Jump to content

Wikipedia:Edit filter/Instructions

From Wikipedia, the free encyclopedia

Creating a filter

This section explains how to create a filter and conduct some preliminary testing, so that you don't flood the history page.

  • For example, evaluate 'some string' rlike 'myregexp' to test your regexp - true expressions evaluate to 1, false will show nothing.
  • Find someone who recently made an edit that you're trying to target, add that account's username or user IP address to the "Changes by user" text field, and click on "Test".
If you don't see positive trigger hits:
  • Tick the "show changes that do not match the filter" checkbox to enable the setting, and click "Test" again.
  • Find the edit that you targeted and click on "(details)" Check the variables - are they the values you expected?
  • Return to the debugging tools page to troubleshoot your code, if needed.
  • Create an "idle" (logging only) edit filter.
  • In the notes field, add a description such as "Testing phase, will add a warning".
  • Let the idle filter run for a while to test for hits that are false positives, or misses that are false negatives.
  • Post a message on the edit filters' noticeboard, so that other edit filter managers can have a chance to examine the filter, post feedback and suggestions, or improve the code themselves.
  • Finally, after you have performed extensive testing and are certain that the filter will not cause mass unexpected disruption or flood the edit filter log with erroneous entries and actions, you can fully enable your filter by adding a warning, disallow action, or tag.

Controlling efficiency

Because these filters are run on every single edit, a poorly worded filter has the strong potential to severely slow down editing or even cause some larger pages to time out. However, some very minor changes in how the conditions are ordered can greatly decrease the running time of the filters. Making use of the order of operations in this way can make the difference between a good filter and one that must be disabled for performance reasons.

Order of operations

Operations are generally done left-to-right, but there is an order to which they are resolved. As soon as the filter fails one of the conditions, it will stop checking the rest of them (due to short-circuit evaluation) and move on to the next filter. The evaluation order is:

  1. Anything surrounded by parentheses (( and )) is evaluated as a single unit.
  2. Turning variables/literals into their respective data. (i.e., page_namespace to 0)
  3. Function calls (norm, lcase, etc.)
  4. Unary + and - (defining positive or negative value, e.g. -1234, +1234)
  5. Keywords
  6. Boolean inversion (!x)
  7. Exponentiation (2**3 → 8)
  8. Multiplication-related (multiplication, division, modulo)
  9. Addition and subtraction (3-2 → 1)
  10. Comparisons. (<, >, ==)
  11. Boolean operations. (&, |, ^, in)

Making expensive operations cheaper

When using keywords such as rlike, in, or contains, the filter must go through the entire string variable to look for the string you're searching for. Variables such as old_wikitext have the tendency to be very large. Sometimes you will be able to approximate these variables by using smaller ones such as added_lines or removed_lines, which the filter can process much faster. Also, using a check for old_size can also help to ensure that you're not going to even try checking a large block of wikitext.

You should always order your filters so that the condition that will knock out the largest number of edits is first. Usually this is a user groups or a user editcount check; in general, the last condition should be the regex that is actually looking for the sort of vandalism you're targeting.