Wikipedia:Edit filter/Instructions
Edit filters are very powerful tools - do not underestimate the damage errors can cause. If you're new to this, make sure you get appropriate input from editors more experienced with edit filters. |
Creating a filter
This section explains how to create a filter and conduct some preliminary testing, so that you don't flood the history page.
- Read the filter rules documentation at mw:Extension:AbuseFilter/Rules format
- Test some regular expressions at debugging tools:
- For example, evaluate
'some string' rlike 'myregexp'
to test your regexp - true expressions evaluate to1
, false will show nothing.
- For example, evaluate
- Manually test your code on the edit filter batch testing page:
- Find someone who recently made an edit that you're trying to target, add that account's username or user IP address to the "Changes by user" text field, and click on "Test".
- If you don't see positive trigger hits:
- Tick the "show changes that do not match the filter" checkbox to enable the setting, and click "Test" again.
- Find the edit that you targeted and click on "(details)" Check the variables - are they the values you expected?
- Return to the debugging tools page to troubleshoot your code, if needed.
- If you don't see positive trigger hits:
- Create an "idle" (logging only) edit filter.
- In the notes field, add a description such as "Testing phase, will add a warning".
- Let the idle filter run for a while to test for hits that are false positives, or misses that are false negatives.
- Post a message on the edit filters' noticeboard, so that other edit filter managers can have a chance to examine the filter, post feedback and suggestions, or improve the code themselves.
- Finally, after you have performed extensive testing and are certain that the filter will not cause mass unexpected disruption or flood the edit filter log with erroneous entries and actions, you can fully enable your filter by adding a warning, disallow action, or tag.
Controlling efficiency
Because these filters are run on every single edit, a poorly worded filter has the strong potential to severely slow down editing or even cause some larger pages to time out. However, some very minor changes in how the conditions are ordered can greatly decrease the running time of the filters. Making use of the order of operations in this way can make the difference between a good filter and one that must be disabled for performance reasons.
Order of operations
Operations are generally done left-to-right, but there is an order to which they are resolved. As soon as the filter fails one of the conditions, it will stop checking the rest of them (due to short-circuit evaluation) and move on to the next filter. The evaluation order is:
- Anything surrounded by parentheses (
(
and)
) is evaluated as a single unit. - Turning variables/literals into their respective data. (i.e.,
article_namespace
to 0) - Function calls (
norm
,lcase
, etc.) - Unary
+
and-
(defining positive or negative value, e.g.-1234
,+1234
) - Keywords
- Boolean inversion (
!x
) - Exponentiation (
2**3 → 8
) - Multiplication-related (multiplication, division, modulo)
- Addition and subtraction (
3-2 → 1
) - Comparisons. (
<
,>
,==
) - Boolean operations. (
&
,|
,^
,in
)
Making expensive operations cheaper
When using keywords such as rlike
, in
, or contains
, the filter must go through the entire string variable to look for the string you're searching for. Variables such as old_wikitext
have the tendency to be very large. Sometimes you will be able to approximate these variables by using smaller ones such as added_lines
or removed_lines
, which the filter can process much faster. Also, using a check for old_size
can also help to ensure that you're not going to even try checking a large block of wikitext.
You should always order your filters so that the condition that will knock out the largest number of edits is first. Usually this is a user groups or a user editcount check; in general, the last condition should be the regex that is actually looking for the sort of vandalism you're targeting.