Jump to content

User:PrimeBOT/Task 17

From Wikipedia, the free encyclopedia

Status and updates for Task 17

List of params

[edit]

Bugs to fix/patches to make

[edit]
  • Parameter order matters? Found a few instances where &a=___?b=___ worked but not &b=___?a=____
  • Avoid removing --> if stuck to the end of the URL

Regex updates

[edit]
because these things are boring

Original

  • \??(?:&?utm_[^=]*?=[^&\s\]\|]*)+(?=]|\s|\|)|(?<=\?)(?:&?utm_[^=]*?=[^&\s\]\|]*)+&

27 May (BRFA trial) - add green code to catch utm_ params in the middle, and catching more end-of-URL possibilities

  • \??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&

7 June (catch ref tags) - add < to end-of-check exceptions

  • \??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&

8 June (catch malformed utm_ params) - utm_ must be followed by text and an =

  • \??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&

10 June (avoid web archive links)

  • (?<!https://web.archive.org[\S]+)(\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&)

1 July (avoid _utms just hanging out in text)

  • (?<!https://web.archive.org[\S]+|\||\s)(\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&)