Jump to content

User:TedderBot/NewPageSearch

From Wikipedia, the free encyclopedia

The New Pages Patrol is the perfect place for assessing articles and finding new contributors. However, it's a busy place to patrol, even with the patrolled flag. User:AlexNewArtBot provides content divided by subject area.

When the bot and user vanished in 2011, I coded a replacement for it. The code is written in Java with User:MER-C's Wikimedia API, and the source code is posted to Github with a reuse-friendly license. That way when I'm hit by a bus, another user can get it running quickly.

Do you want to see new features?

[edit]

Please post them on the talk page. I want to control how features are added to the following lists. If you can help clean up this page, feel free to do so.

Things I need help with

[edit]
  • "search query clerk" - if you understand the search queries (or even some of them) and want to help maintain and fix queries, let me know.
  • graphical person - I need a mascot/logo. This shed deserves to be painted.
  • documentation - I'm terrible with wording. I need help explaining what this bot does and help explaining what sparklines are.

Specification notes

[edit]
  • The definition of a lede, for doubling points, is narrowly construed: it is from the beginning of the page until two newlines or the beginning of a section ("=="). Effectively, it's limited to infoboxes, cleanup tags, and the first paragraph.
  • Can't process the \p{charset} thing. TODO: explain.

Tasks

[edit]

To implement

[edit]
  • Leave annotations alone on search result page, both before and after the search text (before for User:Dudemanfellabra to mark 'unrelated', after for User:Nthep, [1])
  • Self-document search pages. Have a person or project as owner? (for User:SunCreator)
  • RevisionID of article seen (since it is cached)
  • Lazy load rules
  • article title in cached text: example
  • Configuration to turn archives off (for User:SunCreator)
  • Configuration to turn infobox parsing off. (for User:Acroterion and WP:WPARCH)

Completed

[edit]
  •  (20110518) Invert output so the newest result is at the top
  •  (20110518) Move order of processing so a given ruleset is processed and posts, then another next ruleset is processed
  •  (20110518) Maintain state of each ruleset independently, start with ruleset+1
  •  (20110519) Respect bot flag
  •  (20110520) Only run on a given rule if necessary (more than 24 hours since last run)
  •  (20110520) Logging, not stdout
  •  (20110521) Detect pages removed and put them in the archive
  •  (20110523) Log page: User:AlexNewArtBot/ShipsLog (on errors page: User:TedderBot/NewPageSearch/Ships/errors)
  •  (20110602) Turned off caching for fetching rules pages
  •  (20110602) Added count of inhibitors to search logs
  •  (20110602) Bug: inhibit/excludes not working correctly? (for User:Lionelt, example is Neil McAuley on User:AlexNewArtBot/Conservatism with inhibitor "right wing back")
  •  (20110623) RevisionID of the ruleset when loaded (for User:SunCreator)

Searches to implement

[edit]
  • Motorcycling
  • Redirects

Long-term

[edit]
  • Also watch for redirects turned into articles (there's an editfilter for this no more, so this task is difficult unless we can steal another list)