Wikipedia:Bots/Requests for approval/KuduBot 3

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Withdrawn by operator.

KuduBot 3

Operator: KuduIO (talk · contribs)

Time filed: 00:01, Monday September 12, 2011 (UTC)

Automatic or Manual: Automatic unsupervised

Programming language(s): Python and regular expressions

Source code available: Standard pywikipedia, regular expression / parameters may be available on request

Function overview: Move all hatnotes to the very top of the articles per the Manual of Style.

Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 43#Request for hatnote bot

Edit period(s): One-time run, then daily

Estimated number of pages affected: ? (articles)

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): N

Function details: Affects only article lead section.

Discussion

Are there *any* cases where the top is not the best place for a hatnote? Are some used in sections, perhaps? - Jarry1250 ^{[Weasel? Discuss.]} 17:23, 12 September 2011 (UTC)[reply]

They definitely could have been used there, so limiting to lead section is probably smart. — HELLKNOWZ ▎TALK 17:41, 12 September 2011 (UTC)[reply]

Okay, that's one exception. Are there others? I assume we're limiting to article space here for a start? - Jarry1250 ^{[Weasel? Discuss.]} 18:02, 12 September 2011 (UTC)[reply]

I somehow assumed this applies to articles by default; it definitely should, article layout guidelines do not apply to other namespaces. — HELLKNOWZ ▎TALK 18:07, 12 September 2011 (UTC)[reply]

It probably applies to project space too, but limiting to article space for now seems wise. Yes, I'll add an exception for sections in the form of a lookahead. — Kudu ^~I/O~ 20:12, 12 September 2011 (UTC)[reply]

How do you intend to find articles that suffer from this problem? Are you going to randomly crawl through every article, or is there a database report somewhere? —SW— ^comment 18:47, 13 September 2011 (UTC)[reply]
- Presumably by processing a dump (AWB can handle this). - Jarry1250 ^{[Weasel? Discuss.]} 20:52, 13 September 2011 (UTC)[reply]
  - Right, just want to ensure that the operator is willing/able to download and process a database dump. —SW— ^confabulate 21:27, 13 September 2011 (UTC)[reply]
    - This will be done by accessing the database directly from the toolserver. — Kudu ^~I/O~ 20:35, 15 September 2011 (UTC)[reply]
      - This is not possible. The toolserver database does not include page text. —SW— ^verbalize 22:30, 15 September 2011 (UTC)[reply]
        Right. Perhaps I can write a separate tool which uses WikiProxy and dumps a list of pages to a file, and then feed that to pywikipedia's replace.py. — Kudu ^~I/O~ 14:03, 18 September 2011 (UTC)[reply]
        What is WikiProxy? How will it generate a list of problematic pages if it does not analyse a dump? (Or does it?) - Jarry1250 ^{[Weasel? Discuss.]} 15:35, 18 September 2011 (UTC)[reply]
        The magic eye says: meta:User:Duesentrieb/WikiProxy — Kudu ^~I/O~ 21:36, 19 September 2011 (UTC)[reply]

Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's at least see how it performs and then (hopefully) wait for some feedback. Per WP:COSMETICBOT, be careful to not make edits that only affect whitespace and newlines, as is often the case with misformatted lead templates. — HELLKNOWZ ▎TALK 09:37, 22 September 2011 (UTC)[reply]

One minute: Please don't use WikiProxy. If you're going to be scanning 3.5 million articles, please use a dump. It just makes sense. - Jarry1250 ^{[Weasel? Discuss.]} 18:17, 22 September 2011 (UTC)[reply]

I agree. Sending mass queries through wikiproxy will still consume massive resources at toolserver, which is not good (and is probably a violation of toolserver policies). There is no reason that this task can't work from a database dump that is a few days old. The task doesn't require up-to-the-second versions of articles. You could also consider asking the maintainer of this tool to add a report for misplaced hatnotes if you don't want to deal with database dumps yourself. —SW— ^spout 19:02, 22 September 2011 (UTC)[reply]

Comment.I'm trying to see if there are dumps available on the toolserver already, since I have a rather small quota myself. Anybody more experienced feel free to help. — Kudu ^~I/O~ 12:08, 23 September 2011 (UTC)[reply]

If you have a fast internet connection and a moderately good processor, it is far easier to download one onto your home PC and process it with AWB. - Jarry1250 ^{[Weasel? Discuss.]} 12:20, 23 September 2011 (UTC)[reply]

I use Mac OS and Linux, so no AWB for me. However, I'll consider running pywikipedia with a dump from my own computer. Nothing is urgent, so I'll set it up over the next few days. — Kudu ^~I/O~ 12:27, 23 September 2011 (UTC)[reply]

Doing... Downloading an XML dump to the toolserver. — Kudu ^~I/O~ 19:38, 23 September 2011 (UTC)[reply]

Here's the update: I finished downloading and extracting the dump, and now I'm running the script in a screen session. It's still analyzing the dump. — Kudu ^~I/O~ 22:42, 23 September 2011 (UTC)[reply]
- Withdrawn by operator. Bad support from pywikipedia. It'd be easier for someone to file a new BRFA using AWB. — Kudu ^~I/O~ 21:59, 4 October 2011 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.