Wikipedia:Bots/Requests for approval/Plasticbot 3
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Withdrawn by operator.
Automatic or Manually Assisted: Automatic
Programming Language(s): AWB
Function Summary: Tag PDF references with "|format=pdf"
Edit period(s) (e.g. Continuous, daily, one time run): Continuous, although the bulk of the work will be in the first run.
Already has a bot flag (Y/N): Yes
Function Details: The bot performs a regex find and replace to identify citation templates with field url=http://www.example.org/file.pdf without a format=pdf field. It then appends that tag to the citation so that readers are appropriately warned of the file format.
Discussion
[edit]- Will the bot use {{PDFlink}}? Titoxd(?!? - cool stuff) 22:26, 3 September 2008 (UTC)[reply]
- Will it actually examine what is served at the URL (check the Content-Type, magic number, etc.) or just check the URL itself for a given convention? (like
/\.pdf$/i
)What about the reverse situation: a URL which does not refer to a pdf is marked with
format=pdf
?I was also thinking about the possibility of expanding this to other formats but it seems {{cite web}} doesn't do any checking on the format param and there's no standardization for it's use. In fact the [[Template:Cite web/doc|{{cite web}} docs]] have an example that uses "
PDF
" whereas your function summary above uses "pdf
". --Jeremyb (talk) 23:55, 3 September 2008 (UTC)[reply]
- AWB doesn't have the capability to check content types and/or magic numbers. What it is doing is a very simple find and replace for citations containing urls that end in ".pdf".
FIND: {{cite web(.*)url=(.*)\.pdf(.*)}}
REPLACE:{{cite web$1url=$2.pdf|format=pdf$3}}
- with a condition to skip the article if it finds "
format=pdf|format = pdf|format=.pdf|format = .pdf
" (not case sensitive). I realize this condition will give false negatives (skip articles where edits should be made) but I spent quite a while on IRC with some folks trying to get a better regex working and this is the best we could do. Suggestions are welcome. To expand the find/replace to other citation formats one would only have to cut the "web" from the find and the replace string. If this goes well I was going to look into the other citation formats—I am not familiar with their treatment of PDFs. As for the (PDF) vs (pdf) issue, I have seen both used and don't have a problem switching to caps if that is where the consensus lies. Plasticup T/C 00:22, 4 September 2008 (UTC)[reply]
- Hmm, it would probably be preferable, to check that the PDF still exists (tag otherwise), and that the file served is actually a PDF (tag as a dead link maybe otherwise, and, I know this part is difficult, sorry.). Might also be nice, if you get that second part going, to work in more ref fixes, like, update 'date accessed', maybe title, that sort of stuff. SQLQuery me! 06:13, 7 September 2008 (UTC)[reply]
- That sort of thing is far beyond my ability and, I believe, the constraints of AWB. Actually opening and checking the links is part of an ongoing project on toolserver, but it is still in its experimental phases. Plasticup T/C 17:37, 8 September 2008 (UTC)[reply]
- Any possibility I could usurp this task for PDFbot? I can't promise that I'll get to it anytime soon but when I do I'll get more 'general fixes' in. — Dispenser 07:44, 19 September 2008 (UTC)[reply]
- That sort of thing is far beyond my ability and, I believe, the constraints of AWB. Actually opening and checking the links is part of an ongoing project on toolserver, but it is still in its experimental phases. Plasticup T/C 17:37, 8 September 2008 (UTC)[reply]
- The tasks aren't very related, so it is unlikely that combining them into the same bot would cut down on the number of edits. That said, if you can get the code working better than I can I would have no problem with your bot taking up the task. Plasticup T/C 03:29, 20 September 2008 (UTC)[reply]
- I've got about 50% of it done over the weekend, I just need to figure out a better way to insert the parameter so it matches the vertical or horizontal formats. — Dispenser 18:16, 23 September 2008 (UTC)[reply]
- The tasks aren't very related, so it is unlikely that combining them into the same bot would cut down on the number of edits. That said, if you can get the code working better than I can I would have no problem with your bot taking up the task. Plasticup T/C 03:29, 20 September 2008 (UTC)[reply]
- The stuff you mention requires human intervention, updating accessdates for instance isn't suppose to be automated as the bot doesn't know if the facts are still present. — Dispenser 07:44, 19 September 2008 (UTC)[reply]
- Where will it get the list of pages to run on? Mr.Z-man 04:04, 8 September 2008 (UTC)[reply]
- A recent db-dump. Plasticup T/C 17:37, 8 September 2008 (UTC)[reply]
It has been 5-6 days. Any more thoughts on how this bot could be improved or whether it is ready for testing? Plasticup T/C 17:27, 13 September 2008 (UTC) {{BAGAssistanceNeeded}}[reply]
It has been 9 days since the last reply. The current code should produce no false positives, and I would like to request a Trial Period to prove it. Plasticup T/C 02:29, 18 September 2008 (UTC)[reply]
Approved for trial (20 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Mr.Z-man 20:45, 29 September 2008 (UTC)[reply]
- I've gotten the code working, but PDFbot is usurping the task. I am going to let User:Dispenser run with it. Plasticup T/C 23:42, 5 October 2008 (UTC)[reply]
Withdrawn by operator. Mr.Z-man 18:31, 14 October 2008 (UTC) {{country data {{{1}}} | country flaglink | variant = | size = | name = | altlink = national baseball team | altvar = baseball }}[reply]