Wikipedia:Bots/Requests for approval/Rlink2 Bot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was Withdrawn by operator., see the new BRFA.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: Rlink2 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 00:15, Tuesday, December 7, 2021 (UTC)
Function overview: This function of the bot will be involved in bare ref filling.
Automatic, Supervised, or Manual: Supervised/Automatic: Like IAbot or Citation bot, editors will be able to pick articles for fixing. The bot will not randomly traverse through articles for fixing, at least not now. If this changes another BRFA or a modification to this one will be filled.
Programming language(s): Multiple languages.
Source code available: When it is out of trial and bugs are fixed.
Links to relevant discussions (where appropriate): This does things Citation Bot does.
Edit period(s): Continuous.
Estimated number of pages affected:
Namespace(s): Article
Exclusion compliant (Yes/No): yes
Function details: This bot request focuses on fixing bare refs.
As BrownHairedGirl pointed out, Citation Bot and reflinks are unable to fill in numerous sites. My bot differs from Reflinks in that it employs a superior title fetching mechanism to obtain the web page title. It's already really excellent; many of the sites that BHG has labeled as unfixable can be fixed with my bot.
The bot's functioning is straightforward: an editor sends one or more articles, and the bot fixes them. Until the bot is approved, for the trial, I will manually specify articles. Also, while it is like Cite bot, the functionality of this request is far more specific, simply just fix bare refs.
For clarity these edits are similar to the ones the bot will be making:
If the link is dead or the title can't be retrieved the bot will just skip over it.
Discussion
[edit]Do you have a list of the fixes this bot task will do? ProcrastinatingReader (talk) 11:12, 7 December 2021 (UTC)[reply]
- I'm confused by your question. This bot task will fix bare references into a CS1/CS2 cite format with title included, so one fix. Rlink2 (talk) 13:54, 7 December 2021 (UTC)[reply]
Is this edit which converts:
<ref>https://www.youtube.com/watch?v=PUwmA3Q0_OE&t=5s American Museum of Natural History – Human Population Through Time</ref>
to:
<ref>{{cite web| url-status = live| archive-url = https://ghostarchive.org/varchive/youtube/20211205/PUwmA3Q0_OE| url = https://www.youtube.com/watch?v=PUwmA3Q0_OE&t=5s| title = Human Population Through Time}}{{cbignore}} American Museum of Natural History – Human Population Through Time</ref>
- "Human Population Through Time".
{{cite web}}
:|archive-url=
requires|archive-date=
(help) American Museum of Natural History – Human Population Through Time
- "Human Population Through Time".
an example of what the bot does? The ghostarchive url has a timestamp so shouldn't |archive-date=2021-12-05
be included in the template so that a human editor doesn't have to follow you around fixing these sorts of errors? For YouTube videos, {{cite AV media}}
is a better choice than {{cite web}}
.
Why add {{cbignore}}
?
—Trappist the monk (talk) 14:40, 7 December 2021 (UTC)[reply]
- Thank you for sharing your thoughts, Trappist. Procrastinating was attempting to ask me a question which I now understand. While the bot will largely address bare refs when someone tries to push articles through it, there are other supplementary options that the editor pushing can pick.
One possibility is to do archive fixes, and another is to use a generic title with the website name if the title cannot be discovered or just leave the bare ref alone, however as I previously stated, any options will only apply to the fixed bare refs and not any other refs. This is similar to what you can already do with other bots. I like to think of it as a "merge" of capabilities between IA and Cite bot. - The edits I do manually are tad different then what the bot will be doing - obviously the bot will be doing more than Youtube links. I forgot to add the archive-date for about 10-30 edits on the 5th. I realized my mistake early on and fixed it for future. The bot will be generating proper citations from day one. I think the cbignore thing was to fix a bug with IABot. Will keep in mind the Cite AV style. Rlink2 (talk) 15:58, 7 December 2021 (UTC)[reply]
I forgot to add the archive-date for about 10-30 edits on the 5th.
It's a pity you didn't put the errors right after realising this rather than leaving for somebody else to put right. Are you connected with ghostarchive? I don't see anything this bot does anything other bots do except it uses ghostarchive. --John B123 (talk) 18:58, 8 December 2021 (UTC)[reply]- edit to add: Looking at User:Rlink2 Bot, the purpose of the bot is
to convert the archive.* aliases to archive.today
which is totally different to the explanation above. Looking at this edit [1] in the bot's trial run, https://archive.is/20140328061855/http://www.academia.edu/5403302/The_Philistines_in_the_North_and_the_Kingdom_of_Taita_unpublished_paper_by_Itamar_Singer_zl_ is changed to https://archive.today/20140328061855/http://www.academia.edu/5403302/The_Philistines_in_the_North_and_the_Kingdom_of_Taita_unpublished_paper_by_Itamar_Singer_zl_ This seems totally pointless as archive.today redirects to archive.ph, just the same as the original url did. Although the website is called archive.today, the url is archive.ph. --John B123 (talk) 19:15, 8 December 2021 (UTC)[reply]- I said in the description of Bot job 2 that it can correct bare references better than Citation bot or reflinks. If existing tools were "perfect" BrownHairedGirl would not be tagging about 200 bare ref articles a day. The purpose is the same as other bots, but it does specific functions more efficiently. If i wanted a bot to deal with archive sites, I would submit a different request for that since it is another discussion altogether.
- The bot has one other "big task" which is to convert the archive.ph links to archive.today. If you go to request 1, you will see that the archive.today owner requested that people use archive.today because any other domain could be taken away (See archive.today blog post), hence why it changes .is to .today. People are trying to steal archive domains all the time. This already happened to vid.me: see here.
- To be clear, when selecting the OPTIONAL archiving feature, the editor will be able to pick the archiving site they want, but it will default be wayback since almost any website can be found archived there, and archiving as the bot goes along will be too slow. However, this specific bot request is for the fixing of bare refs. Anything else is optional and can be changed/removed per consensus. I thought the opinion to archive the bare refs could be nice, but if people don't want it, it can be removed. Regardless, my recent focus on archiving has been Youtube ahead of their yearly video purges, and despite having 20+ archive providers (of which I have looked at almost all of them), only 4 allow open and instant archiving both manually and programmatically, of which only 2 can do youtube videos: archive.org and ghostarchive, and only one of those allows you to submit videos for archiving instantly. I use both sites. But none of these has anything to do with the bot, which is a simple ref fixing bot rather than a youtube archiving tool. I also fixed the citation problems as soon as i realized it. :) (Edit: apparently not all of them, sorry!) Rlink2 (talk) 21:17, 8 December 2021 (UTC)[reply]
- I'm getting confused here, all the links to Rlink2 Bot 2 lead to User:Rlink2 Bot, which is the bot for converting to archive.today. Where is the documentation for Rlink2 Bot 2? --John B123 (talk) 21:48, 8 December 2021 (UTC)[reply]
- Rlink2 Bot (the account) will be the one bot account doing the multiple tasks. (named Rlink2 Bot and Rlink Bot 2 in the BRFA page). I haven't gotten around to writing the full documentation yet. I will once people decide the direction of the bot and its readied for general usage. I just updated the user page of Rlink2 Bot to reflect the 2nd BRFA. Rlink2 (talk) 22:02, 8 December 2021 (UTC)[reply]
- Actually, scratch the optional archiving feature for now. I don't want discussions about archiving to take away from the goal of Task #2, which is to fix bare refs. Rlink2 (talk) 00:15, 9 December 2021 (UTC)[reply]
- Rlink2 Bot (the account) will be the one bot account doing the multiple tasks. (named Rlink2 Bot and Rlink Bot 2 in the BRFA page). I haven't gotten around to writing the full documentation yet. I will once people decide the direction of the bot and its readied for general usage. I just updated the user page of Rlink2 Bot to reflect the 2nd BRFA. Rlink2 (talk) 22:02, 8 December 2021 (UTC)[reply]
- I'm getting confused here, all the links to Rlink2 Bot 2 lead to User:Rlink2 Bot, which is the bot for converting to archive.today. Where is the documentation for Rlink2 Bot 2? --John B123 (talk) 21:48, 8 December 2021 (UTC)[reply]
- edit to add: Looking at User:Rlink2 Bot, the purpose of the bot is
Can you please change your bot so that the blank space comes before the pipe, not after? This will reduce problems caused by line-wrapping in the edit window. Thank you. --NSH001 (talk) 06:49, 11 December 2021 (UTC)[reply]
- Will do. Rlink2 (talk) 13:54, 11 December 2021 (UTC)[reply]
- Thank you, NSH001 (talk) 03:24, 12 December 2021 (UTC)[reply]
- Sorry, if those diffs are representative, then you're still putting the blank in the wrong place. (see my request below, which you said you would do?)
- Not good to create cs1|2 templates with errors. At this edit:
{{cite web| url = https://www.oyez.org/cases/1963/615| title = {{meta.fullTitle}}}}
- I agree with Editor NSH001, pipes are in the wrong place.
- —Trappist the monk (talk) 13:13, 13 December 2021 (UTC)[reply]
- @NSH001: While i did fix the pipe spacing in another config, I forgot to port the fix to the bare ref config. It has now been fixed for future edits. Thanks for the trout slap. Regarding the erroneous edit, I will need to find a way to escape the { and }, the website titles sometimes do have it. Thanks for all your hard work Trappist. Rlink2 (talk) 16:32, 13 December 2021 (UTC)[reply]
- Small niggle here. @Rlink2, please can the bot format parameters with just one space? i.e. a space before the pipe, no space after the pipe, and no space around the = sign.
- e.g.
{{cite web |url=https://example.com/foobar |title=Foo Bar news |first=Fooey |last=MacBar |date=30 February 2024}}
- That way when the wikisource is wrapped, the parameter name remains on the same line as its value, which makes the code much easier to read. BrownHairedGirl (talk) • (contribs) 02:14, 15 December 2021 (UTC)[reply]
- Done Rlink2 (talk) 04:25, 15 December 2021 (UTC)[reply]
- @NSH001: While i did fix the pipe spacing in another config, I forgot to port the fix to the bare ref config. It has now been fixed for future edits. Thanks for the trout slap. Regarding the erroneous edit, I will need to find a way to escape the { and }, the website titles sometimes do have it. Thanks for all your hard work Trappist. Rlink2 (talk) 16:32, 13 December 2021 (UTC)[reply]
- Dead links. The proposal says
If the link is dead or the title can't be retrieved the bot will just skip over it.
. If the link is positively confirmed as dead (i.e. with a 404 error), then please please please please @Rlink2 can the bot tag that link with (datestamped) {{Dead link}}, replacing any {{Bare URL inline}} tag?
In my work over the last 5 months, I have found that a significant minority of bare URLs are untagged dead links are actually dead, and tagging them is a huge help. The latest (20211201) database dump had 276,443 pages with untagged non-PDF bare URLs inside<ref>..</ref>
. Tagging those dead links excludes them from my searches, which allow me to focus on live links which are potentially fixable, and it also allow various bots to try to rescue the dead links. --BrownHairedGirl (talk) • (contribs) 02:04, 15 December 2021 (UTC)[reply]- This seems like a non controversial solution will add. Rlink2 (talk) 04:25, 15 December 2021 (UTC)[reply]
I just reverted this edit. As part of that revert, I looked at both the live and archived urls. If a YouTube video is the source, providing a link to archive.org where the video is not available seems rather pointless to me. Virtually nothing on a video's YouTube page is useful except perhaps the video's title so why bother? Shouldn't such sources be marked as permanently dead?
—Trappist the monk (talk) 15:41, 16 December 2021 (UTC)[reply]
- I assumed that the comments and title was better than nothing. Thinking about what you said, maybe it is better to mark the links as permadead instead. Will do Rlink2 (talk) 23:59, 17 December 2021 (UTC)[reply]
- Apparently not; see this edit. Does archive.org ever actually archive a YouTube video? I seem to recall that you once said somewhere that it didn't but that ghostarchive did. Am I mistaken? If archive.org does not, why bother with these kinds of 'fixes'?
- —Trappist the monk (talk) 13:01, 20 December 2021 (UTC)[reply]
- You are mostly not mistaken. Regarding that edit, wayback usually shows title, and sometimes comments. Other times it shows the title, but it flashes away quickly. And in some circumstances archive.org Youtube only shows in certain browsers. So people might see different stuff at different times. Apparently they are working on fixing their Youtube, but I have little to no idea of their progress. Like i said before I will transition into marking dead refs with {{dead link}} instead since it really seems to be hit and miss. Rlink2 (talk) 14:55, 20 December 2021 (UTC)[reply]
Meta comment
[edit]- In principle, I enthusiastically welcome another bot to fill WP:Bare URLs. All of the single-page tools (WP:Reflinks, WP:Refill and ReferenceExpander) have flaws and/or limitations. The excellent Citation bot is highly accurate, and actively maintained by @AManWithNoPlan, who fixes bugs very rapidly (and clearly puts a lot of work into it). However, Citation bot has limited capacity and is grossly overloaded (partly by me!) ... so another bot to share the load is very welcome.
- But, in practice, any tool tackling this task will have errors. Many websites are malconfigured, and use inconsistent formatting, making it hard for a bot to reliably extract the relevant data. Making a tool such as this work reliably at scale will require growing exception lists and ongoing patches to cope with all the weird permutations of metadata-mangling in websites.
- I have had several discussions with @Rlink2 over the last few months, and have been delighted to find them them to be skilled, conscientious, precise, diligent, and a great communicator. I am sure that they would be an excellent person to operate such a bot. (No caveats about that: Rlink2 has a rare combination of desirable attributes).
- However, once this bot goes live, the issues will flood in, and it will be a while before that flood slows to a trickle. So the bot needs systems to cope with this: an easy-to use bug-reporting mechanism, prompt updates, and possibly a mechanism to disable the bot if a bug backlog builds up. And Rlink2 needs to be ready to put in a lot of time.
- So, before this bot is fully approved, I would like to see such a bug-handling system already in place and tested, as well as a clear assurance from Rlink that they are ready and willing to deal with a flood of feedback, and can give the bot the regular attention needed (i.e. at least daily). I would hate to see this go live and provoke some big drama as Rlink struggled to respond fast enough to the issues. We don't need drama, and we really really don't need to have Rlink2 getting bruised and disillusioned.
- If it helps, I am happy to volunteer as a beta-tester for the bug-handling mechanism. --BrownHairedGirl (talk) • (contribs) 01:32, 15 December 2021 (UTC)[reply]
- Indeed, just the tidy_date code in Citation Bot is crazy mess of date cleaning. AManWithNoPlan (talk) 01:47, 15 December 2021 (UTC)[reply]
- Thank your for your kind words and your concern. I am able to fix any sort of bugs relatively quickly. Today there was a bug in my archive run, I fixed it in a matter of minutes. As for the exception list, I certainly agree. I already have some sites and titles all hardcoded in already, but the best way to get more is to test the code more. The nice thing is it integrates into AWB, so I can see what the fixer is struggling with pre-launch. By the time the bot goes live I will have time to handle the initial flood of reports. For bug reports, the basic idea will be the same for Citation bot: someone leaves a message on the bot talk page, and then I reply once the bug has been fixed. Feedback on this idea welcome. The bot will be updated as needed, even if it means every day. Rlink2 (talk) 04:25, 15 December 2021 (UTC)[reply]
- @Rlink2: it sounds like you are ready for the deluge!
- One further suggestion: since this job is likely to generate a lot of feedback, wouldn't it be best to operate this task under a unique bot id, which describes its function? Something analogous to "Citation bot" for the bot which dies citations, but in this case it's a bot which fills bare URLs. Ideally a one-word name (no spaces, using camel case if needed, so that the name can be copied more easily.
- That way, feedback on this high-volume task won't be jumbled up with feedback on other bot tasks which you may be running. BrownHairedGirl (talk) • (contribs) 17:39, 15 December 2021 (UTC)[reply]
- @BrownHairedGirl: By the time the bot is approved, I should be done with my current WP tasks, meaning I will devote all my WP time to the bot for the first whiles. What possible names do you have? I was thinking "Bare bot" but suggestions are welcome. Rlink2 (talk) 20:07, 15 December 2021 (UTC)[reply]
- @Rlink2: Even if you drop other tasks for now, I am sure that at some stage you will expand your repertoire again. So best to separate out the big ongoing task.
- "Bare bot" has other connotations which might be distracting; the jokes will get a bit stale after the first few hundred outings. And it has a space in it.
- I was thinking maybe "BareURLbot", "FillBareURLbot", "FillURLbot", "FillRefBot", "BareRefBot", "FillBot", "Bare2CiteBot". Plenty of other possibilities. BrownHairedGirl (talk) • (contribs) 20:20, 15 December 2021 (UTC)[reply]
- I like BareURLbot, but i will have to think about it. Rlink2 (talk) 23:59, 17 December 2021 (UTC)[reply]
- @BrownHairedGirl: By the time the bot is approved, I should be done with my current WP tasks, meaning I will devote all my WP time to the bot for the first whiles. What possible names do you have? I was thinking "Bare bot" but suggestions are welcome. Rlink2 (talk) 20:07, 15 December 2021 (UTC)[reply]
- Thank your for your kind words and your concern. I am able to fix any sort of bugs relatively quickly. Today there was a bug in my archive run, I fixed it in a matter of minutes. As for the exception list, I certainly agree. I already have some sites and titles all hardcoded in already, but the best way to get more is to test the code more. The nice thing is it integrates into AWB, so I can see what the fixer is struggling with pre-launch. By the time the bot goes live I will have time to handle the initial flood of reports. For bug reports, the basic idea will be the same for Citation bot: someone leaves a message on the bot talk page, and then I reply once the bug has been fixed. Feedback on this idea welcome. The bot will be updated as needed, even if it means every day. Rlink2 (talk) 04:25, 15 December 2021 (UTC)[reply]