Wikipedia:Bots/Requests for approval/PearBOT 7
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: Trialpears (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 00:22, Tuesday, January 7, 2020 (UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): Python
Source code available: Will add tomorrow
Function overview: Remove red links to sister projects using {{Sister project}}.
Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 80#Remove sister project templates with no target
Edit period(s): One time run
Estimated number of pages affected: I really don't know. Somewhere between 500 and 10000 maybe?
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: Since many of these templates link to a search using the articles page name as the query only links where there are zero search results.
Discussion
[edit]- Hmm. Sometimes the lack of a namespace prefix can be the actual problem; we often link from Wikipedia articles to the corresponding Commons category. Jo-Jo Eumerus (talk) 09:19, 7 January 2020 (UTC)[reply]
- Jo-Jo Eumerus, commons search results display results from several namespaces (gallery, file, category, creator, institution, help) by default. I will do the search in all of them just as a user following the link would. ‑‑Trialpears (talk) 07:16, 10 January 2020 (UTC)[reply]
- Thanks for taking this on. My gut guess as the requester is that the "Estimated number of pages affected" would be much higher. I run into the problem on species articles, but I suspect the problem is Wikipedia wide, and that every article with a sister project template should be checked. I'd guess that's into the millions. SchreiberBike | ⌨ 04:01, 10 January 2020 (UTC)[reply]
- SchreiberBike, I think this issue would affect animal/plant species disproportionately since the articles often contain a commons/wikispecies link that is copied by other editors even when it's not applicable. It would also be more common that it's automatically removable for species since the title can't show up on any user facing page on all of commons (or other project) to have zero search results which should be more common for species with their latin names. I may very well be wrong though. I'll run it without saving on a few pages to get an actual estimate this weekend. ‑‑Trialpears (talk) 07:29, 10 January 2020 (UTC)[reply]
- @Trialpears: Pi bot does similar things with commons categories, and is also written in Python, so there may be code that you can adapt from that to handle these cases. In particular, I would recommend using Wikidata interwiki links to replace bad links with good ones where possible. There's also a discussion with @Hike395: on my talk page which might be relevant - having some code in the templates along the lines of that used in {{Commons category}}, plus similar tracking categories to those at Category:Commons category Wikidata tracking categories, would help with manually checking cases that the bot can't handle. Thanks. Mike Peel (talk) 09:25, 18 January 2020 (UTC)[reply]
- @Trialpears and SchreiberBike: Hi, Trial. How were you planning on defining "red links"? I've been gathering data about commons links, and the concept is rather complex. You can see the wikidata dump of commons links for en articles that contain {{commons and category}} or {{commons and category-inline}} with missing arguments, here and here. Out of 924 such articles, 51 (5.5%) do not have any commons links recorded in wikidata. The Commons search quality is variable:
- Some searches, like Oeneis bore don't return a gallery, but can return a category
- Some searches, like Cleistesiopsis oricamporum return a relevant gallery with a different name
- Some searches, like Eightcubed return a valid image, but find no gallery or category
- Some searches, like Anjum Shahzad return junk.
- Were you planning on testing for an empty search result? Otherwise, I'm not sure how a bot can tell the difference between a useful search result and junk. There's also the question of timing --- it could be that a search result is empty or junk now, but will return a good result in the future. — hike395 (talk) 00:33, 19 January 2020 (UTC)[reply]
- Hike395, my plan was defining it very strictly only removing if there are no hits in any user facing namespaces which means only Anjum Shahzad would be removed of the ones above. While it is possible there will be a good result in the future there has repeatedly been a consensus that redlinks in navigational templates such as this one should be removed if they are not very likely to be created shortly as documented at WP:EXISTING. It will take some time for me to look into Mike Peel's code but it looks very promising and seems to be more efficient then my current implementation. ‑‑Trialpears (talk) 17:36, 19 January 2020 (UTC)[reply]
- IIUC, you are planning on running Commons searches like Commons:Special:Search/Anjum Shahzad in your bot, and then remove en template if the search results contain no Files, galleries, or Categories. That seems good. Mike Peel's bot code could be helpful --- it could be faster to look up entities in wikidata, filtering out the 95% that have entries in P373, P935, sitelinks, or P910 sitelinks. Then you would only run the search on ~5% of the boxes. — hike395 (talk) 18:58, 19 January 2020 (UTC)[reply]
- Hike395, my plan was defining it very strictly only removing if there are no hits in any user facing namespaces which means only Anjum Shahzad would be removed of the ones above. While it is possible there will be a good result in the future there has repeatedly been a consensus that redlinks in navigational templates such as this one should be removed if they are not very likely to be created shortly as documented at WP:EXISTING. It will take some time for me to look into Mike Peel's code but it looks very promising and seems to be more efficient then my current implementation. ‑‑Trialpears (talk) 17:36, 19 January 2020 (UTC)[reply]
- @Trialpears and SchreiberBike: Hi, Trial. How were you planning on defining "red links"? I've been gathering data about commons links, and the concept is rather complex. You can see the wikidata dump of commons links for en articles that contain {{commons and category}} or {{commons and category-inline}} with missing arguments, here and here. Out of 924 such articles, 51 (5.5%) do not have any commons links recorded in wikidata. The Commons search quality is variable:
- @Trialpears: Do you have any examples of what an edit from this task would look like? --TheSandDoctor Talk 21:07, 7 March 2020 (UTC)[reply]
- {{OperatorAssistanceNeeded}} No response to the above question. Primefac (talk) 20:05, 15 March 2020 (UTC)[reply]
- Sorry, I've had a lot going on recently, should be back to normal tomorrow. Here are diffs two diffs of edits which would have been done by the bot. The criteria is that the fallback search gives no results. Not only links using these two templates will be removed but any wrappers of {{sister project}}. Very sorry for the wait the last three months have been so much more stressful then expected but it will probably not be as bad in the future and this bot is now my top priority together with my FL. ‑‑Trialpears (talk) 15:34, 16 March 2020 (UTC)[reply]
- {{OperatorAssistanceNeeded}} No response to the above question. Primefac (talk) 20:05, 15 March 2020 (UTC)[reply]
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please do not mark these as minor, as I'd like to get as many eyes on the changes as possible (so as per usual, link to this BRFA in your edit summary). Primefac (talk) 22:23, 22 March 2020 (UTC)[reply]
- I'm very sorry to say this, but I will not complete this project. I've been going through documenting the differences between different templates, how they handle fallbacks when different data sources are blank, played around with getting the bot to get search results from other wikis to see if the search fallback is actually useful. All of these have led to me encountering problems. The different wrappers are significantly more varied than I ever thought they would be and will need significant amounts of unique code each, already ballooning the amount of work required to perform this task. I've had issues with pywikibot complaining about being connected to many wikis at once and occasionally just refusing to work with certain sites seemingly at random. My final and biggest issue was that I didn't get 100% consistent search results between manual searches and the bot which could potentially lead to inappropriate removals. While all of this is solvable with enough effort and time I do not have at all the same interest or energy I had a few months ago and can not leave this open in good faith. SchreiberBike I'm very sorry to disappoint. It was a great suggestion for a bot and I'm glad you raised it, but it seems like I won't finish it. Sorry. ‑‑Trialpears (talk) 21:41, 8 May 2020 (UTC)[reply]
- @Trialpears: Hey! Relax! The world has been more stressful than ever and everything is so much more complicated than it first appears. Chill! Let it flow away. I've been feeling stress knowing that I'm causing you stress. Maybe in a while I'll add it back to the requested bots list and I'll link this so whoever tries it knows the history, but Let It Go and keep up the good work. SchreiberBike | ⌨ 23:23, 8 May 2020 (UTC)[reply]
- @Trialpears: Should this be tagged {{BotWithdrawn}} then? * Pppery * it has begun... 17:50, 10 May 2020 (UTC)[reply]
- SchreiberBike thank you for your kind words it really helped to let it go. Pppery, yes this is Withdrawn by operator.. I don't know how closing is done for withdrawn BRFAs. Is it BAG who does it or can anyone do it? ‑‑Trialpears (talk) 22:38, 10 May 2020 (UTC)[reply]
- @Trialpears: Should this be tagged {{BotWithdrawn}} then? * Pppery * it has begun... 17:50, 10 May 2020 (UTC)[reply]
- @Trialpears: Hey! Relax! The world has been more stressful than ever and everything is so much more complicated than it first appears. Chill! Let it flow away. I've been feeling stress knowing that I'm causing you stress. Maybe in a while I'll add it back to the requested bots list and I'll link this so whoever tries it knows the history, but Let It Go and keep up the good work. SchreiberBike | ⌨ 23:23, 8 May 2020 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.