Wikipedia:Bots/Requests for approval/DemonDays64 Bot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: DemonDays64 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 01:22, Saturday, November 23, 2019 (UTC)
Function overview: This bot will edit in HTTPS for applicable links with JavaScript Wiki Browser for large lists of pages that I give it, using a list of RegExes.
Automatic, Supervised, or Manual: Automatic
Programming language(s): No language; it uses JavaScript Wiki Browser, which has functionality to let bots automatically save edits. It also uses WP:JWB Annotated, a fork of JWB that the user Colin M made that is able to find pages to edit that match a certain RegEx, to generate lists of pages; it will get lists that I generate with the search tools in the fork, which will then be pasted into AWB, as it is the one that will be updated, while JWB Annotated is just a fork of one day's version of the tool.
Source code available: At User:DemonDays64 Bot/JWB-settings.js (it is not approved for JWB yet so it doesn't have that page made). It will include the URLS listed at User:DemonDays64/Bot/Links table that don't have 0 pages left to edit.
Links to relevant discussions (where appropriate):
Edit period(s): When I turn on my computer and input a list of pages for it to edit. A few hours of disjointed operation per day.
Estimated number of pages affected: Estimated at >35 per minute, but variable—depends on number of skipped pages and articles with much to change or large articles take a lot longer to load and edit with it. The number could be decreased in the JWB settings if it is too high. For the current set of links I am requesting to be allowed to edit, the number is approximately the total of the numbers listed in the first table at User:DemonDays64/Bot/Links table; somewhere around 100,000 pages over time. The number can be decreased if I use a different set of links.
Namespace(s): Mainspace/Articles
Exclusion compliant Yes:
Function details: Uses a variation of the RegeEx "http:\/\/w?w?w?.?example\.com".
Uses the Regex ((?<=(?<!\?)url ?= ?)|(?<=\[)|(?<=<ref>))(http:\/\/)?(www\.)?example\.com(?!\.)
for sites that I determine to be HTTPS-secured, finding that string and inserting "https://example.com" in the part that matches (with the lookbehind included. It will only affect the http://
or http://www.
or the www.
and not the url=
(or variation) or the square bracket, through the power of Regex lookbehinds (see a list of cases it will catch or not catch and some more details at User:DemonDays64/Bot/Regex Example). The last part, a negative lookahead, stops it from catching things like </nowiki>example.com.au</nowiki>
It will skip pages with the {{bots}} template and ignore unparsed content with the built-in JWB functionality to avoid editing stuff in <nowiki> tags and other similar cases UPDATE: it will not have this option on—it stops editing templates, and the cases for stuff it will edit are so limited (and it is in mainspace only) that avoiding stuff in nowiki and comments is unnecessary—there should be no edge cases that this effects. See User:DemonDays64 Bot/JWB-settings.js for the bot's current settings. The {{bots}} exclusion functionality was tested on User:DemonDays64/Bot/Exclusion test.
Discussion
[edit]- Please tell me how I might improve those RegExes. DemonDays64 | Tell me if I'm doing something wrong :P 02:26, 23 November 2019 (UTC)[reply]
- Comment: You may have to be more specific with the list of sites that you want to change. I don't think you'll be approved for an open-ended task as currently described above. Also, how will this task be different from those already being performed by Bender the Bot? Also, courtesy ping to bender235. – Jonesey95 (talk) 04:45, 23 November 2019 (UTC)[reply]
- @Jonesey95: it would different set of sites from Bender, and more different ones. There’s a lot to edit in, based on the many I edited with JWB before being told to make a bot. Do I really have to list all of them here and get approval every time I add one? I can produce a list of the ones that I was editing in with JWB tomorrow, but that sounds like such a hindrance to have to get each new site approved every time rather than being trusted a little... DemonDays64 | Tell me if I'm doing something wrong :P 06:16, 23 November 2019 (UTC)[reply]
- @Jonesey95: see User:DemonDays64/Bot/Links table for a list of the things I have currently saved in my JWB settings for this account; I would definitely include those in the set. DemonDays64 | Tell me if I'm doing something wrong :P 06:37, 23 November 2019 (UTC)[reply]
- Was were the criteria for selecting these particular links that currently are on
User:DemonDays64/Bot Links ListUser:DemonDays64/Bot/Links table? Might also help if you added a Special:LinkSearch link for each item on that list. Jo-Jo Eumerus (talk) 13:22, 23 November 2019 (UTC)[reply]- @Jo-Jo Eumerus: The criteria was only if it works on HTTPS (and HTTP links to the site were found on pages I was editing with JWB). How I tested the links is described here: I open an HTTP page found on Wikipedia and see if it redirects to HTTPS. If it does, I edit it to be HTTPS and put it in the list. If it doesn't, I try changing the URL to be HTTPS. If there's a "this link is a security threat!" warning, I don't add it in. If the images and stuff seem to not be broken, I add it. When automated, I only do this for main domains—frequently, old subdomains don't have HTTPS. I have added links to Special:LinkSearches on User:DemonDays64/Bot/Links table. DemonDays64 | Tell me if I'm doing something wrong :P 14:55, 23 November 2019 (UTC)[reply]
- Was were the criteria for selecting these particular links that currently are on
- @Jonesey95: see User:DemonDays64/Bot/Links table for a list of the things I have currently saved in my JWB settings for this account; I would definitely include those in the set. DemonDays64 | Tell me if I'm doing something wrong :P 06:37, 23 November 2019 (UTC)[reply]
- @Jonesey95: I would like to know the specific sites you have in mind, too, just to avoid duplicating efforts. There are certainly thousands to choose from. Also, something I want to make you aware of upfront: when you Regex-search for "http://www.example.com/", make sure you don't also catch "https://web.archive.org/.../http://www.example.com/", i.e. the archive url associated with a dead link. --bender235 (talk) 17:08, 23 November 2019 (UTC)[reply]
- @Bender235: see User:DemonDays64/Bot/Links table for a list of the current set of links that I’ve tested (the ones with zero left, of course, will not be part of the list). The Regex is described in detail at User:DemonDays64/Bot/Regex Example. DemonDays64 | Tell me if I'm doing something wrong :P 17:34, 23 November 2019 (UTC)[reply]
- @Bender235: CC User:Jonesey95: User:DemonDays64/Bot/Links table now has a table showing the numbers of pages affected with links to each site and has links to the Special:LinkSearch pages for each. DemonDays64 | Tell me if I'm doing something wrong :P 19:09, 23 November 2019 (UTC)[reply]
- @Bender235: see User:DemonDays64/Bot/Links table for a list of the current set of links that I’ve tested (the ones with zero left, of course, will not be part of the list). The Regex is described in detail at User:DemonDays64/Bot/Regex Example. DemonDays64 | Tell me if I'm doing something wrong :P 17:34, 23 November 2019 (UTC)[reply]
- @Jonesey95: it would different set of sites from Bender, and more different ones. There’s a lot to edit in, based on the many I edited with JWB before being told to make a bot. Do I really have to list all of them here and get approval every time I add one? I can produce a list of the ones that I was editing in with JWB tomorrow, but that sounds like such a hindrance to have to get each new site approved every time rather than being trusted a little... DemonDays64 | Tell me if I'm doing something wrong :P 06:16, 23 November 2019 (UTC)[reply]
- {{BAGAssistanceNeeded}}—it's been a week, and I'm really unsure what the people in control of whether this bot is approved, and just others in general who have some experience in the field, think of this. Only three users have commented, and now I have changed and specified it more, with some links to information and an improved description.
Can someone please take a look at this? I'd really like some input, and hope to get to the trial stage sometime soon. Thanks!
(Signature is very appropriate here LOL) | DemonDays64 | Tell me if I'm doing something wrong :P 05:03, 30 November 2019 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Noting that the community has expressed near-unanimous support (Sept 2016, Sept 2015) for this sort of task in the past. Enterprisey (talk!) 18:35, 3 December 2019 (UTC)[reply]
- @Enterprisey: thanks! I’ve requested the necessary AWB permission at Wikipedia:Requests for permissions/AutoWikiBrowser. DemonDays64 | Tell me if I'm doing something wrong :P 21:09, 3 December 2019 (UTC)[reply]
DemonDays64, Archive URLs take many forms see WP:WEBARCHIVES for known archive URL schemes to skip. Generally if the lookbehind contains "/http" or "?url=http" it should be skipped. -- GreenC 22:15, 3 December 2019 (UTC)[reply]
- @GreenC: Hello! Thanks so much for bringing that to my attention! I have updated the regex listed on this page; now it includes a negative lookbehind in the
url=
thing for?
; that was the only thing I found on that page that would be caught.For good measure, it also won't catchThanks!! DemonDays64 | Tell me if I'm doing something wrong :P 00:21, 4 December 2019 (UTC)[reply]&url=example.com
, either, using another negative lookbehind.
Just to note that I granted the request for extended confirmed rights by the bot operator for this bot account, as per the administrator instructions legit socks can have their socks granted extended confirmed if their main account is extended confirmed. Furthermore, I could not see anything in WP:BOTPOL which did not allow this. I reviewed the bots edits from the trial run, and unless I missed something, they all seemed to be error free, so felt that there was very little (or even no) risk in granting the rights. There was a need for the bot to have the rights according to the bot operator (the bot operator expressed that not being able to edit extended confirmed protected pages was an obstacle to running the trial). If any of the BAGs or other administrators don't feel that the bot should have extended confirmed until it's approved and/or got the bot flag, then feel free to revert my granting of the rights (or if you cannot, ask me to do it by means of a ping). Dreamy Jazz 🎷 talk to me | my contributions 00:59, 6 December 2019 (UTC)[reply]
Trial complete. List of all the trial edits. — Preceding unsigned comment added by DemonDays64 (talk • contribs) 01:55, 6 December 2019 (UTC)[reply]
{{BAG assistance needed}} I have one point of clarification I'd like to be certain about—am I free to add new entries to the list of links to edit, or do I need a new bot request if I want to add any new ones? I keep finding more sites I could add, and am not sure if I have to bother the approvals group with the many potential requests to expand what the bot does, or if I can just add more links to the settings without another approval. Thanks! DemonDays64 | Tell me if I'm doing something wrong :P 02:02, 6 December 2019 (UTC)[reply]
- Approved. You are welcome to add or remove new links to the list as necessary; please make sure all changes are logged (especially any potential removals). I would also ask that you link to this BRFA in your edit summaries. Primefac (talk) 15:47, 8 December 2019 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.