User talk:JL-Bot/Archive 6
This is an archive of past discussions about User:JL-Bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 4 | Archive 5 | Archive 6 | Archive 7 | Archive 8 |
non-deterministic DOI order
This and this are a bit of weird edits. Headbomb {t · c · p · b} 12:29, 29 November 2019 (UTC)
- Added a sort so will always be numerical order. -- JLaTondre (talk) 20:07, 1 December 2019 (UTC)
One exclusion not working.
I can't get
{{JCW-exclude|Metro (British newspaper)|Métro}}
to work at WP:JCW/Questionable6#Metro (British newspaper) Headbomb {t · c · p · b} 10:38, 30 November 2019 (UTC)
- Fixed. -- JLaTondre (talk) 21:12, 1 December 2019 (UTC)
"Recognized content" Portal Philosophy
Hi, JL-Bot recently added a page to Portal_talk:Philosophy#Good_articles, but I don't have a clue why that page should have a place in a philosophy list. It's not the only addition there that raises a question, like f.i. Mick Aston, an archaeologist, and there might be more. What is the source for JL-Bot in these cases? To me it doesn't make sense, at the moment. Greetings, Eissink (talk) 18:09, 8 December 2019 (UTC).
- @Eissink: That's because Talk:2008 attacks on Christians in southern Karnataka is tagged with {{WikiProject Philosophy}}. Headbomb {t · c · p · b} 18:15, 8 December 2019 (UTC)
- Ah, okay, thanks, Headbomb. I will ask there why this apparently should be in the Philosophy Portal, given that the article itself has hardly anything to do with philosophy and is also not categorized under any philosophy or even ethics Category. I assume you understand my surprise here. Thanks again, Eissink (talk) 18:32, 8 December 2019 (UTC).
WP:JCW/Publisher1 is now too large
The publisher pages should probably be split as
- 1-5
- 6-10
- 11-20
- 21-30
- 31-40
- ...
from now on. Headbomb {t · c · p · b} 15:28, 25 October 2019 (UTC)
- I've set up a few additional exclusions to make sure we don't run in limits in the near future, so this isn't ultra high priority. But things will likely bust again in the next few dumps as more articles get created. Headbomb {t · c · p · b} 15:52, 25 October 2019 (UTC)
- The bot uses a common logic for saving pages. Instead of having another one off, lets switch it to a maximum number of entries vs. number of rows. The maximum can vary by type if desired, but the logic would remain the same. In other words, count the number of lines in the entries field, not the number of row templates. -- JLaTondre (talk) 00:10, 28 October 2019 (UTC)
- The biggest issue I see with this that entries 1/2/4 dwarf nearly everything else. E.g. Elsevier has ~2650 rows alone. That's pretty much equivalent to the entirety of WP:JCW/Publisher2. Maybe something like if the first 5 rows' entries > 5000, maxrow = 5. If first 5 rows' entries <= 5000, maxrow = 10. Headbomb {t · c · p · b} 04:34, 28 October 2019 (UTC)
- With maxrow = 50/100 for WP:CRAPWATCH. Headbomb {t · c · p · b} 04:37, 28 October 2019 (UTC)
- The biggest issue I see with this that entries 1/2/4 dwarf nearly everything else. E.g. Elsevier has ~2650 rows alone. That's pretty much equivalent to the entirety of WP:JCW/Publisher2. Maybe something like if the first 5 rows' entries > 5000, maxrow = 5. If first 5 rows' entries <= 5000, maxrow = 10. Headbomb {t · c · p · b} 04:34, 28 October 2019 (UTC)
- The bot uses a common logic for saving pages. Instead of having another one off, lets switch it to a maximum number of entries vs. number of rows. The maximum can vary by type if desired, but the logic would remain the same. In other words, count the number of lines in the entries field, not the number of row templates. -- JLaTondre (talk) 00:10, 28 October 2019 (UTC)
That said, now that the DOI duplication stuff is handled, I doubt we'll be anywhere near limits on WP:CRAPWATCH, so this is just a publisher thing for now. Headbomb {t · c · p · b} 23:52, 30 October 2019 (UTC)
The publisher listing being too large can be greatly mitigated by doing this instead. Applied to everything that makes use of doi=... in the entries column normally. This resolves this issue for now. Headbomb {t · c · p · b} 13:37, 2 January 2020 (UTC)
- Done. The next runs will have the change. -- JLaTondre (talk) 02:02, 3 January 2020 (UTC)
Default sorting tweaks
WP:JCW/A1 sorts as {{DEFAULTSORT:A-01}}
, which is peachy. However, whatever is sorting as {{DEFAULTSORT:* ...}}
is now growing a bit unwieldy. It would be a good time to revisit sorting on a select few pages, namely
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Missing1 ...
{{DEFAULTSORT:μ-01}}
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Popular1 ...
{{DEFAULTSORT:π-01}}
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Publisher1 ...
{{DEFAULTSORT:ρ-01}}
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable1 ...
{{DEFAULTSORT:ϙ-01}}
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Target1 ...
{{DEFAULTSORT:τ-01}}
With pages like .../Target17 sorting as {{DEFAULTSORT:τ-17}}
and so on. Headbomb {t · c · p · b} 03:45, 9 December 2019 (UTC)
- Done. Will show up as pages are re-saved. -- JLaTondre (talk) 02:17, 27 December 2019 (UTC)
WP:JCW/DOI mockup
I had an idea for a regrouping of citations by DOI prefix. This should be relatively straightforward to implement. It would be a straight regrouping of whatever |journal=
is listed regrouped by the DOI prefixes. No advanced matching or anything for typos and variants. However, the usual hierarchy / formatting for redirects, disambiguations, etc... would still apply.
A few things would be needed before doing a full deployment of this, the first one being how pages are organized. Namely, do we go through every DOI prefix from 10.0
(or 10.1000
, depending on where they start) to 10.max
, with visible gaps in between?
Or just list DOI prefix with hits?
Having some statistics on usage would be good. Namely what is the smallest doi prefix, biggest doi prefix, and how many doi prefixes from min to max have hits, %-wise. Headbomb {t · c · p · b} 03:18, 20 December 2019 (UTC)
- The
|registrant=
of {{JCW-DOI-rank}} could be determined directly by parsing{{R from DOI prefix|registrant=...}}
found on DOI redirects, or if that information is omitted, where the doi redirect points (e.g 10.1016→Elsevier) Headbomb {t · c · p · b} 03:41, 20 December 2019 (UTC)- I'll look into it. I will have time next week to work on the backlog of requests. -- JLaTondre (talk) 13:30, 21 December 2019 (UTC)
No rush. I think I'll have a presentation on JCW on January 22nd, so it'd be nice to have that up and running by then, but it's certainly not time critical. Headbomb {t · c · p · b} 18:44, 21 December 2019 (UTC)
- I uploaded an example output. It does not have the formatting yet or the indent level yet. It is just the DOIs with citations. Listing the ones without would be pages of empty ones. Some questions:
- Do you want any "normalization" on the DOIs done? For example, 10.01021 is the second example. This should really be 10.1021 and there are quite a few citations under that record (it's the American Chemical Society).
- There are a number of invalid DOI entries (i.e. don't follow the prescribed format). I'm assuming a new maintenance report, but let me know if you want something different.
- Statistics would be easy, just need to define them. By smallest prefix, you mean lowest number? Least number of citations?
- -- JLaTondre (talk) 17:37, 27 December 2019 (UTC)
- I'm unsure about normalization yet. Certainly a bot report is a good first step. Also, each entry should be linked and formatted as usual. Concerning statistics, I'm thinking something like this (although perhaps even simpler, since the general idea is to have an idea of how many DOI prefixes are in use, and how they are distributed). Headbomb {t · c · p · b} 17:42, 27 December 2019 (UTC)
- I uploaded an example output. It does not have the formatting yet or the indent level yet. It is just the DOIs with citations. Listing the ones without would be pages of empty ones. Some questions:
Someone made a quarry query, and from those results, a few things are clear.
- Citations with DOI prefixes that have 10.#, where # is not 4 or 5 digits should be reported as errors.
- Citations with DOI prefixes that range from 10.0001 to 10.0999 should be reported as errors.
- Citations with DOI prefixes that range from 10.00001 to 10.09999 should be reported as errors.
- Citations with DOI prefixes that are over 10.40000 should be reported as errors.
I'll be thinking about how to organize things shortly. Headbomb {t · c · p · b} 17:56, 29 December 2019 (UTC)
The distribution of prefixes is clearly skewed towards the lower end of prefixes. Things should start at
- WP:JCW/DOI/10.1000 (covering 10.1000 to 10.1009) (with
{{DEFAULTSORT:δ-10.1000}}
) - WP:JCW/DOI/10.1010 (covering 10.1010 to 10.1019) (with
{{DEFAULTSORT:δ-10.1010}}
) - ... (steps of 10)
- WP:JCW/DOI/10.1050 (covering 10.1050 to 10.1074) (with
{{DEFAULTSORT:δ-10.1050}}
) - WP:JCW/DOI/10.1075 (covering 10.1075 to 10.1099) (with
{{DEFAULTSORT:δ-10.1075}}
) - WP:JCW/DOI/10.1100 (covering 10.1100 to 10.1199) (with
{{DEFAULTSORT:δ-10.1100}}
) - WP:JCW/DOI/10.1200 (covering 10.1200 to 10.1299) (with
{{DEFAULTSORT:δ-10.1200}}
) - ... (steps of 100)
- WP:JCW/DOI/10.2000 (covering 10.2000 to 10.2999) (with
{{DEFAULTSORT:δ-10.2000}}
) - WP:JCW/DOI/10.3000 (covering 10.3000 to 10.3999) (with
{{DEFAULTSORT:δ-10.3000}}
) - ... (steps of 1000)
- WP:JCW/DOI/10.10000 (covering 10.10000 to 10.19999) (with
{{DEFAULTSORT:δ-10.10000}}
) - WP:JCW/DOI/10.20000 (covering 10.20000 to 10.29999) (with
{{DEFAULTSORT:δ-10.20000}}
) - WP:JCW/DOI/10.30000 (covering 10.30000 to 10.39999) (with
{{DEFAULTSORT:δ-10.30000}}
)
Ranges might have to be modified down the road, but that should give something reasonable-ish to start with. WP:JCW/DOI1 WP:JCW/DOI2 etc... should just be forgotten as a structure. Headbomb {t · c · p · b} 19:00, 29 December 2019 (UTC)
- Note that this also means that the 'statistics' thing from above is no longer needed. Headbomb {t · c · p · b} 21:26, 29 December 2019 (UTC)
- I manually created the first two. The entries now have formatting and indent levels. However, getting a "Warning: Template include size is too large. Some templates will not be included." on both pages and they are each not displaying their last entry. Once you have reviewed the format & decide upon what to change, I will start working on the automated saving to the pages. -- JLaTondre (talk) 02:04, 31 December 2019 (UTC)
- Note that this also means that the 'statistics' thing from above is no longer needed. Headbomb {t · c · p · b} 21:26, 29 December 2019 (UTC)
Looks good in general (though {{JCW-PrevNext|previous=|current=DOI1|next=DOI2}}
should be {{JCW-PrevNext|previous=|current=DOI/10.1000|next=DOI/10.1010}}
and similar). I'll look into reducing the template's footprint and see what's possible to solve expansion issues. Headbomb {t · c · p · b} 03:41, 31 December 2019 (UTC)
Almost there. Latest version ran and results uploaded. Need to tweak a couple of things (forgot to add the DOI to the footer and the Invalid page needs to list the actual prefixes). -- JLaTondre (talk) 00:21, 4 January 2020 (UTC)
- The first page is blank ish. Also 8000 isn't created (just create a 'plain' version with no hits if that's the case). Last thing I noticed is that things should be by default sorted by prefix. Otherwise looks pretty good. Headbomb {t · c · p · b} 04:25, 4 January 2020 (UTC)
- Also, whenever you have exactly 5 entries, the links to individual articles aren't given. Links should be given for 5 or less links. Headbomb {t · c · p · b} 14:48, 4 January 2020 (UTC)
- Also also, 1100 is too big. It should be broken down in 1100/1125/1150/1175. Headbomb {t · c · p · b} 20:29, 4 January 2020 (UTC)
- Done. 20200101 results uploaded. -- JLaTondre (talk) 02:35, 7 January 2020 (UTC)
- Also also, 1100 is too big. It should be broken down in 1100/1125/1150/1175. Headbomb {t · c · p · b} 20:29, 4 January 2020 (UTC)
- Also, whenever you have exactly 5 entries, the links to individual articles aren't given. Links should be given for 5 or less links. Headbomb {t · c · p · b} 14:48, 4 January 2020 (UTC)
Didn't do a full run?
Seems the bot only processed /TAR and nothing else. Headbomb {t · c · p · b} 12:18, 27 December 2019 (UTC)
- Retrieving the questionable configuration failed. Looks like a Wikipedia API called never returned (probably should ensure it times out & does a retry, but happens so infrequently). Re-running. -- JLaTondre (talk) 12:54, 27 December 2019 (UTC)
- Did it fail again? Headbomb {t · c · p · b} 19:59, 27 December 2019 (UTC)
- There was an issue with yesterday's change for the Weird counting issue. I've reverted that change and re-run. Results are saving now. Though there was a sort issue with the questionable ones. It should be fixed (caught it before the publisher run which looks good) so should revert to normal on the next run. -- JLaTondre (talk) 02:06, 28 December 2019 (UTC)
- Did it fail again? Headbomb {t · c · p · b} 19:59, 27 December 2019 (UTC)
Nothing ran last night, btw. Headbomb {t · c · p · b} 13:15, 29 December 2019 (UTC)
- I was in process of making some changes that I hadn't fully completed. It will run tonight. -- JLaTondre (talk) 01:27, 30 December 2019 (UTC)
- Still has an error. I will fix and re-run this morning. -- JLaTondre (talk) 13:27, 30 December 2019 (UTC)
Structure / sorting tweak
When you have a change, the pages in Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance could be made proper subpages of WP:JCW/Maintenance, i.e.
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Invalid Titles → Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Invalid titles
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Miscapitalisations → Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Miscapitalisations
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Misspellings → Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Misspellings
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Patterns → Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Patterns
- Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/DOI/Invalid → Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Invalid DOI prefixes
Which could all sort as {{DEFAULTSORT:* Subpage}}
Headbomb {t · c · p · b} 03:26, 5 January 2020 (UTC)
- The latest dump is currently saving and will have these changes. Once I verify the new locations saved correctly, I will clean up the old ones. -- JLaTondre (talk) 03:25, 6 January 2020 (UTC)
- I believe I have fixed all the old links, made all the appropriate redirects, etc. for the new page names. -- JLaTondre (talk) 12:33, 6 January 2020 (UTC)
- Bit surprised you didn't just move the old ones to keep their history. Also the Invalid DOIs seem to be nowhere. Going to be up once the DOIs are updated? Headbomb {t · c · p · b} 12:49, 6 January 2020 (UTC)
- Doesn't seem like the history would have much value. But if you want it, I can do a history merge on them. Yes, the DOIs will be up later today. I wanted to get the monthly dump up first. -- JLaTondre (talk) 13:43, 6 January 2020 (UTC)
- It's not extremely useful, but there's some corner cases here and there. I mostly use it to see how things have evolved over the years. Headbomb {t · c · p · b} 14:14, 6 January 2020 (UTC)
- History moved over. -- JLaTondre (talk) 02:35, 7 January 2020 (UTC)
- It's not extremely useful, but there's some corner cases here and there. I mostly use it to see how things have evolved over the years. Headbomb {t · c · p · b} 14:14, 6 January 2020 (UTC)
- Doesn't seem like the history would have much value. But if you want it, I can do a history merge on them. Yes, the DOIs will be up later today. I wanted to get the monthly dump up first. -- JLaTondre (talk) 13:43, 6 January 2020 (UTC)
- Bit surprised you didn't just move the old ones to keep their history. Also the Invalid DOIs seem to be nowhere. Going to be up once the DOIs are updated? Headbomb {t · c · p · b} 12:49, 6 January 2020 (UTC)
- I believe I have fixed all the old links, made all the appropriate redirects, etc. for the new page names. -- JLaTondre (talk) 12:33, 6 January 2020 (UTC)
Weird counting issue.
This causes this. That's a bit surprising. Headbomb {t · c · p · b} 12:27, 26 December 2019 (UTC)
- The bot is working off the database dump with the exception of the configuration settings. Pre-move when it looked for matches to Expository Times (included via the Category:SAGE Publishing academic journals configuration), it would include The Expository Times as that redirected to it. Post move, it looks for The Expository Times. It still found Expository Times, but won't include it, as in the database, Expository Times remains an article and it will only report the target, redirects to the target, and non-articles. I have made a change that if the configuration specifies a redirect (as per the database dump), it will pull in the destination as well. That handles this case and also brings in a few other results. Looks good for a limited test run. Take a look at tonight's update and see if there are any issues with it. -- JLaTondre (talk) 01:54, 27 December 2019 (UTC)
Use the current dump until the 23rd?
Dumps from the 20th of the month seems to get processed anywhere from the 22nd to the 26th normally. Since I'm giving a presentation about JCW on the 22nd, and that I'll be using examples results from the current dump, would it be possible to delay using the next dump until the 23rd?
I realize it's fairly likely the dump isn't ready to be used before the 23rd anyway, but I'd rather not wake up on the 22nd and have to scramble to find new examples of things I want to talk about. It'd just be for this time. Headbomb {t · c · p · b} 10:02, 20 January 2020 (UTC)
- Yeah, usually takes several days for the dump to be generated. And then it depends on me being around to kick off processing (on the list to automate that, but still manual at this point). Easy enough to wait until after the 23rd. -- JLaTondre (talk) 01:29, 22 January 2020 (UTC)
Basic stats
When the bot runs on a new dump, could you output some basic stats at WP:JCW/STATS. Namely
- Number of {{cite xxx}} considered
- Number of non-empty
|doi=
found - Number of {{doi}} found
- Number of {{doi-inline}} found
- Number of articles with any of the above found
- Number of distinct
|journal=
found - Number of distinct DOIs found
- Number of distinct DOI prefixes found
- Anything else you consider interesting
Headbomb {t · c · p · b} 11:48, 8 January 2020 (UTC)
- Should be pretty easy. For the first one (number of templates considered), do you want all the templates examined or just all the templates with a
|journal=
or|doi=
value? In other words, do you want {{cite magazine}}, {{cite book}} counted even if they did not result in a journal result? -- JLaTondre (talk) 00:12, 9 January 2020 (UTC)- Just the ones the bot considers. Headbomb {t · c · p · b} 01:11, 9 January 2020 (UTC)
- I have a presentation on JCW on the ~22nd, so it would be nice to have those before then. If not, not that big a deal, I can deal without. Headbomb {t · c · p · b} 17:23, 10 January 2020 (UTC)
- That won't be an issue. I hope to wrap up this weekend. -- JLaTondre (talk) 22:01, 10 January 2020 (UTC)
- Results are posted. Please review and provide any comments, etc. I didn't do anything about formatting. If you want to edit the page for formatting / wording, I can then update the bot output to match. Interestingly, this turned up two cases of red citation templates due to capitalization mistakes (the bot is case insensitive as there are a variety of redirects based on case). I fixed these in Wikipedia so now the citations will properly display on their respective articles. -- JLaTondre (talk) 15:15, 11 January 2020 (UTC)
- Made some tweaks at Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Statistics. Headbomb {t · c · p · b} 17:12, 11 January 2020 (UTC)
- I'll change the bot output to match. -- JLaTondre (talk) 22:47, 11 January 2020 (UTC)
- Done. -- JLaTondre (talk) 14:52, 25 January 2020 (UTC)
- I'll change the bot output to match. -- JLaTondre (talk) 22:47, 11 January 2020 (UTC)
- "15,784 DOI templates without journal names" should really be about DOI templates with journal names, I feel. That would be more interesting, I think, since they aren't really designed to have journals names to begin with. Headbomb {t · c · p · b} 17:48, 11 January 2020 (UTC)
- {{doi-inline}} has a second field which is usually a journal name. I'll invert the stat. -- JLaTondre (talk) 22:47, 11 January 2020 (UTC)
- Done. -- JLaTondre (talk) 14:52, 25 January 2020 (UTC)
- {{doi-inline}} has a second field which is usually a journal name. I'll invert the stat. -- JLaTondre (talk) 22:47, 11 January 2020 (UTC)
- Made some tweaks at Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Statistics. Headbomb {t · c · p · b} 17:12, 11 January 2020 (UTC)
- I have a presentation on JCW on the ~22nd, so it would be nice to have those before then. If not, not that big a deal, I can deal without. Headbomb {t · c · p · b} 17:23, 10 January 2020 (UTC)
- Just the ones the bot considers. Headbomb {t · c · p · b} 01:11, 9 January 2020 (UTC)
I wonder what articles have {{cite map}}/{{cite av media}}/{{cite sign}}/{{cite speech}} that have journal/doi parameters in them. Headbomb {t · c · p · b} 17:32, 11 January 2020 (UTC)
- Here you go:
- {{cite map}} = Buddhism in Russia x 2, Canadian Arctic Rift System, Demographics of Russia, Ecology of the Sierra Nevada, Grand Canal Street railway works, Hinduism in Russia, Islam in Russia, List of state trunkline highways in Michigan, Religion in Russia, Russia, U.S. Route 385 in Colorado
- {{cite av media}} = Alfie Fripp, Durand Line , Hans Sydow, Hells Bells (cave formations), The Pitchfork Review
- {{cite sign}} = Andreas Gottschalk
- {{cite speech}} = Storm surge
- Though some of those are via redirects to the listed template. -- JLaTondre (talk) 22:47, 11 January 2020 (UTC)
Mbox
Headbomb, is there a way to add a parameter to make the new mbox you added optional (User:JL-Bot/Project content)? « Gonzo fan2007 (talk) @ 21:08, 22 January 2020 (UTC)
- Gimme a few hours and I'll implement something. Headbomb {t · c · p · b} 21:59, 22 January 2020 (UTC)
- Appreciate the quick reply! No need to rush on implementing it unless you want to. « Gonzo fan2007 (talk) @ 22:58, 22 January 2020 (UTC)
- @Gonzo fan2007: adding
|mbox=no
to your subscription should bypass it entirely. You could also add|mbox=<noinclude>no</noinclude>
or|mbox=<includeonly>no</includeonly>
depending of if you want the mbox to appear on certain pages, but not others. Headbomb {t · c · p · b} 23:51, 22 January 2020 (UTC)- Thank you!! « Gonzo fan2007 (talk) @ 01:27, 23 January 2020 (UTC)
- @Gonzo fan2007: adding
- Appreciate the quick reply! No need to rush on implementing it unless you want to. « Gonzo fan2007 (talk) @ 22:58, 22 January 2020 (UTC)
Crossref Registrants
@Headbomb: I have completed retrieving the Crossref registrant information. Before I upload, I had one question. In the bot request, your example table showed all the prefixes (including those without results). Of the 39,000 prefixes, Crossref has registrants for less than half (15,607). That is a lot of empty lines. Do you really want the empties listed also? -- JLaTondre (talk) 13:19, 26 January 2020 (UTC)
- I was thinking of having them as placeholders for other registration agencies, but at the moment, I don't know how their APIs work, so you can omit them. Headbomb {t · c · p · b} 15:41, 26 January 2020 (UTC)
- Done. See User:JL-Bot/DOI for listing of pages. Not all prefixes had results. Also a couple notes on output format:
- If the Crossref name was an invalid Wikipedia page title (typically a due to '/', but could be other cases as well), the name is not linked. If there was not a Wikipedia redirect, then the target will be listed as 'INVALID'.
- If the Crossref page name & the Wikiedia DOI redirect resulted in different targets (typically happens when Crossref is a generic name), then the targets list both. See 10.1136 for an example. Figured this might be useful in case there was an odd discrepancy.
- -- JLaTondre (talk) 01:36, 27 January 2020 (UTC)
- Done. See User:JL-Bot/DOI for listing of pages. Not all prefixes had results. Also a couple notes on output format:
DOI miscount?
In WP:JCW/DOI/10.1500#10.1560, there's a listing of
- Evolution (2 in 1)
However, if you follow the link, you are taken to Jerry Coyne. And in the version that was live, only one 10.1560 prefix is found (although there are two |journal=Evolution
, one with this doi prefix, one without).
The bot should only count the one with the matching DOI prefix. Headbomb {t · c · p · b} 22:16, 28 January 2020 (UTC)
- I hate table joins. Fixed. Results will be up shortly. -- JLaTondre (talk) 01:08, 30 January 2020 (UTC)
Large page optimization
Instead of giving advanced counting functions to {{JCW-PUB-rank}}, {{JCW-CRAP-rank}}, {{JCW-DOI-rank}} and the like, if you could implement this new format, it would greatly speed up the loading of large pages like WP:JCW/Publisher1.
|linecount=
is simply the number of lines in|entries=
|entrycount=
is simply the number of entries with non-zero hits in|entries=
For instance, in WP:JCW/Publisher18#EBSCO Information Services
Rank | Target/Group | Entries (Citations, Articles) | Total Citations | Distinct Articles | Citations/article
|
---|---|---|---|---|---|
174 | EBSCO Information Services | {{doi|10.1033}} |
|
18 | 14 | 1.286 |
Would have |linecount=10
(10 matches of \*+
), but |entrycount=9
(9 matches of \(\d+ in
) because EBSCO Information Services has no hits. Headbomb {t · c · p · b} 04:51, 21 January 2020 (UTC)
- I need to re-work how I organize the data for output already to handle the three level indent request. I can do this in away that gets me closer to handling that one. -- JLaTondre (talk) 01:31, 22 January 2020 (UTC)
- Sure. Headbomb {t · c · p · b} 01:34, 22 January 2020 (UTC)
- IMO, this is the next step, because if page loading becomes speedier, we could probably look into automatic "publisher" configuration via the targets/registrants of Category:Redirects from DOI prefixes. I've been holding off adding
|doi=10.1016
to Elsevier (and similar for other publishers) since those would be so massive it would make the pages explode in load times. Headbomb {t · c · p · b} 22:38, 3 February 2020 (UTC)- I've been working on re-doing the output to support the three level nesting and other related requests. It's pretty easy to fit this into the current output in the meantime. I will add it into the existing and then do it more elegantly in the re-worked version. -- JLaTondre (talk) 01:31, 6 February 2020 (UTC)
- Some things obviously didn't work here and elsewhere. Things could also probably simplified to
|l-count=
and|e-count=
. Also don't forget the linebreak before listing the entries. Headbomb {t · c · p · b} 01:36, 7 February 2020 (UTC)- I know, I wasn't done. It's done now. You say "probably simplified". Does that mean those parameters are there now or you need to add them? -- JLaTondre (talk) 02:00, 7 February 2020 (UTC)
- None of these parameter are used yet (edit: now they are), I was waiting for them to be up before coding them. It's just as trivial to code for
|l-count=
vs|linecount=
, but the first one reduces page size a bit and every little bit helps. Headbomb {t · c · p · b} 02:07, 7 February 2020 (UTC)- I changed it to the shorter versions. That will show up with next run. -- JLaTondre (talk) 03:00, 7 February 2020 (UTC)
- None of these parameter are used yet (edit: now they are), I was waiting for them to be up before coding them. It's just as trivial to code for
- I know, I wasn't done. It's done now. You say "probably simplified". Does that mean those parameters are there now or you need to add them? -- JLaTondre (talk) 02:00, 7 February 2020 (UTC)
- Some things obviously didn't work here and elsewhere. Things could also probably simplified to
- I've been working on re-doing the output to support the three level nesting and other related requests. It's pretty easy to fit this into the current output in the meantime. I will add it into the existing and then do it more elegantly in the re-worked version. -- JLaTondre (talk) 01:31, 6 February 2020 (UTC)
- IMO, this is the next step, because if page loading becomes speedier, we could probably look into automatic "publisher" configuration via the targets/registrants of Category:Redirects from DOI prefixes. I've been holding off adding
- Sure. Headbomb {t · c · p · b} 01:34, 22 January 2020 (UTC)
Wikipedia:JCW/BADDOI didn't update
I suspect it's because there are no longer any bad doi prefix. If that's the case, the page should still get an update + timestamp (similar to WP:JCW/DOI/10.8000) so people don't wonder. Headbomb {t · c · p · b} 09:07, 7 February 2020 (UTC).:
- Fixed. -- JLaTondre (talk) 00:19, 8 February 2020 (UTC)
- Also WP:JCW/DOI/10.10000 is getting too big. Splitting into WP:JCW/DOI/10.15000 (containing 10.15000 to 10.19999) would be ideal. Headbomb {t · c · p · b} 10:42, 7 February 2020 (UTC)
- I changed it so the bot is now using the pages listed in {{JCW-Main}}. If you need a new page, you can simply add it to the template and it will start using it at the next run (it just did that for 10.15000). If you'd rather move it somewhere else so that you can add an explanation of how it works (perhaps Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/DOI) just keep the list in the same format. Let me know where and I'll change it to pull from there. -- JLaTondre (talk) 00:19, 8 February 2020 (UTC)
MMWR Supplements not picked up in WP:JCW/Target3#Morbidity and Mortality Weekly Report
- MMWR Supplement is picked up, but MMWR Supplements aren't. Not sure why that's the case. Headbomb {t · c · p · b} 02:24, 9 February 2020 (UTC)
- Because 'MMWR Supplement' is normalized to 'mmwr' (with the logic of removing a trailing 'supplement'), but 'Supplements' (with a "s") is not removed; resulting in 'MMWR Supplements' normalizing to 'mmwrsupplements'. Obviously, 'mmwr' and 'mmwrsupplements' differ by more than the limit and so it's not picked up. I will update the normalization to remove 'supplements' as well as 'supplement'. I'll have to look and see if there are any other plural cases that should also be handled. -- JLaTondre (talk) 23:00, 9 February 2020 (UTC)
- Common targets have been updated. If there are any impacts to publishers & questionable, they will show up when those next update. -- JLaTondre (talk) 02:43, 10 February 2020 (UTC)
Bot needs a kick?
The bot hasn't run in a while. I haven't updated anything in the config pages, so that's a bit normal, but the dump has been out for a few days now. Any word on when the new dump can be processed? Headbomb {t · c · p · b} 07:30, 25 March 2020 (UTC)
- There was not a second dump this month. See this announcement. -- JLaTondre (talk) 10:52, 25 March 2020 (UTC)
Publisher1 Template Issue
@Headbomb: There is a template issue at Publisher1. There is an error showing and the rank 9 entry is not displaying. I'm not seeing am obvious cause. Can you see if there is an output issue or a template issue? Thanks. -- JLaTondre (talk) 14:53, 6 May 2020 (UTC)
- @JLaTondre: that seems to be a Category:Pages where template include size is exceeded type of thing. Headbomb {t · c · p · b} 14:56, 6 May 2020 (UTC)
Three-Level Hierarchy (was New version, part 8)
I notice that the Crapwatch uses a 2-level hierarchy, rather than a 3-level hierarchy as above. The 3-level hierarchy makes it clear if something is a typo, or if something is a direct match to something in WP:CRAPWATCH/SETUP.
For instance
Rank | Target/Group | Entries (Citations, Articles) | Total Citations | Distinct Articles | Citations/article
|
---|---|---|---|---|---|
148 | Blaze Media [WP:RSP § Generally unreliable] WP:RSP#Blaze Media |
|
11 | 7 | 1.571 |
would be much better understood as
Rank | Target/Group | Entries (Citations, Articles) | Total Citations | Distinct Articles | Citations/article
|
---|---|---|---|---|---|
148 | Blaze Media [WP:RSP § Generally unreliable] WP:RSP#Blaze Media |
11 | 7 | 1.571 |
since Blaze Magazine is a typo/variant of The Blaze (magazine)
Likewise with Hindawi, you have
Rank | Target/Group | Entries (Citations, Articles) | Total Citations | Distinct Articles | Citations/article
|
---|---|---|---|---|---|
4 | Hindawi Publishing Corporation [Beall's publisher list*] Originally listed on Beall's list, but later removed as a 'borderline case' |
...
... |
2478 | 2097 | 1.182 |
which would be a lot clearer to understand (to humans) why Int J Inflam was listed if it was under
Rank | Target/Group | Entries (Citations, Articles) | Total Citations | Distinct Articles | Citations/article
|
---|---|---|---|---|---|
4 | Hindawi Publishing Corporation [Beall's publisher list*] Originally listed on Beall's list, but later removed as a 'borderline case' |
...
... |
2478 | 2097 | 1.182 |
Int J Inflam would still be only counted once in the statistics, even if it was listed twice. Headbomb {t · c · p · b} 09:50, 18 March 2019 (UTC)
- Still plugging away at this. Every time I think I'm close, I stumble across another special case. -- JLaTondre (talk) 17:46, 31 March 2019 (UTC)
- This is finally complete. I have refactored the code to allow the hierarchy to be generated. There are also some performance and other improvements:
- Normalization matches now exclude citations that are part of the configuration for a different publisher / questionable target. For example, the publisher configuration includes Geoscience Letters under SpringerOpen. Previously, it was showing up in the results for Geoscience e-Journals based on a normalization match. It will now only be reported under SpringerOpen.
- Added a maximum lines per page. The three-level hierarchy expands the size the results. This would have increased the template expansion errors for the publisher results. Now, if the page grows too long, it will end the page and start a new one even if less than the specified number of records per page. The current limit is 12,700 lines per page.
- Previously when doi template results were merged with citation template results, the article and citation counts were not always de-duped correctly. This has been fixed.
- I did a pretty thorough comparison between the old and new results. However, the citation process is pretty complex so let me know if you see anything odd. The results will be uploading shortly. -- JLaTondre (talk) 17:30, 22 May 2020 (UTC)
DOI inline merge into main grouping when possible
In WP:JCW/Publisher5#Mary Ann Liebert you have
- Astrobiology (432 in 243)
and then later
- doi=10.1089
- Astrobiology (1 in 1)
The second entry comes from a {{doi-inline}} template, and isn't properly merged into the main grouping. Headbomb {t · c · p · b} 12:15, 18 December 2019 (UTC)
- Partially addressed. It will no longer produce the duplicate listing (now checks for 'TITLE (journal)' and 'TITLE (magazine)' as well as 'TITLE' matches). However, in looking at this, I realized that it's not properly updating the article counts in these cases. I will work on that. -- JLaTondre (talk) 02:51, 27 December 2019 (UTC)
- Done. This was addressed in the three-level hierarchy changes. -- JLaTondre (talk) 17:30, 22 May 2020 (UTC)
Exclude bluelinks/redlinks with JCW-patterns
It would be useful if we could exclude bluelinks/redlinks from matching with {{JCW-pattern}}. For example,
{{JCW-pattern|Online|*Online*|!Nonlinear!|exclude=bluelinks}}
would only match redlinks. This would be useful in the case of something like
- BBC News Online (15 in 13)
- BBC News online (5 in 1, 2, 3, 4, 5)
- BBC Online (2 in 1, 2)
- BBC online (3 in 1, 2)
- BBC Online Network (1 in 1)
which would exclude the first four entries, but not the last one. Conversely,
{{JCW-pattern|Online|*Online*|!Nonlinear!|exclude=redlinks}}
would only match bluelinks, and in this case, keep the first four entries, but exclude the last one. Headbomb {t · c · p · b} 12:50, 19 November 2019 (UTC)
- @Headbomb: This is complete. The
|exclude=(blue|red)links
must be the last parameter supplied. -- JLaTondre (talk) 19:10, 24 May 2020 (UTC)
Missing article
1985 Tour de France turned GA on 17 May, more than a week ago. However, it does not appear at Wikipedia:WikiProject Cycling or Wikipedia:WikiProject Cycling/Tour de France task force, though the article is tagged for both? Zwerg Nase (talk) 15:57, 28 May 2020 (UTC)
- That task last ran on the 16th (the day before it was tagged). Due to several reasons, it did run this past weekend. It will run this weekend & should get picked up then. -- JLaTondre (talk) 20:20, 28 May 2020 (UTC)
- @JLaTondre: since this question comes about every now and then, how about having something like User:JL-Bot/Recog-date, to transclude on pages, or perhaps just adding "Last ran on ... " somewhere on the page? Headbomb {t · c · p · b} 20:33, 28 May 2020 (UTC)
- Yes, that occurred to me when I was responding. The other thing is that I need to automate that task so that it's not dependent on my schedule. -- JLaTondre (talk) 21:29, 28 May 2020 (UTC)
- @JLaTondre: Thank you for the quick response! Zwerg Nase (talk) 08:50, 29 May 2020 (UTC)
- Task has been automated & it's worked these last two weekends. -- JLaTondre (talk) 12:14, 6 June 2020 (UTC)
- @JLaTondre: Thank you for the quick response! Zwerg Nase (talk) 08:50, 29 May 2020 (UTC)
- Yes, that occurred to me when I was responding. The other thing is that I need to automate that task so that it's not dependent on my schedule. -- JLaTondre (talk) 21:29, 28 May 2020 (UTC)
- @JLaTondre: since this question comes about every now and then, how about having something like User:JL-Bot/Recog-date, to transclude on pages, or perhaps just adding "Last ran on ... " somewhere on the page? Headbomb {t · c · p · b} 20:33, 28 May 2020 (UTC)
How many templates for User:JL-Bot/Project content?
Are there any restrictions on how many templates JL-Bot can pick up from {{User:JL-Bot/Project content}}? Can the bot handle 30 templates? E.g. Special:Diff/960922365/961310436. —andrybak (talk) 18:53, 7 June 2020 (UTC)
- How about 131? Special:Diff/961315958 —andrybak (talk) 19:32, 7 June 2020 (UTC)
- There is no restriction on the number. It would theoretically be possible to add so many the bot ran out of memory, but that would be far more than those cases. -- JLaTondre (talk) 23:37, 8 June 2020 (UTC)
Problem bot edit
JLaTondre, I just rolled back this edit by JL-Bot. It added ~150K to a portal talk page in a single edit, along with a huge number of images in a gallery that was having a major impact on page loading time. I'm not sure what the point of this was, but there might be others like this that I didn't see. –Deacon Vorbis (carbon • videos) 15:22, 13 June 2020 (UTC)
- Deacon Vorbis, this is due to a big number of templates supplied to the bot's input: Special:Diff/961315958. I moved the list of featured articles without the featured pictures to a separate page, Portal:Science/Recognized content, to keep the discussion page, Portal talk:Science, clean.
- Featured pictures list supplied by the bot is not used by the portal directly yet—perhaps a separate page for featured pictures with fewer templates in the input could be set up to use it as suggestions for Portal:Science/Featured picture. —andrybak (talk) 16:28, 13 June 2020 (UTC)
WP:JCW/DOI nightly runs
Would be a good idea to do runs if Category:Redirects from DOI prefixes has new/different members in it. I don't believe anything would change except for |registrant=
in the compilation, so maybe a seperate subroutine to just sync |registrant=
with the category would be enough. Headbomb {t · c · p · b} 15:44, 10 January 2020 (UTC)
- The doi processing is pretty quick. For now, I will have it run if there are any other updates. I can add in the category check in awhile. -- JLaTondre (talk) 22:04, 10 January 2020 (UTC)
- Didn't run alongside the other updates last night. Still to be implemented, or a bug? Headbomb {t · c · p · b} 17:17, 12 January 2020 (UTC)
- Manually running it. Should run with future ones. -- JLaTondre (talk) 01:35, 13 January 2020 (UTC)
- Didn't run alongside the other updates last night. Still to be implemented, or a bug? Headbomb {t · c · p · b} 17:17, 12 January 2020 (UTC)
@JLaTondre: I think the bot chocked last night. Headbomb {t · c · p · b} 11:49, 18 January 2020 (UTC)
- Server had an internet outage last night. It will run tonight. -- JLaTondre (talk) 21:29, 18 January 2020 (UTC)
- @JLaTondre: Did the bot crash last night? It only edited [1], and I know for a fact that there was some changes in DOIs and exclusions. Headbomb {t · c · p · b} 19:49, 2 February 2020 (UTC)
- Issue resolved. Should run tonight. -- JLaTondre (talk) 23:19, 2 February 2020 (UTC)
- Still a nope. I wonder if it's because I'm editing the config pages midrun. That hasn't been an issue before though. Headbomb {t · c · p · b} 11:13, 3 February 2020 (UTC)
- No, it was due to a typo on my part. I uploaded last night's results. It's now running the 20200201 dump. -- JLaTondre (talk) 22:20, 3 February 2020 (UTC)
- Still a nope. I wonder if it's because I'm editing the config pages midrun. That hasn't been an issue before though. Headbomb {t · c · p · b} 11:13, 3 February 2020 (UTC)
- Issue resolved. Should run tonight. -- JLaTondre (talk) 23:19, 2 February 2020 (UTC)
- @JLaTondre: Did the bot crash last night? It only edited [1], and I know for a fact that there was some changes in DOIs and exclusions. Headbomb {t · c · p · b} 19:49, 2 February 2020 (UTC)
Also User:JL-Bot/DOI could be updated with every dump (with the new-template based format). Headbomb {t · c · p · b} 09:48, 16 February 2020 (UTC)
- Yes, that was originally a one-off. I'll change it to update with new pages. -- JLaTondre (talk) 00:55, 18 February 2020 (UTC)
- It's mostly to provide a semi-monthly reset because I'm changing various patterns to see if it matches something that already exists on Wikipedia. And also to get new registrants. Once per dump is all that's needed here. Headbomb {t · c · p · b} 14:37, 25 February 2020 (UTC)
- Ah, you changed topics. I was thinking this was related to WP:JCW/DOI & was a listing of its subpages. The CrossRef retrieval takes over 10 hours. That is a pretty extensive use of their resources. I've kicked it off, but let's see how much delta there is in the results before we query them monthly. Tomorrow, I'll run the upload to the User:JL-Bot/DOI pages. I've changed the output format over to the templates you created. By the way, I've been working the three level hierarchy as well as some performance improvements. I'm hoping to have that wrapped up in the next couple of weeks, but my schedule is constrained at the moment and it requires some significant changes (needed to change the data structures in order to have the information required for the hierarchy at the point it is generated). I'll be spending quite a bit of time validating the output. -- JLaTondre (talk) 23:59, 25 February 2020 (UTC)
- Cool beans! If the delta is small, what could be done is something like a base-reset (no queries to CrossRef), with a full refresh once per month/three months/six months/year/whatever. Headbomb {t · c · p · b} 01:25, 26 February 2020 (UTC)
- The results are up. I fixed an issue that caused the last page (10.37000) not to be saved last time. Excluding that, there are still a significant number of differences between the two CrossRef results. Mostly minor changes in the formats of names, but some major changes as well as new listings. I will post a user friendly comparison in a bit. -- JLaTondre (talk) 22:21, 26 February 2020 (UTC)
- Cool beans! If the delta is small, what could be done is something like a base-reset (no queries to CrossRef), with a full refresh once per month/three months/six months/year/whatever. Headbomb {t · c · p · b} 01:25, 26 February 2020 (UTC)
- Ah, you changed topics. I was thinking this was related to WP:JCW/DOI & was a listing of its subpages. The CrossRef retrieval takes over 10 hours. That is a pretty extensive use of their resources. I've kicked it off, but let's see how much delta there is in the results before we query them monthly. Tomorrow, I'll run the upload to the User:JL-Bot/DOI pages. I've changed the output format over to the templates you created. By the way, I've been working the three level hierarchy as well as some performance improvements. I'm hoping to have that wrapped up in the next couple of weeks, but my schedule is constrained at the moment and it requires some significant changes (needed to change the data structures in order to have the information required for the hierarchy at the point it is generated). I'll be spending quite a bit of time validating the output. -- JLaTondre (talk) 23:59, 25 February 2020 (UTC)
- It's mostly to provide a semi-monthly reset because I'm changing various patterns to see if it matches something that already exists on Wikipedia. And also to get new registrants. Once per dump is all that's needed here. Headbomb {t · c · p · b} 14:37, 25 February 2020 (UTC)
@JLaTondre: very useful. I've removed the 37000s to get a more representative sense of what a typical delta would be. Whatever frequency we settle on for the JL-Bot/DOI updates, uploading a delta automatically would be very useful. Headbomb {t · c · p · b} 00:17, 27 February 2020 (UTC)
- The 37000s were a valid delta. While they didn't get uploaded last time, I had collected the data. -- JLaTondre (talk) 01:04, 27 February 2020 (UTC)
- Ah I see. Well I suppose it makes sense, if DOI prefixes get assigned in roughly sequential order. Headbomb {t · c · p · b} 01:07, 27 February 2020 (UTC)
- User:JL-Bot/DOI/Deltas hasn't been updated in a while. It would be useful if it did with every compilation update. Headbomb {t · c · p · b} 15:02, 6 May 2020 (UTC)
- New version uploaded. I will set it up to run monthly. -- JLaTondre (talk) 13:22, 8 May 2020 (UTC)
- The bot has been updated to automatically detect a new dump and download & process it (previously it did the nightly updates automatically, but I had to initiate the new dump download). It will also automatically pull the registrant information from Crossref monthly. -- JLaTondre (talk) 19:53, 26 June 2020 (UTC)
- New version uploaded. I will set it up to run monthly. -- JLaTondre (talk) 13:22, 8 May 2020 (UTC)
- User:JL-Bot/DOI/Deltas hasn't been updated in a while. It would be useful if it did with every compilation update. Headbomb {t · c · p · b} 15:02, 6 May 2020 (UTC)
- Ah I see. Well I suppose it makes sense, if DOI prefixes get assigned in roughly sequential order. Headbomb {t · c · p · b} 01:07, 27 February 2020 (UTC)
Source code?
I just realize that I don't think I remember you uploading the JL-Bot code publicly? It's a fairly advance piece of software now, and I'm starting to get worrying about the bus factor here. Would you be willing to put the code up somewhere (possibly in a {{infobox bot}} on the bot's userpage)? Headbomb {t · c · p · b} 21:18, 12 November 2019 (UTC)
- The source code is now on GitHub. I have added links on the bot's user page. -- JLaTondre (talk) 19:35, 26 June 2020 (UTC)
Duplication?
In Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Publisher37, and many other places, every publisher entry seems duplicated. Headbomb {t · c · p · b} 12:40, 28 June 2020 (UTC)
- Fixed. I'm about to push out the DOI redirect configuration changes. That is going to add additional publisher entries. I will do that and then re-run. Once done, I will clean-up the remaining unneeded pages. Thanks. -- JLaTondre (talk) 14:21, 28 June 2020 (UTC)
- Okay, made the fix, but forgot to update the branch with the doi-redirects so compounded the problem. Working it... -- JLaTondre (talk) 21:07, 28 June 2020 (UTC)
- All cleaned up. -- JLaTondre (talk) 00:04, 29 June 2020 (UTC)
- Okay, made the fix, but forgot to update the branch with the doi-redirects so compounded the problem. Working it... -- JLaTondre (talk) 21:07, 28 June 2020 (UTC)
Automatic DOI-based subscriptions for WP:JCW/PUB and WP:CITEWATCH
Now that we have a substantial amount of DOIs, it would be good if the bot automatically 'selected' publishers and journals based on Category:Redirects from DOI prefixes.
For example 10.1068 has this
#REDIRECT[[SAGE Publishing]] {{R from DOI prefix|registrant=Pion Ltd}}
For SAGE Publishing, this would basically be every Redirects from DOI prefixes that points to SAGE Publishing (with each |registrant=
found in those redirects listed as |imprint#=
) or which has |registrant=SAGE Publishing
(in this case nothing)
{{JCW-selected |SAGE Publishing |imprint1=Pion Ltd |doi1=10.1068 |doi2=10.1106 |doi3=10.1177 |doi4=10.1191 |doi5=10.1243 |doi6=10.1258 |doi7=10.1345 |doi8=10.1354 |doi9=10.1369 |doi10=10.1622 |doi11=10.1630 |doi12=10.2182 |doi13=10.2189 |doi14=10.2511 |doi15=10.2968 |doi16=10.3317 |doi17=10.3821 |doi18=10.4135 |doi19=10.4137 |doi20=10.4219 |doi21=10.5034 |doi22=10.5126 |doi23=10.5193 |doi24=10.5301 |doi25=10.5367 |doi26=10.7182 |doi27=10.17322 |doi28=10.31124}}
For Pion Ltd, this would basically be everything that points to Pion Ltd (in this case nothing, since it redirects to SAGE Publishing) or has |registrant=Pion Ltd
(10.1068)
{{JCW-selected|Pion Ltd|parent1=SAGE Publishing|doi1=10.1068}}
- For WP:JCW/PUB, it should do this automatically for all DOI redirects.
- For WP:CITEWATCH, it should do this automatically only if there's a target/registrant match with a corresponding entry on WP:CITEWATCH/SETUP.
Headbomb {t · c · p · b} 08:49, 16 February 2020 (UTC)
- @Headbomb: Trying to make sure I understand this one:
- You are saying if {{R from DOI prefix}} has a registrant that is different than the redirect target, it should be used to generate a result for both the redirect target and also for the registrant? Subject to existence on the CITEWATCH configuration page for CITEWATCH.
- How does {{JCW-selected}} come into play here? Are you expecting this to update the configuration pages? Why wouldn't it instead just use the {{R from DOI prefix}} redirects to drive the processing directly (i.e. remove all DOIs from the configuration pages)?
- Thanks. -- JLaTondre (talk) 00:02, 25 June 2020 (UTC)
- It's been a while, and it's late here (I'll double check tomorrow), but the first bullet seems to be what I have in mind, more or less. I'd see it as a potentially two-pronged approach
- 1a) Load all the {{R from DOI prefix}} and process that directly. [This what the main bot should mostly care about]
- 1b) Update the WP:JCW/PUBSETUP/WP:CITEWATCH/SETUP entries to reflect the {{R from DOI prefix}} structure. This is mostly for the human benefit of reviewing the structure. It should only add DOIs prefixes, and touch those with redirects, because if a publisher article is deleted, or if a DOI redirect is deleted, we'll want to take care of this manually, rather than automatically.
- The effect of 1b) would make the 1a) step kinda redundant, so long as there is no change to the DOI structure, so there might be some optimizations possible there when fetching stuff.
- So basically things can't just be automated based on the DOI redirects, because many publishers (the less notable ones and most predatory ones) will have manually declared DOIs without redirects. And we want to keep those associations. Headbomb {t · c · p · b} 02:47, 25 June 2020 (UTC)
- My preference would be to have a separate template for these (perhaps {{JCW-doi-redirects}}). That way I don't have to worry about editing the existing configuration records. The bot can just update its own configuration lines and the manual ones can remain in {{JCW-selected}}. That might also reduce confusion where an editor removes a DOI entry and the bot puts it back because the editor didn't realize it was from a DOI redirect. Wasn't there a bot that used to sort the configuration pages? Is that still active? If so, is it doing anything fancier than a basic alphanumeric sort? I'd like to make sure the entries are inserted in the same order to avoid conflicts. -- JLaTondre (talk) 16:16, 27 June 2020 (UTC)
- A separate template would probably work fine yes. Might be easier to just automatically every DOI from {{R from DOI prefix}} in there and I'll remove duplicates from {{JCW-selected}} manually. There was a sorting bot (User:RonBot), but its maintainer User:Ronhjones and his wife sadly died in a house fire in 2019. User:TheSandDoctor was going to revive it at one point, but I don't know where that's at. Headbomb {t · c · p · b} 16:30, 27 June 2020 (UTC)
- I've pretty much completed the configuration page update portion. The template needs to be created. I looked at the {{JCW-selected}} one, but that uses Lua which I'm not familiar with. If you can create the template, I will have it save the updates. I was planning on the following format:
{{JCW-doi-redirects|SAGE Publishing|10.1068|10.1106|10.1177|10.1191|10.1243|10.1258|10.1345|10.1354|10.1369|10.1622|10.1630|10.17322|10.2182|10.2189|10.2511|10.2968|10.31124|10.3317|10.3821|10.4135|10.4137|10.4219|10.5034|10.5126|10.5193|10.5301|10.5367|10.7182}}
- If you want to use a different name, that is fine. Due to the way it works, it will sort the entries within an individual section of the configuration page and remove any exact duplicates (duplicate lines). -- JLaTondre (talk) 23:23, 27 June 2020 (UTC)
- Should be ready. Maybe it'll look weird in the edit window, but it that can be fixed later if that's the case. Headbomb {t · c · p · b} 23:28, 27 June 2020 (UTC)
- My apologies for the inactivity on this, Headbomb. Thank you also for pinging me...I did not know that Ron had passed. Rather surreal and very tragic. I am open to anyone else taking this on and I will provide any assistance that I can to such person if requested. --TheSandDoctor Talk 06:46, 28 June 2020 (UTC)
- This is complete. It should now update nightly the configurations based on the DOI redirects and use those in generating the updated publisher and questionable listings (assuming any changes). The first pass has been performed and uploaded. I have changed the sort order for the DOIs (it was normal sort, changed it to sort only by the decimal portion like the DOI pages) which means when it runs tonight, it should update based on that. If any other checks of the configuration pages are required, I can probably take that on. I don't believe I ever interacted with Ron, but still jarring to hear that. Makes me appreciate more the completed request above to have the source code public. -- JLaTondre (talk) 00:18, 29 June 2020 (UTC)
- My apologies for the inactivity on this, Headbomb. Thank you also for pinging me...I did not know that Ron had passed. Rather surreal and very tragic. I am open to anyone else taking this on and I will provide any assistance that I can to such person if requested. --TheSandDoctor Talk 06:46, 28 June 2020 (UTC)
- Should be ready. Maybe it'll look weird in the edit window, but it that can be fixed later if that's the case. Headbomb {t · c · p · b} 23:28, 27 June 2020 (UTC)
- A separate template would probably work fine yes. Might be easier to just automatically every DOI from {{R from DOI prefix}} in there and I'll remove duplicates from {{JCW-selected}} manually. There was a sorting bot (User:RonBot), but its maintainer User:Ronhjones and his wife sadly died in a house fire in 2019. User:TheSandDoctor was going to revive it at one point, but I don't know where that's at. Headbomb {t · c · p · b} 16:30, 27 June 2020 (UTC)
- My preference would be to have a separate template for these (perhaps {{JCW-doi-redirects}}). That way I don't have to worry about editing the existing configuration records. The bot can just update its own configuration lines and the manual ones can remain in {{JCW-selected}}. That might also reduce confusion where an editor removes a DOI entry and the bot puts it back because the editor didn't realize it was from a DOI redirect. Wasn't there a bot that used to sort the configuration pages? Is that still active? If so, is it doing anything fancier than a basic alphanumeric sort? I'd like to make sure the entries are inserted in the same order to avoid conflicts. -- JLaTondre (talk) 16:16, 27 June 2020 (UTC)
The number of subpages is getting a bit unwieldy. It would probably be best to group these in groups of up to 100 when the size limit isn't reached. Headbomb {t · c · p · b} 13:55, 29 June 2020 (UTC)
- Changed. That is just a configuration setting at this point. -- JLaTondre (talk) 00:36, 30 June 2020 (UTC)
- That significantly reduced the number of pages. I will work on cleaning up the obsolete ones latter today. -- JLaTondre (talk) 10:43, 30 June 2020 (UTC)
- WP:JCW/Publisher4 / 5 / 6 / 7 still exceeds size limit. Things may have to be tweaked. Headbomb {t · c · p · b} 11:57, 30 June 2020 (UTC)
- Reading WP:PEIS makes my head hurt. I've played around a little bit without any success. I can set an arbitrary small maximum line size or record size, but there should be some way to better optimize the sizing. There is an interaction between the number of templates and the number of entries in the template that is more complex than just the total number of lines. I will need to play around with it some more. -- JLaTondre (talk) 02:57, 1 July 2020 (UTC)
- I reduced the maximum lines per page. That solves it for now. -- JLaTondre (talk) 20:15, 2 July 2020 (UTC)
- Reading WP:PEIS makes my head hurt. I've played around a little bit without any success. I can set an arbitrary small maximum line size or record size, but there should be some way to better optimize the sizing. There is an interaction between the number of templates and the number of entries in the template that is more complex than just the total number of lines. I will need to play around with it some more. -- JLaTondre (talk) 02:57, 1 July 2020 (UTC)
- WP:JCW/Publisher4 / 5 / 6 / 7 still exceeds size limit. Things may have to be tweaked. Headbomb {t · c · p · b} 11:57, 30 June 2020 (UTC)
- That significantly reduced the number of pages. I will work on cleaning up the obsolete ones latter today. -- JLaTondre (talk) 10:43, 30 June 2020 (UTC)
Splitting WP:CITEWATCH/SETUP
How much trouble would it be to split WP:CITEWATCH/SETUP into three parts?
- User:JL-Bot/Questionable.cfg/General, containing every current section from Sources to WP:RSP
- User:JL-Bot/Questionable.cfg/Publishers, containing the current Publishers section
- User:JL-Bot/Questionable.cfg/Journals, containing the current Journals section
This would avoid running into template expansion limits, and make the pages easier to load on slower machines. Headbomb {t · c · p · b} 18:41, 7 July 2020 (UTC)
- That should be pretty straightforward. I may be able to get to it this weekend. -- JLaTondre (talk) 21:10, 9 July 2020 (UTC)
- No rush. It's mostly a matter of convenience. Headbomb {t · c · p · b} 21:15, 9 July 2020 (UTC)
- Done. I had time today. I will leave it for you to split the page to your liking. The bot is loading from the current page and the subpages so no hurry. It will continue to operate. Once you have completed the split, I will remove the current page from the configuration unless there would be some reason to include it also. -- JLaTondre (talk) 17:35, 10 July 2020 (UTC)
- No rush. It's mostly a matter of convenience. Headbomb {t · c · p · b} 21:15, 9 July 2020 (UTC)
Should be done. Headbomb {t · c · p · b} 18:10, 10 July 2020 (UTC)
Pardon my french
A year ago, I made a typo in a request. This is the fix.
Not sure if it ended up in the code, but I figured I'd point it out. Headbomb {t · c · p · b} 22:21, 10 July 2020 (UTC)
- No impact. The normalization doesn't care what is in front / behind the series | série | part, etc. Too many possibilities and allowing any word has been working fine so far. -- JLaTondre (talk) 21:50, 11 July 2020 (UTC)
Bot removals?
What's this new section? What is its purpose / How does it work? Headbomb {t · c · p · b} 10:45, 24 November 2019 (UTC)
- LOL, and here I was thinking my edit summaries were pretty clear. ;-) I saw the bot removed a couple entries it shouldn't have so I restored them all until I can figure out what went wrong. Since you had done multiple edits in between, I couldn't simply revert the bot. It was easier to copy them all to their own section. Also makes it easier on me to debug. Once done, I'll move the ones that should stay back to their proper places. -- JLaTondre (talk) 13:38, 24 November 2019 (UTC)
Category needs to lead with :
See [2]. Headbomb {t · c · p · b} 16:08, 21 August 2020 (UTC)
- Done. -- JLaTondre (talk) 12:51, 29 August 2020 (UTC)
WikiProject Gloucestershire
I recently set up "Recognised content" for Wikipedia:WikiProject Gloucestershire and thought I had followed the instructions on the configuration page, but I see JL-bot has recently done a run "updating recognized content" but nothing has appeared on the project page. Have I done something wrong in the configuration?— Rod talk 11:33, 29 August 2020 (UTC)
- You changed it since the last run. The current parameter is a valid template so the configuration looks good. The bot is currently doing this week's run (it runs this task every Saturday), but it is only in the As at the moment. It will be awhile before it gets to the Gs. -- JLaTondre (talk) 11:55, 29 August 2020 (UTC)
- Thanks. I'll check again later in the bot run.— Rod talk 11:59, 29 August 2020 (UTC)
- Yep - has worked now. Thanks— Rod talk 15:40, 29 August 2020 (UTC)
- Thanks. I'll check again later in the bot run.— Rod talk 11:59, 29 August 2020 (UTC)
Sort order uncertainty?
The bot should make up its mind. Probably with JCW-selected>JCW-pattern>JCW-doi-redirects or something (for the same first parameter). Headbomb {t · c · p · b} 12:53, 4 July 2020 (UTC)
- Fixed that yesterday. Order is selected, pattern, doi-redirects. -- JLaTondre (talk) 13:10, 4 July 2020 (UTC)
While we're at it, anyway you can re-use that to sort WP:JCW/EXCLUDE? Headbomb {t · c · p · b} 14:05, 12 July 2020 (UTC)
- Yes, I will add that in when it updates the stats. -- JLaTondre (talk) 23:09, 16 July 2020 (UTC)
- Completed. As with the others, it will only sort (and remove duplicates) within a section. This allows for arbitrary sections like the current Temp one. -- JLaTondre (talk) 13:56, 20 August 2020 (UTC)
Same in User:JL-Bot/Citations.cfg, [4][5]. Probably should be applied across the board when sorting is concerned. Headbomb {t · c · p · b} 14:53, 1 September 2020 (UTC)
- I removed the Perl library sort that was being used and switched to the sorting that is used in individual pages. It will re-sort on next save and then should be stable. -- JLaTondre (talk) 01:28, 4 September 2020 (UTC)
Up DOI limits
Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Invalid DOI prefixes
10.40000 to 10.49999 should be accepted as valid. Headbomb {t · c · p · b} 21:05, 3 September 2020 (UTC)
- Done. I'll leave it to you if you want to create a 40000 page for those three or just let them be on the end of the 30000 pages (what will happen if page is not created). I also upped the CrossRef retrieval to 49999. -- JLaTondre (talk) 01:31, 4 September 2020 (UTC)
- New page. Headbomb {t · c · p · b} 13:05, 13 September 2020 (UTC)
- Although if you meant here, the 30000 page is fine as is for now. Headbomb {t · c · p · b} 13:42, 18 September 2020 (UTC)
- Yes, I was talking about the WP:JCW/DOI pages. Those are controlled by what is listed in the {{JCW-Main}} header. The User:JL-Bot/DOI pages will automatically divide into separate pages for every 1000. -- JLaTondre (talk) 14:07, 19 September 2020 (UTC)
- Although if you meant here, the 30000 page is fine as is for now. Headbomb {t · c · p · b} 13:42, 18 September 2020 (UTC)
- New page. Headbomb {t · c · p · b} 13:05, 13 September 2020 (UTC)
Weird duplicate
In WP:JCW/C45 there is two entries for "Compendium of Continuing Education in Dentistry". The first with 17 cites, the second with 5. Headbomb {t · c · p · b} 21:22, 19 September 2020 (UTC)
- This had me pretty confused, but figured out it was due to this. Can you have JCW-CleanerBot ignore Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia subpages? -- JLaTondre (talk) 21:48, 19 September 2020 (UTC)
Final }} parsing in DYK templates
In Wikipedia:WikiProject_Articles_for_creation/DYK#Did_you_know?_articles, there's a few entries with }} at the end. This seems to be a problem having to due with parsing templates like this [6] (before = problem, after = putative fix). Headbomb {t · c · p · b} 06:06, 4 October 2020 (UTC)
- Added handling for that case. -- JLaTondre (talk) 13:00, 4 October 2020 (UTC)
Incorrect image continues to be added to the Puerto Rico portal
Hi,
this image: File:Cicatrices de flagellation sur un esclave.jpg - found on commons is of a slave in Louisiana, not of a slave in Puerto Rico. You can read the descrpition at its source on the US Library of Congress, here.
I removed the image from the Portal talk:Puerto Rico however, JL-Bot adds the image back, as a Puerto Rico Featured pictures
I would appreciate help with this. --The Eloquent Peasant (talk) 13:09, 10 October 2020 (UTC)
- That's because File talk:Cicatrices de flagellation sur un esclave.jpg is tagged with {{WP Puerto Rico}} Headbomb {t · c · p · b} 15:21, 10 October 2020 (UTC)
- Ah. TY. --The Eloquent Peasant (talk) 16:20, 10 October 2020 (UTC)
gallery mode=packed
I came across this, which is something I think the bot should support. Headbomb {t · c · p · b} 20:18, 29 September 2020 (UTC)
- Are you saying that should be the default or an option? -- JLaTondre (talk) 22:00, 29 September 2020 (UTC)
- Well at least an option. But I think it works best as default too. Headbomb {t · c · p · b} 22:41, 29 September 2020 (UTC)
- I changed it to always use packed as it does look better. If anyone complains, I'll add an option. It will show up in tomorrow's run. -- JLaTondre (talk) 23:38, 2 October 2020 (UTC)
- Well at least an option. But I think it works best as default too. Headbomb {t · c · p · b} 22:41, 29 September 2020 (UTC)
The choice of parameters (mode=packed heights=200px) makes the images wayyy too overwhelmingly large (it's the minimum height before the images are further scaled up to fill rows). Please leave out the height and let them follow the default size. --Paul_012 (talk) 10:54, 5 October 2020 (UTC)
- Okay, I will make the default no height and add an option to specify a height for projects that may want it. I probably won't be able to get to that until next week, though. -- JLaTondre (talk) 23:58, 5 October 2020 (UTC)
- Implemented. There is now a
|gallery-heights=
option to specify a desired height. If not specified, it leaves it the default. -- JLaTondre (talk) 18:58, 12 October 2020 (UTC)
- Implemented. There is now a
Normalize '...that'
For the DYK blurbs of WP:RECOG, if the bot could normalize the leading ...that
to ... that
like this, that would be nice. Headbomb {t · c · p · b} 04:48, 14 October 2020 (UTC)
- Done. -- JLaTondre (talk) 02:06, 17 October 2020 (UTC)
New dump?
Been a while since the dump is out. Past runs have been happening within 1-2 days of the dump being out, but it's been 3 days now. Any word on when you could run this next? Headbomb {t · c · p · b} 17:37, 24 October 2020 (UTC)
- Yup, my server had an upgrade which broke some dependencies. Then there was an issue processing the latest dump. It took a bit, but I believe I have everything resolved. It is saving the data now. Let me know if you see anything odd. -- JLaTondre (talk) 13:08, 25 October 2020 (UTC)
JCW GIGO tweaks
This fixed display issues on that page. Caused by this citation.
Also, [[Geo: Blah blah]]
doesn't link, but [[:Geo: Blah blah]]
does. [7]. See also VPT. Headbomb {t · c · p · b} 20:57, 26 October 2020 (UTC)
- Both cases now handled. -- JLaTondre (talk) 21:01, 30 October 2020 (UTC)
Is there a way to display FA and GA for this project even though it's part of Wikipedia:WikiProject United States? See project banner at Talk:Metropolitan Community Church of Washington, D.C. to see what I mean. The DC project was swallowed up by the US project a few years ago, but I'd like for only DC articles to be listed. APK whisper in my ear 00:05, 2 November 2020 (UTC)
- @APK:, you can check WP:RECOG with
|category=WikiProject District of Columbia articles
Headbomb {t · c · p · b} 01:26, 2 November 2020 (UTC)- Ok thanks, let me see if I can figure this out. APK whisper in my ear 03:44, 2 November 2020 (UTC)
Category link fixes
See this diff. Headbomb {t · c · p · b} 19:43, 1 November 2020 (UTC)
- I wasn't expecting a publisher to be a category. Added a check for that. -- JLaTondre (talk) 23:53, 2 November 2020 (UTC)
Weird text in DYK blurb
Wikipedia:WikiProject Women in Red/DYK has
- 2009| (2009-12-01)
in it. I can't find where this is coming form. A parsing error? Headbomb {t · c · p · b} 19:50, 10 October 2020 (UTC)
- It came from Talk:Bessie Moses which had
{{dyktalk|1 December|2009|}}
. The bot handled dyktalk without a blurb parameter (ex:{{dyktalk|1 December|2009}}
), but not an empty parameter. I put in a check for that. When there is no blurb (for either case), it uses "Bessie Moses (article's talk page missing blurb)". However, I also corrected the article's talk page so it now has the correct date and the blurb. -- JLaTondre (talk) 20:06, 12 October 2020 (UTC)- Some more at Wikipedia:WikiProject Astronomy/Did you know "nompage=Template:Did you know nominations/2020 AV2", caused by this. Headbomb {t · c · p · b} 22:23, 17 October 2020 (UTC)
- Done. -- JLaTondre (talk) 23:34, 21 October 2020 (UTC)
- Some more at Wikipedia:WikiProject Astronomy/Did you know "nompage=Template:Did you know nominations/2020 AV2", caused by this. Headbomb {t · c · p · b} 22:23, 17 October 2020 (UTC)
Another one in Wikipedia:WikiProject Women in Red/DYK and Wikipedia:WikiProject Anarchism/DYK
- ... that along with her business partners, philanthropist Sara Braun (pictured), one of the first businesswomen in Punta Arenas, Chile, was involved in the genocide of the Selk'nam people? |dyknom= Template:Did you know nominations/Sara Braun (2019-12-14)
- ... that the King's Police Medal was created to reward the gallantry of three police officers involved in the Tottenham Outrage in 1909?|views=2436 (2009-02-07)
due to this and this. Headbomb {t · c · p · b} 18:26, 31 October 2020 (UTC)
- When I update it for the multiple DYK entries, I will see if I can clean up the logic so it is generic to the extra params that can occur vs. having to program each one specifically. -- JLaTondre (talk) 00:44, 3 November 2020 (UTC)
Weird blanking
We replaced the {{columns-list}} with divs to fix template expansion issue. Did that cause bot issues? See [8] and [9]. Headbomb {t · c · p · b} 06:56, 2 November 2020 (UTC)
- Yes, it was using the columns-list to determine the start and the end. I changed it over to the <div> and updated the Questionable.cfg/General and Questionable.cfg/Journals over to that format as well since need them all to be consistent. -- JLaTondre (talk) 00:52, 3 November 2020 (UTC)
Assistance requested
A follow up to my post above, I created the subpages for Wikipedia:WikiProject District of Columbia (FA, FL, FP, GA, DYK) and I thought they would show up all together at Wikipedia:WikiProject District of Columbia/Recognized content. Basically my question is how do I get all of the recognized content to show up on the main project page like here? APK whisper in my ear 04:57, 15 November 2020 (UTC)
- You likely want to use the category option in WP:RECOG, since {{WikiProject District of Columbia}} is not an actually-used template. i.e.
|Category=WikiProject District of Columbia articles
Headbomb {t · c · p · b} 05:09, 15 November 2020 (UTC)- Ok, thank you. APK whisper in my ear 05:43, 15 November 2020 (UTC)
Detection by topic in Template:Article history
Currently, WP:RECOG lists are populated by detection of templates on talk pages, mostly WikiProject banners. Would it be possible to make JL-Bot also detect what value is passed as parameter |topic=
into template {{Article history}}? See list of possible values at Wikipedia:WikiProject Good articles/Project quality task force § Good article topic values. —andrybak (talk) 08:29, 20 November 2020 (UTC)
- Categories can also be specified (see the "Project parameter" section at WP:RECOG). As it looks like those Article history parameters result in categories being added to the talk pages, you can use the resultant category (for example, "agriculture" results in Category:Agriculture, food and drink good articles). Parsing every single page with an Article history template would be prohibitive in terms of time. -- JLaTondre (talk) 11:46, 20 November 2020 (UTC)