User talk:Citation bot/Archive 13

This is an archive of past discussions about User:Citation bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 10

Archive 11

Archive 12

→

Don't remove final , from URL

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:48, 4 January 2019 (UTC)

What happens: Turns out the final comma in URL is a valid character, it should not be removed
What should happen: [1] (this is what the bot removed)
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1165 AManWithNoPlan (talk) 18:53, 4 January 2019 (UTC)

Yeah never modify URLs without testing that the URL works. This is what I have learned with WAYBACKMEDIC. It is continually finding crazy things in URLs that are not predictable. One can not safely say a URL ending in a set of characters should be changed, or added to. Same with encoding schemes, they can be all over the place such as %20 vs + there is no right way, even within the same URL. Standards are out the window these days the only "right" URL is the one that works. -- GreenC 16:19, 5 January 2019 (UTC)

Upgrades Arxiv to Journal for no apparent reason

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:09, 5 January 2019 (UTC)

What happens: [2]
What should happen: [3]
We can't proceed until: Feedback from maintainers

See also [4]. Headbomb {t · c · p · b} 17:46, 5 January 2019 (UTC)

it gets updated because of the bibcode. We have a blacklist of bibcode a that are actually arXiv despite claiming to be journals. Obviously, you found a new liar to add. AManWithNoPlan (talk) 19:04, 5 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1167 AManWithNoPlan (talk) 19:33, 5 January 2019 (UTC)

Adding redundant duplicate alias of "work" paramter

Status: {{fixed}}
Reported by: DferDaisy (talk) 00:26, 6 January 2019 (UTC)

What happens: Bot changes "cite web" to "cite news" and adds a new "work" parameter, but "website" parameter is already present. These two parameters are aliases therefore a redundant parameter error occurs.
What should happen: remove "website" parameter if "work" parameter is added.
Relevant diffs/links: Diff of North Norfolk (UK Parliament constituency) and diff of Lord Kitchener Wants You
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1172 Once implemented it should fix this. AManWithNoPlan (talk) 01:17, 6 January 2019 (UTC)

API: New feature request, run from links on page

Let's have something like

https://tools.wmflabs.org/citations/list.php?linksonpage=User:Headbomb/Sandbox5

This would be super useful. We could be build lists of pages with crappy citations with AWB's database scanner or with clever insource:// search (e.g. pages with raw GoogleBooks links, pages with raw DOI links, ...), then put the list of pages to be edited somewhere (e.g. User:Headbomb/Sandbox5), then tell the bot to run against those pages (follow redirects if they exist). Headbomb {t · c · p · b} 14:45, 22 August 2018 (UTC)

@Smith609: since you seem to be the one to ask about API features, how doable is this? Headbomb {t · c · p · b} 11:11, 24 August 2018 (UTC)

Does the new "run on multiple pages separated by pipes" functionality address this request? Martin (Smith609 – Talk) 07:46, 25 August 2018 (UTC)

@Smith609: not really. Those list would have to manually be built and fed manually every time. It's OK for a one-time list, but the idea is that you could embed have a one-click way of running the bot on a list of links. Book:Canada would be a prime example (or cleanup-centric lists, like WP:JCW/J30 and fix a crap ton of capitalization mistakes in one click). If you could have something like https://tools.wmflabs.org/citations/list.php?linksonpage=Book:Canada, that would find all links on the page (likely direct links for simplicity) and run the bot on those pages, that would be great.

That is if you have [[Foobar|Barfoo]] somewhere on the page, get Foobar (follow redirects if there are any), and run the bot on that. Repeat for all other links it finds. Headbomb {t · c · p · b} 22:30, 26 August 2018 (UTC)

This is still something that would be incredibly useful. Headbomb {t · c · p · b} 15:02, 30 November 2018 (UTC)

For example: https://en.wikipedia.org/w/api.php?action=parse&prop=links&page=Chemistry&format=json Wikipedia can make the list for us. Would obviously need to remove talk and other namespaces AManWithNoPlan (talk) 23:10, 23 December 2018 (UTC)

@AManWithNoPlan: - while there would be uses where non-mainspace would be useful, I think restricting this to mainspace+draft would be best, at least for now. Headbomb {t · c · p · b} 23:24, 23 December 2018 (UTC)

This is basically a request that would allow any user to run a full-automated bot without needing WP:BRFA. Given this is a tool designed for manual watching of diffs, I wonder how wise it would be to turn the bot keys over. -- GreenC 16:27, 5 January 2019 (UTC)

Indeed. It is not even a category, so one could do this on a fashion article and find a link to quantum mechanics because the designers uncle was a physics professor. AManWithNoPlan (talk) 16:48, 5 January 2019 (UTC)

Yeah I agree there's a concern there. While running on Book:Canada (and other books) is no different than running on a category, maybe build a whitelist of users that could use it in such a fashion on other pages? Or some other whitelisting (e.g. any page that start with "Book:", "Wikipedia:WikiProject ..." + specific pages "User:EXAMPLE/SANDBOX2"). Headbomb {t · c · p · b} 18:07, 5 January 2019 (UTC)

https://tools.wmflabs.org/citations/get_linked_pages.php?page= This will give you a list of all linked pages (we have a short black list to remove things like doi, isbn, etc). This way a human has to a little work and think about it rather than just yelling “git her done” and leaving the seen of the crime. Note that the extraneous html is removed in a non committed pull. AManWithNoPlan (talk) 19:20, 7 January 2019 (UTC)

Yeah, but that's not extremely useful. I know what pages are on Book:Canada [5] (or say WP:JCW/Sandbox [6]), the goal is to kick the bot into action once the list of pages to run on has been built, much like it does with a category. Headbomb {t · c · p · b} 20:58, 7 January 2019 (UTC)

It only takes a little copy and paste to make a pipe separated list. AManWithNoPlan (talk) 01:38, 8 January 2019 (UTC)

Which is extra work, for little reason, to have a piped list that no one knows what to do with. Headbomb {t · c · p · b} 01:40, 8 January 2019 (UTC)

{{fixed}} prints with pipes now. AManWithNoPlan (talk) 13:43, 8 January 2019 (UTC)

So now we have a piped list of things no one knows to do anything with, and articles that still don't edit edited by citation bot. Headbomb {t · c · p · b} 13:57, 8 January 2019 (UTC)

Adding unrelated bibcode

Status: {{fixed}}
Reported by: (t) Josve05a (c) 06:12, 6 January 2019 (UTC)

What happens: bibcode to wrong article
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=User%3AJosve05a%2Fcite-sandbox&diff=prev&oldid=877047801
We can't proceed until: Feedback from maintainers

The adsabs database seems to more generous with matches suddenly. I have already submitted two fixes. https://github.com/ms609/citation-bot/pull/1174 https://github.com/ms609/citation-bot/pull/1169 AManWithNoPlan (talk) 14:59, 6 January 2019 (UTC)

Valid bibcode (book?) not expanded, but details added to a different citation in the same article

Status: {{fixed}}
Reported by: Lithopsian (talk) 21:59, 6 January 2019 (UTC)

What happens: When trying to expand all the citations in an article, one is not expanded. It is a journal that apparently has two bibcodes, 1982mcts.book.....H and 1982MSS...C03....0H. Although the non-book bibcode is in the template, the book bibcode is reported for the big query, then later the non-book bibcode is reported as not found (big query returning a different bibcode from the one submitted in the query?). The details for this bibcode are added to the next citation returned from the big query, together with some details from that bibcode (additional authors, etc.).
What should happen: Expand the citations from their own bibcodes.
Replication instructions: You can run the bot against User:Lithopsian/sandbox to see this in action
We can't proceed until: Feedback from maintainers

There something wrong with that one bibcode that redirects to another one. That makes us not expand it since one check we do is to make sure the bibcode we get back is the one we sent out. This is unfixable, since we will not remove the double check. The second issue is that the not currently rejects expansion of any book bibcodes since that rehires is to write code that we have not done yet. I might look into writing that code. AManWithNoPlan (talk) 22:53, 6 January 2019 (UTC)

The query changing the bibcode and then mixing the text is horrible. AManWithNoPlan (talk) 22:58, 6 January 2019 (UTC)

this is the evil bibcode: http://adsabs.harvard.edu/abs/1982MSS...C03....0H AManWithNoPlan (talk) 23:00, 6 January 2019 (UTC)

looks like we need to make sure we did not get a bibcode back that is new. AManWithNoPlan (talk) 23:03, 6 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1178 Detect corrupt query, I hope. AManWithNoPlan (talk) 23:22, 6 January 2019 (UTC)

No citations are mangled now, at least not in that example. Bibcode 1982MSS...C03....0H is ignored. Book bibcodes are ignored, except that cite journal templates are changed to cite book templates. Lithopsian (talk) 14:58, 7 January 2019 (UTC)

1982MSS...C03....0H is defective and we will never expand that. AManWithNoPlan (talk) 20:10, 7 January 2019 (UTC)

Fails to convert hdl url

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:07, 7 January 2019 (UTC)

What happens: [7]
What should happen: [8] (minus the other tidying up I did)
We can't proceed until: Feedback from maintainers

Every handle resolver has to be added separately. https://github.com/ms609/citation-bot/pull/1181 AManWithNoPlan (talk) 18:48, 7 January 2019 (UTC)

New feature request: merge (subscription required) templates into CS1|2

Commonly seen:

Bovver boot:

{{cite news | url=https://www.questia.com/read/1G1-61177939 | title=Max hangs up his boots with £200m | work=[[The People]] | date=March 31, 1996 | accessdate=March 4, 2013 | author=Gunn, Cathy}}{{Subscription required|via=[[Questia Online Library]]}}

Anthony Chenevix-Trench:

{{cite news |last=Heffer|first=Simon|title= Beaten by Eton: The Land of Lost Content: The biography of Anthony Chenevix-Trench by Mark Peel |date=27 July 1996 |accessdate= 3 December 2012 |location =London |newspaper=[[Daily Mail]] {{Subscription required|via=[[Questia Online Library]]}}|url=https://www.questia.com/read/1G1-111427463}}

On both subscription status is noted with the {{subscription}} template, which can be inside or outside the CS1|2.

The better format would be:

Bovver boot:

{{cite news | url=https://www.questia.com/read/1G1-61177939 | title=Max hangs up his boots with £200m | work=[[The People]] | url-access=subscription | via = [[Questia Online Library]] | date=March 31, 1996 | accessdate=March 4, 2013 | author=Gunn, Cathy}}

Anthony Chenevix-Trench:

{{cite news |last=Heffer|first=Simon|title= Beaten by Eton: The Land of Lost Content: The biography of Anthony Chenevix-Trench by Mark Peel |date=27 July 1996 |accessdate= 3 December 2012 |location =London |newspaper=[[Daily Mail]] |url=https://www.questia.com/read/1G1-111427463 | url-access=subscription | via = [[Questia Online Library]] }}

The {{subscription}} is replaced with |url-access= and if there is a |via= argument, with a |via= in the CS1|2. The |subscription= template goes by many names. -- GreenC 19:30, 7 January 2019 (UTC)

Would also want to only do this if there was one cite template in the ref tag; since, one might be applying this to more than one cite template. Given that this is not easily done within the bot’s code, it might be best to make a Bot request. AManWithNoPlan (talk) 20:02, 7 January 2019 (UTC)

I thought about making a bot but not sure it would pass COSMETIC. Understood about matching up is tricky. Will keep the idea in mind. --GreenC 20:23, 7 January 2019 (UTC)

Converting data into inline data within a template is really useful in so many ways. AManWithNoPlan (talk) 20:29, 7 January 2019 (UTC)

I agree. Started a discussion Template talk:Subscription required#Why are we using this template with CS1|2_templates?. Maybe it will need to be an RfC to 1) make the conversions and 2) change the docs to only use in free-form citations not CS1|2. -- GreenC 20:38, 7 January 2019 (UTC)

Is this the same as User talk:Citation bot/Archive 11#Remove "subscription required" or replace with parameter? (t) Josve05a (c) 20:38, 7 January 2019 (UTC)

yes it is. AManWithNoPlan (talk) 20:41, 7 January 2019 (UTC)

What can be said about the advantages of converting? -- GreenC 20:44, 7 January 2019 (UTC)

I will close this item as {{wontfix}} and have moved a link to the discussion area above. AManWithNoPlan (talk) 17:02, 8 January 2019 (UTC)

Remove format=pdf and variants when URLs end in .pdf

If you have something like

{{cite web |url=http://www.example.com/asdf.pdf |title=title}}, giving
"title" (PDF).
{{cite journal |url=http://www.example.com/asdf.pdf |title=title}}, giving
"title" (PDF). {{cite journal}}: Cite journal requires |journal= (help)

Citation templates automatically append (PDF) next to the link. So there's no point in having

{{cite journal |url=http://www.example.com/asdf.pdf |title=title |format=PDF}}, giving
"title" (PDF). {{cite journal}}: Cite journal requires |journal= (help)

So if you find |format=PDF or similar (e.g. |format=pdf / |format=Portable Document Format / |format=pdf), remove it as pointless. Headbomb {t · c · p · b} 17:41, 5 January 2019 (UTC)

I think |format=pdf exist in case the URL does not have an apparent ".pdf", so this suggestion would only be done when the URL has a ".pdf". But I wonder if there is any other reason for using |format=pdf? -- GreenC 18:22, 5 January 2019 (UTC)

I find those rather pointless personally, but the above request was for when URLs end in PDF. I'll update the header. Headbomb {t · c · p · b} 18:37, 5 January 2019 (UTC)

I agree. To make sure the removal doesn't introduce an unknown problem, maybe some other reason for it to exist, I posted a question Help_talk:Citation_Style_1#Removing_format=pdf_when_the_URL_ends_in_".pdf". -- GreenC 18:56, 5 January 2019 (UTC)

Flag to archive {{notabug}}. Moving link above. AManWithNoPlan (talk) 14:03, 7 January 2019 (UTC)

De-flag, it's been confirmed redundant and useless. Headbomb {t · c · p · b} 15:05, 7 January 2019 (UTC)

that was fast. I was expecting to hear back in a week or three. AManWithNoPlan (talk) 15:18, 7 January 2019 (UTC)

According to Xover: "An URL ending in ".pdf" can (and not infrequently does) return something other than a PDF." Trappist also brought up the concern that other wiki-languages don't support the PDF icon unless there is format=pdf thus when they copy cites from enwiki they loose this meta information. Those are the two concerns that came up. -- GreenC 15:50, 7 January 2019 (UTC)

Yeah, Headbomb's description there about the result of the discussion is clearly biased or otherwise misleading in its intent. That discussion has not completed at this time. --Izno (talk) 15:51, 7 January 2019 (UTC)

a) If a url ending in .pdf returns anything but a PDF, then |format=PDF will STILL be displayed. b) This is the English Wikipedia. Unlike |language=English other wikis can easily implement automatic PDF detection, and would be better off doing so. Headbomb {t · c · p · b} 16:44, 7 January 2019 (UTC)

Isn't (a) a reason to remove automatic detection of the PDF format from the module? If that's your intent, best be off to argue for that instead. --Izno (talk) 17:07, 7 January 2019 (UTC)

No, the opposite. The only time the landing page will not be a PDF is if the PDF is behind a paywall. Headbomb {t · c · p · b} 18:28, 7 January 2019 (UTC)

Auto-detection is useful even if not always accurate (heuristics noted by Xover). -- GreenC 19:14, 7 January 2019 (UTC)

This is the wrong bot for the initial cleanup. Something else needs to fix this and then we can play whack a mole on new ones. Assuming this is a good idea of course. AManWithNoPlan (talk) 19:17, 7 January 2019 (UTC)

There wouldn't be any 'initial cleanup' really, it's a cosmetic cleanup, so that's akin to removing |postscript=. or |url=<PMC-URL>. It's simplifies the edit window and makes references easier/more consistent to edit. Headbomb {t · c · p · b} 21:03, 7 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1190 AManWithNoPlan (talk) 22:18, 8 January 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 14:47, 9 January 2019 (UTC)

Don't add journal= to citations with bibcode with a '.book' in them

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:14, 7 January 2019 (UTC)

What happens: [9]
What should happen: [10]
We can't proceed until: Feedback from maintainers

Something with the bibcode database has gone wonky suddenly. Adding lots of data integrity checks. Obviously more needs done. AManWithNoPlan (talk) 19:08, 7 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1186 AManWithNoPlan (talk) 18:13, 8 January 2019 (UTC)

Does not remove stray dot at the end of pp

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 02:24, 8 January 2019 (UTC)

What happens: [11]
What should happen: [12]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1188 AManWithNoPlan (talk) 18:19, 8 January 2019 (UTC)

Book Reviews added to book citations

Status: {{fixed}}
Reported by: Hawkeye7 (discuss) 11:10, 7 January 2019 (UTC)

What happens: Book reviews added to book citations
What should happen: Bot should not have made any changes at all
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Quebec_Agreement&type=revision&diff=877221463&oldid=877207078
Replication instructions: No idea
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1187 AManWithNoPlan (talk) 18:13, 8 January 2019 (UTC)

Timeout at Deim_Zubeir

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:12, 9 January 2019 (UTC)

We can't proceed until: Feedback from maintainers

It’s an internal php bug. Work around: https://github.com/ms609/citation-bot/pull/1193 AManWithNoPlan (talk) 22:14, 9 January 2019 (UTC)

Fails at Russian passport

Status: new bug
Reported by: Headbomb {t · c · p · b} 17:00, 10 January 2019 (UTC)

What happens: Fails at Russian passport
What should happen: Should not fail
We can't proceed until: Feedback from maintainers

{{wontfix}} at this time. It does run, just too slowly. AManWithNoPlan (talk) 22:04, 10 January 2019 (UTC)

Deal with both url and chapter-url

Status: {{fixed}}
Reported by: (t) Josve05a (c) 22:16, 26 December 2018 (UTC)

What should happen: Remove the |url= in favor of the |chapter-url= doi, or at least add the |chapter-url= doi as |doi= instead of the |url= doi.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=User:Josve05a/cite-sandbox&diff=875483488&oldid=875483473
We can't proceed until: Feedback from maintainers

I will have to think about this and all the possible combinations AManWithNoPlan (talk) 22:37, 26 December 2018 (UTC)

https://github.com/ms609/citation-bot/pull/1206 AManWithNoPlan (talk) 00:37, 11 January 2019 (UTC)

2001gpm..book.....L

Status: {{fixed}}
Reported by: (t) Josve05a (c) 01:44, 11 January 2019 (UTC)

What happens: Bot adds |bibcode=2001gpm..book.....L and changes {{cite journal}} to {{cite book}} for journal article.
We can't proceed until: Feedback from maintainers

Well that's not a bug. Bibcode:2001gpm..book.....L is a book. Headbomb {t · c · p · b} 01:53, 11 January 2019 (UTC)

Yes, but it seems to be adding it to all {{cite journal}}'s with |journal=Genetics, which is a bug. Link to old edit. I saw the script trying to make this change on a page a few moments before I reported this as well, so it is still doing it. (t) Josve05a (c) 02:13, 11 January 2019 (UTC)

That is just spiffy. Might have to block that bibcode explicitly. I will investigate later tonight. Probably will need to search for it and remove it where ever it is. AManWithNoPlan (talk) 02:45, 11 January 2019 (UTC)

Let us all take a moment to ponder Headbomb being wrong about something. This is a rare event. Please observe a moment of silence. 🤣😁😂 AManWithNoPlan (talk) 03:58, 11 January 2019 (UTC)

Well perhaps if the initial report had included a diff... Headbomb {t · c · p · b} 06:20, 11 January 2019 (UTC)

I cleaned up the current uses, btw. The only thing in common they had is they all were concerning citations for various articles of Genetics.Headbomb {t · c · p · b} 06:24, 11 January 2019 (UTC)

The cause is that the journal Genetics is not indexed, but this one book has Journal=Genetics set in its record. Thus, any search for journal=genetics gets a hit. AManWithNoPlan (talk) 06:30, 11 January 2019 (UTC)

Fix: https://github.com/ms609/citation-bot/pull/1208 AManWithNoPlan (talk) 06:30, 11 January 2019 (UTC)

Wrongly upgrades arxiv, again

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 06:34, 11 January 2019 (UTC)

What happens: Wrongly 'Upgrades'

{{cite arXiv |author=Limin Lu |date=1998 |title=The Metal Contents of Very Low Column Density Lyman-alpha Clouds: Implications for the Origin of Heavy Elements in the Intergalactic Medium |eprint=astro-ph/9802189 |display-authors=etal}}</ref>

to {{cite journal |author=Limin Lu |date=1998 |title=The Metal Contents of Very Low Column Density Lyman-alpha Clouds: Implications for the Origin of Heavy Elements in the Intergalactic Medium |arxiv=astro-ph/9802189 |display-authors=etal|bibcode=1998astro.ph..2189L }}</ref>

Relevant diffs/links: [13]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1210 AManWithNoPlan (talk) 06:55, 11 January 2019 (UTC)

Removes URL

Status: {{notabug}}
Reported by: ♦ J. Johnson (JJ) (talk) 00:39, 12 January 2019 (UTC)

What happens: Removes the URL.
What should happen: Not gratuitously remove URLs.
Relevant diffs/links: this edit.
We can't proceed until: Feedback from maintainers

That's not a bug, that url is redundant with the DOI. Headbomb {t · c · p · b} 00:47, 12 January 2019 (UTC)

I beg to differ. The url is an alternate way to the source, independent of the doi. Who said we must not have both? ♦ J. Johnson (JJ) (talk) 01:03, 12 January 2019 (UTC)

The style guides and template documentation strongly discourage the use of urls unless they link to a 100% free full copy. Also, URLs that duplicate other indentifiers are discouraged even if free. One reason is that with a doi you know you are going to a publisher, a link is without context. AManWithNoPlan (talk) 01:12, 12 January 2019 (UTC)

Bot breaks URL in pages field of citation template by changing hyphen to en dash in hidden URL

Status: {{fixed}}
Reported by: Biogeographist (talk) 16:14, 10 January 2019 (UTC)

What happens: Bot changes hyphen to en dash in a URL (in this case a Google Books URL), which breaks the URL: e.g., changes this working URL to this broken URL
What should happen: Bot should not change hyphens in URLs in pages field
Relevant diffs/links: See the books.google.com URL that was changed (mangled) in this diff: Special:Diff/877378326
Replication instructions: Put this working URL in the pages field of a citation template (e.g. 107) and run the bot
We can't proceed until: Feedback from maintainers

This bug was previously reported at User talk:Citation bot/Archive 7 § Don't change urls and User talk:Citation bot/Archive 7 § Bot breaks URL in pages field of citation template by changing hyphen to en dash in URL but apparently was not completely fixed.

This bug may occur in this case because the link is a protocol-relative URL, which is a deprecated link format on Wikipedia. In such cases, citation bot should update the link format instead of breaking the URL with the unfortunate hyphen/dash exchange. Biogeographist (talk) 16:14, 10 January 2019 (UTC)

URLs should almost never be modified unless it can issue a GET to verify the new URL works, or in known cases of URL changes. -- GreenC 16:21, 10 January 2019 (UTC)

fascinating how a url can be hiding within non url text. Surprising that it took so long for this bug to be reported. AManWithNoPlan (talk) 17:11, 10 January 2019 (UTC)

Ah a PRURL inside an incorrectly placed square-bracket - gigo. -- GreenC 17:35, 10 January 2019 (UTC)

Side bar: people often talk about old-crusty-unreadable code. They say things like: we need to replace this Fortan with C/C++/Java/Go/etc.. Then they do that and discover that the old code was unreadable since 90% of the code was error/exception handling. The same is true of the Citation Bot: if the template were always used right and they did not have six different names for the exact same parameter, then the bot would be 75% smaller. This is GIGO, but I think we can prevent the GO half. AManWithNoPlan (talk) 17:48, 10 January 2019 (UTC)

Yes for sure. An infinite tail of exceptions and edge cases -- GreenC 18:52, 10 January 2019 (UTC)

New code will ingore PRURL once it is git pulled in and will add the https: when it is the very first characters of the page. AManWithNoPlan (talk) 00:25, 11 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1212 this needed too. AManWithNoPlan (talk) 17:38, 11 January 2019 (UTC)

Fails to edit/finish on List of gravitationally rounded objects of the Solar System

Status: new bug
Reported by: Headbomb {t · c · p · b} 06:39, 11 January 2019 (UTC)

We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1215 AManWithNoPlan (talk) 17:39, 11 January 2019 (UTC)

Thank you for reporting these. AManWithNoPlan (talk) 02:03, 12 January 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 14:46, 16 January 2019 (UTC)

Fails to edit PageRank

Status: new bug
Reported by: Headbomb {t · c · p · b} 18:51, 13 January 2019 (UTC)

What happens: The bot tells me to 'Consult APIs to expand templates' for some weird reason
What should happen: Bot should edit the page
We can't proceed until: Feedback from maintainers

We got it fixed so now it fails on all pages 🙄 AManWithNoPlan (talk) 05:44, 16 January 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 14:46, 16 January 2019 (UTC)

Time out on 2016 Turkish coup d'état attempt

Status: new bug
Reported by: Redalert2fan (talk) 12:44, 16 January 2019 (UTC)

What happens: Bot times out on 2016 Turkish coup d'état attempt.
We can't proceed until: Feedback from maintainers

{{wontfix}} so many links and so many that block us or time out that it does eventually finish (after a long-time), if you (and your web browser) will let it. Probably best to run section by section. AManWithNoPlan (talk) 16:49, 16 January 2019 (UTC)

Thanks for looking at it anyways, I`ll try it section by section. Redalert2fan (talk) 16:54, 16 January 2019 (UTC)

Time out on List of Flashpoint episodes

Status: new bug
Reported by: Redalert2fan (talk) 13:42, 16 January 2019 (UTC)

What happens: Bot times out on List of Flashpoint episodes with: " ! Operation timed out after 45000 milliseconds with 0 bytes received For URL: http://www.bbm.ca/_documents/top_30_tv_programs_english/nat01052009.pdf ! Operation timed out after 45000 milliseconds with 0 bytes received For URL: http://www.bbm.ca/_documents/top_30_tv_programs_english/nat01122009.pdf "
What should happen: There are more links to check on the page so the bot should continue.
We can't proceed until: Feedback from maintainers

Running it in the debugger I find that there are mostly pdf files, which have no usable metadata. Once this pull is implemented https://github.com/ms609/citation-bot/pull/1229/ the bot will have a lot more cites on it "don't try to hard" list. AManWithNoPlan (talk) 17:00, 16 January 2019 (UTC)

{{wontfix}} AManWithNoPlan (talk) 17:00, 16 January 2019 (UTC)

Invalid last1 and first1

Status: new bug
Reported by: Redalert2fan (talk) 18:04, 16 January 2019 (UTC)

What happens: bot adds "last1=Correspondent and first1=Patrick Kingsley Migration"
What should happen: Probably last1=Kingsley and first1=Patrick
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=User%3ARedalert2fan%2Fsandbox4&diff=prev&oldid=878741788
We can't proceed until: Feedback from maintainers

There also is "first=SPIEGEL ONLINE, Hamburg|last=Germany" on the page already which also does not seem to be correct, however this was not added by the bot.

yeah, we try not to do too much fixing existing bad data. {{wontfix}} AManWithNoPlan (talk) 18:16, 16 January 2019 (UTC)

Fails to add bibcode

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:12, 13 January 2019 (UTC)

What happens: [14]
What should happen: [15]
We can't proceed until: Feedback from maintainers

They have changed the format to be longer. AManWithNoPlan (talk) 19:45, 13 January 2019 (UTC)

It's still 19 characters... ? Headbomb {t · c · p · b} 20:01, 13 January 2019 (UTC)

counting on an iPhone is not as easy as I thought AManWithNoPlan (talk) 20:17, 13 January 2019 (UTC)

I will look at it after the bibcode searches stabilize AManWithNoPlan (talk) 00:35, 14 January 2019 (UTC)

{{wontfix}} the doi search fails to return anything. They need to update their data files. AManWithNoPlan (talk) 20:19, 16 January 2019 (UTC)

The arxiv id does, however. Headbomb {t · c · p · b} 21:21, 16 January 2019 (UTC)

Why is this "2nd rate information"? Bibcode bot and others will add it. Headbomb {t · c · p · b} 21:32, 16 January 2019 (UTC)

once we have a doi to search with, we do not search absabs using arXiv. If the bibcode does not know about the doi, then it is outdated information. AManWithNoPlan (talk) 21:36, 16 January 2019 (UTC)

Not really no, there's a slew of citations, mostly in mathematics, that never get anything but an arxiv bibcode. Headbomb {t · c · p · b} 21:37, 16 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1231 AManWithNoPlan (talk) 00:30, 17 January 2019 (UTC)

Failed to pickup another bibcode

Status: new bug
Reported by: Headbomb {t · c · p · b} 18:02, 17 January 2019 (UTC)

What should happen: [16]
Relevant diffs/links: [17]
We can't proceed until: Feedback from maintainers

The bibcode title does not match very well, so we reject it. Perhaps we are too picky. AManWithNoPlan (talk) 18:16, 17 January 2019 (UTC)

at the very least I should combine the two title checking codes into a function call and remove dashes before doing the compare since bibcodeland seems to eat em dashes and leave an empty plate of white space in its place. AManWithNoPlan (talk) 19:28, 17 January 2019 (UTC)

Isn't a doi query enough? I never found any wrong result when querying ADSABS via doi. Headbomb {t · c · p · b} 20:05, 17 January 2019 (UTC)

Trust me, some of the way off partial matches we get are nuts and we have to deal with GIGO on our end too. AManWithNoPlan (talk) 21:02, 17 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1233 more forgiving title compare. AManWithNoPlan (talk) 02:45, 18 January 2019 (UTC)

Mostly {{fixed}}, but this bibcode is still too different of a title to match. AManWithNoPlan (talk) 18:28, 18 January 2019 (UTC)

Cite Journal

Why does the bot remove publisher and location from the "Cite journal" template? Especially for magazines that have been published for a long time, these things change and may perhaps be of interest? Mr.choppers | ✎ 04:20, 19 January 2019 (UTC)

please see above discussion links and join in. One might ask why is the publisher information almost always wrong. You might also ask why do people use cite journal for non-journals such as magazine? AManWithNoPlan (talk) 04:43, 19 January 2019 (UTC)

May be of interest is not a worthwhile reason - the citation is for a reference in an article and not intended for a treatise on the magazine itself. If such information is useful, then please wikilink the magazine name and create a nice article for it. AManWithNoPlan (talk) 04:46, 19 January 2019 (UTC)

Simply put, because the information is near useless and because no style guide recommends it. Headbomb {t · c · p · b} 06:07, 19 January 2019 (UTC)

Please consider contributing to this discussion. --Izno (talk) 15:14, 19 January 2019 (UTC)

If it is useless, then why do the parameters exist? I only found out about the existence of "cite magazine" a little while ago, hence the occasional reappearance of "cite journal." Mr.choppers | ✎ 16:34, 19 January 2019 (UTC)

they exist because all the citation templates are based upon the same core code and core documentation. So, there are lots of useless parameters. AManWithNoPlan (talk) 16:45, 19 January 2019 (UTC)

Flagging for archiving since links exist above {{notabug}}. The documentation is lacking considering the publisher location removal has been standard for a decade.

Fails to expand doi

Status: new bug
Reported by: Headbomb {t · c · p · b} 04:07, 21 January 2019 (UTC)

What happens: [18]
What should happen: [19]
We can't proceed until: Feedback from maintainers

{{notabug}} tell them to publish metadata. Seriously it is just a doi.org url. AManWithNoPlan (talk) 04:46, 21 January 2019 (UTC)

these non-crossref dois usually have second rate meta data at doi.org, but this one has nothing. AManWithNoPlan (talk) 04:48, 21 January 2019 (UTC)

Can you take-over/merge reFill?

The maintainer of reFill is looking to pass the torch Wikipedia:Village_pump_(technical)#reFill_is_looking_for_a_maintainer. Is the functionality of reFill already part of Citation bot? I know this tool is very popular though it has a long list of bugs to be worked out and the code base is PhP. -- GreenC 13:16, 8 January 2019 (UTC)

We run our own Citoid installation. He uses Wikipedia’s install. The Wikipedia install would have to be willing to allow us to hit them much more aggressive than their policy allows, but that would make it easier for us. We do nothing with combining equivalent references. AManWithNoPlan (talk) 13:32, 8 January 2019 (UTC)

They Wikipedia citoid is better than ours, so refill does a better job than ours which is why I suggest that they whitelist us. AManWithNoPlan (talk) 13:48, 8 January 2019 (UTC)

I can't imagine the Wikimedia Foundation opposing such a usage of their Citoid instance. Nemo 21:25, 8 January 2019 (UTC)

I have looked at the reFill code base and it appears to not use the Citoid instance, at least not for everything. That is one reason it seems to handle international stuff much better. AManWithNoPlan (talk) 00:19, 11 January 2019 (UTC)

https://tools.wmflabs.org/refill-api perhaps we use them AManWithNoPlan (talk) 23:16, 14 January 2019 (UTC)

{{notabug}} seems like others are taking it over and a 2.0 version is moving fast. AManWithNoPlan (talk) 16:36, 21 January 2019 (UTC)

Converting bare links to cite journal

I know that some users are tirelessly working on converting bare links to journal articles into {{cite journal}} calls (which then citation bot can clean up). What are your preferred ways? Do you have regular expressions or other aids to share for the purpose? I see that a simplistic regex search for DOI URLs in bare links, like insource:http insource:/\[http[^ ]+10\.[0-9]{4,5}\/[^ ]+ /, finds several thousands of pages and I'm not sure what's the best way to help. Nemo 18:07, 16 January 2019 (UTC)

I usually just search for something like insource:/\>https\:\/\/doi\.org\/10/> or search for specific publisher links and try to "fix all" from that domain. (t) Josve05a (c) 18:20, 16 January 2019 (UTC)

And then you make edits like this by manually adding the basic cite journal call and using the citation expander to do the rest? Nemo 21:27, 16 January 2019 (UTC)

On that edit I only used the citation expander. The bot/tool can convert bare refs (with only URL) to the proper cite template, so no need to add basic cite journal fields maunually. (t) Josve05a (c) 15:35, 18 January 2019 (UTC)

Here are 5000+ examples for whoever is interested: phabricator:P8007. Nemo 14:49, 18 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1236 Amy thoughts on this AManWithNoPlan (talk) 05:49, 20 January 2019 (UTC)

{{fixed}} bot now does more AManWithNoPlan (talk) 16:35, 21 January 2019 (UTC)

Fails to add issue?

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 08:36, 18 January 2019 (UTC)

What happens: [20]
What should happen: [21] (forget the name changes, just see that the issue number is added)
We can't proceed until: Feedback from maintainers

10.1016/j.agee.2010.07.017 AManWithNoPlan (talk) 18:43, 18 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1238 we still won’t add issue of zero or one AManWithNoPlan (talk) 03:56, 19 January 2019 (UTC)

DOI glitch?

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:32, 13 January 2019 (UTC)

What happens: [22]
What should happen: Not that?
We can't proceed until: Feedback from maintainers

This is a bit of a garbage DOI (someone at science made doi:10.1126/science.10.1126/SCIENCE.291.5501.24 the doi instead of doi:10.1126/SCIENCE.291.5501.24 like a sane person would), but it's a valid one nonetheless. Headbomb {t · c · p · b} 17:32, 13 January 2019 (UTC)

WOW! AManWithNoPlan (talk) 20:18, 13 January 2019 (UTC)

WOW! even? Headbomb {t · c · p · b} 21:55, 13 January 2019 (UTC)

not that wowing https://github.com/ms609/citation-bot/pull/1225 AManWithNoPlan (talk) 22:32, 13 January 2019 (UTC)

Takes forever to run

Is it just me, or is the bot considerable slower since about a week? We're talking 30 minutes + to run on articles. Sometimes several hours. Headbomb {t · c · p · b} 03:24, 21 January 2019 (UTC)

It’s just you. 😁🤣😂😯. Actually we seem to have gained popularity and and load AManWithNoPlan (talk) 03:26, 21 January 2019 (UTC)

Well, is there a way to get more server resources? Or a new server? Headbomb {t · c · p · b} 03:27, 21 January 2019 (UTC)

I was wondering the same thing earlier today when I could not fully submit patches since they could not test themselves. AManWithNoPlan (talk) 03:30, 21 January 2019 (UTC)

I'm not sure, but other tools experience 500 errors due to a buildup of connections and work around the problem by periodically doing a "webservice restart". It seems kubernetes doesn't yet support increasing parallelism. Nemo 11:56, 21 January 2019 (UTC)

I can confirm getting a lot of 500 errors. Headbomb {t · c · p · b} 12:09, 21 January 2019 (UTC)

There may be a related discussion about Cyberbot at WP:BOTN. --Izno (talk) 13:11, 21 January 2019 (UTC)

Seems to be resolved. Maybe it's temporary though. Headbomb {t · c · p · b} 05:32, 22 January 2019 (UTC)

all evidence points to it being a toolbar problem. {{notabug}} AManWithNoPlan (talk) 13:29, 22 January 2019 (UTC)

Do not automatically add Citeseerx

Status: {{notabug}}
Reported by: David Eppstein (talk) 16:52, 7 November 2018 (UTC)

What happens: Citeseerx links automatically added in violation of WP:COPYLINK and WP:ELNEVER
What should happen: These links can sometimes be ok, but they are often a violation of publisher copyright, so they can only be added if citeseer traces their provenance back to an author copy or a publisher-licensed copy. This needs to be checked by hand. Citation bot should never add such links automatically. There is currently a similar thread about Zenodo about WP:ANI likely to lead to a topic ban from modifying citations for the user incautiously adding such links. Do we want such a ban to be given to Citation bot? The edit is shown as "user activated" but is listed as being made by the bot and there is no responsibility assigned to a specific user for this bad edit.
Relevant diffs/links: Special:Diff/867705073
We can't proceed until: Feedback from maintainers

Users are always responsible for the edits of the bot, since they are the ones that asked the bot to make the edit in the first place, so nothing is automatically added. The best way to deal with (the very small number of) copyvios on CiteSeerX is to contact them to take down the offending file (and possibly put a comment in the citeseerx parameter such as |citeseerx=, although the CiteSeerX page contains more than just the file and the metadata is gives is useful).Headbomb {t · c · p · b} 16:59, 7 November 2018 (UTC)

The number of copyvios is not small, because citeseerx copies all sorts of copies of papers — often copies made available for some course by someone else – that are neither author copies nor licensed from the publisher. They may be fair use for a course but that doesn't make them fair use for citeseerx and for us. And if the edit cannot be attributed to the specific user who caused it (and that user convinced or prevented from continuing to make bad edits) or if the process does not involve the user specifically vetting the edits that are made, with a big warning about COPYLINK, then it should not be happening at all. —David Eppstein (talk) 17:23, 7 November 2018 (UTC)

since we do not link to the PDF directly, does that make it okay? honest question about how close to the illegal copy do we need to be in order to be evil. AManWithNoPlan (talk) 18:00, 7 November 2018 (UTC)

I doubt it. We're linking to a site whose only purpose is to provide the link. WP:ELNEVER seems unambiguous: "If there is reason to believe that a website has a copy of a work in violation of its copyright, do not link to it." —David Eppstein (talk) 18:07, 7 November 2018 (UTC)

So, slightly better, but not better enough. AManWithNoPlan (talk) 18:20, 7 November 2018 (UTC)

They have a takedown link on each page now, and they seem to be within the law as an NSF site http://vondranlegal.com/what-to-do-when-the-federal-government-infringes-your-copyright/ AManWithNoPlan (talk) 15:40, 22 January 2019 (UTC)

A funny case of sovereign immunity used to dismiss a copyright violation case brought by a photographer against a state university in USA: Indiana 1:16-cv-02463-TWP-DML and similarly in Kentucky. And Ohio, Indiana, Florida (more elaborately, with consideration of "established state procedure to deprive of property" and due process), Michigan, Michigan again. Nemo 18:09, 22 January 2019 (UTC)

goes double for people. The Pope and the queen of England are both exempt from all criminal prosecution worldwide. They have sovereign≥ immunity at home and diplomatic immunity every where else. AManWithNoPlan (talk) 21:26, 22 January 2019 (UTC)

Why is cit book preferred to cite web for online “books”?

Why is this bot constantly changing cite web to cite book here? I am using the online version of this dictionary, not the paper version. Peacemaker67 (click to talk to me) 23:04, 22 January 2019 (UTC)

When we query the website, it gives us publisher information and says “I am a book online”. Many of the journals/books/patents/etc people reference are through websites. The issue of which template is preferable is another issue. AManWithNoPlan (talk) 23:26, 22 January 2019 (UTC)

The problem is, the online version doesn't give page numbers in the paper edition, and I am not citing a book, I'm citing a web page. I've never seen the book. I don't know why a bot is overriding legitimate editor choice of template. To me it seems a perverse outcome. Peacemaker67 (click to talk to me) 23:49, 22 January 2019 (UTC)

See User:Citation bot/use#... the bot made a mistake?, third example. Headbomb {t · c · p · b} 02:35, 23 January 2019 (UTC)

Thanks Headbomb! Peacemaker67 (click to talk to me) 02:55, 23 January 2019 (UTC)

I am shocked that works. I forgot I added support for comments in the template type a while ago. I will blame being 35000 feet up in the air. AManWithNoPlan (talk) 03:15, 23 January 2019 (UTC)

{{fixed}} flag for archiving AManWithNoPlan (talk) 05:09, 23 January 2019 (UTC)

Fails to remove location in cite journal

Status: new bug
Reported by: Headbomb {t · c · p · b} 18:47, 22 January 2019 (UTC)

What should happen: [23]
Relevant diffs/links: [24]
We can't proceed until: Feedback from maintainers

It converts the place to location after the removal of location occurs. AManWithNoPlan (talk) 19:00, 22 January 2019 (UTC)

I am prone to leave code as it is. Otherwise we start looping over stuff again and again just for a few obscure edge cases. AManWithNoPlan (talk) 21:31, 22 January 2019 (UTC)

{{wontfix}} not a high priority. Requires location and publication-place to both be set. AManWithNoPlan (talk) 21:21, 23 January 2019 (UTC)

Two issues with book chapter on JSTOR

https://en.wikipedia.org/w/index.php?title=User%3AJosve05a%2Fcite-sandbox&diff=prev&oldid=879725824

The link is converted from bare to a |doi= when it should have been added as a |jstor= (as well/instead).
{{cite book}} with |chapter= should be used, not {{cite journal}}

(t) Josve05a (c) 00:25, 23 January 2019 (UTC)

number one causes number two. AManWithNoPlan (talk) 01:32, 23 January 2019 (UTC)

much enhanced jstor support added. More coming. {{fixed}} AManWithNoPlan (talk) 15:05, 24 January 2019 (UTC)

Expand bare doi templates

<ref>{{doi|10.1111/jep.12752}}</ref> should be treated as |doi=10.1111/jep.12752 (t) Josve05a (c) 00:34, 23 January 2019 (UTC)

That could be generalized to other identifiers too. Headbomb {t · c · p · b} 14:48, 23 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1255 AManWithNoPlan (talk) 18:48, 24 January 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 19:36, 24 January 2019 (UTC)

Bare converting

Meanwhile, I've checked the result of the recent bare ref conversion change and I've not found any mistake, only good edits. Special:Diff/879655266, Special:Diff/879653934, Special:Diff/879649478, Special:Diff/879639416, Special:Diff/879626624, Special:Diff/879617148, Special:Diff/879616120, Special:Diff/879615613, Special:Diff/879614147, Special:Diff/879613874, Special:Diff/879611681, Special:Diff/879611148, Special:Diff/879609896, Special:Diff/879601520, Special:Diff/879598740, Special:Diff/879592444, Special:Diff/879590812, Special:Diff/879581261, Special:Diff/879574435, Special:Diff/879568947, Special:Diff/879566233. Nemo 17:32, 22 January 2019 (UTC)

Things like this are fine, but edits like this or this are... iffy and can create WP:CITEVAR issues. Headbomb {t · c · p · b} 18:12, 22 January 2019 (UTC)

That said, I do love the feature, but I would restrict it with an API call / additional checkbox in [25] so that usage is intentional, and users are warned to only use this on articles they plan to fully cleanup citations after the bot. Headbomb {t · c · p · b} 18:22, 22 January 2019 (UTC)

Perhaps only do it if there are at least two citation templates on the page already? AManWithNoPlan (talk) 18:32, 22 January 2019 (UTC)

Still iffy. The first one is a bare link, but when it tries to reformat a manual citation, you'll get in trouble and some will demand heads on pikes. An 'advanced' checkbox would probably be OK, but by default this is likely too risky. Headbomb {t · c · p · b} 18:36, 22 January 2019 (UTC)

At least in Special:PermaLink/879639416, however, the citation ends up using a style consistent with the pre-existing one and all the others, which is why the specific case seemed fine to me. Isn't it? As for the general case, there ought to be a way to check whether the existing references use an inconsistent style or a (non-)style falling outside the realm of "where Wikipedia does not mandate". Nemo 19:03, 22 January 2019 (UTC)

"With the pre-existing ones and all the others?" Hardly so. Headbomb {t · c · p · b} 20:07, 22 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1250 best I can do on a phone in an airport AManWithNoPlan (talk) 21:02, 22 January 2019 (UTC)

that pull is now active {{fixed}} AManWithNoPlan (talk) 13:49, 25 January 2019 (UTC)

Please remove this jstor proxy

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 18:25, 24 January 2019 (UTC)

What happens: {{cite journal |last1=Benedict |first1=Ruth |title=Reviewed Work: An Apache Life-Way: The Economic, Social, and Religious Institutions of the Chiricahua Indians by Morris E. Opler |journal=American Anthropologist |series=New Series |date=October–December 1942 |volume=44 |issue=4, Part 1 |pages=692–693 |url=https://www-jstor-org.rp.nla.gov.au/stable/663315 |accessdate=17 January 2019 }}
What should happen: {{cite journal |last1=Benedict |first1=Ruth |title=Reviewed Work: An Apache Life-Way: The Economic, Social, and Religious Institutions of the Chiricahua Indians by Morris E. Opler |journal=American Anthropologist |series=New Series |date=October–December 1942 |volume=44 |issue=4, Part 1 |pages=692–693 |url=https://www-jstor-org.rp.nla.gov.au/stable/663315 |accessdate=17 January 2019 }}
We can't proceed until: Feedback from maintainers

They did not include proxy in their url, annoying. AManWithNoPlan (talk) 18:46, 24 January 2019 (UTC)

Will add code to see www-jstor-org.some-stuff/stable/1234 AManWithNoPlan (talk) 18:47, 24 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1258 AManWithNoPlan (talk) 22:41, 24 January 2019 (UTC)

Final Ed

Status: {{fixed}} submitted
Reported by: Headbomb {t · c · p · b} 03:30, 25 January 2019 (UTC)

What happens: [Arch Dis Child Fetal Neonatal ed]
What should happen: [Arch Dis Child Fetal Neonatal Ed]
Relevant diffs/links: [26]
We can't proceed until: Feedback from maintainers

i Don’t remember why Ed is “ed”. Might just add the string as a whole. AManWithNoPlan (talk) 13:50, 25 January 2019 (UTC)

because of 2nd ed.? (t) Josve05a (c) 14:10, 25 January 2019 (UTC)

that’s right. AManWithNoPlan (talk) 15:07, 25 January 2019 (UTC)

It's also a preposition in some Latin languages I think. Italian maybe. Headbomb {t · c · p · b} 20:22, 25 January 2019 (UTC)

In English ED is something else... 🤣😲😂🙄 AManWithNoPlan (talk) 20:25, 25 January 2019 (UTC)

What's so funny about ED? Headbomb {t · c · p · b} 20:39, 25 January 2019 (UTC)

Wrong date format

Status: new bug
Reported by: PamD 07:08, 27 January 2019 (UTC)

What happens: bot added date to a citation using deprecated format 2014-05-30 instead of 30 May 2014.
Relevant diffs/links: Special:Diff/880375039
We can't proceed until: Feedback from maintainers

Deprecated? In what parallel universe? https://xkcd.com/1179/ Nemo 10:22, 27 January 2019 (UTC)

🤣😂😁 {{notabug}} AManWithNoPlan (talk) 14:03, 27 January 2019 (UTC)

seriously, that page is flagged to demand the use of the date format that the bot used. I wish everyone would use yyyy-mm-dd for computer stuff. In my writing I use 7 MAY 2001 format. AManWithNoPlan (talk) 14:08, 27 January 2019 (UTC)

Publisher removed

Status: new bug
Reported by: MB 15:36, 27 January 2019 (UTC)

What happens: publisher fields removed, I don't see why, they are not redundant
Relevant diffs/links: [27]
We can't proceed until: Feedback from maintainers

@MB: You will be interested in this current discussion, I imagine. --Izno (talk) 15:56, 27 January 2019 (UTC)

As is often the case, the publisher of that journal has changed multiple times. AManWithNoPlan (talk) 17:49, 27 January 2019 (UTC)

@MB: I should note that there was consensus for over a decade to always remove publishers and recently this consensus is being challenged. I only note this since multiple are people incorrectly believe that this is a new feature of the Bot. AManWithNoPlan (talk) 18:00, 27 January 2019 (UTC)

Subscription site / bad title

Status: {{fixed}}
Reported by: Redalert2fan (talk) 19:39, 26 January 2019 (UTC)

What happens: title= "Subscribe to read" is added, which is not a valid title
What should happen: skip when title = "Subscribe to read" / do nothing
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Hyundai_Merchant_Marine&type=revision&diff=880315328&oldid=877907681
We can't proceed until: Feedback from maintainers

The correct title is actually on the reference page, it can be found after "Subscribe to the FT to read:" in this case its: "Hanjin bankruptcy brings chaos but no capacity cut". Not sure if its feasible/possible to make the bot search for this. It seems to be like this for all Financial Times pages (ft.com) so preventing all links to that site from being edited by the bot is also a possibility. Redalert2fan (talk) 19:39, 26 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1261 AManWithNoPlan (talk) 20:11, 26 January 2019 (UTC)

Invalid ISO dates added

Status: {{fixed}}
Reported by: Keith D (talk) 15:21, 27 January 2019 (UTC)

What happens: Adds an ISO style date without leading zero's.
What should happen: ISO dates should always have leading zeros supplied.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Sidoarjo_mud_flow&diff=880374680&oldid=878960216
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1262 when we know, we will pad now (after this on wikipedia of course). AManWithNoPlan (talk) 18:56, 27 January 2019 (UTC)

not does not fully report actions reasons

Status: {{wontfix}}
Reported by: RobDuch (talk) 20:50, 28 January 2019 (UTC)

What happens: Citation bot removed both an access date and complete URL from a "Cite journal" template with note "Removed accessdate with no specified URL."
What should happen: Nothing, as there was both an access date and URL in the cite template
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Caretaker_(military)&diff=880593420&oldid=879672200
We can't proceed until: Feedback from maintainers

See bug report template. Both an access date and a complete URL were removed by Citation Bot from a "Cite journal" template. RobDuch (talk) 20:50, 28 January 2019 (UTC)

The url removal is described merely as parameters removed. The reason, which the person using the bot would see, that the url is removed because it is redundant with the DOI. AManWithNoPlan (talk) 21:11, 28 January 2019 (UTC)

The description of what is done is always a hard to describe since it is summarizing possibly 40 changed templates with 100 changes in one line. That is impossible to get right every time. AManWithNoPlan (talk) 22:06, 28 January 2019 (UTC)

More invalid dates added

Status: {{fixed}}
Reported by: Keith D (talk) 22:06, 28 January 2019 (UTC)

What happens: adds invalid dates in this case |date=1899 1899-1985
What should happen: Only add dates that are valid and comply with the MOS.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Doris_Holmes_Blake&diff=880651882&oldid=840011180
https://en.wikipedia.org/w/index.php?title=E._Yale_Dawson&diff=880651603&oldid=826309884

https://en.wikipedia.org/w/index.php?title=Helmut_Karl_Buechner&diff=880655437&oldid=837624460
https://en.wikipedia.org/w/index.php?title=Wallace_Roy_Ernst&diff=880655720&oldid=802573936

We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1264 AManWithNoPlan (talk) 01:18, 29 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1265 also AManWithNoPlan (talk) 17:36, 29 January 2019 (UTC)

CrossRef gives bad last=&Na? Please reject

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 05:31, 29 January 2019 (UTC)

What happens: [28]
What should happen: ?
We can't proceed until: Feedback from maintainers

This is a possible placeholder / shorthand for no authors or N/A. Maybe. Headbomb {t · c · p · b} 05:31, 29 January 2019 (UTC)

That makes sense. I will look at it. AManWithNoPlan (talk) 05:52, 29 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1270 AManWithNoPlan (talk) 17:19, 29 January 2019 (UTC)

Odd edits

Citation bot is making odd changes to references like this where it converts a {{cite journal}} to a {{cite book}} (when the reference in question very much is a journal, not a book) and removes valid publisher information. See also here where the bot simply removed parameters with no discernible reason. Can anyone explain why the bot is doing this? Parsecboy (talk) 12:40, 29 January 2019 (UTC)

The first edit is a bit strange, a alleged journal with an ISBN. Worldcat and Google Books seem to indicate that it is a book in a series rather than a typical journal. The bot might be able to be coded to avoid doing what it did, since context is important.

Removing the publisher parameter from journal citations is a long-standing feature. I'm not saying I support it, just explaining. See this RFC for more information. – Jonesey95 (talk) 14:44, 29 January 2019 (UTC)

So, the bot removed the publisher since it is a journal and changed the type since it is a book. So, is is a bournal or jook? I am only 90% joking. The distinction is not always clear between a series of books and a journal. AManWithNoPlan (talk) 15:28, 29 January 2019 (UTC)

{{notabug}} data from databases is not clear. AManWithNoPlan (talk) 14:01, 30 January 2019 (UTC)

New feature coming - more DOIs

There are ten different DOI providers. We have always supported Crossref. We added more recently. Now even more are coming. We also are adding tests for the ones that don’t work so we know if they suddenly start working and can check for bugs. Who knew that movies had dois? And no, we don’t expand the black panther marvel movie doi even with the new code. https://github.com/ms609/citation-bot/pull/1253 AManWithNoPlan (talk) 18:18, 26 January 2019 (UTC)

{{fixed}} and running great. AManWithNoPlan (talk) 14:48, 31 January 2019 (UTC)

Citation template changes

Status: {{fixed}}
Reported by: — TAnthony^Talk 18:38, 30 January 2019 (UTC)

What happens: {{cite web}} is incorrectly changed to {{cite book}} in two Kirkus Reviews citations; this is never a good idea because the parameters in each are intentionally styled differently. For example, |title= in {{cite book}} italicizes, but |title= in {{cite web}} does not, because the title of a web article should not be automatically italicized in its entirety.
What should happen: {{cite web}} template should be left alone
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Ian_McDonald_(British_author)&diff=880956973&oldid=866979281
We can't proceed until: Feedback from maintainers

Looks like this issue is similar/related to the previously reported bug where {{cite web}} was changed to {{cite journal}}. What are the criteria with which this bot is changing citation templates from one to another? I think we can assume that most of these templates have been specifically chosen by editors, what is the bot supposed to be "fixing"? Thanks.— TAnthony^Talk 18:43, 30 January 2019 (UTC)

the website tells us that it is a book. I will look into it. Honestly editors generally don’t put much thought into which one they choose. AManWithNoPlan (talk) 18:48, 30 January 2019 (UTC)

Well they are online book reviews, not books. And I've found that even sloppy editors can see the difference between cite web and cite book, I do a lot of citation cleanup and have almost never had to change a template like this.— TAnthony^Talk 18:51, 30 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1274 should catch a lot of them once active. AManWithNoPlan (talk) 21:54, 30 January 2019 (UTC)

Washington, D.c

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 02:59, 31 January 2019 (UTC)

What happens: [Washington, D.c]
What should happen: [Washington, D.C.]
Relevant diffs/links: [29]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1275 Will not remove trailing period, if there is another period in the last word. AManWithNoPlan (talk) 03:31, 31 January 2019 (UTC)

It should never remove the trailing period in a journal... too much WP:CONTEXT, for cases like Int. J. Med. Sci. or something. Headbomb {t · c · p · b} 20:56, 31 January 2019 (UTC)

Citoid discus

Citoid usage discussion on MediaWiki.org

{{wontfix}} — flag to archive

URLs containing an ISSN-DOI

Status: {{fixed}}
Reported by: Randykitty (talk) 10:06, 29 January 2019 (UTC)

What happens: See here. In this case these were correct "cite web" links, but because the URLs were the same ISSN-DOi links that posed problems earlier, the bot changes this to "cite journal" links. The URL still redirects to the correct place, but the DOI doesn't. It incorrectly replaces the "work" field with a "journal" field (there is no journal with the title "Wiley Online Library"...).
What should happen: In cases like this, all the bot needs to do is either to leave the reference as is, or replace the ISSN-DOI URL with the URL that it now redirects to (in this case https://onlinelibrary.wiley.com/page/journal/10990739/homepage/EditorialBoard.html).
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Applied_Organometallic_Chemistry&curid=11763255&diff=880710379&oldid=876944349
We can't proceed until: Feedback from maintainers

The DOI is valid and points to the correct journal, but you are write that these ISSN only DOIs are probablematic and should probably be 100% ignored. AManWithNoPlan (talk) 16:57, 29 January 2019 (UTC)

Code to simply not ever add those. https://github.com/ms609/citation-bot/pull/1269 AManWithNoPlan (talk) 17:02, 29 January 2019 (UTC)

The bot adds the doi because of the ISSN in the URL. However, the doi goes to the journal mainn page, even if the URL was pointing to another page (e.g. the listing of the editorial board). What was being referenced here was a page on the journal's website, not an article published in the journal, so "cite web" was correct and "cite journal" is not. Note that the ISSN-containing URL has been abandoned by Wiley and pages have gotten new URLs that doon't contain the ISSN. The old URLs are still functional, they are rediected to the new (non-ISSN) URL. Ideally, the bot would replace the old URL with the new one, but I have no idea how easy/difficult that is. If it's too hard, the bot should leave these instances alone. --Randykitty (talk) 17:07, 29 January 2019 (UTC)

Sorry, but this is not fixed. I just reverted the above diff and ran the bot again, with almost the same result, except that there are now erroneous DOIs and it still is changed to "cite journal"... --Randykitty (talk) 15:15, 30 January 2019 (UTC)

The perennial problem with GIGO: there is always another code path AManWithNoPlan (talk) 15:42, 30 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1273 AManWithNoPlan (talk) 21:56, 30 January 2019 (UTC)

Much better now these too will help:

https://github.com/ms609/citation-bot/pull/1279
https://github.com/ms609/citation-bot/pull/1277
https://github.com/ms609/citation-bot/pull/1278
https://github.com/ms609/citation-bot/pull/1280

So many small improvements for such a rare promblem. AManWithNoPlan (talk) 17:52, 31 January 2019 (UTC)

I have looked at those github links, but must admit my ignorance here and have no idea what all that means. Meanwhile, the bot is still doing this ([30]). I've corrected a few by hand, but that's quite tedious. This is such a wonderful tool and I really appreciate all the work and effort of you guys to keep this running, so I feel really bad to keep pestering you about this... --Randykitty (talk) 11:46, 2 February 2019 (UTC)
- until they are merged, they are not alive yet. AManWithNoPlan (talk) 13:59, 2 February 2019 (UTC)

Remove URL if DOI resolves to the same place

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 01:24, 15 January 2019 (UTC)

What should happen: [31]
We can't proceed until: Feedback from maintainers

Actually this could apply to any URLs that resolve to the same place as the DOI. Headbomb {t · c · p · b} 01:24, 15 January 2019 (UTC)

Indeed: according to https://www.crossref.org/blog/urls-and-dois-a-complicated-relationship/ , resolving the URL is needed for over a hundred publishers. A simple HTTP request with cookies enabled, plus some custom HTML parsing for the most frequent DOI prefixes, would go a long way. Nemo 14:14, 15 January 2019 (UTC)

thoughts on this approach https://github.com/ms609/citation-bot/pull/1260 AManWithNoPlan (talk) 19:07, 26 January 2019 (UTC)

What is 'this approach', exactly? Headbomb {t · c · p · b} 19:37, 26 January 2019 (UTC)

The linked code. I will convert to English. Drop url if:

Citation is complete 
The doi is not an ISSN-only doi (points to article not journal)
The url hostname is on the list canonical publishers
The url does not contain 'pdf', 'image', 'plate', 'figure', or 'picture'
The doi resolves to something

AManWithNoPlan (talk) 18:41, 27 January 2019 (UTC)

The title is slightly misleading because this code doesn't check at all whether there is a match, it just relies on whoever has previously compiled the template to have verified and stated an identity between the URL and the DOI. You could argue that if they didn't it's just GIGO (example) and that this assumption works in the large majority of cases but I'd be curious if it's 99 %, 95 % or 80 % or whatever. Maybe I'll run some regex on the dumps so that whoever wants can check a sample of URLs.

Speaking of which, it may be helpful to use a slightly different constant than CANONICAL_PUBLISHER_URLS, where several domains are unlikely to be the target of a DOI: for instance link.springer.com receives nearly all DOI redirects, while www.springer.com is more likely to contain journal descriptions where the URL patterns can get tricky. Nemo 20:02, 27 January 2019 (UTC)

I have changed the pull to now also check if the doi url matches the url in the template and also if it matches what the the url redirects to when actually polled. AManWithNoPlan (talk) 01:20, 29 January 2019 (UTC)

Please try to detect open access database errors

Status: {{fixed}}
Reported by: David Eppstein (talk) 17:21, 6 February 2019 (UTC)

What happens: Citation bot adds a CiteSeerX link to a paper with the same title but a different (overlapping) author set and far different publication year. The authors and date of the paper are listed correctly on CiteSeerX but Citation bot fails to detect the inconsistency.
What should happen: Citation bot detects the inconsistent authors and year, doesn't consider the papers to be the same, and doesn't add the link
Relevant diffs/links: Special:Diff/882038136
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1294 Will raise the bar. AManWithNoPlan (talk) 05:01, 7 February 2019 (UTC)

Better publisher handling

https://github.com/ms609/citation-bot/pull/1286 documenting improvements {{fixed}}

Redlink title-link

Status: new bug
Reported by: David Eppstein (talk) 22:06, 7 February 2019 (UTC)

What happens: Citation bot adds title-link parameter on the title of a conference when that conference has no Wikipedia article to link to
What should happen: Only use title-link to create new links to existing articles
Relevant diffs/links: Special:Diff/882232314
We can't proceed until: Feedback from maintainers

I'm not convinced the citation in question belongs in the article at all, but that's beside the point. —David Eppstein (talk) 22:06, 7 February 2019 (UTC)

We do not add a new title link, we just convert the inline link to the superior |title= plus |title-link=. AManWithNoPlan (talk) 22:24, 7 February 2019 (UTC)

Ok, thanks for the clarification. That change seems harmless enough to me. —David Eppstein (talk) 22:29, 7 February 2019 (UTC)

It is really hard to see in the diff {{notabug}}. AManWithNoPlan (talk) 22:56, 7 February 2019 (UTC)

Changing publisher of a book series into a journal

Status: {{fixed}}
Reported by: David Eppstein (talk) 21:21, 7 February 2019 (UTC)

What happens: In citation template with contribution+title+series+publisher parameters, incorrectly changes publisher= to journal=
What should happen: contribution+title+series+journal is not a valid combination of parameters. Citation bot should recognize that it is converting a valid citation template into an invalid one and not do it, no matter what its source's metadata might say. GIGO is not an acceptable excuse for taking garbage from elsewhere and creating more of it here when it wasn't here already.
Relevant diffs/links: Special:Diff/882190191 (incidentally, at least one and possibly both of the CiteSeerX links added in the same diff appear to fail WP:ELNEVER)
We can't proceed until: Feedback from maintainers

That is annoying that their meta-data has the publisher listed as journal. I will investigate. AManWithNoPlan (talk) 21:30, 7 February 2019 (UTC)

Annoying metadata elsewhere or not, the bot should never turn valid citations on this site into invalid ones. —David Eppstein (talk) 21:35, 7 February 2019 (UTC)

Fix one: https://github.com/ms609/citation-bot/pull/1303 AManWithNoPlan (talk) 21:36, 7 February 2019 (UTC)

Fix two: https://github.com/ms609/citation-bot/pull/1304 AManWithNoPlan (talk) 21:41, 7 February 2019 (UTC)

arXiv vs eprint

Status: {{fixed}}
Reported by: David Eppstein (talk) 21:34, 7 February 2019 (UTC)

What happens: Replaces the documented and standard parameter arxiv=, in a citation for which the arXiv link is a courtesy link rather than the main publication venue of the reference, with the undocumented and obsolete parameters eprint= and class=
What should happen: Only use documented citation parameters
Relevant diffs/links: Special:Diff/882187390
We can't proceed until: Feedback from maintainers

According to the documentation, the bots actions are correct. {{cite arxiv}} is an odd beast that does things its own way. AManWithNoPlan (talk) 21:42, 7 February 2019 (UTC)

Converting |arxiv= to |eprint= could probably be removed at this point, since that dates back to a time where |arxiv= was not supported. The addition of |class= to a cite arxiv is fine though. Headbomb {t · c · p · b} 22:32, 7 February 2019 (UTC)

- While the conversion is technically correct, it is just one more pointless change to tick people off -- or at least confuse. Also, if the citation ever gets upgraded to {{cite journal}} we have to convert it back. AManWithNoPlan (talk) 22:55, 7 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1306 AManWithNoPlan (talk) 23:01, 7 February 2019 (UTC)

When expanding preprint into conference paper, deletes the paper title

Status: {{fixed}}
Reported by: David Eppstein (talk) 21:47, 7 February 2019 (UTC)

What happens: Citation turns correctly-titled arXiv preprint into conference proceedings paper missing its title
What should happen: restore paper title as contribution parameter of citation template
Relevant diffs/links: Special:Diff/882201706
We can't proceed until: Feedback from maintainers

I don't understand how this one happened. Citation bot did correctly find a publication matching the arXiv preprint. To do so, it must have matched title and authors, because that's the only information in common between the arXiv preprint and the published version. When I ask for bibtex metadata from doi.org, I get

@incollection{Grier_2013,
	doi = {10.1007/978-3-642-39206-1_42},
	url = {https://doi.org/10.1007%2F978-3-642-39206-1_42},
	year = 2013,
	publisher = {Springer Berlin Heidelberg},
	pages = {497--503},
	author = {Daniel Grier},
	title = {Deciding the Winner of an Arbitrary Finite Poset Game Is {PSPACE}-Complete},
	booktitle = {Automata, Languages, and Programming}
}

which does correctly include the title of the paper (but not the series). So the information was obviously there. But Citation bot chose to remove it. —David Eppstein (talk) 21:47, 7 February 2019 (UTC)

Two points, we check DOIs in this order: 1. CrossRef 2. dx.doi.org JSON (not bibtex) 3. Zotero on the website itself (yuck!). So, you information is doubly irrelevant, it is not the dx.doi.org JSON, and we use CrossRef. We get this: AManWithNoPlan (talk) 22:02, 7 February 2019 (UTC)

<isbn type="print">978-3-642-39205-4</isbn>
<isbn type="electronic">978-3-642-39206-1</isbn>
<issn type="print">0302-9743</issn>
<issn type="electronic">1611-3349</issn>
<series_title>Lecture Notes in Computer Science</series_title>
<volume_title>Automata, Languages, and Programming</volume_title>
<volume>7965</volume>
<contributors>
<contributor sequence="first" contributor_role="author">
<given_name>Daniel</given_name>
<surname>Grier</surname>
</contributor>
</contributors>
<component_number>Chapter 42</component_number>
<year media_type="print">2013</year>
<first_page>497</first_page>
<last_page>503</last_page>
<doi type="book_content">10.1007/978-3-642-39206-1_42</doi>
<publication_type>full_text</publication_type>
<article_title>
Deciding the Winner of an Arbitrary Finite Poset Game Is PSPACE-Complete
</article_title>

Time to dig through the CrossRef parsing code AManWithNoPlan (talk) 22:04, 7 February 2019 (UTC)

This should fix it. I included a test too. https://github.com/ms609/citation-bot/pull/1305 AManWithNoPlan (talk) 22:22, 7 February 2019 (UTC)

Removes access date when there is no urls

Status: {{notabug}}
Reported by: Micromesistius (talk) 02:17, 9 February 2019 (UTC)

What happens: Access date should not be removed from citations to IUCN Red List assessments. Assessments get updated, and it is useful to now when an editor has checked that the information present in Wikipedia is the most updated available. A 2004 assessment recently accessed is likely to up-to-date, whereas one with ancient access date is more likely to need an update.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Cophixalus_tagulensis&curid=12398551&diff=882429619&oldid=857233552
We can't proceed until: Feedback from maintainers

Get a better url. DOIs have not access dates. AManWithNoPlan (talk) 02:23, 9 February 2019 (UTC)

The assessment date is clear. '2004'. No need for an accessdate. Change |date=2004 to |date=30 April 2004 to be more specific. Headbomb {t · c · p · b} 02:50, 9 February 2019 (UTC)

More ISSN DOI

Not sure if this is still happening: [32]. Nemo 10:10, 10 February 2019 (UTC) {{fixed}}

better etal handling

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 04:10, 7 February 2019 (UTC)

What happens: If |(author|first|last)\d?=et\s*al\.?, replace with |display-authors=etal. Similar for |display-editors=
What should happen: [33]
We can't proceed until: Feedback from maintainers

See the actual test in Module:Citation/CS1 at local function name_has_etal (name, etal, nocat).
The naive suggested implementation above can cause duplicate parameters (as in display-authors is already set and/or happens to be set to the exact number of authors in the list i.e. author1, 2, and display-authors=2 is set), or it can cross over into pages listed in Category:CS1 maint: display-authors. You can find some of the former in the contribution history there. I would say this is a bit context sensitive, which is why it's not an error at this time. Trappist the monk might have an opinion. --Izno (talk) 04:22, 7 February 2019 (UTC)

E.g. this one. Here is another fixed GIGO of a different sort i.e. the name separators. You also need or want to catch italics, which I've had a few to do. --Izno (talk) 04:30, 7 February 2019 (UTC)

Then for editors, there's also the 2 or 3 different ways to use the parameters. This one has internal numbers. A different one may have external numbers with/without dash i.e. |editorfirst1. --Izno (talk) 04:33, 7 February 2019 (UTC)

Consider the regex above to be pseudocode for the general idea, rather than finalized solution. Headbomb {t · c · p · b} 04:36, 7 February 2019 (UTC)

Sure, just there's some falls to be aware of.

Also, one other thing I've been doing in the run is taking care of uses of |authors= where I see it, which are often used in combination. --Izno (talk) 04:43, 7 February 2019 (UTC)

And then there's dumb garbage like this. --Izno (talk) 05:18, 7 February 2019 (UTC)

Perhaps this is better done as a separate task and not this BOT? AManWithNoPlan (talk) 17:48, 7 February 2019 (UTC)

Well GIGO can be handled by a different bot/AWB thing, but cases similar to the ones I linked in the diff should be able to be handled by this bot relatively easily. Headbomb {t · c · p · b} 18:09, 7 February 2019 (UTC)

This will handle the simplest cases: https://github.com/ms609/citation-bot/pull/1302 AManWithNoPlan (talk) 21:22, 7 February 2019 (UTC)

Labour / Le Travail

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 20:47, 8 February 2019 (UTC)

What happens: Labour / le Travail
What should happen: Labour / Le Travail
Relevant diffs/links: [34]
We can't proceed until: Feedback from maintainers

In general '/' should be treated the same way as ':' is. Headbomb {t · c · p · b} 20:47, 8 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1314 AManWithNoPlan (talk) 21:44, 8 February 2019 (UTC)

Bad title

Status: examples {{fixed}}
Reported by: Redalert2fan (talk) 22:25, 8 February 2019 (UTC)

What happens: title= Archived copy is changed in to title= "Zoeken in over NA na een 404" which makes no sense in Dutch, it literally translates to "Search in over NA after a 404"
What should happen: no change should be made
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Kingdom_of_the_Netherlands&curid=18949613&diff=882412987&oldid=881997900
We can't proceed until: Feedback from maintainers

The "dead" page contains "Deze pagina is niet gevonden" which means "this page was not found", While the archived copy is a pdf which does not seems to contain a specific title (other than the file name). Redalert2fan (talk) 22:25, 8 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1315/files will fix these specific examples once merged. And this is why lazy webservers that simply say "page not found" but do not set the error code are a bad idea. AManWithNoPlan (talk) 23:06, 8 February 2019 (UTC)

I think I found one more for you, exactly the same style of problem. This time its in Vietnamese diff. title= "Bao phu nu - Đọc báo phụ nữ Việt Nam online tin tức mới nhất 24h" is added. I don't speak Vietnamese but according to google translate this means "Bao phu nu - Read newspaper Vietnamese women online latest news 24h" which seems like a title for the whole website and not for the specific article. By looking at the link a correct title should be something like "Bao-Trung-Quoc-noi-ve-may-bay-tuan-tieu-M28-cua-Viet-Nam". Thanks Redalert2fan (talk) 19:45, 9 February 2019 (UTC)

Fix line feeds in titles

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 01:20, 9 February 2019 (UTC)

What happens: [35]
What should happen: [36]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1317 AManWithNoPlan (talk) 01:31, 9 February 2019 (UTC)

Just make sure they're not start/end of line. Headbomb {t · c · p · b} 01:45, 9 February 2019 (UTC)

Miscapitalized journal

Status: {{fixed}}
Reported by: David Eppstein (talk) 06:52, 9 February 2019 (UTC)

What happens: "J. SIAM" (correct) changed to "J. Siam" (incorrect)
What should happen: If you're going to make your usual excuse of "can't fix it because some other web site somewhere has bad metadata" then every edit needs to have the source of the metadata clearly identified so that the garbage can be traced back to its source. In this case, I checked the (JSON) metadata from doi.org and got "Journal of the Society for Industrial and Applied Mathematics" so that's not where it comes from.
Relevant diffs/links: Special:Diff/882439896
We can't proceed until: Feedback from maintainers

Bad metadata for this is so common that we actually have a whole list of capitalization rules and exceptions . In fact it is so bad that we don’t trust the metadata and change the capitalization after we get it. AManWithNoPlan (talk) 14:11, 9 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1318 AManWithNoPlan (talk) 15:23, 9 February 2019 (UTC)

Titles from russianplanes.net

Status: {{notabug}} sadly.
Reported by: Redalert2fan (talk) 20:51, 9 February 2019 (UTC)

What happens: title= ✈ russianplanes.net ✈ наша авиация is added. ("наша авиация" means something like "our aircraft")
What should happen: https://en.wikipedia.org/w/index.php?title=Sukhoi_Superjet_100&diff=next&oldid=882541347
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Sukhoi_Superjet_100&diff=prev&oldid=882541347
We can't proceed until: Feedback from maintainers

Note: this happens with all aircraft pages like this from russianplanes.net. Thanks, Redalert2fan (talk) 20:56, 9 February 2019 (UTC)

Cannot fix. Tell Russian to give proper titles. AManWithNoPlan (talk) 23:14, 9 February 2019 (UTC)

I'll have to learn Russian then! haha. On a serious note would it be an option to block the title from being added? Redalert2fan (talk) 23:22, 9 February 2019 (UTC)

It is better than no title, so I am not sure. It is technically the correct title. AManWithNoPlan (talk) 23:28, 9 February 2019 (UTC)

Capitalization: AIAA Journal

Status: {{fixed}}
Reported by: Redalert2fan (talk) 21:57, 9 February 2019 (UTC)

What happens: Aiaa Journal is added
What should happen: AIAA Journal
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Wright_Flyer&curid=1045608&diff=882550736&oldid=882423908
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1318 AManWithNoPlan (talk) 23:15, 9 February 2019 (UTC)

Alternative ID used as page number

Status: new bug
Reported by: Nemo 19:25, 10 February 2019 (UTC)

What happens: Page "1-5" is replaced with "89017", which on http://api.crossref.org/works/10.1155/2007/89017 is listed as "alternative-id". True, the page range only tells about the number of pages as all the articles on this (digital-only) volume apparently start from p. 1, cf. http://api.crossref.org/works/10.1155/2007/17315
What should happen: Leave the page number alone?
Relevant diffs/links: special:diff/882688292
We can't proceed until: Feedback from maintainers

The full page number is 89017-1–89017-5. So which is more useful? AManWithNoPlan (talk) 00:50, 11 February 2019 (UTC)

we use open url API since it allows bots. AManWithNoPlan (talk) 02:05, 11 February 2019 (UTC)

it's pubmed actually. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?tool=DOIbot&email=martins@gmail.com&db=pubmed&id=18317533 AManWithNoPlan (talk) 18:01, 11 February 2019 (UTC)

Well, it's not wrong and PubMed has its reasons I guess, so it could be left as is. Nemo 18:15, 11 February 2019 (UTC)

we generally leave as is, but bogus preprint page ranges with one as the first are too common. {{notabug}}. AManWithNoPlan (talk) 18:18, 11 February 2019 (UTC)

Incorrect date

Status: {{fixed}}
Reported by: Redalert2fan (talk) 19:33, 9 February 2019 (UTC)

What happens: date= 2019-10-03 is added which is not correct (and in the future).
What should happen: Correct article date is 03 october, 2016.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=PAC_P-750_XSTOL&type=revision&diff=882532903&oldid=849926223
We can't proceed until: Feedback from maintainers

I cannot reproduce it. Very odd. AManWithNoPlan (talk) 23:28, 9 February 2019 (UTC)

Quite interesting, I also tried running it again and for the link it gave "Operation timed out after 10001 milliseconds with 0 bytes received" but in that case last time it probably didn't time out. Thanks for taking a look. Redalert2fan (talk) 23:35, 9 February 2019 (UTC)

no wonder I couldn't reproduce it, the bot timed out. AManWithNoPlan (talk) 00:21, 10 February 2019 (UTC)

Could always have a "future date" date where anything 2 days in the future doesn't get added. Headbomb {t · c · p · b} 00:28, 10 February 2019 (UTC)

lots of magazines and journals have future dates. Might be best to put off adding them. AManWithNoPlan (talk) 01:15, 10 February 2019 (UTC)

Holy crud strtotime('3 October, 2016') gives that date!!!! AManWithNoPlan (talk) 20:11, 11 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1329 AManWithNoPlan (talk) 20:34, 11 February 2019 (UTC)

Capital om

Status: {{fixed}}
Reported by: Nemo 10:35, 10 February 2019 (UTC)

What happens: Danish om (about) gets capitalised
What should happen: Maybe leave it alone, or maybe it doesn't matter?
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Paratypothorax&diff=prev&oldid=882610848
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1330 AManWithNoPlan (talk) 23:59, 11 February 2019 (UTC)

Removes accessdate for citations with chapterurl

Status: {{fixed}}
Reported by: Sounder Bruce 23:01, 10 February 2019 (UTC)

What happens: The bot removes accessdates from citations that use chapterurl instead of the standard url. The parameter can be used as a standalone, especially when citing things like legislative texts (as my example shows). This bug was previously reported in 2015, but was withdrawn.
Relevant diffs/links: Special:Diff/882657806
We can't proceed until: Feedback from maintainers

I wonder when that broke? AManWithNoPlan (talk) 00:43, 11 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1324 AManWithNoPlan (talk) 02:05, 11 February 2019 (UTC)

Adds No Authorship Indicated

Status: {{fixed}}
Reported by: Nemo 12:18, 11 February 2019 (UTC)

What happens: "No Authorship Indicated" is added to last1=
What should happen: Ignore as bad data.
Relevant diffs/links: special:diff/882801301
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/1326 AManWithNoPlan (talk) 17:05, 11 February 2019 (UTC)

Broken links to www3.interscience.wiley.com

I noticed we have some 1000 links to www3.interscience.wiley.com/cgi-bin/ which seem to all give an HTTP 403 error. Do they work for anyone? Should they be removed? Is it a job for a bot? For this bot or some other? Nemo 09:35, 8 February 2019 (UTC)

I just checked them from a computer that has a subscription to a multitude of journals. They do not work and they should be removed. AManWithNoPlan (talk) 15:55, 8 February 2019 (UTC)

{{wontfix}} by this bot. Some other bot should grab them all. Verify they are dead and then remove. AManWithNoPlan (talk) 17:01, 11 February 2019 (UTC)

Don't change page numbers to reflect entire range of article

Status: new bug
Reported by: Umimmak (talk) 21:27, 13 February 2019 (UTC)

What happens: The bot changes the page numbers to reflect what pages the entire article can be found, overwriting any preexisting page numbers which direct the reader just to the relevant pages of said article.
What should happen: The citation should continue just displaying pages in the cited source containing the information that supports the article text. to quote Help:Citation Style 1#Pages, or A range of pages in the source that supports the content. to quote Template:Cite journal.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Cartwrightia&oldid=883110969
We can't proceed until: Feedback from maintainers

This was already partly addressed, perhaps a regression? Some more complicated example which may be useful for additional unit testing: [37]. Nemo 22:21, 13 February 2019 (UTC)

This has never been addressed. Addressing it has been discussed and the code is written, but it is not deployed to wikipedia. AManWithNoPlan (talk) 00:08, 14 February 2019 (UTC)

Flag for archiving : {{Duplicate Issue}}

Not a bug! You appear to be referring to the cite/citation |pages= parameter, which is supposed to be a range, as appropriate for the full citation. And not the in-source specifier of where specific material is to be found, which is appropriate for individual (and multiple) short-cites within the article.

I suspect your complaint stems from this edit, which replaced things like "|pages= 64, 66, 70" and "|pages= 396, 422" with "pages= 55–76" and "pages= 381–429". This is an instance of the perennial trying to "reuse a citation" with "named-refs". The problem is that while the "<ref name=" construction can make a note appear in more than one point in the text, it is still just one note applied to multiple, and usually differing, instances. The proper solution is to use short-cites (such done with the {{harv}} family of templates), which can be individually customized.

The problem here is you don't want to lose the specific page information. Which I think is legitimate. The proper way to preserve that information is put them into short-cites. But that can't be done in the bot, as the correct page number to use at each point in the text is indeterminable. E.g., one of the examples above has three page numbers, and appears in two places. Correct assignment of those page numbers requires comparison of the text with the source at each location. Until someone comes along to do that, I would like to suggest the following: that the incorrect page "range" being replaced be preserved as a comment. Also: we should have a maintenance category for such misplaced in-source specifiers. ♦ J. Johnson (JJ) (talk) 00:23, 14 February 2019 (UTC)

It could easily be a bug, in the (not infrequent) case that the doi goes to a collection of smaller articles and the citation goes to an individual one of those smaller articles. For instance, some journals publish collections of book reviews under a single doi, but each review within that collection has its own smaller page range and its own author. Example:

Colley, Susan Jane (May 2013), "Review of The Manga Guide to Linear Algebra", Book Reviews, The College Mathematics Journal, 44 (3): 244–247, doi:10.4169/college.math.j.44.3.241, JSTOR 10.4169/college.math.j.44.3.241

I would be quite annoyed if I found Citation bot "fixing" these by expanding the page range to the whole book review column given by the metadata for the doi (pp. 241–247 in this example).

Also, putting detailed page information into short-cites only works for citation styles that use both short-cites and long-cites. Because our citation templates are unable to handle it, my usual solution for citing specific material within a longer journal paper is to write it out in untemplated text after the template. —David Eppstein (talk) 00:40, 14 February 2019 (UTC)

dead discussion

should publisher be removed – discussion about the above discussion

{{fixed}} - discussion above archives, so archive our link to it

not directly related discussion

merging subscription neeeded into cite templates

{{notabug}} looks like they have it all under control.

Again converts good combination of parameters to bad combination

Status: {{fixed}}
Reported by: David Eppstein (talk) 18:04, 10 February 2019 (UTC)

What happens: The bot converts a citation template with |title=/|work= parameters (where the |title= is a conference paper and the |work= is the proceedings title) to |chapter=/|title=/|work= (moving paper title to |chapter= and conference proceedings title to |title= but leaving |work= in place. The original |title=/|work= is not the best coding but is a valid combination of parameters. The changed |chapter=/|title=/|work= is an invalid combination, the citation template complains about it, and in addition it fails to display the chapter.
What should happen: Citation bot should never convert a template with a valid combination of parameters to a template with an invalid combination of parameters.
Relevant diffs/links: Special:Diff/882648141
We can't proceed until: Feedback from maintainers

CS2 sucks. I think I have a solution, I can work on. AManWithNoPlan (talk) 02:29, 11 February 2019 (UTC)

I'm sure nothing would be different if the template also had |mode=cs1. So it's not the style, but the all-in-one template parameterization that you're complaining about. But that has its advantages, too: for instance, that way you don't have quite as much of a problem with people using cite journal for conference papers. —David Eppstein (talk) 03:13, 11 February 2019 (UTC)

Yeah, CS1 encourages people to do wrong, CS2 encourages templates to guess wrong. AManWithNoPlan (talk) 17:02, 11 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1327 (I also already have added code to detect this specific instance (ie. it detects that Proc. === Proceedings)) AManWithNoPlan (talk) 17:57, 11 February 2019 (UTC)

url parameter removed

Status: {{notabug}}
Reported by: Dan Bloch (talk) 19:41, 13 February 2019 (UTC)

What happens: This change removed the "url=http://www.sciencemag.org/content/305/5683/503.full" parameter from a citation. From the comment it isn't clear that this was intentional or why it was done.
We can't proceed until: Feedback from maintainers

Urls that match the DOI are removed. AManWithNoPlan (talk) 21:23, 13 February 2019 (UTC)

url with "&" character in search query (books.google.com)

Status: new bug
Reported by: MarMi wiki (talk) 19:33, 14 February 2019 (UTC)

What happens: ...&q=%22House+&+garden%22+computer+Sutherland+1966&dq=... is trimmed to:
...&q=%22House+&dq=...
What should happen: q= shouldn't be trimmed
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Draft%3AECHO_IV&diff=prev&oldid=883327724
We can't proceed until: Feedback from maintainers

This is probably a Pale Moon browser fault which apparently doesn't encoded url properly. On SeaMonkey "&" is encoded as %26, and entering the full url with unencoded "&" trims it just like the bot did. (It apparently was a temporary browser glitch, because after testing in Pale Moon, url was properly encoded too) Cause found: automatic cite in Visual Editor decodes %26 in q= to "&" (VisualEditor/Feedback). --MarMi wiki (talk) 19:53, 14 February 2019 (UTC)

Thank you for following up. {{notabug}} AManWithNoPlan (talk) 20:03, 14 February 2019 (UTC)

I wouldn't mind if everything except id= and pg= were trimmed from Google Books links, but I think others disagree. Presumably, because this is a subject of editor disagreement, it shouldn't be overridden by the bot making a choice on what to trim. —David Eppstein (talk) 23:14, 14 February 2019 (UTC)

journal = Methods in Molecular Biology (Clifton, N.j.)

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 01:32, 9 February 2019 (UTC)

What happens: converts

Flanagan, JamesM. (2015-01-01). "Epigenome-Wide Association Studies (EWAS): Past, Present, and Future". Cancer Epigenetics. Methods in Molecular Biology. Springer New York. pp. 51–63. ISBN 978-1-4939-1803-4. {{cite book}}: |access-date= requires |url= (help); External link in |chapterurl= (help); Unknown parameter |chapterurl= ignored (|chapter-url= suggested) (help); Unknown parameter |editors= ignored (|editor= suggested) (help)

to

Flanagan, JamesM. (2015-01-01). "Epigenome-Wide Association Studies (EWAS): Past, Present, and Future". Cancer Epigenetics. Methods in Molecular Biology. Vol. 1238. Springer New York. pp. 51–63. doi:10.1007/978-1-4939-1804-1_3. ISBN 978-1-4939-1803-4. PMID 25421654. {{cite book}}: |journal= ignored (help); Unknown parameter |editors= ignored (|editor= suggested) (help)

What should happen: *Flanagan, JamesM. (2015-01-01). "Epigenome-Wide Association Studies (EWAS): Past, Present, and Future". Cancer Epigenetics. Methods in Molecular Biology. Vol. 1238. Springer New York. pp. 51–63. doi:10.1007/978-1-4939-1804-1_3. ISBN 978-1-4939-1803-4. PMID 25421654. {{cite book}}: Unknown parameter |editors= ignored (|editor= suggested) (help)
Relevant diffs/links: [38]
We can't proceed until: Feedback from maintainers

Most likely not fixable, will look at meta data AManWithNoPlan (talk) 02:34, 9 February 2019 (UTC)

You could specify an exception for that journal/series. It's really really common, and I need to cleanup about 30-40 conversions from some weird |journal=Methods in Molecular Biology → |series=Methods in Molecular Biology → |journal=Methods in Molecular Biology (Clifton, N.j) + |series=Methods in Molecular Biology → |journal= + |series=Methods in Molecular Biology cycle per dump. Headbomb {t · c · p · b} 02:55, 9 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1332 AManWithNoPlan (talk) 20:07, 12 February 2019 (UTC)

Incorrect Publisher Removed

Status: {{fixed}} won’t do it unless there is an identifier. ie. only when not needed
Reported by: RexxS (talk) 01:16, 10 February 2019 (UTC)

What happens: publisher removed without consensus
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Mouth-to-mouth_resuscitation&diff=882567370&oldid=880586917
We can't proceed until: Feedback from maintainers

That’s an interesting question. What should be done when a decade old consensus is challenged? Should we stop and wait or what. I don’t know. AManWithNoPlan (talk) 01:26, 10 February 2019 (UTC)

we know about that discussion. AManWithNoPlan (talk) 01:39, 10 February 2019 (UTC)

your example is funny since the publisher removed is wrong 😁🤣😲🤯 AManWithNoPlan (talk) 01:40, 10 February 2019 (UTC)

It almost always is because journals get purchased and repurchased over their history. Sure the Journal of Foo maybe be published by the Foo Society today, but in 2 years it might get published by Elsevier. Which means that all instances of Foo Society would need to be changed to Elsevier. This is one of the many reasons why it's completely pointless to include publisher information, against the advice of every style guide out there. Headbomb {t · c · p · b} 04:19, 10 February 2019 (UTC)

@AManWithNoPlan: My ability to assume good faith is stretched pretty much to the limit by the behaviour surrounding CitationBot recently (not yours in particular), but since you asked an apparently sincere question I'll make an effort to answer it in kind.

First of all, a consensus is not a consensus if you can't link to it. CitationBot has no bot authorization for removing these parameters, and there is no community discussion supporting removing them. That means that what you have is not a consensus, but mere absence of challenge. And I didn't challenge it back in 2009 because I had no idea CitationBot existed and never saw it edit: if I did I would have challenged it then. The argument that this behaviour has consensus is thus extremely weak. Lack of objections ("implied consensus") is the very weakest form of consensus to begin with, and lack of objection due to obscurity weakens it yet further. It is sufficient to support that CitationBot's behaviour over that time was in good faith, but not sufficient to lean on when objections became evident.

In addition, the long standing and strong consensus on Wikipedia, exemplified in BRD and CON etc., is that when any consensus (both strong and weak) is challenged, the status quo prevails until a new consensus is reached. But note what status quo means in this context: article content should remain the way it was and changing it is considered edit-warring, pointy, gaming, and generally disruptive behaviour. This is why I say that Smith609, Kaldari, and you are actually at peril of sanctions here! Once such edits are challenged, all edits should cease until consensus is reached! And in this case, not only are the edits challenged, but the first close of the RfC concluded that the consensus was against making these changes. Under these circumstances, the only constructive and collegial and respectful (of consensus, I mean) thing to do is to disable this function (or rule or module or however it's implemented) until the question is resolved. You should have done that the second the RfC was launched and waited for consensus to emerge, but if it didn't become clear to you sooner it certainly should have at the first close. It's always possible that consensus will turn out in your favour (unlikely at this point, yes, but by no means impossible), in which case you can re-enable the function afterwards and now with an actual consensus to back it up.

I'll add that if it is accurate that there's been a significant uptick in removals recently (after the start of the RfC, or, worse, after the first close that indicated consensus was against you) that would actually constitute using automated editing to enforce your preference against consensus and would have to end up at the drama boards. I really really hope that isn't the case, because the project never wins when that happens (at best we just limit the damage).

But that's why I say my ability to assume good faith is stretched to the breaking point where CitationBot is concerned: at every single crossroads its proponents make the choice concomitant with "What can I get away with?" and "How can I furthest advance my preference in spite of those pesky other editors?" and "I know better than those other editors that whine and complain.". I have so far seen not a single instance where the choice indicated any kind of respect for other editors or community consensus processes. It doesn't even matter if the community is wrong, by whatever standard you choose to apply: consensus and cooperation and respect for others' opinions is the fundament of how Wikipedia functions.

So apologies for the wall of text, but I really want CitationBot to succeed, because the state of citations on the project is shockingly bad and in desperate need of improvement. But not at the expense of fundamental pillars of the project. And all this, currently, over optional parameters that do no harm, even when used incorrectly, and are required in relatively few instances; and merely because they offend the sensibilities of a few (that is, the case against is essentially a style issue, much like whether commas or full stops separate datums in citations). Strident advocacy may appear to lead to "success", for CitationBot, in the short term; but in the long term it pretty much only leads to disruption, drama, and more loss of editors that we cannot afford. Please reconsider your (collective) priorities and mode of interaction with the wider community: I would love to be a cheerleader for CitationBot, but absent at least some measure of humility towards the community, that just cannot be. --Xover (talk) 08:32, 10 February 2019 (UTC)

The bot is user-activated. If you don't want the bot to remove publishers because of a misguided belief that this information belongs there, don't use the bot. Or put a comment in the publisher field. Headbomb {t · c · p · b} 08:55, 10 February 2019 (UTC)

Case in point. --Xover (talk) 09:46, 10 February 2019 (UTC)

"Consensus needs a link" is a common fallacy, to the point Wikipedia:Consensus#Achieving consensus disproves it in the first sentence: «Editors usually reach consensus as a natural process [...] Consensus is a normal and usually implicit and invisible process» (cf. 2009).

Personally I wish this feature wasn't there, because I think very few people care about it either way, but I accept that it's been there for a long while for a reason. As for the rest, maybe the more discussions there are the more popular a tool becomes (and vice versa)? Nemo 11:24, 10 February 2019 (UTC)

Just asserting that something is a fallacy does not make it so. That certain forms of consensus can be presumed from "implied consensus" does not mean all consensus must be implied or even that all consensus can be implied. And this is the second time I've had to ask you to refrain from strawman arguments: I even acknowledge implied consensus in the message you presumably read since you're replying to it, and explain why "implied consensus" is not sufficient foundation for mass automated edits against explicit consensus. Even the very policy you cite (selectively) explains that an implied consensus does not hold once challenged: at which point you're supposed to engage in consensus building before editing further. --Xover (talk) 15:49, 10 February 2019 (UTC)

I don’t have strong opinion, I am here to code. Wow! That’s a lot a explanation! My one opinion is that people should remove publisher and location (which are almost always wrong sadly) and wiki link to a page about the journal-and make it if needed: a permanent fix that makes Wikipedia better and everyone happy. I just find it funny that pretty much every one who complains is pointing to journals with incorrect publishers listed or journals so obscure that even that information won’t help much. AManWithNoPlan (talk) 14:06, 10 February 2019 (UTC)

My apologies: I misunderstood the intent of your previous comment. Since it was phrased as a question and accompanied by a direct indication that you lacked knowledge, I took it to mean that you were soliciting answers to the apparent question. In light of your more recent comment I realize that was not the case. I shall bother you no further with either information or attempts to engage in constructive dialogue. --Xover (talk) 15:49, 10 February 2019 (UTC)

it was a real question. The wow! was a real response of being impressed. AManWithNoPlan (talk) 17:30, 10 February 2019 (UTC)

Removing only when there's a unique identifier (https://github.com/ms609/citation-bot/pull/1323) seems a good way to address everyone's concerns. Nemo 10:20, 11 February 2019 (UTC)

I agree that's a workable way forward. Headbomb {t · c · p · b} 00:13, 14 February 2019 (UTC)

Another bad title

Status: {{fixed}}
Reported by: Redalert2fan (talk) 18:59, 13 February 2019 (UTC)

What happens: title= "Account Suspended" is added
What should happen: In this case probably nothing, but it might be a good idea to add that title to the blacklist.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=GI_(band)&diff=prev&oldid=883172964
We can't proceed until: Feedback from maintainers

I have fixed this specific link with IABot and added the correct title myself. Redalert2fan (talk) 18:59, 13 February 2019 (UTC)

Returning HTTP 200 for what's in effect a deleted website is nasty. Is this a temporary state? Nemo 19:22, 13 February 2019 (UTC)

According to this tweet [39] on 22 July 2018 the company decided to stop their activities. So this is a permanent state. Trying any link to any page from japakomusic.com redirects to http://japakomusic.com/cgi-sys/suspendedpage.cgi . Redalert2fan (talk) 20:56, 14 February 2019 (UTC)

Cosmeticbot issue

Status: {{wontfix}} and thankfully rare
Reported by: EdChem (talk) 14:17, 15 February 2019 (UTC)

What happens: In this diff, the bot reports altering the title of a journal citation but (as far as I can see) changes a double space into a single space. Isn't this the kind of cosmetic change that is supposed to be avoided as a stand-alone edit under WP:COSMETICBOT?
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Tert-Butyloxycarbonyl_protecting_group&curid=4914370&diff=883449806&oldid=881434891
We can't proceed until: Feedback from maintainers

Note, also, in an edit the bot made earlier this month, it altered the same citation but without changing the spacing... so I'm not sure why it made the change as a separate edit a couple of weeks later. EdChem (talk) 14:21, 15 February 2019 (UTC)

The Bot does some mostly cosmetic changes and some very important changes. On rare occasions the changes are all cosmetic. Since this is very rare, we do not track changes and then not make the edit if only cosmetic changes are made. AManWithNoPlan (talk) 14:50, 15 February 2019 (UTC)

The Bot does white space normalization. There a quite a few white space characters that we convert to spaces, and the last step is combining multiple spaces into one so that the wiki text matches the rendering. AManWithNoPlan (talk) 14:57, 15 February 2019 (UTC)

Agreed that this should be avoided on its own when it's just regular spacing if possible, but at the same time, the coding complexity for it might be too much. Normalizing other spacing (like converting invisible non-breaking spaces to regular spaces) has enough advantages to do it on its own though. Headbomb {t · c · p · b} 17:30, 15 February 2019 (UTC)

It is only 9944⁄100% cosmetic since it improves the editors view of the page by making the editable text more in line with what is displayed. Humor intended. AManWithNoPlan (talk) 19:40, 15 February 2019 (UTC)

Bot does not detect bad wiki code

Status: {{fixed}}
Reported by: Nemo 23:02, 16 February 2019 (UTC)

What happens: Special:Diff/883684491
We can't proceed until: Feedback from maintainers

Really hard to see in that diff, but I think this will do it. At the very least, it will crank down the greediness. https://github.com/ms609/citation-bot/pull/1343 AManWithNoPlan (talk)

I do not know if that is fixable. <ref>{{cite book|last=Berridge|first=Vanessa|title=The Princess's Garden: Royal Intrigue and the Untold Story of Kew|year=2015|publisher=Amberley Publishing Limited|url=https://books.google.com/books?id=NhpzCgAAQBAJ&pg=PT21|page=21]</ref> Note that the template does not END! AManWithNoPlan (talk) 23:26, 16 February 2019 (UTC)

Fixed on page. https://en.wikipedia.org/w/index.php?title=Frederick_the_Great&type=revision&diff=883687734&oldid=883684916 AManWithNoPlan (talk) 23:27, 16 February 2019 (UTC)

Ah, right. In some of those regular expressions we could add the newline to the excluded characters (I would hope no DOI includes a newline!) but a broken template call is a broken template call... Nemo 23:32, 16 February 2019 (UTC)