User talk:Citation bot/Archive 4
This is an archive of past discussions with User:Citation bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 | Archive 6 | → | Archive 10 |
Request for examples and test cases
I'm continuing work on author-handling. Having good examples to work from will help me handle tricky and special cases. If you have citations that have been problematic in the past or which you think would make good test cases, please either drop a link to the diff + line number here or copy the to-be-fixed citation to the sandbox I've been using on testwiki: User:Fhocutt (WMF)/Sandbox. Thank you all for the input and suggestions so far, and any resources you can offer here. --Fhocutt (WMF) (talk) 21:42, 15 September 2015 (UTC)
- See above for sample bug reports related to author names:
- author= converted to authors= and author=
- Bot unnecessarily adding last2, last3, last4, ... parameters
- Bot added |first1= when |first= was already present
- Bot found |first9=LH et al. and added |author10=and others and |displayauthors=9
- Bot used "author4=and others" in place of real author #4 on a 7-author reference
- duplicated last name
- Butchered author names
- Deprecated parameter |author-separator= added
- |display-authors=9 no longer necessary for exactly nine authors
- Bot creates CS1 errors when attempting to parse authors= parameter containing many names
- Remove display-authors=etal when inserting all the remaining authors
- Let us know if you need additional feedback and testing. – Jonesey95 (talk) 18:36, 16 September 2015 (UTC)
Thank you! I've added the examples above to my testwiki sandbox.
Please test the tool now. It should not modify authors when author name-related parameters exist, including the new vauthors. However, it should fetch and expand author data when available if there are no existing parameters. You can help by reporting bugs here or at https://phabricator.wikimedia.org/T111891.
Known issues:
- Will still modify editors, regardless of whether editor name parameters are present. Does this need to be fixed for the tool/bot to be used?
It should convert curved quotes to "'" in fetched author data, but I don't have any references to serve as a test case for this. If you do, please leave them here or in my testwiki sandbox. --Fhocutt (WMF) (talk) 01:03, 18 September 2015 (UTC)
This is a good candidate for starting to add automated tests to the bot's codebase. You can help by commenting here or on the Phabricator task with examples of citations with strange formatting and edge cases--spaces in strange places, multiline parameters or values, and similar. The idea here is to have a better way to make sure that the bot continues to parse template parameters and values correctly, even when changes are made to the code. Your help is appreciated. --Fhocutt (WMF) (talk) 03:41, 3 October 2015 (UTC)
- You can start with some of the bug reports on this page:
- Bot should add more than four editors
- Issue & Number
- Comments cause trouble
- Bot 579 added doi-broken-date when doi-inactive-date was already present
- |display-authors=9 no longer necessary for exactly nine authors
- Removes accessdate for no-URL citations inside of nowiki tags
- Edit of talk page
- Hyphens to dashes problem
- Duplicating jstor
- Citation unrendered because of syntax error
- Have fun! – Jonesey95 (talk) 13:37, 3 October 2015 (UTC)
- Thanks! Most of those weren't touched by this part of the code, but I added a couple of them as examples to make sure the part I was modifying didn't change them. On the strange duplicate parameter issue when comments are present, the current version of the bot doesn't do that, at least on testwiki: https://test.wikipedia.org/w/index.php?title=User%3AFhocutt_%28WMF%29%2FCitation_bot_test&type=revision&diff=243602&oldid=243601 . --Fhocutt (WMF) (talk) 22:57, 9 October 2015 (UTC)
{{notabug}}
Doesn't expand cite journal from pmid
- Status
- Not a bug
- Reported by
- RoadTrain (talk) 17:06, 31 May 2016 (UTC)
- Relevant diffs/links
- CYP4F12
- Replication instructions
- Click 'Citations' and the bot won't expand cite journal only having pmid. It works on other pages I used it on.
- We can't proceed until
- Agreement on the best solution
That's probably because these refs were inside {{PBB_Summary}} template. Some user already filed them.--RoadTrain (talk) 22:02, 31 May 2016 (UTC)
- Strike the probably, it's definitely because of that. Next time you can simply remove the opening and closing {{}} of the template, let the bot run, and put them back before saving (I tried exactly that and it works). Imo you can mark the issue as fixed — it would be probably considered a new feature to teach the bot how to work inside templates and they don't accept new feature requests — but it's your choice of course. Ihaveacatonmydesk (talk) 22:16, 31 May 2016 (UTC)
- This is a duplicate of the comments bug. This has nothing to do with the {{PBB_Summary}}. Flagging as {{notabug}}, so that bot will archive this duplicate bug AManWithNoPlan (talk) 15:13, 21 July 2016 (UTC)
Bot unnecessarily adding last2, last3, last4, ... parameters
- Status
- unresolved ongoing bug
- Reported by
- Boghog (talk) 19:48, 25 October 2014 (UTC)
- Type of bug
- Deleterious
- What happens
- When the full author list is stored in
|author=
, the bot adds|last2=
,|last3=
,|last4=
, ... without the corresponding|first2=
,|first3=
,|first4=
, ... - What should happen
- If
|author=
contains the full author list, then the bot should not add|last2=
,|last3=
,|last4=
, ... parameters - Relevant diffs/links
- diff, diff, diff, diff, diff, diff, ...
- We can't proceed until
- Bot operator's feedback on what is feasible
- Requested action from maintainer
- If
|author=
contains a complete author list, do not unnecessarily add|last2=
,|last3=
,|last4=
, ...
Extended content
|
---|
This is essentially the same bug that was previously reported here but it still occurring. Boghog (talk) 19:48, 25 October 2014 (UTC)
I believe that the bug described here is a duplicate of one described above. I have found that e-mailing the bot's maintainer is more effective than posting here at eliciting a response to requests perceived as urgent. In the meantime, the undo link is always available to you, and there are instructions for blocking the bot from specific articles displayed on the bot's user page. – Jonesey95 (talk) 00:01, 28 October 2014 (UTC)
Workaround based on {{vcite2 journal}}As a follow-up to the above discussion, a new {{vcite2 journal}} template with an optional
|
Handling multiple authors
Extended content
| ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@Boghog, Materialscientist, and Ryan Kaldari (WMF): I've been looking into the way the bot handles and expands multiple authors. The main issues seem to come from an odd choice to reassign several parameters (including authors and coauthor(s)) to author2, which I have temporarily fixed. There are also some hiccups when expanding "et al."--for some formattings of author lists, the list of names is not recognized as a list, so it thinks the list is a single author and fetches the rest of the author names because it looks like there are missing parameters. My questions:
|
Flagging as {{notabug}}, since it seems to be resolved now and is no longer doing this. AManWithNoPlan (talk) 15:42, 9 August 2016 (UTC)
Where the heck is the current source code?
The main page lists two repositories, and a google search finds others from other peoples for unknown reasons (We can call those suppositories instead of repositories). Both repositories seem to have been updated in the last year. AManWithNoPlan (talk) 18:23, 12 August 2016 (UTC)
- The source code is at https://github.com/ms609/citation-bot. Kaldari (talk) 00:31, 13 August 2016 (UTC)
{{notabug}}
Bot should add more than four editors and add displayeditors=29 if there are exactly 4 editors
- Status
- new bug / feature request (two related features in one request)
- Reported by
- – Jonesey95 (talk) 23:49, 21 September 2013 (UTC)
- Type of bug
- Improvement
- What happens
- Bot limits editors to four first names and four last names.
- What should happen
- Bot should retrieve all editors and add "displayeditors=29" parameter if there are exactly four editors.
- Replication instructions
- Run the citation expander on a citation that has four editors listed but more than four editors in the original work.
- We can't proceed until
- Bot operator's feedback on what is feasible
- Requested action from maintainer
- Remove four-editor limit from bot code and add "displayeditors=29" to citations with exactly four authors.
The bot should add "displayeditors=29" if there are exactly four editors to avoid the Lua error described for exactly 9 authors above. – Jonesey95 (talk) 23:49, 21 September 2013 (UTC)
Is this still a bug? AManWithNoPlan (talk) 20:43, 6 August 2016 (UTC)
- It is fixed, but in DOItools.php need to after this line:
"editor4", "editor4-author", "editor4-first", "editor4-link",
add these lines
"editor5", "editor5-author", "editor5-first", "editor5-link",
"editor6", "editor6-author", "editor6-first", "editor6-link",
"editor7", "editor7-author", "editor7-first", "editor7-link",
"editor8", "editor8-author", "editor8-first", "editor8-link",
"editor9", "editor9-author", "editor9-first", "editor9-link",
"editor10", "editor10-author", "editor10-first", "editor10-link",
"editor11", "editor11-author", "editor11-first", "editor11-link",
"editor12", "editor12-author", "editor12-first", "editor12-link",
"editor13", "editor13-author", "editor13-first", "editor13-link",
"editor14", "editor14-author", "editor14-first", "editor14-link",
and so on AManWithNoPlan (talk) 03:54, 7 August 2016 (UTC)
{{resolved}} It is fixed for long time AManWithNoPlan (talk) 15:36, 12 October 2016 (UTC)
unnecessary addition of |DUPLICATE_page= parameter
- Status
- new bug
- Reported by
- Trappist the monk (talk) 12:11, 26 June 2016 (UTC)
- Type of bug
- Inconvenience: Humans must clean up after the bot
- What happens
|DUPLICATE_page=
causes Module:Citation/CS1 to display a redundant error message- What should happen
- when a template has both
|page=
and|pages=
, the bot should do nothing - Relevant diffs/links
- Dinophysis norvegica
- We can't proceed until
- Agreement on the best solution
Extended content
|
---|
Without
with
—Trappist the monk (talk) 12:11, 26 June 2016 (UTC)
|
- Re "arbitrarily...": The bot always tags the first duplicated parameter, which is the one that is not displayed in the citation. That preserves the rendered citation while adding the error message. There is no way that the bot could choose the "right" parameter to mark as a duplicate. – Jonesey95 (talk) 14:27, 30 June 2016 (UTC)
- I disagree that this is necessarily a problem. These duplicate parameters can be hard to find in long articles, and the bot's conversion of one of the parameters to "DUPLICATE_parameter" makes the error stand out and helps editors who would otherwise not notice the errors find them and fix them. If there are any developers reading this page, I would like them to work on other bug fixes and feature requests before tackling this one. – Jonesey95 (talk) 14:43, 28 June 2016 (UTC)
{{notabug}} AManWithNoPlan (talk) 15:31, 12 October 2016 (UTC)
Citation_bot puts '# # # comment placeholder # # #'
- Status
- new bug
- Reported by
- Wikid77 (talk) 22:20, 2 August 2016 (UTC)
- Type of bug
- Inconvenience
- What happens
- Citation_bot inserts text "|# # # citation bot : comment placeholder 0 # # #journal =" (as text generated inside the {cite_journal} parameters).
- Relevant diffs/links
- Diff from 07:17, 10 July 2016, in page "State Shinto": https://en.wikipedia.org/w/index.php?title=State_Shinto&diff=729148482&oldid=726231213
- Replication instructions
- (unsure)
- We can't proceed until
- Agreement on the best solution
This Citation_bot is duplicating the parameter "journal=" in a {cite journal} which contains comment-code "<!--xxx-->" as inserting text, "|# # # citation bot : comment placeholder 0 # # #journal =" (as text generated inside the {cite_journal} parameters). This bug had been reported 6 months prior (botching the same page), on 5 February 2016, see: dif5594. -Wikid77 (talk) 22:20, revised 22:36, 2 August 2016 (UTC)
- strange. This bug has come back. AManWithNoPlan (talk) 03:58, 3 August 2016 (UTC)
- Is this not a special case of #Comments cause trouble. --Izno (talk) 13:10, 3 August 2016 (UTC)
- I think this is a different bug, previously reported, unresolved, and archived for some reason. – Jonesey95 (talk) 13:25, 3 August 2016 (UTC)
- it is clearly a related bug. Obviously there is some code that is trying to avoid the comment bug by encoding comments like this and then fails to de-encode them. AManWithNoPlan (talk) 15:10, 3 August 2016 (UTC)
- I think this is a different bug, previously reported, unresolved, and archived for some reason. – Jonesey95 (talk) 13:25, 3 August 2016 (UTC)
- Is this not a special case of #Comments cause trouble. --Izno (talk) 13:10, 3 August 2016 (UTC)
This is all coming from this code AManWithNoPlan (talk) 19:56, 7 August 2016 (UTC):
class Comment extends Item {
const placeholder_text = '# # # Citation bot : comment placeholder %s # # #';
const regexp = '~<!--.*-->~us';
const treat_identical_separately = FALSE;
public function parse_text($text) {
$this->rawtext = $text;
}
public function parsed_text() {
return $this->rawtext;
}
}
Note that the CASE of the above text does not match the bot bug. The code that fails is in objects.php AManWithNoPlan (talk) 20:13, 7 August 2016 (UTC):
protected function replace_object ($objects) {
$i = count($objects);
if ($objects) foreach (array_reverse($objects) as $obj)
$this->text = str_replace(sprintf($obj::placeholder_text, --$i), $obj->parsed_text(), $this->text);
}
Note that the replace is CASE SENSITIVE. What about those situations, like in this bug where stuff was changed by Title Case or what not. Then this fails. The solution is:
protected function replace_object ($objects) {
$i = count($objects);
if ($objects) foreach (array_reverse($objects) as $obj)
$this->text = str_ireplace(sprintf($obj::placeholder_text, --$i), $obj->parsed_text(), $this->text);
}
Also should in public function write() in objects.php to add after this code:
if ($my_page->lastrevid != $this->lastrevid) {
echo "\n ! Possible edit conflict detected. Aborting.";
return FALSE;
}
add this code
if ( stripos($this->text,"Citation bot : comment placeholder") != false ) {
echo "\n ! Comment placeholder left escaped. Aborting.";
return FALSE;
}
This will make sure that we never have the bug again. Of course, the bot will fail to work on such pages, so the real solution is to make sure that every escaping is un-escaped. AManWithNoPlan (talk) 03:45, 7 August 2016 (UTC)
{{Resolved}} AManWithNoPlan (talk) 14:35, 12 October 2016 (UTC)
Google dates are not in a standard format
- Status
- Feature Request
- Reported by
- Keith D (talk) 00:14, 19 August 2016 (UTC)
- Type of bug
- Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
- What happens
- Added invalidly formatted date to cite
- Relevant diffs/links
- https://en.wikipedia.org/w/index.php?title=Umrah&diff=735160730&oldid=733744115
- We can't proceed until
- Agreement on the best solution
- Requested action from maintainer
- Change to not add date entries that trigger an error condition. It should use an en-dash and not dashes to join dates parts such as 2015–16. But in this case it should translate to November 2009 as non consecutive years.
Google has date as "2009.11". The bot changes dots to dashes, which is an improvement over what google gives it. This is vaguely a minor version of the google books data is rubbish bug. AManWithNoPlan (talk) 03:14, 19 August 2016 (UTC)
{{resolved}} It seems to do the right thing now. AManWithNoPlan (talk) 15:33, 12 October 2016 (UTC)
Stop Citation_bot adding DUPLICATE_xxx
This bot should be STOPPED until it can be fixed, as it still adds unneeded "DUPLICATE_title" (etc.) even though there is the "Category:Pages using duplicate arguments in template calls" (in cites, infoboxes), and still treats lone parameters as if duplicate when cite contains an HTML comment "<!-- -->" with no duplicate keywords. Meanwhile, the flooding of cite categories hides other pages with real overlooked cite errors, such as vandalism to cite parameters, tracked in category:
- "Category:Pages with citations using unsupported parameters" (where "DUPLICATE_xx" appear)
Because of the flooding of that unsupported-parameter category by Citation_bot, it took 5 days to fix a vandalized cite page (among 120 listed), which could encourage vandals to hack more pages which can remain botched for 5 days. A flooded category often can prolong errors for months/years in semi-major pages (re: "The Band Perry" listed down under "T"), because cite errors are mainly fixed by wp:wikignomes clearing all pages from a cite-error category, where typical editors almost never fix 90% of red-error cite problems. Stop Citation_bot. -Wikid77 (talk) 14:58, 30 September 2016 (UTC)
- The fixed code is on github (I know, since I wrote the fix). Some one with power needs to upload it to wikipedia. AManWithNoPlan (talk) 15:05, 30 September 2016 (UTC)
- Wikid77, there is no "flooding." 100 articles is not flooding the "unsupported parameter" category, which currently contains just 18 pages. I fix articles in that category most days, as do other editors.
- The articles with duplicate parameters have errors. They are just being moved from one error category (that is widely ignored, and to which your vandalism example applies much more accurately) to another error category that is regularly cleaned out, and flagging the errors in red makes them easier to find and fix by searching quickly for the string "duplicate". Also, there are only about 1,800 articles left in the duplicate parameter category, so even if they were all moved into the unsupported parameter category, it wouldn't take that long to fix them. – Jonesey95 (talk) 15:47, 30 September 2016 (UTC)
Thanks, Jonesey95, for helping to fix those hundreds of pages in the unsupported category. Because it then contained only 19 pages, I was able to fix the numerous recent hack edits to popular U.S. TV star "Estelle Getty" within 3 hours, after User:Citation_bot had recently linked over 250 pages into that category:
- "Category:Pages with citations using unsupported parameters" (where "DUPLICATE_xx" appear)
For many editors, fixing those hundreds of pages for parameters "DUPLICATE_xxx" is very tedious because the linked url+titles or dates or publisher must be verified by downloading source pages or PDF documents or googling printed books and scanning for title/date markings to ensure the duplicate is not the original, or in some cases both dates or titles must be fixed, unlike a simple parameter spelling error, such as "tittle=" as "title=" or "frist2=" as "first2=" etc. Hence, the generated cite errors for DUPLICATE_xx are often much harder to fix (and users have complained), plus Citation_bot leaves other duplicate parameters in the same pages and does not solve all the duplication problems, just obscures the unsupported-parameters category by 6x as many pages with complex errors often 10-times harder to fix, as effectively flooding the category by a 60x-heavier workload (when fixed properly). Meanwhile, after fixing several hundred duplicate parameters, I have found almost no vandalism (or other parameter errors) in pages with duplicates, but 1-in-10 misspelled, unsupported parameters seem to be caused by severe hack edits affecting other sections of a page. The largest amount of hacked cites are in unsupported parameters, not in duplicate parameters often caused by a 2nd date in ISO format, a 2nd (sub)title, an alternate URL, a 2nd publisher agency, or a nearby valid author/date also called "title". Citation_bot is obscuring simple fixes by escalating complex duplication issues into the wrong, smaller category. -Wikid77 (talk) 07:25, 4 October 2016 (UTC)
- Your rationale makes no sense. The error exists regardless; either we can surface it easily for editors, or not. I agree with Jonesey in this regard. --Izno (talk) 11:10, 4 October 2016 (UTC)
- you are partially incorrect. Until the new source code on github is loaded to Wikipedia till servers the bot will continue to wrongly add DUPLICATE. Those are the real problem. AManWithNoPlan (talk) 13:36, 4 October 2016 (UTC)
- @AManWithNoPlan: Jonesey and I are saying it's a feature and not a bug. --Izno (talk) 13:42, 4 October 2016 (UTC)
- I agree that it is a great feature, but sometimes it adds duplicate when there is not one because of comments AManWithNoPlan (talk) 14:10, 4 October 2016 (UTC)
- @AManWithNoPlan: Jonesey and I are saying it's a feature and not a bug. --Izno (talk) 13:42, 4 October 2016 (UTC)
- you are partially incorrect. Until the new source code on github is loaded to Wikipedia till servers the bot will continue to wrongly add DUPLICATE. Those are the real problem. AManWithNoPlan (talk) 13:36, 4 October 2016 (UTC)
{{notabug}} AManWithNoPlan (talk) 14:35, 12 October 2016 (UTC)
Bot should add more than four editors and add displayeditors=29 if there are exactly 4 editors
- Status
- new bug / feature request (two related features in one request)
- Reported by
- – Jonesey95 (talk) 23:49, 21 September 2013 (UTC)
- Type of bug
- Improvement
- What happens
- Bot limits editors to four first names and four last names.
- What should happen
- Bot should retrieve all editors and add "displayeditors=29" parameter if there are exactly four editors.
- Replication instructions
- Run the citation expander on a citation that has four editors listed but more than four editors in the original work.
- We can't proceed until
- Bot operator's feedback on what is feasible
- Requested action from maintainer
- Remove four-editor limit from bot code and add "displayeditors=29" to citations with exactly four authors.
The bot should add "displayeditors=29" if there are exactly four editors to avoid the Lua error described for exactly 9 authors above. – Jonesey95 (talk) 23:49, 21 September 2013 (UTC)
Is this still a bug? AManWithNoPlan (talk) 20:43, 6 August 2016 (UTC)
- It is fixed, but in DOItools.php need to after this line:
"editor4", "editor4-author", "editor4-first", "editor4-link",
add these lines
"editor5", "editor5-author", "editor5-first", "editor5-link",
"editor6", "editor6-author", "editor6-first", "editor6-link",
"editor7", "editor7-author", "editor7-first", "editor7-link",
"editor8", "editor8-author", "editor8-first", "editor8-link",
"editor9", "editor9-author", "editor9-first", "editor9-link",
"editor10", "editor10-author", "editor10-first", "editor10-link",
"editor11", "editor11-author", "editor11-first", "editor11-link",
"editor12", "editor12-author", "editor12-first", "editor12-link",
"editor13", "editor13-author", "editor13-first", "editor13-link",
"editor14", "editor14-author", "editor14-first", "editor14-link",
and so on AManWithNoPlan (talk) 03:54, 7 August 2016 (UTC)
{{resolved}} It is fixed for long time AManWithNoPlan (talk) 15:36, 12 October 2016 (UTC)
unnecessary addition of |DUPLICATE_page= parameter
- Status
- new bug
- Reported by
- Trappist the monk (talk) 12:11, 26 June 2016 (UTC)
- Type of bug
- Inconvenience: Humans must clean up after the bot
- What happens
|DUPLICATE_page=
causes Module:Citation/CS1 to display a redundant error message- What should happen
- when a template has both
|page=
and|pages=
, the bot should do nothing - Relevant diffs/links
- Dinophysis norvegica
- We can't proceed until
- Agreement on the best solution
Extended content
|
---|
Without
with
—Trappist the monk (talk) 12:11, 26 June 2016 (UTC)
|
- Re "arbitrarily...": The bot always tags the first duplicated parameter, which is the one that is not displayed in the citation. That preserves the rendered citation while adding the error message. There is no way that the bot could choose the "right" parameter to mark as a duplicate. – Jonesey95 (talk) 14:27, 30 June 2016 (UTC)
- I disagree that this is necessarily a problem. These duplicate parameters can be hard to find in long articles, and the bot's conversion of one of the parameters to "DUPLICATE_parameter" makes the error stand out and helps editors who would otherwise not notice the errors find them and fix them. If there are any developers reading this page, I would like them to work on other bug fixes and feature requests before tackling this one. – Jonesey95 (talk) 14:43, 28 June 2016 (UTC)
{{notabug}} AManWithNoPlan (talk) 15:31, 12 October 2016 (UTC)
Citation_bot puts '# # # comment placeholder # # #'
- Status
- new bug
- Reported by
- Wikid77 (talk) 22:20, 2 August 2016 (UTC)
- Type of bug
- Inconvenience
- What happens
- Citation_bot inserts text "|# # # citation bot : comment placeholder 0 # # #journal =" (as text generated inside the {cite_journal} parameters).
- Relevant diffs/links
- Diff from 07:17, 10 July 2016, in page "State Shinto": https://en.wikipedia.org/w/index.php?title=State_Shinto&diff=729148482&oldid=726231213
- Replication instructions
- (unsure)
- We can't proceed until
- Agreement on the best solution
This Citation_bot is duplicating the parameter "journal=" in a {cite journal} which contains comment-code "<!--xxx-->" as inserting text, "|# # # citation bot : comment placeholder 0 # # #journal =" (as text generated inside the {cite_journal} parameters). This bug had been reported 6 months prior (botching the same page), on 5 February 2016, see: dif5594. -Wikid77 (talk) 22:20, revised 22:36, 2 August 2016 (UTC)
- strange. This bug has come back. AManWithNoPlan (talk) 03:58, 3 August 2016 (UTC)
- Is this not a special case of #Comments cause trouble. --Izno (talk) 13:10, 3 August 2016 (UTC)
- I think this is a different bug, previously reported, unresolved, and archived for some reason. – Jonesey95 (talk) 13:25, 3 August 2016 (UTC)
- it is clearly a related bug. Obviously there is some code that is trying to avoid the comment bug by encoding comments like this and then fails to de-encode them. AManWithNoPlan (talk) 15:10, 3 August 2016 (UTC)
- I think this is a different bug, previously reported, unresolved, and archived for some reason. – Jonesey95 (talk) 13:25, 3 August 2016 (UTC)
- Is this not a special case of #Comments cause trouble. --Izno (talk) 13:10, 3 August 2016 (UTC)
This is all coming from this code AManWithNoPlan (talk) 19:56, 7 August 2016 (UTC):
class Comment extends Item {
const placeholder_text = '# # # Citation bot : comment placeholder %s # # #';
const regexp = '~<!--.*-->~us';
const treat_identical_separately = FALSE;
public function parse_text($text) {
$this->rawtext = $text;
}
public function parsed_text() {
return $this->rawtext;
}
}
Note that the CASE of the above text does not match the bot bug. The code that fails is in objects.php AManWithNoPlan (talk) 20:13, 7 August 2016 (UTC):
protected function replace_object ($objects) {
$i = count($objects);
if ($objects) foreach (array_reverse($objects) as $obj)
$this->text = str_replace(sprintf($obj::placeholder_text, --$i), $obj->parsed_text(), $this->text);
}
Note that the replace is CASE SENSITIVE. What about those situations, like in this bug where stuff was changed by Title Case or what not. Then this fails. The solution is:
protected function replace_object ($objects) {
$i = count($objects);
if ($objects) foreach (array_reverse($objects) as $obj)
$this->text = str_ireplace(sprintf($obj::placeholder_text, --$i), $obj->parsed_text(), $this->text);
}
Also should in public function write() in objects.php to add after this code:
if ($my_page->lastrevid != $this->lastrevid) {
echo "\n ! Possible edit conflict detected. Aborting.";
return FALSE;
}
add this code
if ( stripos($this->text,"Citation bot : comment placeholder") != false ) {
echo "\n ! Comment placeholder left escaped. Aborting.";
return FALSE;
}
This will make sure that we never have the bug again. Of course, the bot will fail to work on such pages, so the real solution is to make sure that every escaping is un-escaped. AManWithNoPlan (talk) 03:45, 7 August 2016 (UTC)
{{Resolved}} AManWithNoPlan (talk) 14:35, 12 October 2016 (UTC)
Google dates are not in a standard format
- Status
- Feature Request
- Reported by
- Keith D (talk) 00:14, 19 August 2016 (UTC)
- Type of bug
- Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
- What happens
- Added invalidly formatted date to cite
- Relevant diffs/links
- https://en.wikipedia.org/w/index.php?title=Umrah&diff=735160730&oldid=733744115
- We can't proceed until
- Agreement on the best solution
- Requested action from maintainer
- Change to not add date entries that trigger an error condition. It should use an en-dash and not dashes to join dates parts such as 2015–16. But in this case it should translate to November 2009 as non consecutive years.
Google has date as "2009.11". The bot changes dots to dashes, which is an improvement over what google gives it. This is vaguely a minor version of the google books data is rubbish bug. AManWithNoPlan (talk) 03:14, 19 August 2016 (UTC)
{{resolved}} It seems to do the right thing now. AManWithNoPlan (talk) 15:33, 12 October 2016 (UTC)
Stop Citation_bot adding DUPLICATE_xxx
This bot should be STOPPED until it can be fixed, as it still adds unneeded "DUPLICATE_title" (etc.) even though there is the "Category:Pages using duplicate arguments in template calls" (in cites, infoboxes), and still treats lone parameters as if duplicate when cite contains an HTML comment "<!-- -->" with no duplicate keywords. Meanwhile, the flooding of cite categories hides other pages with real overlooked cite errors, such as vandalism to cite parameters, tracked in category:
- "Category:Pages with citations using unsupported parameters" (where "DUPLICATE_xx" appear)
Because of the flooding of that unsupported-parameter category by Citation_bot, it took 5 days to fix a vandalized cite page (among 120 listed), which could encourage vandals to hack more pages which can remain botched for 5 days. A flooded category often can prolong errors for months/years in semi-major pages (re: "The Band Perry" listed down under "T"), because cite errors are mainly fixed by wp:wikignomes clearing all pages from a cite-error category, where typical editors almost never fix 90% of red-error cite problems. Stop Citation_bot. -Wikid77 (talk) 14:58, 30 September 2016 (UTC)
- The fixed code is on github (I know, since I wrote the fix). Some one with power needs to upload it to wikipedia. AManWithNoPlan (talk) 15:05, 30 September 2016 (UTC)
- Wikid77, there is no "flooding." 100 articles is not flooding the "unsupported parameter" category, which currently contains just 18 pages. I fix articles in that category most days, as do other editors.
- The articles with duplicate parameters have errors. They are just being moved from one error category (that is widely ignored, and to which your vandalism example applies much more accurately) to another error category that is regularly cleaned out, and flagging the errors in red makes them easier to find and fix by searching quickly for the string "duplicate". Also, there are only about 1,800 articles left in the duplicate parameter category, so even if they were all moved into the unsupported parameter category, it wouldn't take that long to fix them. – Jonesey95 (talk) 15:47, 30 September 2016 (UTC)
Thanks, Jonesey95, for helping to fix those hundreds of pages in the unsupported category. Because it then contained only 19 pages, I was able to fix the numerous recent hack edits to popular U.S. TV star "Estelle Getty" within 3 hours, after User:Citation_bot had recently linked over 250 pages into that category:
- "Category:Pages with citations using unsupported parameters" (where "DUPLICATE_xx" appear)
For many editors, fixing those hundreds of pages for parameters "DUPLICATE_xxx" is very tedious because the linked url+titles or dates or publisher must be verified by downloading source pages or PDF documents or googling printed books and scanning for title/date markings to ensure the duplicate is not the original, or in some cases both dates or titles must be fixed, unlike a simple parameter spelling error, such as "tittle=" as "title=" or "frist2=" as "first2=" etc. Hence, the generated cite errors for DUPLICATE_xx are often much harder to fix (and users have complained), plus Citation_bot leaves other duplicate parameters in the same pages and does not solve all the duplication problems, just obscures the unsupported-parameters category by 6x as many pages with complex errors often 10-times harder to fix, as effectively flooding the category by a 60x-heavier workload (when fixed properly). Meanwhile, after fixing several hundred duplicate parameters, I have found almost no vandalism (or other parameter errors) in pages with duplicates, but 1-in-10 misspelled, unsupported parameters seem to be caused by severe hack edits affecting other sections of a page. The largest amount of hacked cites are in unsupported parameters, not in duplicate parameters often caused by a 2nd date in ISO format, a 2nd (sub)title, an alternate URL, a 2nd publisher agency, or a nearby valid author/date also called "title". Citation_bot is obscuring simple fixes by escalating complex duplication issues into the wrong, smaller category. -Wikid77 (talk) 07:25, 4 October 2016 (UTC)
- Your rationale makes no sense. The error exists regardless; either we can surface it easily for editors, or not. I agree with Jonesey in this regard. --Izno (talk) 11:10, 4 October 2016 (UTC)
- you are partially incorrect. Until the new source code on github is loaded to Wikipedia till servers the bot will continue to wrongly add DUPLICATE. Those are the real problem. AManWithNoPlan (talk) 13:36, 4 October 2016 (UTC)
- @AManWithNoPlan: Jonesey and I are saying it's a feature and not a bug. --Izno (talk) 13:42, 4 October 2016 (UTC)
- I agree that it is a great feature, but sometimes it adds duplicate when there is not one because of comments AManWithNoPlan (talk) 14:10, 4 October 2016 (UTC)
- @AManWithNoPlan: Jonesey and I are saying it's a feature and not a bug. --Izno (talk) 13:42, 4 October 2016 (UTC)
- you are partially incorrect. Until the new source code on github is loaded to Wikipedia till servers the bot will continue to wrongly add DUPLICATE. Those are the real problem. AManWithNoPlan (talk) 13:36, 4 October 2016 (UTC)
{{notabug}} AManWithNoPlan (talk) 14:35, 12 October 2016 (UTC)