Module talk:Citation/CS1/Archive 3
This is an archive of past discussions about Module:Citation. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 | → | Archive 10 |
Issues in cite conference and corresponding citation
The {{citation}} template
- {{citation | last1 = Alon | first1 = N. | author1-link = Noga Alon | last2 = Caro | first2 = Y. | contribution = On the number of subgraphs of prescribed type of planar graphs with a given number of vertices | editor1-last = Rosenfeld | editor1-first = M. | editor2-last = Zaks | editor2-first = J. | isbn = 978-0-444-86571-7 | mr = 0791009 | pages = 25–36 | publisher = Elsevier | series = Annals of Discrete Mathematics 20, North-Holland Mathematical Studies 87 | title = Convexity and Graph Theory: proceedings of the Conference on Convexity and Graph Theory, Israel, March 1981 | year = 1984}}
produces
- Alon, N.; Caro, Y. (1984), "On the number of subgraphs of prescribed type of planar graphs with a given number of vertices", in Rosenfeld, M.; Zaks, J. (eds.), Convexity and Graph Theory: proceedings of the Conference on Convexity and Graph Theory, Israel, March 1981, Annals of Discrete Mathematics 20, North-Holland Mathematical Studies 87, Elsevier, pp. 25–36, ISBN 978-0-444-86571-7, MR 0791009
When I try {{citation/lua}} instead, as
- {{citation/lua | last1 = Alon | first1 = N. | author1-link = Noga Alon | last2 = Caro | first2 = Y. | contribution = On the number of subgraphs of prescribed type of planar graphs with a given number of vertices | editor1-last = Rosenfeld | editor1-first = M. | editor2-last = Zaks | editor2-first = J. | isbn = 978-0-444-86571-7 | mr = 0791009 | pages = 25–36 | publisher = Elsevier | series = Annals of Discrete Mathematics 20, North-Holland Mathematical Studies 87 | title = Convexity and Graph Theory: proceedings of the Conference on Convexity and Graph Theory, Israel, March 1981 | year = 1984}}
I see two severe problems: the |contribution=
(the title of the paper) is entirely missing, and the semicolon separating the editor name is missing. As more minor issues, although you have fixed many of the dots already (thanks!) there is still one between the page numbers and ISBN, and the commas after the publisher and between the ISBN and MR are missing. By the way, the {{cite conference}} version of this citation,
- Alon, N.; Caro, Y. (1984). "On the number of subgraphs of prescribed type of planar graphs with a given number of vertices". In Rosenfeld, M.; Zaks, J. (eds.). Convexity and Graph Theory: proceedings of the Conference on Convexity and Graph Theory, Israel, March 1981. Annals of Discrete Mathematics 20, North-Holland Mathematical Studies 87. Elsevier. pp. 25–36. ISBN 978-0-444-86571-7. MR 0791009.
{{cite conference}}
: Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help)
has long had a problem where the period after the editor initial is doubled. There doesn't seem to currently be a {{cite conference/lua}} template, but when that one gets done I hope that it can fix this old bug. —David Eppstein (talk) 00:36, 21 February 2013 (UTC)
- Lua can quickly avoid double-dots: I plan to fix the double ".." problems. The Lua language has a quick substring function as "string.sub(name,-1,-1)" to extract the final character from a string, check against dot "." and skip adding another dot. This will also be available to fix the original cite templates. -Wikid77 00:48, 21 February 2013 (UTC)
Multi-phase transition to Lua cites
Bumping thread for 30 days. Allen3 talk 10:43, 24 March 2013 (UTC)
With all the corrections people have already submitted, some Lua cites are very close to being released. I suggest a multi-phase transition in different weeks for the 23 cite templates, to focus on 5 major cite templates (for web, book, news, journal & {citation} ) with one minor cite template, {cite_encyclopedia}, to start as a small pre-release:
- test general parameters as "text-book" cases
- test several example articles (India, United States, Canada, Germany, Japan, etc.)
- transition Template:Cite_encyclopedia to Lua, as a small start (only 62,000 articles)
- transition Template:Citation to Lua (wide use, but only 93,000 pages)
- transition Template:Cite_news to Lua (mostly pop-culture, 385,000 articles)
- transition Template:Cite_journal to Lua (complex science/document parameters, 275,000 pages)
- transition Template:Cite_web to Lua (majority of cites, 1.3 million pages)
- transition Template:Cite_book to Lua (2nd largest, 460,000 pages)
- transition Template:Cite_video to Lua (minor, in 9,000 pages)
- transition other cite templates to use Lua
At any point, the transition phases can be reverted, or delayed, to handle whatever issues are found. The multi-phase plan is a balance between conservative delays and wide-scale impact. The most-used template, {cite_web} in 1.3 million pages, will be released in the middle phases, after tests which have smaller impacts. Meanwhile, because {cite_news} is used mostly for pop-culture articles, the parameters are often simple, while many users will be editing articles which use {cite_news} and report any unusual cite formats. Also, because {cite_news} is a major cite template, it will be a good "stress test" to having Lua used in several hundred thousand articles (385,000), before transitioning {cite_web} as used in 1.3 million pages. -Wikid77 (talk) 16:37, 22 February 2013 (UTC)
- Looks like a good list FYI: {{cite video}} is now {{cite AV media}}, and supports all the features of {{cite sign}}. --— Gadget850 (Ed) talk 16:58, 22 February 2013 (UTC)
- Yes a good list, it might be an idea to try an inform the wider community of whats happening at the village pump or signpost.--Salix (talk): 17:17, 22 February 2013 (UTC)
- Transition plan announced at PUMPTECH: I have worded an announcement to emphasize the benefits of using Lua-based CS1 cites, even if not "perfect" yet:
- The Lua-cite advantages will outweigh the risks of slight format differences, which can be fixed later. -Wikid77 (talk) 19:31, 22 February 2013 (UTC)
- All seems to be progressing steadily, some very good work being done here. One question, as each problem is found and corrected, are we updating a page of testcases to show that the Lua output is equivalent (or better where agreed) than the current citation core? I would be interested to add a selection of sample citations to it. Thanks Rjwilmsi 22:19, 22 February 2013 (UTC)
- Preparing long-term testcases: We did not have a naming structure to compare side-by-side testcases, but creating "/old" versions of each cite template will allow long-term comparisons. These testcases could quickly become a nightmare, as a "cottage industry" of thousands of parameter combinations, so I have waited until now. See more below: #CS1 comparison testcases. -Wikid77 (talk) 01:03, 23 February 2013 (UTC)
- Release delayed 3 weeks until 17 March 2013: Due to several trivial problems, the release of {cite_encyclopedia/lua} was delayed for over 3 weeks. Perhaps most debilitating, the excessive limitations with the Lua timeout, as a mere 10 seconds, compared to 60-second allowance for markup-based parameters, made Lua unusable for cite templates in major articles, due to the risk of entire cites stored as "Script error" when the file servers were extremely slow. In rare cases, some Lua functions can slow to over 65% slower, where a 7-second Lua run could stretch beyond 11 seconds. To patch the severe Lua time limitation (with a "band-aid"), the Lua timing was changed to omit time elapsed when parsing the parameter templates, to enable formatting of hundreds of citations; however, the 10-second timeout still limits Lua to only partial analysis of large article pages. There was also a complete inability to use Lua templates when generating PDF output. Other trivial problems involved the shifted position of multiple parameters, again providing evidence for the need to just hand-write citation footnotes, where the Lua-based cites have become yet the next level of "much ado about nothing" in excessive formatting of footnotes. However, in related tests, the 200-variable-name limit in Lua functions was confirmed, so there is another limit to rambling additions of parameter names, where they cannot be given separate variable names inside a single Lua function, unless limited to within 200 possible names. -Wikid77 (talk) 04:32, 19 March 2013 (UTC)
- Release of {cite_journal} as Lua on 23 March 2013: After numerous discussions about the position of the "editor=" parameter, which was left as "In Editor" for now, adjusting some minor options, and creation of the related testcases page, {cite_journal} was transitioned to use Lua on 23 March 2013 at 1am. After several hours, about 114,000 more articles were auto-delinked from the markup-based helper Template:Citation/core. A specific article, "Lyme disease" was timed to edit-preview within 9 seconds (formerly 22+ seconds) using 189 {cite_journal} and 6 {cite_news}, with similar reformat times for other major medical articles. -Wikid77 (talk) 16:36, 24 March 2013 (UTC)
CS1 comparison testcases
We can have several pages of testcases for wp:CS1 cites, including:
- wp:CS1/test_parameters - list of cites to show each parameter
- wp:CS1/test_basics - list of cites to show basic, typical examples
- wp:CS1/test_problems - list of cites which test known problems/fixes
The massive complexity of the 430+ parameters in the wp:CS1 cite templates requires a large set of testcases, to provide some assurance of handling the astronomically huge set of endless combinations of rampant variations of parameter names. The testcases will provide a basic "sanity test" of the overall functionality, because the testing of all possible parameter groups would exceed the age of the universe, several times over. This is a typical case of combinatorial explosion: "the cite templates can be rewritten within 1 year with Lua script, but would require 90 billion years to completely test". The possible count of testcases starts with 430 factorial (430! ~= 2.2946e+947), or zillions of parameter combinations, where setting "first=" to blank might erase "author=x".
Comparing the related templates: For each new Lua-based template named with "/lua" then the original markup-based template will have a permanent copy as "/old" to compare the side-by-side results, even after the Lua versions are installed with the current template names. For example:
- Template:Cite_web - {cite_web} live, whether markup or switched to Lua-based
- Template:Cite_web/lua - the Lua-based version of {cite_web}
- Template:Cite_web/old - the markup-based version of {cite_web} as the old copy
- Template:Cite_book/lua - the Lua-based version of {cite_book}
- Template:Cite_book/old - the markup-based version of {cite_book} as the old copy
- Template:Cite_journal/lua - the Lua-based version of {cite_journal}
- Template:Cite_journal/old - the markup-based version of {cite_journal} as copied
Again, the focus must be on confirming just the general parameters, with occasional variant spellings; otherwise, there would quickly be hundreds of thousands of parameter combinations. However, without some form of sanity check, then the complexity of the CS1 cites would become impossible to handle. -Wikid77 (talk) 01:03/06:21, 23 February 2013 (UTC)
Auto-detecting singular page
The Lua Module:Citation/CS1 checks "pages=" for a singular page number. The equivalent quick, markup-based detection of singular page in "pages=" can be seen in Template:Cite_web/sandbox4, which includes the quick logic, "{{#ifexpr:{{{pages|528-32}}}00000 < 1 |pp.|p.}}" to treat a hyphen range as "pp." after testing for #iferror with alpha/dash pages "45-A" or such. The full rapid algorithm begins with "|At =" in /sandbox4, which has been tested to spot singular pages. Each {cite_*} can set the page number. If there are problems, we can discuss alterations here, and also update the Lua script to use the similar altered logic.
- {{cite encyclopedia/lua |title=Test n-dash |pages=6– |date=8 March 2013}}
- {{cite encyclopedia/lua |title=Test n only |pages=655 |date=8 March 2013}}
- {{cite encyclopedia/lua |title=Test n-m|pages=6–55|date=8 March 2013}}
I have timed the quick page-detection #iferror/#ifexpr
markup as running over 500 per second, which only executes with "pages=xx" as non-empty, and so the expected extra runtime would be less than 1/2500 second per cite, where only 20% of cites use "pages=xx" (based on a sample of 3,000 articles). Where used, 50% (half) of those cites have pages=nn as singular, so among 50,000 then 25,000 articles would auto-detect singular pages=nn. The other options, "page=" or "at=" are not affected. To force plural, use "at=pp.4" or such. -Wikid77 (talk) 23:15/11:36, 8 March 2013 (UTC)
Volume bolding
Wikitext | {{cite encyclopedia
|
---|---|
Live | LAST1, FIRST1; LAST2, FIRST2 (YEAR). "TITLE". In EDITOR (ed.). ENCYCLOPEDIA. Vol. VOLUME (EDITION ed.). LOCATION: PUBLISHER. pp. PAGES. ID. Retrieved 2006-07-02. {{cite encyclopedia}} : |volume= has extra text (help); Check date values in: |year= (help)CS1 maint: numeric names: authors list (link) CS1 maint: year (link)
|
Sandbox | LAST1, FIRST1; LAST2, FIRST2 (YEAR). "TITLE". In EDITOR (ed.). ENCYCLOPEDIA. Vol. VOLUME (EDITION ed.). LOCATION: PUBLISHER. pp. PAGES. ID. Retrieved 2006-07-02. {{cite encyclopedia}} : |volume= has extra text (help); Check date values in: |year= (help)CS1 maint: numeric names: authors list (link) CS1 maint: year (link)
|
No bolding on the volume? |
Is it intentional to remove the bolding on the volume of an encyclopedia? Dragons flight (talk) 03:41, 12 March 2013 (UTC)
- Looks like only volume numbers four characters or less are bolded. --— Gadget850 (Ed) talk 15:26, 12 March 2013 (UTC)
Wikitext | {{cite encyclopedia
|
---|---|
Live | LAST1, FIRST1; LAST2, FIRST2 (YEAR). "TITLE". In EDITOR (ed.). ENCYCLOPEDIA. Vol. 1234 (EDITION ed.). LOCATION: PUBLISHER. pp. PAGES. ID. Retrieved 2006-07-02. {{cite encyclopedia}} : Check date values in: |year= (help)CS1 maint: numeric names: authors list (link) CS1 maint: year (link)
|
Sandbox | LAST1, FIRST1; LAST2, FIRST2 (YEAR). "TITLE". In EDITOR (ed.). ENCYCLOPEDIA. Vol. 1234 (EDITION ed.). LOCATION: PUBLISHER. pp. PAGES. ID. Retrieved 2006-07-02. {{cite encyclopedia}} : Check date values in: |year= (help)CS1 maint: numeric names: authors list (link) CS1 maint: year (link)
|
- Intentional non-bolded longer volume names: For years, there had been suggestions to unbold the volume name when using a volume-name title, and so beyond 4-character length, it inserts a dot and omits the prior bolding, "Volume III: Garrish to Nominal" because the bolded name had appeared too garrish, too excessive, in many current articles. In fact, the unbolded volume was requested, again, on 21 February 2013, in the above thread "#series/volume/publisher order". For the markup-based templates, a rapid {padleft} can be used to detect and unbold beyond 5-character volume names. -Wikid77 11:19, 13 March 2013 (UTC)
- How was the 4-character limit derived? I see your objective, but I don't think this gives the right answer for
|volume=XXVIII
or|volume=55–56
for journal cites. Rjwilmsi 15:26, 13 March 2013 (UTC)
- How was the 4-character limit derived? I see your objective, but I don't think this gives the right answer for
Editor problem
Wikitext | {{cite book
|
---|---|
Live | Playfair, Major-General I.S.O.; Stitt, Commander G.M.S; Molony, Brigadier C.J.C.; Toomer, Air Vice-Marshal S.E. (2004) [1st. pub. HMSO:1954]. Butler, J.R.M (ed.). Mediterranean and Middle East Volume I: The Early Successes Against Italy (to May 1941). History of the Second World War, United Kingdom Military Series. Uckfield, UK: Naval & Military Press. ISBN 1-845740-65-3. {{cite book}} : Unknown parameter |lastauthoramp= ignored (|name-list-style= suggested) (help)
|
Sandbox | Playfair, Major-General I.S.O.; Stitt, Commander G.M.S; Molony, Brigadier C.J.C.; Toomer, Air Vice-Marshal S.E. (2004) [1st. pub. HMSO:1954]. Butler, J.R.M (ed.). Mediterranean and Middle East Volume I: The Early Successes Against Italy (to May 1941). History of the Second World War, United Kingdom Military Series. Uckfield, UK: Naval & Military Press. ISBN 1-845740-65-3. {{cite book}} : Unknown parameter |lastauthoramp= ignored (|name-list-style= suggested) (help)
|
Incorrect labeling on the editor |
The Lua version replaces the "X ed." editor marker with a nonsensical "In X" expression. Dragons flight (talk) 03:49, 12 March 2013 (UTC)
- Document collections use "In Editor" format: Some users have preferred the format as "In Editor" rather than "Editor, ed." and so that is why it has been displayed. Because wp:CS1 style is a hodge-podge of cite styles, the Lua module was originally written to use a few styles for all citations, rather than mimic each of the prior 23 {cite_*} fork templates. -Wikid77 11:19, 13 March 2013 (UTC)
- PS. There is also a change to the author list, where the old version had an ampersand. Dragons flight (talk) 03:54, 12 March 2013 (UTC)
Changes in page / date handling for cite news
Wikitext | {{cite news
|
---|---|
Live | "Auction Record for an Original 'Alice'". The New York Times. 11 December 1998. p. B30. |
Sandbox | "Auction Record for an Original 'Alice'". The New York Times. 11 December 1998. p. B30. |
This is a case where the new version is different, but not necessarily wrong (i.e. both approaches seem basically reasonable). The label on the page number and the placement of the publication date appear to have changed in the handling of cite news. I assume this was probably intentional, since it seems like too large a change to be accidental. However, I tried skimming this page and didn't find any discussion of this, so I thought I would highlight it. Dragons flight (talk) 15:47, 12 March 2013 (UTC)
- That's a change to cite news, but it doesn't seem to discuss the page and date rearranging. The example given doesn't use the agency or location fields. Dragons flight (talk) 15:55, 12 March 2013 (UTC)
- My bad. Not sure how I connected this. --— Gadget850 (Ed) talk 17:02, 12 March 2013 (UTC)
- That's a change to cite news, but it doesn't seem to discuss the page and date rearranging. The example given doesn't use the agency or location fields. Dragons flight (talk) 15:55, 12 March 2013 (UTC)
- Reset page format as "p." for Cite_news: There had been an overuse of the colon ":" page format, and so I changed when config.CitationClass is "news" to use the p./pp. page-number format. -Wikid77 (talk) 10:13, 13 March 2013 (UTC)
Test cases
Are we, through the current process, developing a (near-) comprehensive suite of test cases? Should they be captured and documented for future use? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:35, 12 March 2013 (UTC)
- Expanding representative sets of testcases: The goal is to expand the various pages, of numerous testcases, as issues are noted in importance, such as testcase essay "wp:CS1/test_problems". See above: "#CS1 comparison testcases". The tactic has been to view the pages during a run-preview when editing the Lua Module:Citation/CS1, so the testcases need to be kept limited, at first, so the pages are not too large to view during a run-preview. Because there are potentially unlimited billions of billions of parameter combinations, I expect the testcases to be expanded for years. The complete testing of parameters would exceed the age of the universe, many times over as a combinatorial explosion of parameter choices. -Wikid77 10:13, 13 March 2013 (UTC)
cite web error checking
Wikitext | {{cite web
|
---|---|
Live | {{cite web}} : Empty citation (help)
|
Sandbox | {{cite web}} : Empty citation (help)
|
Wikitext | {{cite web
|
---|---|
Live | http://www.foo.com/. {{cite web}} : Missing or empty |title= (help)
|
Sandbox | http://www.foo.com/. {{cite web}} : Missing or empty |title= (help)
|
Are we intentionally dropping the error condition? Dragons flight (talk) 18:55, 12 March 2013 (UTC)
Parameter "contribution=" acts as a title: The current Lua Module:Citation/CS1 has allowed omission of title, and then "contribution=" can be used instead, for any type of citation. Because all cite forks, in Lua, are handled by the one Lua module, then there has been little distinction between the allowable parameters.
Wikitext | {{cite book
|
---|---|
Live | J. Doe (1245 BC). "Song of Solomon". Moses papyrus, Inc. {{cite book}} : |work= ignored (help); Check date values in: |date= (help); Missing or empty |title= (help)
|
Sandbox | J. Doe (1245 BC). "Song of Solomon". Moses papyrus, Inc. {{cite book}} : |work= ignored (help); Check date values in: |date= (help); Missing or empty |title= (help)
|
Wikitext | {{cite web
|
---|---|
Live | J. Doe (1245 BC). Bible Texts Collection. Moses papyrus, Inc. {{cite web}} : |contribution= ignored (help); Check date values in: |date= (help); Missing or empty |title= (help); Missing or empty |url= (help)
|
Sandbox | J. Doe (1245 BC). Bible Texts Collection. Moses papyrus, Inc. {{cite web}} : |contribution= ignored (help); Check date values in: |date= (help); Missing or empty |title= (help); Missing or empty |url= (help)
|
Perhaps when the configuration class is "web" then the title should be considered mandatory, but I am not sure what is needed. -Wikid77 (talk) 02:26, 13 March 2013 (UTC)
Behavior of italics in title
Wikitext | {{cite journal
|
---|---|
Live | Phipps, A. G. (2000). "Japanese use of Beni-tengu-dake (Amanita muscaria) and the efficacy of traditional detoxification methods". Florida International University, Miami, Florida. {{cite journal}} : Cite journal requires |journal= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
|
Sandbox | Phipps, A. G. (2000). "Japanese use of Beni-tengu-dake (Amanita muscaria) and the efficacy of traditional detoxification methods". Florida International University, Miami, Florida. {{cite journal}} : Cite journal requires |journal= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
|
The italics in title |
Wikitext | {{cite web
|
---|---|
Live | Tulloss RE (2012). "Amanita muscaria var. persicina Dav. T. Jenkins". Studies in the Genus Amanita Pers. (Agaricales, Fungi). Retrieved 2013-02-21. {{cite web}} : Italic or bold markup not allowed in: |work= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
|
Sandbox | Tulloss RE (2012). "Amanita muscaria var. persicina Dav. T. Jenkins". Studies in the Genus Amanita Pers. (Agaricales, Fungi). Retrieved 2013-02-21. {{cite web}} : Italic or bold markup not allowed in: |work= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
|
The italics in title |
When the title contains text that is already italicized, the results appear to be a little strange and unpredictable in Lua. Dragons flight (talk) 12:40, 13 March 2013 (UTC)
- I reproduced the existing version by switching the uses of <i>...</i> to wikistyle ''...''. This matches the prior behavior when fed arguments that included italics, though it probably also opens us up to similar bad results if the arguments are wrapped in single-quotes or similar odd edge cases. I think using the quotes is "better", but I'd be open to other options. As it was expressions like <i>...''Bob''...</i> were being mangled by html tidy. Dragons flight (talk) 17:17, 13 March 2013 (UTC)
- Allows either internal double-tic or lead/tail apostrophe: The quick Lua check for apostrophes still allows title " 'Tis the season" in {cite_book/lua}, as: {{cite_book/lua |title='Tis the Season |date=December 1939}}, so that still works. -Wikid77 (talk) 19:00, 14 March 2013 (UTC)
- There is an instance of
<b>
- would that cause a similar issue? --— Gadget850 (Ed) talk 23:03, 13 March 2013 (UTC)
- There is an instance of
Cite comparison tool
I have created Module:CiteConversionTest. It is a simple that allows one to see select an article, pull out its citations, and see the both and after conversion results side-by-side.
Try (in your personal sandbox):
{{#invoke:CiteConversionTest | test | France }}
To avoid time outs it will only show the first 90 citations, but even so, one can easily test a large number of citations very quickly. Dragons flight (talk) 15:09, 13 March 2013 (UTC)
Missing options, strange dots
Wikitext | {{cite news
|
---|---|
Live | . BurgerBusiness. 2012-01-25 http://www.burgerbusiness.com/?p=9168. {{cite news}} : Missing or empty |title= (help)
|
Sandbox | . BurgerBusiness. 2012-01-25 http://www.burgerbusiness.com/?p=9168. {{cite news}} : Missing or empty |title= (help)
|
Some dots to do something about. Or, you know, require that the title is not blank. Dragons flight (talk) 04:27, 14 March 2013 (UTC)
Cite_web/lua still runs 85 per second
I have, today, verified the {cite_web/lua} still runs at 85 cites per second (compared to {cite_web} at 20/sec.), even with extra checks for rare formatting. The fastest Lua-based templates run ~180 per second, and I wonder if there are some tricks to faster Lua-based cites, especially for {cite_web/lua} which might use a separate Module:Citation/web if it could be rewritten much faster. This is a long-term issue, but something to inspect while reading the Lua module for other changes. -Wikid77 (talk) 19:00, 14 March 2013 (UTC)
- Is there any kind of profiler available to find out where the time bottleneck is? Otherwise this is going to be very difficult to find and improve. —David Eppstein (talk) 20:10, 14 March 2013 (UTC)
- Could you repeat your test? I think I've improved one of the bottlenecks for reasons having to do with how we interact with the frame. Dragons flight (talk) 20:44, 14 March 2013 (UTC)
- Now repeatedly 115/second or 35% faster: The same test of 500 {cite_web/lua}, with 8 typical parameters, now runs under 4.4 seconds rather than 6 seconds, as ~115 cites per second. I ran the current and the prior tests more than 15 times each, to reveal a clear pattern of the lowest times. The test repeated a typical moderate citation 500 times:
- Parameters: {{cite_web/lua |title=My Title |url=http://www.google.com |last=Doe |first=John |date=1 May 1956 |publisher=Acme |location=London |pages=45}}
- Results: {{cite web/lua |title=My Title |url=http://www.google.com |last=Doe |first=John |date=1 May 1956 |publisher=Acme |location=London |pages=45}}
- I tend to average the shortest times, rather than just use the "luckiest" low time (which was 4.3 seconds for 500 cites). Anyway, for a large article of 350 cites, that allows a run of "3.04" seconds to format those 350. So who could complain about waiting 3 seconds for citations. Excellent. -Wikid77 23:41, 14 March 2013 (UTC)
- I imagine loads of people could still complain ;). But seriously, this is very good news and thank you to all involved in working on the templates. Rjwilmsi 08:17, 15 March 2013 (UTC)
Cite encyclopedia tests
I've created a test page devoted just to {{cite encyclopedia}} at Module talk:Citation/CS1/test/encylopedia.
I would like to propose we fix the errors presently shown there, as well as any others people can add to it, and then move forward with deploying the Lua version of {{cite encyclopedia}} as a moderate scale test case as previously suggested by Wikid77. Dragons flight (talk) 18:23, 16 March 2013 (UTC)
- Looks like you fixed issue that have been outstanding for a long time. Good work. --— Gadget850 (Ed) talk 21:59, 16 March 2013 (UTC)
- I've addressed all of the {{cite encyclopedia}} errors that I know about. So it might make sense to do this moderate scale testing fairly soon. Dragons flight (talk) 00:32, 17 March 2013 (UTC)
- I've gone ahead and deployed the cite encyclopedia test case. This affects 60000 pages or so. Dragons flight (talk) 03:47, 17 March 2013 (UTC)
- What about the PDF rendering issue? --— Gadget850 (Ed) talk 05:10, 17 March 2013 (UTC)
- I've gone ahead and deployed the cite encyclopedia test case. This affects 60000 pages or so. Dragons flight (talk) 03:47, 17 March 2013 (UTC)
- They have "closed" that bug as of a few days ago. It should be the case that Lua templates and regular templates render the same way in all (or nearly all) cases, which I've tested with a number of examples. In particular, I've tried test pages where all citations were converted to Lua, and found that the output was the same. That part is good, so PDF rendering isn't a special concern for Lua developers. Now for the bad news. The "fix" applied to the PDF issue broke a ton of other stuff, leading to a new bug (bugzilla:46115). So the PDF process is still broken, but it is now broken for pages irrespective of whether or not they use Lua. Personally, I think a "fix" that breaks other things is not really a fix at all, but the WMF is aware of the issue and doesn't seem inclined to roll anything back. So, that's the current situation, PDF rendering is still broken for a variety of cases and in a variety of ways, but it generally works the same whether or not Lua is involved, so we might as well move forward with Lua. Dragons flight (talk) 05:31, 17 March 2013 (UTC)
Optimization suggestions
After looking over the code here, I have a few suggestions:
- I see Module:Mw's escape function runs 7
gsub
s. It would probably be faster to use just onegsub
with a table of replacement values, something like in here. Same goes for anything else that does multiple chained gsubs like that. - Considering the strings passed to
nowiki()
are likely to be very short, it should be faster to steal the gsub commands from here instead of callingframe:preprocess
. Never mind that the use offrame:preprocess
there is b0rken if the argument happens to contain "|" or the like. - Instead of
safeforitalics
, you could just use<i>...</i>
for italic formatting. Unless the ability to turn off the italics in the submitted arg using random''
is important. - I don't know if something vaguely like this would be faster, but it would remove the restriction that
duplicate_char
must be ASCII.
function safejoin( tbl, duplicate_char )
--[[
Note: string functions are safe here, since multibyte characters aren't used with
quantifiers or in brackets in the patterns
]]
local esc_char = duplicate_char:gsub( '%p', '%%%0' )
local comp1 = '^' .. esc_char
local comp2 = '^%b<>' .. esc_char -- Do we care about multiple tags at the start of a value?
local pat = esc_char .. "[ ']*$"
local repl = {
[duplicate_char] = '',
[duplicate_char .. ' '] = '',
[duplicate_char .. "''"] = "''",
}
local str = '';
for _, value in ipairs( tbl ) do
if value and value ~= '' then
if value:match( comp1 ) or value:match( comp2 ) then
str = str:gsub( pat, repl ) .. value
else
str = str .. value
end
end
end
return str;
end
- Even though it no longer counts against the Lua time limit, it does still take time to parse each argument from
frame.args
. If an argument isn't going to be used, best to not access it in the first place. - There is a non-zero cost for Lua to parse functions in the module, even if they aren't used. So I'd suggest killing all the functions for other templates from this module. Stick them in another module, or just forget them if they're going to be faster in ordinary wikitext with parser functions (like {{refend}}).
- And for that matter, it's silly to call these tag-generating functions with constant name and params. Instead of
z.mw.text.tag({name="span", contents=result, params={class="smallcaps", style="font-variant:small-caps;"}})
, just do'<span class="smallcaps" style="font-variant:small-caps">' .. result .. '</span>'
.- It's only marginally less silly the two times you don't pass entirely-constant values; since this is such a performance-critical module, just copy
z.mw.text.escape()
into this module (but improve it as I suggested above) and then concatenate the strings directly.
- It's only marginally less silly the two times you don't pass entirely-constant values; since this is such a performance-critical module, just copy
- And then you can kill both
require
calls. - For that matter, you might actually notice some improvement by putting
local gsub = string.gsub
at the top of the module and usinggsub
rather thanstring.gsub
, and the same for the rest of the string functions you use. Local variable access is faster than global variable access in Lua.
Hope this helps. Anomie⚔ 05:39, 17 March 2013 (UTC)
- Thanks Anomie. I agree that optimization is important here, though we are already 80% faster than the conventional templates, so I don't think optimization necessarily needs to hold us up. Several of your suggestions seem like good ones, and I'll try to benchmark them later. A couple of your suggestions I know won't help, and I'll mention them now. We can't use
<i>
for italics, as interacting with titles that include preexisting quote italics (e.g.''...''
) is a design requirement. There is a section earlier on this page mentioning that. I don't think we should spend much time (or performance) worrying if duplicate_char is UTF8. That would only happen if the user supplied a UTF8 separation character (which I've never seen an example of), and even then the failure mode isn't particularly bad. Regarding the use of arguments, we load all the arguments into a local variable at the very beginning (see z.citation). Given that citations are usually called with 3-10 arguments but can test for the existence of several dozen, we found that preloading all arguments locally in one step was about 30% faster than testing them one by one, also there aren't many opportunities for branch exclusion since anything included in the template is almost always used (and hence must be parsed). If I understand correctly, the module and its requirements are only parsed once per page render (rather than once per #invoke), and I think that overhead is quite small compared to the time spent rendering citations. Nevertheless I am also inclined to look at whether we can benefit from killing the require statements and splitting off code not directly related to citations. Dragons flight (talk) 16:26, 17 March 2013 (UTC)- I thought that
<i>
versus''
might be intended. If everything is used if supplied, then yeah you may as well load things like that; the advantage would only come if there are parameters that are only used with one style. If you have time, please benchmark thesafejoin
thing, I'd be interested to know which is faster. - As for the module contents, they are reloaded and reparsed on each #invoke (the code is in common/Hooks.php, ScribuntoHooks::invokeHook). Same goes for
require
(engines/LuaCommon/LuaCommon.php, Scribunto_LuaEngine::loadPackage; note that each loaded module gets its ownpackage.loaded
so requires aren't cached in that way either). That's why we havemw.loadData()
, it specifically avoids that by loading and caching the submodule in the outer sandbox that isn't recreated for each #invoke. Anomie⚔ 18:39, 17 March 2013 (UTC)
- I thought that
- Your version of safejoin appears to give identical output but is about 6 times slower than my version on a suite of test inputs. Neither version contributes all that much to the overall runtime though. Dragons flight (talk) 22:14, 17 March 2013 (UTC)
- 6 times slower? Wow, that's disappointing. Anomie⚔ 22:57, 17 March 2013 (UTC)
- Your version of safejoin appears to give identical output but is about 6 times slower than my version on a suite of test inputs. Neither version contributes all that much to the overall runtime though. Dragons flight (talk) 22:14, 17 March 2013 (UTC)
- You are definitely right about the gsubs. I had not appreciated that option before. Dragons flight (talk) 17:16, 17 March 2013 (UTC)
- I've gone ahead and split all the other template support back to Module:Citation, so this just handles WP:CS1 {{citation}} styling. I've also killed the require statements so that this piece of code stands on its own. That said, in testing it is not obvious that those changes actually make a statistically significant difference. I can see a difference with converting the nowiki code and some of the other things, so it should be a little faster now, but only a little. Dragons flight (talk) 21:37, 17 March 2013 (UTC)
- Cite_web/lua now 125/second as +9% faster: Before reading the above changes, I had simply rerun the prior 500-cite benchmarking tests (of 8 major parameters) another 15 times, which now clocked at 500 in 4 seconds (average low time; lowest was 3.6 sec. or 139/sec). Using just the average of lowest times, as 4 seconds, that gives 125/second. That is fast enough for now, and specifically, it is now ~10x times faster than old {cite_web} which had COinS metadata. -Wikid77 (talk) 00:20, 18 March 2013 (UTC)
- Re italics: There is a proposal to remove the current CSS styling for
<cite>
to allow it to default to italics. This might be a better way to go if the proposal is implemented. --— Gadget850 (Ed) talk 13:50, 18 March 2013 (UTC)
- Re italics: There is a proposal to remove the current CSS styling for
Transition Phase-4: Cite_news to use Lua
I think we are ready for the first large roll-out of Lua-based cites, into the major pop-culture articles which heavily repeat {cite_news}, because they are news-trendy topics sourced to recent news websites. With articles of such popularity, any rare cite problems are likely to be noted by someone, somewhere, who are aware how to complain about cite templates. The Lua-based {cite_news} (similar to {cite_web} usage) will also quicken many pop-culture articles as 2x-3x times faster edit-preview.
- Run: {{#invoke:CiteConversionTest|test|Costa Concordia}}
Let's progress to {cite_news} as the next phase, with the now ultra-fast {cite_news/lua} to run 6x faster than current {cite_news/old}. -Wikid77 (talk) 00:20, 18 March 2013 (UTC)
- News is a fine place to go. There is one case I know of where we have differences where I'm not sure if it is intentional or not:
Wikitext | {{cite news
|
---|---|
Live | "Auction Record for an Original 'Alice'". The New York Times. 11 December 1998. p. B30. |
Sandbox | "Auction Record for an Original 'Alice'". The New York Times. 11 December 1998. p. B30. |
- We should probably decide whether we want to fix that or leave it. Also, it would be nice to have an example page Module talk:Citation/CS1/test/news like Module talk:Citation/CS1/test/encyclopedia to catch the major forms and provide a reference in case of future regressions. Dragons flight (talk) 03:41, 18 March 2013 (UTC)
- Leave it. The only template that is supposed to use a different page format is journal. And the blank postscript is a fix that has been requested before. --— Gadget850 (Ed) talk 10:40, 18 March 2013 (UTC)
Testing Cite_news and quoted title
Unlike {cite_book}, the template {cite_news} should put quotation marks around the parameter "title=" and not use italics there. Placement of other parameters has been shifted, slightly different from {cite_news/old}, and the date comes before the page number, as typical for a newspaper article, where date determines which day's paper and then page number is next. All parameters for {cite_news}:
Wikitext | {{cite journal
|
---|---|
Live | Author (Date) [Origyear]. "Test Cite_news Parameters". Department. Newspaper Name (Type). Series (in Language). Volume (Issue). Others (Evening ed.). Place: Publisher: page. arXiv:ArXiv. ASIN ASIN. Bibcode:Bibcode. doi:10.DOI. ISBN Isbn. ISSN Issn. JFM JFM. JSTOR Jstor. LCCN LCCN. MR MR. OCLC OCLC. OSTI OSTI. PMC PMC. PMID PMID. RFC RFC. SSRN SSRN. Zbl ZBL. Id. Archived from the original (Format) on Archivedate. Retrieved Accessdate – via Via. Quote {{cite journal}} : |author= has generic name (help); |issue= has extra text (help); |page= has extra text (help); |volume= has extra text (help); Check |arxiv= value (help); Check |asin-tld= value (help); Check |asin= value (help); Check |bibcode= length (help); Check |doi= value (help); Check |isbn= value: invalid character (help); Check |issn= value (help); Check |jfm= value (help); Check |jstor= value (help); Check |lccn= value (help); Check |mr= value (help); Check |oclc= value (help); Check |osti= value (help); Check |pmc= value (help); Check |pmid= value (help); Check |rfc= value (help); Check |ssrn= value (help); Check |zbl= value (help); Check date values in: |accessdate= , |date= , and |archivedate= (help); Invalid |ref=harv (help); More than one of |pages= , |at= , and |page= specified (help); Unknown parameter |agency= ignored (help); Unknown parameter |coauthor= ignored (|author= suggested) (help); Unknown parameter |deadurl= ignored (|url-status= suggested) (help); Unknown parameter |doi_inactivedate= ignored (help); Unknown parameter |laydate= ignored (help); Unknown parameter |laysource= ignored (help); Unknown parameter |laysummary= ignored (help); Unknown parameter |subscription= ignored (|url-access= suggested) (help); Unknown parameter |titlelink= ignored (|title-link= suggested) (help); Unknown parameter |trans_title= ignored (|trans-title= suggested) (help); Unknown parameter |transcript= ignored (help); Unknown parameter |transcripturl= ignored (help)CS1 maint: unrecognized language (link)
|
Sandbox | Author (Date) [Origyear]. "Test Cite_news Parameters". Department. Newspaper Name (Type). Series (in Language). Volume (Issue). Others (Evening ed.). Place: Publisher: page. arXiv:ArXiv. ASIN ASIN. Bibcode:Bibcode. doi:10.DOI. ISBN Isbn. ISSN Issn. JFM JFM. JSTOR Jstor. LCCN LCCN. MR MR. OCLC OCLC. OSTI OSTI. PMC PMC. PMID PMID. RFC RFC. SSRN SSRN. Zbl ZBL. Id. Archived from the original (Format) on Archivedate. Retrieved Accessdate – via Via. Quote {{cite journal}} : |author= has generic name (help); |issue= has extra text (help); |page= has extra text (help); |volume= has extra text (help); Check |arxiv= value (help); Check |asin-tld= value (help); Check |asin= value (help); Check |bibcode= length (help); Check |doi= value (help); Check |isbn= value: invalid character (help); Check |issn= value (help); Check |jfm= value (help); Check |jstor= value (help); Check |lccn= value (help); Check |mr= value (help); Check |oclc= value (help); Check |osti= value (help); Check |pmc= value (help); Check |pmid= value (help); Check |rfc= value (help); Check |ssrn= value (help); Check |zbl= value (help); Check date values in: |accessdate= , |date= , and |archivedate= (help); Invalid |ref=harv (help); More than one of |pages= , |at= , and |page= specified (help); Unknown parameter |agency= ignored (help); Unknown parameter |coauthor= ignored (|author= suggested) (help); Unknown parameter |deadurl= ignored (|url-status= suggested) (help); Unknown parameter |doi_inactivedate= ignored (help); Unknown parameter |laydate= ignored (help); Unknown parameter |laysource= ignored (help); Unknown parameter |laysummary= ignored (help); Unknown parameter |subscription= ignored (|url-access= suggested) (help); Unknown parameter |titlelink= ignored (|title-link= suggested) (help); Unknown parameter |trans_title= ignored (|trans-title= suggested) (help); Unknown parameter |transcript= ignored (help); Unknown parameter |transcripturl= ignored (help)CS1 maint: unrecognized language (link)
|
The correct placement for date/page is for date to precede the page number (as done in Lua). Some newspapers have a morning or evening "edition=" for the same date, which has been added. Some newspapers, such as The New York Times, have a release date which is the prior day, and that could be shown as "origyear=printed 17 March 2013" to follow the regular date. The article title is always quoted ("Title"). In the old format, the "page=x" overrides the "pages=z" option and should be the same in Lua (in articles where both page/pages are specified), as coded in sandbox Module:Citation/CS1/sandbox where "page=x" overrides "pages=z" in 3 places. -Wikid77 (talk) 05:53/07:06, 18 March 2013 (UTC)
- {{cite journal}} (and possibly others) require that pages= override page=, which is how it was set up before. We need to accommodate which one wins depending on the citation mode. Dragons flight (talk) 08:37, 18 March 2013 (UTC)
- For all CS1 templates the order of precedence is page, pages, at. For cite journal, if work, journal, newspaper, magazine or periodical is defined then p. or pp. are not included. --— Gadget850 (Ed) talk 11:10, 18 March 2013 (UTC)
- Test of Cite_journal/old page+pages: The old order has a confusing mix for "page=x" to sometimes override "pages=z" as in the following:
- Try Cite_journal title: {{cite journal/old|title=Journal title|date=May 1996|page=pageX|pages=pagesZ}}
- Title jou*/old result: Journal title. May 1996. p. pageX.
- Title sandbox result: "Journal title". May 1996: pageX.
{{cite journal}}
:|page=
has extra text (help); Cite journal requires|journal=
(help); More than one of|pages=
and|page=
specified (help) - Try Cite_journal periodical: {{cite journal/old |periodical=Periodical |title=Journal title|date=June 2001 |page=pageX|pages=pagesZ}}
- Periodical jou*/old result: "Journal title". Periodical: pageX. June 2001.
- Periodical sandbox result: "Journal title". Periodical: pageX. June 2001.
{{cite journal}}
:|page=
has extra text (help); More than one of|pages=
and|page=
specified (help)
- For that reason, the Lua had been showing all options (all 3: page, pages, at), to display whatever has been used. I had noticed days ago that the Lua was acting reversed on {cite_news} page-option precedence, but neglected to mention that, and I apologize for not noting the difference last week. With 430 options, it is difficult to match all combinations. It was fine when all three appeared (showing all: page, pages, at) because at least "page=" and "pages=" would always show their data, and I left "at=" as a separate option. People have used both together as "pages=340 pages total" and "page=27" or such where only "p.27" would show. Hence, if we go back to display all, then I think that would be best, to show when people use both page/pages. However, for {cite_journal} we need an extra check to suppress the date when "periodical" and append the date afterward. It is a rat's nest, but I think we can match the date order in Citation/CS1/sandbox. We are very close to matching them. -Wikid77 (talk) 13:40/13:54, 18 March 2013 (UTC)
- That was a bug in the old cite journal, and I have fixed it. The precedence is page, pages, at. --— Gadget850 (Ed) talk 15:10, 18 March 2013 (UTC)
- We can of course code it that way, though I do worry about the amount of breakage by flipping the existing behavior of cite journal. I would like to include an error behavior in the case of multiple page specs. Initially this can be a hidden category to figure out how many cases of multiple usage exist, and allow cleanup of existing problems. Is it correct to understand that including more than one of pages=, page= and at= should always be considered erroneous behavior? I've seen rather a large numbers of examples of this in the wild. Dragons flight (talk) 16:07, 18 March 2013 (UTC)
- If only one of the three parameters is defined, then nothing is broken. If both page and pages are both defined, then it is usually because the editor is using pages for the total number of pages in the work, which has no place in the citation and if it doesn't show, then no problem. Regardless, I have made the fix to the old template, so we might get some complaints, but I doubt it. --— Gadget850 (Ed) talk 18:24, 18 March 2013 (UTC)
- We can of course code it that way, though I do worry about the amount of breakage by flipping the existing behavior of cite journal. I would like to include an error behavior in the case of multiple page specs. Initially this can be a hidden category to figure out how many cases of multiple usage exist, and allow cleanup of existing problems. Is it correct to understand that including more than one of pages=, page= and at= should always be considered erroneous behavior? I've seen rather a large numbers of examples of this in the wild. Dragons flight (talk) 16:07, 18 March 2013 (UTC)
- I've added a trapping category: Category:References with multiple page specifications. Right now, it is of course limited to {{cite encyclopedia}} uses. Dragons flight (talk) 19:45, 18 March 2013 (UTC)
- This seems like a good way to handle it to me. That way we can find and fix problems with misused parameters without making a visible mess and confusing the readers. —David Eppstein (talk) 19:55, 18 March 2013 (UTC)
- I've added a trapping category: Category:References with multiple page specifications. Right now, it is of course limited to {{cite encyclopedia}} uses. Dragons flight (talk) 19:45, 18 March 2013 (UTC)
Cite news installed
I've gone ahead and installed the Lua version of {{cite news}}. Dragons flight (talk) 14:23, 19 March 2013 (UTC)