Template talk:Lang/Archive 7

This is an archive of past discussions about Template:Lang. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 5

Runtime error

Line 1068 of Module:Lang is giving a runtime error at Ḍād. The problem may be a mistake in the template:

{{transl|صكةةشضar|DIN|[[tsade|ṣād]]}} → [[[tsade|ṣād]]] Error: {{Transliteration}}: unrecognized language / script code: صكةةشضar (help)

However, it would be desirable to tweak the line in question so an error message is shown. The problem is that the match in the following line returns nil so :lower() fails.

    args.code = args[1]:match ('^%a%a%a?%a?'):lower();

Johnuniq (talk) 07:01, 18 January 2018 (UTC)

Thanks for that, fixed.

—Trappist the monk (talk) 10:28, 18 January 2018 (UTC)

template:lang-orv

While I was away on wikibreak, Editor Iryna Harpy reverted {{lang-orv}} to an older version and left, what is to me anyway, an inscrutable edit summary:

Restore prior undiscussed changes resoring "Old Russian Language" rather than the [[[WP:COMMONNAME]] "Old East Slavic".

The template, as it renders now, displays:

{{langx|orv|place holder text}}

Old East Slavic: place holder text

The module-supported version renders:

{{#invoke:lang|lang_xx_inherit|code=orv|place holder text}}

Old East Slavic: place holder text

Both of Old East Slavic language and Old Russian language, the links produced by the the two versions of the template, are redirects to Old East Slavic which identifies Old Russian as a synonym. It is not clear to me how the appeal to WP:COMMONNAME applies the the revert. Because both 'Old East Slavic' and 'Old Russian' labels in the above examples link to the same article and are considered synonymous, and because WP:COMMONNAME applies to article titles, not to links (piped or not piped), it would seem WP:COMMONNAME does not apply here.

orv is an ISO 639-3 language code that is assigned to the language name Old Russian; see the ISO 639-3 custodian's website at code orv. This same code and name combination can be found in the IANA language-subtag-registry file. Consequently, {{lang-orv}} should be using the correct name as defined by the code's custodian.

In the reverting editor's edit summary the claim is made that changes to {{lang-orv}} were made without discussion. That may or may not be true for Editor Maczkopeti's edit and is marginally true for my subsequent edits to the extent that {{lang-orv}}, one of some 600ish {{lang-??}} templates, was not specifically discussed on this page before I made the initial edit. The edit summary for that edit, though, does refer to the discussions on this page. For this reason, I believe that the undiscussed changes claim is flawed.

Further, Editor Maczkopeti's version and the module-supported version of the template both supported transliteration and translation while the version apparently preferred by Editor Iryna Harpy does not. If a different label is desired for the template rendering, that functionality is available from the module-supported version of the template. The |label= parameter is described in the documentation associated with module-supported {{lang-??}} templates (see for example {{lang-de}}).

—Trappist the monk (talk) 23:59, 30 January 2018 (UTC)

It looks like there are people willing to die on this hill. See Talk:Old East Slavic. Per the discussions there, it may be best to reinstate the template to get the new features, but override the ISO's language name, as we have done for other languages. I'd be interested to hear from Iryna Harpy. – Jonesey95 (talk) 02:04, 31 January 2018 (UTC)

The label is something that does get displayed in article text, and usually in quite a prominent place, so it definitely does make a difference. We could leave aside the historic unreliability of at least one of the ISO code custodians (they are better these days but I'm not sure to what an extent they've manage to catch up). But even then if it was as simple as following them, we probably wouldn't have that much of a need for our language naming conventions and we definitely wouldn't be spending so much time debating the titles of language articles. The |label= parameter is fine, but it should be used for cases where the specific context necessitates a label different from the default. I would think that it goes without saying that the default label should match whatever has gained consensus to be the article's name here. – Uanfala (talk) 03:01, 31 January 2018 (UTC)

WP:NCLANG applies to article titles; {{lang-orv}} applies to independent strings of text that may occur anywhere. Because both Old East Slavic and Old Russian are synonymous, the labeling choice becomes one of context. Because templates cannot know the context in which they are used, in cases such as this one, where we have determined that both language names are more-or-less equivalent (equivalency through redirection), one name must be selected for the default label. Because the ISO 639-3 custodian has assigned Old Russian to code orv, which choice is supported by the IANA custodian, that is probably the best choice that can be made by the template. We are not here to set standards nor to right wrongs; if a better language name exists, the subject should be taken up with the custodian and ISO to change the assigned name or to create a new code/name pair.

Because templates cannot know context and because a default label must be chosen, the module-supported templates provide |label= so that editors may choose to use the default label, no label, or some other label of their own devising. As the live template stands now, those editorial choices are prohibited.

—Trappist the monk (talk) 13:03, 31 January 2018 (UTC)

You seem to have chosen not to pay attention to what Jonesey and I wrote above. What the default label for a language should be is not a separate question from its article title. If the community has taken the trouble to discuss and decide the most appropriate name for a language, then ignoring that consensus and virtually forcing upon it the choice of one external "authority" (which may or may not be reliable in the particular case) is frankly delusional. – Uanfala (talk) 15:17, 31 January 2018 (UTC)

No, no, not true. I did read and, I think, understood but disagreed with some of what you wrote. I agree that the label is prominently displayed and so matters in the context in which it is used. Ethnologue is not an ISO 639-3 custodian though its parent, SIL International, is. Whatever Ethnologue has to say about code/name pairing may be dismissed in favor of the official pairing defined by SIL International – no doubt there is some internal collaboration between SIL International and Ethnologue. Articles at en.wiki may be given whatever name the community chooses pursuant to the restrictions set down in WP:NCLANG. But, those article-title restrictions do not apply to templates any more than they apply to piped links (if they did then piped links like this: [[Old East Slavic|Old Russian language]]would not be allowed). The single template mentioned at WP:NCLANG is {{infobox ethnonym}}; the {{lang-??}} and {{lang}} templates are not. We agree that |label= should be used when context dictates that the default label isn't suitable. We disagree about the default label / article title match. We can have only one article with a specific title and we can have only one default label. Redirects resolve mismatches. No one is being forced to do anything; the |label= parameter is implemented specifically to allow editors to choose template labels for themselves. I do not think that using a default label that is taken from an international standard is delusional. I do think that the delusional comment is inappropriate because it simply labels me and does not advance your position.

—Trappist the monk (talk) 16:56, 31 January 2018 (UTC)

Is there a difference between this situation and that of {{Lang-mla}}? I ask this as a straightforward question – not to imply or argue that we should add to our list of overrides, but because I often fail to perceive a legitimate difference. – Jonesey95 (talk) 19:59, 31 January 2018 (UTC)

Yes, I think that there is. {{lang-mla}} produces a label to Tamambo language because Malo language is a disambiguation page (the {{lang-??}} templates should not link labels to dab pages). At the time that I added mla to Module:lang/data, mla was being misused for Medieval Latin (in Module:Language/data/wp languages). Also, at the time |label= was just an entry on the future features wishlist. The correct fix, I think, is to make Malo language a redirect, improve the hatnote at Tamambo language, and add a hatnote to Melo language after which we remove the mla entry at Module:lang/data so that {{lang-mla}} renders a label linked to Malo language which redirects to Tamambo language. Right?

—Trappist the monk (talk) 20:50, 31 January 2018 (UTC)

Malo language is now a redirect; its previous content added to Malo, a dab page; Tamambo language hatnote tweaked; hatnote added to Melo language; mla stricken from Module:lang/data.

—Trappist the monk (talk) 13:13, 1 February 2018 (UTC)

(edit conflict) I don't know the specific of this case, but judging from the glottolog entry (linked from the infobox of the article Tamambo language) and the titles in the bibliography displayed there, Mambo is found in the old literature while Tamambo is the name used in more recent sources. If ethnologue (this is the part of SIL that deals with the language codes) still has it as Mambo, then it's most likely a case of no-one having bothered to push them to change it yet. – Uanfala (talk) 20:53, 31 January 2018 (UTC)

I was not labelling you as delusional, Trappist the monk. I do see your comments so far as sensible (and your work on the module has been brilliant), but I really can't think of a more suitable word to describe the expectation that it's OK to override the hard-won community consensus in individual cases and force the use of a single external authority. The whole point of the lang-xx templates is for it to be easy to change the language label if it's decided (via an RM at the language article) that the name previously used for that language is problematic (because of, for example, being stigmatising, or out of step with the current literature). The obvious practice so far has been to keep individual labels in sync with article titles. The lang module hasn't quite departed from it yet as it's these article titles that it uses via Module:Language/data/wp languages. This isn't being updated anymore and if you wish your module to do something else, it's unlikely that people will try to stop you outright: simply over time more lang-xx templates will switch away from using it like lang-orv has done. If you do wish to stop them from doing so, or if you'd like to have community consensus behind your vision, you should start an RfC at WT:WikiProject Languages (though this is extremely unlikely to pass). I do not wish to be arguing these points any more. – Uanfala (talk) 20:53, 31 January 2018 (UTC)

I guess I must take issue with the notion that the label provided by {{lang-orv}} was a hard-won community consensus. While that may be true of the Old East Slavic article title, it does not appear to be true of the template. Upon its creation the template used 'Old Russian' as a label. With this edit it briefly changed to 'Old Ruthenian' but was reverted 9.5 hours later. A pair of edits approximately 1 year 9 months later briefly restored 'Old Ruthenian' but three minutes later that was changed (by the same editor [ping]) to 'Old East Slavic'. Looking at that editor's contributions around the time of the edits does not show any activity on a talk page related to 'Old East Slavic'. Similarly, the editor has not contributed to Old East Slavic or to its talk page or to Talk:Old Ruthenian language/Archive 1. So, looks like a drive-by edit that stuck until I changed the template with this edit. The template itself has no talk page so if there were any controversy about the label that the template provided, that discussion has occurred elsewhere.

—Trappist the monk (talk) 00:48, 1 February 2018 (UTC)

I honestly don't have the strength to argue the case for overriding this template at the moment but, to make it clear, but Uanfala, myself, and countless other editors specialising in Slavistics are up to date on contemporary research in the field. Please check Dƶoxar's editing history again, and you'll find that s/he is a prodigious editor in all areas of Slavonics, ergo it is unfair to call his/her edit a 'drive by' change in any shape or form. The template appears in dozens of articles (probably hundreds, but I haven't the energy to run a check on how many exactly), and your revert has reinstated the anachronistic 'Old Russian language' to each and every instance of its use contrary to what is now the mainstream usage of 'Old East Slavic'. As already noted, this dated nomenclature appears in prominent positions within articles: infoboxes and the leads of articles evert time it is used. This is misleading to the reader. Not only does it perpetuate misinformation regarding the nature of the medieval language in question, it contradicts content within the articles themselves based of WP:RS. The fact that this is formulated as a cover-all template does not mean that the formula is written in stone and should not be amended (or even bypassed) where appropriate... and this is a case where it is appropriate. This does not detract from the template as being generically useful, but the modifications follow the spirit of Wikipedia. It isn't even an IAR issue as it can be demonstrated that Old East Slavic has been the contemporary norm for a few decades. If this is to be disputed (or even dispute-worthy), this template talk page is too obscure a venue as most editors are unaware of the intricacies of coding and are unlikely to even find the template page regulating the output. As has been suggested, should it be deemed appropriate, I believe that WikiProject Languages is the apt venue. I'd also suggest that WP:ETHNIC be made aware of any potential RfC. --Iryna Harpy (talk) 03:56, 3 February 2018 (UTC)

{{langx|orv|placeholder text}} → Old East Slavic: placeholder text

—Trappist the monk (talk) 12:21, 3 February 2018 (UTC)

Character references

{{Unicode chart Cuneiform}} is currently displaying its Cuneiform characters italicized, because it uses numeric character entities ({{lang|akk|𒀀}}), and those are composed of characters within the range U+32–U+127, which the module considers to be Latin.

The problem could be fixed by entering |italic=no into the 896 or so templates in the template, but it would be more robust to change the module and run mw.text.decode on the text before doing mw.ustring.gsub on it to determine the script. Then, in the above instance the module would see the character 𒀀 rather than the ASCII character reference. Then any further cases in which someone has chosen to use character references will be dealt with.

If decodeNamedEntities is enabled, it would probably add some amount of memory because a PHP table will be loaded for every template instance. If that is a problem, the function could be modified to use that flag only if it's needed: that is, if there are character references that match the pattern "&[a-zA-Z0-9]-;" as opposed to "&#[0-9]-;", "&#[Xx]%x-;". — Eru·tuon 10:06, 2 February 2018 (UTC)

All I see is little square boxes. I've implemented your suggestion in the sandbox. Dropping a zero from the Cuneiform produces a character that displays for me and tests the sandbox:

{{lang|akk|ሀ}} → ሀ

{{lang/sandbox|akk|ሀ}} → ሀ

{{lang/sandbox|es|casa}} → casa

Someone else will have to say that the fix works for the Cuneiform:

𒀀 ← {{lang|akk|𒀀}}

𒀀 ← {{lang/sandbox|akk|𒀀}}

For me, the shapes of the little boxes are slightly different so that suggests that the fix does work.

—Trappist the monk (talk) 12:28, 2 February 2018 (UTC)

I see italicized and unitalicized cuneiform characters above. I think it worked. – Jonesey95 (talk) 14:02, 2 February 2018 (UTC)

I get squares, using Opera 36. It will depend upon the fonts installed on your computer, and will probably depend upon your browser as well - my signature shows a red rose when using Firefox, but a square when using Opera, all other things (operating system, fonts, Wikipedia skin, user CSS/javascript, other user prefs) being equal. --Redrose64 🌹 (talk) 15:55, 2 February 2018 (UTC)

The sandbox version looks correct now! Thanks.

If you want to see the characters, you can download Noto Sans Cuneiform. You might also have to add CSS tied to language attributes (imperfect), if your browser or operating system doesn't automatically connect the font to the Unicode range. (I have Windows 10, and apparently it supports these characters with Segoe UI Historic by default – that's kind of neat.) I myself do nothing with cuneiform, but like to be able to see the characters. — Eru·tuon 21:09, 2 February 2018 (UTC)

live module updated.

—Trappist the monk (talk) 12:27, 3 February 2018 (UTC)

ko-Kore vs. ko-Hani

@Trappist the monk: Template {{lang}} has broken some articles. (see Wikipedia talk:WikiProject Korea#List of townships in South Korea) Could you help me with this? Thanks. --Garam (talk) 08:36, 11 February 2018 (UTC)

I saw the mention of this problem at the WikiProject page. It seems the template didn't like ISO 15924 code Kore ("Korean (alias for Hangul + Han)") anymore. To resolve the immediate error on List of townships in South Korea I replaced it with Hani (since in this case all the entries affected were hanja), though this solution won't be applicable in other cases. 59.149.124.29 (talk) 09:37, 11 February 2018 (UTC)

{{lang}} is not broken. That error message:

{{lang|nocat=yes|ko-Kore|梧城面}} → [梧城面] Error: {{Lang}}: script: kore not supported for code: ko (help)

arises because IANA have determined that the ISO 15924 script code Kore should not be used with language code ko. I presume that IANA believe that writing ko-Kore is just as unnecessary as writing en-Latn. This suggests to me that the correct fix may be to simply remove the script subtag, though, IP editor's fix may be just as valid.

—Trappist the monk (talk) 10:45, 11 February 2018 (UTC)

Erde, singe

It looks like something changed for better regarding things that should be italic, and in Latin even for those that should not be. For German, however, I still get italic, which is nonsense for "Erde, singe", for the title and for all quotations, at least if we still follow that what is in quotation marks doesn't need italics. (I didn't use the template in the article.) Is there a way to not automatically italicise the output? Or a plan to change that soon? 4 days to do repair on Der 100. Psalm, where at least the title shows properly. - (Forgive me, the answer is probably on this page, but I have no time to search.) --Gerda Arendt (talk) 15:46, 4 January 2018 (UTC)

Remember that templates cannot see that which lies outside of the bounding {{ and }}. For "{{lang|de|[[Erde, singe]]}}", {{lang}} sees the language specifier, de, and the wikilinked text, [[Erde, singe]]. Remember also that the general rule for non-English, latn-script text used in the English Wikipedia, is to italicize. Titles of minor works, proper names, etc, are exceptions to that general rule. The template is intended to render the general-rule case.

In another conversation on this page I suggested that we could create a minor-title language wrapper-template that would take the same two parameters as {{lang}}, would create correct html, and visually render a quoted text in normal font style. That offer was dismissed in the conversation but, lacking a better solution, I have not withdrawn the offer.

I do not know what you mean by 4 days to do repair on Der 100. Psalm, where at least the title shows properly. According to the article history, there have been no changes since this edit on 29 December 2016 by InternetArchiveBot.

—Trappist the monk (talk) 16:09, 4 January 2018 (UTC)

Gerda Arendt, I can't follow what Trappist is doing and mostly don't know what he is talking about. But if you need help fixing the display of a specific template on a specific page I can probably handle that, so "any time at all, all you have to do is call". Justlettersandnumbers (talk) 17:25, 4 January 2018 (UTC)

What is it that you don't understand? If ever you don't understand something that I have written, tell me, perhaps I can find a better form of explanation.

—Trappist the monk (talk) 13:35, 5 January 2018 (UTC)

"Erde, singe" is the title of a "minor work", thus the exception you mention, Trappist the monk. Justlettersandnumbers, can you make that possible? Just title and infobox of that article. --Gerda Arendt (talk) 18:14, 4 January 2018 (UTC)

Related to this discussion, see §latn script inside <poem>...</poem> tags.

—Trappist the monk (talk) 13:35, 5 January 2018 (UTC)

I went ahead and fixed one article manually, but I wrote several hundred. Will the template eventually be back to no forced italics? Can a bot do the work? --Gerda Arendt (talk) 18:47, 23 January 2018 (UTC)

I can't see how how a bot could fix wrongly applied italics when {{Lang}} is used, nor how the template code itself could fix it. Either the template gets restored to its previous functionality (show markup as specified by the editor) or hundreds (thousands?) of articles need to be corrected manually – or MOS:ITALIC and MOS:NOITALIC get rewritten. -- Michael Bednarek (talk) 00:19, 26 January 2018 (UTC)

As was explained above, those are not the only two choices. And yes, a bot request could be filed to make a specific adjustment to a few hundred pages. – Jonesey95 (talk) 02:37, 26 January 2018 (UTC)

I do not anticipate that the template will change much from its current functionality.

I am perplexed by your fix to "Traum durch die Dämmerung". In §Poem, before the fix, all of the {{lang}} templates in the last three sentences are wrapped in italic markup (an indication that all of the German text enclosed by the templates should be rendered in italic font):

The third line of the poem describes the walk to meet the woman in first person, after detailing meadows, twilight, the sun and the stars: ''{{lang|de|Nun geh ich zu der schönsten Frau}}'' (Now I go to the most beautiful woman). The subject notes that he is not in a rush: ''{{lang|de|Ich gehe nicht schnell}}'' (I do not go fast). She is not described, but their relationship imagined as a ''{{lang|de|weiches, sammtenes Band}}'' (soft, velvety band), drawing him to ''{{lang|de|der Liebe Land}}'' (the love land), reaching a state of ''{{lang|de|mildes blaues Licht}}'' (mild blue light).

You then fixed the article so that all but one of these of these templates use |italic=no yet all are still wrapped in italic markup:

The third line of the poem describes the walk to meet the woman in first person, after detailing meadows, twilight, the sun and the stars: ''{{lang|de|Nun geh ich zu der schönsten Frau|italic=no}}'' (Now I go to the most beautiful woman). The subject notes that he is not in a rush: ''{{lang|de|Ich gehe nicht schnell|italic=no}}'' (I do not go fast). She is not described, but their relationship imagined as a ''{{lang|de|weiches, sammtenes Band|italic=no}}'' (soft, velvety band), drawing him to ''{{lang|de|der Liebe Land}}'' (the love land), reaching a state of ''{{lang|de|mildes blaues Licht|italic=no}}'' (mild blue light).</nowiki>

What is the purpose of that? |italic=no unconditionally renders the German text in upright font:

''{{lang|de|mildes blaues Licht|italic=no}}'' → mildes blaues Licht

If you want to control the font with external markup, use |italic=unset, which setting disables template-provided styling

''{{lang|de|mildes blaues Licht|italic=unset}}'' → mildes blaues Licht

{{lang|de|mildes blaues Licht|italic=unset}} → mildes blaues Licht

See the template documentation at {{lang}}.

—Trappist the monk (talk) 00:41, 31 January 2018 (UTC)

Can that please be the only time that you point out in this detail that yes, I make mistakes? I am sure I made more when I installed the "italic=unset" in Bach cantata. 126 occurrences of "lang", took me more more than an hour. --Gerda Arendt (talk) 14:52, 1 February 2018 (UTC)

The automatic italics is also spoiling the "this user speaks [language]" userboxes, such as Template:User fr-2, Template:User ro-2, Template:User de-1, and Template:User es-1, amongst many, many others. It would be nice if there were some way to non-italicize these without having to hunt down and edit each and every last one of them individually. — Rich wales (no relation to Jimbo) 22:31, 1 March 2018 (UTC)

Those userboxes are the old method - if you use the #babel: parser function (which has been around since at least 2013), as in {{#babel:fr-2|ro-2|de-1|es-1}}, they display properly. --Redrose64 🌹 (talk) 23:16, 1 March 2018 (UTC)

OK, thanks. — Rich wales (no relation to Jimbo) 18:19, 3 March 2018 (UTC)

Missing "translit." and "lit." smalltext in grc-gre, other issues

The {{lang-grc-gre}} template does not display the wikilinked "translit." and "lit." in smalltext, nor the commas separating them, unlike {{langx|grc}} and {{langx|el}}. To demonstrate, compare the following:

{{langx|grc|σοφῐ́ᾱ|sophíā|wisdom}} renders to Ancient Greek: σοφῐ́ᾱ, romanized: sophíā, lit. 'wisdom'.
{{langx|el|σοφία|sophía|wisdom}} renders to Greek: σοφία, romanized: sophía, lit. 'wisdom'.
{{lang-grc-gre|σοφία|sophía|wisdom}} renders to ‹See Tfd›Greek: σοφία, translit. sophía, lit. "wisdom".

As a related note, the some or all (I didn't check them all) more specific templates appear to not conform to the aforementioned wikilinking in "transl.", such as the following:

{{langx|grc-x-classic|σοφῐ́ᾱ|sophíā|wisdom}} renders to Classical Greek: σοφῐ́ᾱ, romanized: sophíā, lit. 'wisdom'.
{{langx|grc-x-hellen|σοφῐ́ᾱ|sophíā|wisdom}} renders to Hellenistic Greek: σοφῐ́ᾱ, romanized: sophíā, lit. 'wisdom'.

I hope someone either fixes these issues or informs me why they should not be changed. I would have posted this on Template talk:Lang-grc-gre, but given how obscure that talkpage is (it's still redlinked!), I decided it would probably receive more attention here. Thanks. ―Nøkkenbuer (talk • contribs) 21:32, 20 March 2018 (UTC)

The grc-gre portion of the name {{lang-grc-gre}} is non-standard so is not supported by Module:Lang. That template doesn't create the translit. and lit. static text in its renderings; it may never have done – a search through the template's history would answer that question. It is my opinion that {{lang-grc-gre}} should go away because the text that it wraps could be any one of several Greek languages; that is ok for the browser/screen reader reason for the template but doesn't serve the human reader very well.

The specific Greek-language templates don't link translit. because Module:Lang requires the existence of the related articles: Romanization of Classical Greek, Romanization of Hellenistic Greek, etc. When and if those articles are created, the specific Greek-language templates will link translit. in the rendering.

—Trappist the monk (talk) 22:10, 20 March 2018 (UTC)

Personally, I'm fine with deleting {{lang-grc-gre}}, especially since I have noticed that most of its usage involves the names of ancient Greeks despite the script including diacritics and spellings (and subsequent transliterations) which have since been changed or lost in later Greek (especially Modern Greek). I only use it because I have seen it used in such instances, and thus I'm trying to adhere to whatever semblance of implicit consensus appears to exist on the matter. I would much rather use a more specific template, though.

Should I even bother continuing my usage of the template? Or just drop it (which I would prefer)? A bit more boldly, would you be interested in trying to delete the template at WP:TFD (or support my attempt)? Or, given the template's usage and transclusion count, it's not worth bothering because it will be dead on arrival?

By the way, thanks for the reply and explanation about the unlinked transl. in the other templates, Trappist the monk. ―Nøkkenbuer (talk • contribs) 00:29, 21 March 2018 (UTC)

I will support you if you decide to tfd {{lang-grc-gre}}. If I had Greek I'd also help with fixing its transclusions but alas, I do not know to distinguish one form of Greek from another.

—Trappist the monk (talk) 10:53, 21 March 2018 (UTC)

Implement shorthand parameters

This needs to use smarter negative detection, to support shorthand answers. Should also support a shorthand parameter name:

{{lang|de|Ich Bin Ein Auslander|italics=no}} → Ich Bin Ein Auslander
{{lang|de|Ich Bin Ein Auslander|italics=n}} → Ich Bin Ein Auslander
{{lang|de|Ich Bin Ein Auslander|italic=no}} → Ich Bin Ein Auslander
{{lang|de|Ich Bin Ein Auslander|italic=n}} → Ich Bin Ein Auslander
{{lang|de|Ich Bin Ein Auslander|i=no}} → Ich Bin Ein Auslander
{{lang|de|Ich Bin Ein Auslander|i=n}} → Ich Bin Ein Auslander

Simlarly, |cat=n and |c=n should produce the same output as the unintuitive |nocat=y.

— SMcCandlish ☏ ¢ 😼 20:21, 22 March 2018 (UTC)

tool tip for {{lang}}

Unlike the renderings of the {{lang-??}} templates, the rendering of {{lang}} does not inherently give readers any indication of the rendered text's language. I have implemented a tool tip for {{lang}} so that floating the pointer over the text will show the language in a tool tip:

{{lang|ru-latn|tûndra}} → tûndra

tûndra

—Trappist the monk (talk) 11:04, 30 March 2018 (UTC)

italic markup error messaging

I have restored the italic markup error messaging to {{lang}}. That functionality was disabled while most {{lang-??}} called the {{lang}} template with hard-coded italic markup. Now, most {{lang-??}} templates call Module:lang without hard-coded italic markup. Before mading this change, I spent the better part of a week or more with an AWB script fixing the obvious {{lang}} template instances in article space that included italic markup – some 9000+ articles. But, Cirrus search is imperfect so, now that the error messages are enabled, Category:Lang and lang-xx template errors is refilling (I had also emptied it).

—Trappist the monk (talk) 14:37, 14 April 2018 (UTC)

Please take a look at Fire and Rain (novel), which uses {{Infobox name module}}. I played around in the template's sandbox, but didn't get very far.

I also adjusted {{Infobox book}}, which should remove a few hundred pages from the error category and prevent a few hundred more from ever getting in there. – Jonesey95 (talk) 15:57, 15 April 2018 (UTC)

The problem with that use of {{infobox name module}} is that this:

{{infobox name module|traditional={{linktext|煙雨|濛濛}}|simplified={{linktext|烟雨|濛濛}}|pinyin=Yānyǔ méngméng|translation=Misty Rain}}

produces all of this:

'"`UNIQ--templatestyles-00000040-QINU`"'<tr><th scope="row" class="infobox-label">Literally</th><td class="infobox-data">''Misty Rain''</td></tr>

and all of that feeds the {{{2|}}} parameter in {{lang}} in {{infobox book}}. Note where the opening bracket of the error message is.

So, here we have: Chinese traditional script, Chinese simplified script, Chinese Latin (pinyin) script, italicized English translation (which is the cause of the error message), and miscellaneous English html and wiki markup; three scripts and two languages, both identified as Chinese. Much of that wiki and html markup does not belong inside {{lang}} ever.

Editors at Fire and Rain (novel) should pick one language and one script for the {{infobox book}} |title_orig= parameter.

With regard to the change that you made to {{infobox book}}: I thought of making that same change but chose not to because MOS:FOREIGNITALIC. The correct solution is, I think, to always apply italics (|italic=yes) except when |orig_lang_code= specifies a CJK language (unless the |title_orig= is using Latn script). You can see why I have elected to defer acting on a fix to {{infobox book}}.

—Trappist the monk (talk) 17:15, 15 April 2018 (UTC)

It looks like I was wise not to try to hammer a "fix" into Infobox name module. Ugly.

At that MOS page, I see titles of major works that should be italicized are italicized in Latin, Greek, Cyrillic, and Hebrew scripts. From my perusal of uses of Infobox book, it appears to me that we are following that guidance. What am I missing? As you know, I can be pretty clever, but I am not always smart, and I miss stuff. – Jonesey95 (talk) 17:43, 15 April 2018 (UTC)

And now for the rest of the story:

...(but not in Chinese, Japanese, or Korean). (references omitted)

And, {{lang}} only applies automatic italics if all characters in {{{2|}}} are Latn-script characters (except when code is en) so Greek, Cyrillic, and Hebrew scripts are not auto-italicized.

MOS:FOREIGNITALIC does not, it appears, address the question of Arabic, Devanagari, Thai, ... scripts unless the intended meaning of the whole statement (your quoted part plus my quoted part) is that all language scripts except CJK are to be italicized when used as major work titles.

—Trappist the monk (talk) 18:09, 15 April 2018 (UTC)

IANA lists

19 different Chinese language codes (excluding csl (Chinese Sign Language)):
zh, cdo (Min Dong), cjy (Jinyu), cmn (Mandarin), cpi (Chinese Pidgin English), cpx (Pu-Xian), czh (Huizhou), czo (Min Zhong), gan (Gan), hak (Hakka), hsn (Xiang), ltc (Late Middle), lzh (Literary), mnp (Min Bei), nan (Min Nan), och (Old), wuu (Wu), yue (Yue, Canonese), zhx (Chinese (family))
3 Japanese language codes (excluding jsl (Japanese Sign Language)):
ja, jpx (Japanese (family)), ojp (Old)
3 Korean language codes (excluding kvk (Korean Sign Language)):
ko, okm (Middle Korean (10th-16th cent.)), oko (Old Korean (3rd-9th cent.))

—Trappist the monk (talk) 18:09, 15 April 2018 (UTC)

I have created Module:In lang as a place for code bits and pieces related to Module:lang. The only code in it is the function set_italics() which should enable {{Infobox book}}, {{infobox document}} and the like to correctly set italics for non-English titles:

{{#invoke:In lang|set_italics|zh|烟雨}} → Script error: The function "set_italics" does not exist. : should be no

{{#invoke:In lang|set_italics|zh|Yānyǔ méngméng}} → Script error: The function "set_italics" does not exist. : should be yes (pinyin)

{{#invoke:In lang|set_italics|en|Test}} → Script error: The function "set_italics" does not exist. : should be yes

{{#invoke:In lang|set_italics|ru|тундра}} → Script error: The function "set_italics" does not exist. : should be yes

Usage is:

{{lang|zh|烟雨|italics={{#invoke:In lang|set_italics|zh|烟雨}}}} → 烟雨

{{lang|zh|Yānyǔ méngméng|italics={{#invoke:In lang|set_italics|zh|Yānyǔ méngméng}}}} → Yānyǔ méngméng

—Trappist the monk (talk) 22:27, 15 April 2018 (UTC)

I have discovered and, I think, fixed an italic detection bug in Module:Lang. The italic markup in this:

{{lang|it|''Le stragi nascoste. L'armadio della vergogna...''}}

should have been detected and flagged as an error. The detection mechanism was defeated by the single apostrophe in L'armadio. The new version correctly detects the markup:

[Le stragi nascoste. L'armadio della vergogna...] Error: {{Lang}}: text has italic markup (help)

A common markup error not detected by the previous version is the case where the markup starts or ends inside the template with the matching markup outside:

{{lang|it|Le stragi nascoste. ''L'armadio della vergogna...}}'' – markup begins inside, ends outside. This is now detected:

[Le stragi nascoste. L'armadio della vergogna...] Error: {{Lang}}: text has italic markup (help)

—Trappist the monk (talk) 13:19, 17 April 2018 (UTC) }}

Question on markup of romanized CJK, etc.

So very much has changed with this template system (with not all the documentation being updated yet, or even all templates in the series being patched to use the new code), I want to just ask outright:

Is there now a canonical way to markup something like ''[[dōjin]]'' as romanized Japanese, and if so, is this method likely to remain stable for the foreseeable future? — SMcCandlish ☏ ¢ 😼 18:35, 1 June 2018 (UTC)

{{lang|ja-Latn|[[dōjin]]}} → [[dōjin]] → dōjin

{{langx|ja|script=Latn|[[dōjin]]}} →

[[[dōjin]]] <span style="color:#d33">Error: {{Langx}}: invalid parameter: &#124;script= ([[:Category:Lang and lang-xx template errors|help]])</span>

→ [[[dōjin]]] Error: {{Langx}}: invalid parameter: |script= (help)

—Trappist the monk (talk) 21:29, 1 June 2018 (UTC)

Nice. — SMcCandlish ☏ ¢ 😼 12:23, 5 June 2018 (UTC)

ISO codes not working: hant, hans

The codes for Traditional Chinese and Simplified Chinese – hant and hans, respectively – are making the {{lang}} template cough up a big red error. — SMcCandlish ☏ ¢ 😼 12:22, 5 June 2018 (UTC)

Where? These seem to work:

{{lang|zh-hant|正體字/繁體字}} → 正體字/繁體字

{{lang|zh-hans|简化字}} → 简化字

{{lang-zh|script=hant|正體字/繁體字}} → Chinese: 正體字/繁體字

{{lang-zh|script=hans|简化字}} → Chinese: 简化字

—Trappist the monk (talk) 12:32, 5 June 2018 (UTC)

Ah, I was doing {{lang|hant}}; the codes provided at the articles didn't have the zh- part. Probably need revision for clarity. — SMcCandlish ☏ ¢ 😼 12:36, 5 June 2018 (UTC)

Legitimate italics inside

Another example of this template's attempts to be too clever by half is when the foreign language text is unitalicised on the whole, but contains an italic element. Consider the |quote=-part of {{Cite book}}:

|quote=[... ein neues Opernprojekt in Angriff: Das Käthchen von Heilbronn, nach Heinrich von Kleists gleichnamigem Drama.] Error: {{Lang}}: text has italic markup (help)

At the moment, the only solution is to remove {{Lang}} altogether in such situations, right? -- Michael Bednarek (talk) 01:43, 16 April 2018 (UTC)

|quote=

{{Lang|de|... ''ein neues Opernprojekt in Angriff: ''Das Käthchen von Heilbronn'', nach Heinrich von Kleists gleichnamigem Drama''.|italic=unset}}

|quote=... ein neues Opernprojekt in Angriff: Das Käthchen von Heilbronn, nach Heinrich von Kleists gleichnamigem Drama.

It's in the documentation.

—Trappist the monk (talk) 02:50, 16 April 2018 (UTC)

Trappist the monk: I just came here to ask the exact same question. There is no explanation of |italic=unset in the documentation. Curly "JFC" Turkey 🍁 ¡gobble! 10:54, 27 May 2018 (UTC)

For {{lang}}, are you looking somewhere other than here and here? Many of the {{lang-??}} templates also have similar tables (see Template:lang-fr#Parameters for example).

—Trappist the monk (talk) 11:13, 27 May 2018 (UTC)

You kindly go over articles making changes, why not that one also? I fixed Ernst Pepping, where you fixed duplicate markup, but left two glaring red error messages. --Gerda Arendt (talk) 14:33, 17 April 2018 (UTC)

Because the script is intended to pluck off the low hanging fruit? It does not cannot should not 'fix' everything that editors write; sometimes what they write is correct; sometimes not. The script is a simple-minded machine; it does not know if the editor correctly placed the italic markup in:

{{lang|la|[[Missa Dona nobis pacem|Missa ''Dona nobis pacem'']]}}

so that template got skipped. Had the template been written:

{{lang|la|[[Missa Dona nobis pacem|''Missa Dona nobis pacem'']]}}

it would know that the italic markup is extraneous and would have removed it.

—Trappist the monk (talk) 14:54, 17 April 2018 (UTC)

How about that script assuming in good faith that the italic markup is correct? When I used the lang-template so far, I always placed italics "outside" when they were for the complete phrase, but inside when not (Bach works or this). I noticed that "unset" is now added to Bach's works, - thank you! One thing less I have to do. There are still all the hymns and poems that shouldn't be italic, and now Michael Bednarek said movement titles also shouldn't be italic, see BWV 54. --Gerda Arendt (talk) 16:02, 17 April 2018 (UTC)

Of the nearly ten thousand articles that the script has touched, many many included markup that to a human is obviously wrong. Because the script is mindless, it cannot necessarily distinguish that which humans see as the obviously incorrect from that which humans see as the obviously correct. It is better that the script ignores that which it cannot correct. The BWV 'fixes' were/are something that is easy for the script to fix because many/most of those template instances follow a common form. Any template instances that do not follow the common form have been ignored. Do not blame the script for doing anything wrong when it removed the italic markup from the movement titles in BWV 54. Someone thought it correct that movement titles should be italicized (contrary to the current version of MOS:MUSIC#Classical music titles) else they would not have added the italic markup around those templates. I have written this before: the script is not, cannot be, smart enough to know that a {{lang}} template wraps a minor title so it cannot shall not 'correct' templates in that way.

—Trappist the monk (talk) 16:50, 17 April 2018 (UTC)

Sorry about my English. I didn't "blame" the script for BWV 54. I just learned today (after 7 years of only following examples, not guidelines) about movement titles not italic, which adds to my workload, not the script's. - Back to AGF: Missa and then some italic name might be also a "common form", same for symphony with a name before or after. --Gerda Arendt (talk) 17:47, 17 April 2018 (UTC)

This search for 'missa' at the start of {{lang}} {{{2}}} provides a handful of results (28 when I did the initial search); a lot of them with external italic markup. Running the same search with 'symphony' in place of 'missa' yielded zero results.

—Trappist the monk (talk) 18:20, 17 April 2018 (UTC)

Thanks for looking. Pierre de la Rue: several of these "Missa + some name" (which was the common way to name masses at the time), but firstly, they failed to keep Missa straight (as generic), and secondly, they failed to use the lang-template. --Gerda Arendt (talk) 18:27, 17 April 2018 (UTC)

possible alternate solution

It occurred to me that it might be a handy thing to give the module the ability to do what I did in my first reply to the parent topic above. Module:lang/sandbox now supports |italic=invert so that this:

|quote={{Lang/sandbox|de|... ein neues Opernprojekt in Angriff: ''Das Käthchen von Heilbronn'', nach Heinrich von Kleists gleichnamigem Drama.|italic=invert}}

renders like this:

|quote=... ein neues Opernprojekt in Angriff: Das Käthchen von Heilbronn, nach Heinrich von Kleists gleichnamigem Drama.

I suspect that for many editors it is easier to identify and markup the bits of text that are emphasized than it is to markup the whole to get inverted emphasis.

Keep or discard?

—Trappist the monk (talk) 10:26, 3 May 2018 (UTC)

Keep, and thanks for the thinking, but I won't use it. I see it in every DYK credit, and it looks confusing. - Sometimes I think we overestimate our readers' familiarity with our rather complex rules for italics. When something is obviously a quote, why italics on top? --Gerda Arendt (talk) 10:55, 3 May 2018 (UTC)

How can you see it in every DYK credit? |italic=invert is a new parameter only available in {{lang/sandbox}} and the handful of {{lang-??/sandbox}} templates.

When something is obviously a quote, why italics on top? Because MOS:FOREIGNITALIC and Wikipedia:Manual of Style#Foreign-language quotations. If you have an issue with the rule then the place to take that up is at WT:MOSTEXT or WT:MOS.

—Trappist the monk (talk) 11:33, 3 May 2018 (UTC)

There having been no objections, implemented in the live module.

—Trappist the monk (talk) 12:05, 27 May 2018 (UTC)

Asking for a flag to disable this "feature"

If this "Italic Campaign" is going to stay with us, please just give us a flag to tell the template "I know what I am doing, shut up and let me use markup". I am currently trying to use the lang template for Hittite.[1] Hittite, depending on the transliteration scheme, uses mixed italic and non-italic to distinguish Sumerograms from Akkadian loans within Hittite text. You simply cannot let the template police this usage of italics, it needs to be entered manually. If nothing else, please at least disable any italics related stuff if the language is Hittite, Ancient Egyptian, Akkadian or Sumerian. --dab (𒁳) 06:18, 12 June 2018 (UTC)

|italic=unset:

{{lang|hit-latn|mixed ''italic'' and normal font ''text''|italic=unset}}

mixed ''italic'' and normal font ''text''

mixed italic and normal font text

{{langx|hit|script=latn|mixed ''italic'' and normal font ''text''|italic=unset}}

[mixed ''italic'' and normal font ''text''] <span style="color:#d33">Error: {{Langx}}: invalid parameter: &#124;script= ([[:Category:Lang and lang-xx template errors|help]])</span>

[mixed italic and normal font text] Error: {{Langx}}: invalid parameter: |script= (help)

It's in the documentation. Does |italic=unset not work for you?

—Trappist the monk (talk) 09:49, 12 June 2018 (UTC)

Gurmukhī script code

I just tried marking some Punjabi as being in Gurmukhī script and I got an error. pa-Guru should be accepted as a legitimate IETF language tag; we shouldn't assume Gurmukhī is an implicit default:

{{lang|pa|मतलब}} → मतलब

{{lang|pa-Guru|मतलब}} → [मतलब] Error: {{Lang}}: script: guru not supported for code: pa (help)

Thanks. — OwenBlacker (talk; please {{ping}} me in replies) 09:57, 17 June 2018 (UTC)

From language-subtag-registry file:

Type: language
Subtag: pa
Description: Panjabi
Description: Punjabi
Added: 2005-10-16
Suppress-Script: Guru

See BCP 47 - Tags for Identifying Languages §3.1.9 for the definition of the Suppress-Script record. Module:Lang emits an error message for pa-Guru because IANA have chosen to suppress that script subtag for the Punjabi language tag. The only subtag that effects how {{lang}} renders a language text is Latn; this to comply with MOS. No other script subtags effect how {{lang}} renders the language text. Because the browsers and screen readers depend upon correctly formatted lang= attributes, {{lang}} does not create a non-compliant ... subtag.

—Trappist the monk (talk) 10:43, 17 June 2018 (UTC)

@Trappist the monk: Oh of course facepalms. Thank you; that's really obvious now it's been pointed out to me (again). Thank you. — OwenBlacker (talk; please {{ping}} me in replies) 13:47, 29 June 2018 (UTC)

How to use the template with mixed words?

I was trying to use the template {{lang|es}} with the word "Guatemala" in [[Guatemala City]] but how can I use it without breaking the link and given that the word "City" is not Spanish? Can it be used in [[El Rodeo, Escuintla|El Rodeo]] and [[Escuintla Department|Escuintla]] and if so, how? Thinker78 (talk) 03:16, 6 June 2018 (UTC) Edited 03:31, 6 June 2018 (UTC)

How about [[Guatemala City|{{lang|es|italic=no|Guatemala}} City]] → Guatemala City? – Jonesey95 (talk) 03:35, 6 June 2018 (UTC)

@Thinker78: Jonesy95 has the right solution (and sometimes you have to add |nocat=y, because the template won't render right in an image caption, etc., without it). But that's not appropriate in this case because a) Guatemala is a name assimilated into English (with not very close to the Spanish pronunciation), and b) "Guatemala City" isn't a Spanish construction (like Ciudad Guatemala, nor a mixed-language one (like "Munich (München in German)"), it's simply English, albeit with a name borrowed straight from Spanish without spelling alteration. Similarly, we do not apply such markup to proper names of people and other things just because they didn't originate in English; the lead at Enrique Peña Nieto does not begin with '''{{lang|es|italic=unset|Enrique Peña Nieto}}''' ... – and it shouldn't. — SMcCandlish ☏ ¢ 😼 14:37, 29 June 2018 (UTC)

PS: In fairness, some might actually argue that Enrique Peña Nieto should have the markup, since it might lead to more correct pronunciation in a screen reader (though this still wouldn't affect the "Guatemala City" case). That idea would surely need a broad RfC, and be backed up with proof that it actually works. Even then, I think the proposal would fail, because it's not an accessibility requirement but an enhancement, and it would notably complicate the wikimarkup. I don't think the editorship at large would go for it. We're not even using <dfn>Enrique Peña Nieto</dfn> for title markup in lead's opening sentence yet (though of course if we moved to such semantic markup, we'd do it with a short-named template). — SMcCandlish ☏ ¢ 😼 15:37, 29 June 2018 (UTC)

Italicisation of Halkomelem

Moved from Module talk:Language#Italicisation of Halkomelem.

The language Halkomelem uses Americanist phonetic notation as a writing system, which is based on the Latin script. But {{lang|hur|...}} doesn't italicise text. I've not quite worked out where I'd need to edit things anymore and I'm pretty sure I don't have the right permissions (Template editor, I assume?), so could someone please cause hur to be italicised? — OwenBlacker (talk; please {{ping}} me in replies) 23:02, 17 May 2018 (UTC)

@Trappist the monk and Eru·tuon: Any thoughts on this one, btw? — OwenBlacker (talk; please {{ping}} me in replies) 13:56, 29 June 2018 (UTC)

This search found these (some had |italic=yes, others had external italic wikimarkup – both removed here):

lá:yelhpꭥ
Quw̓utsun̓ – uses U+0313 COMBINING COMMA ABOVE, a combining diacritical marks character
Hul̓q̓umín̓um̓ / hən̓q̓əmin̓əm̓ – these use U+0259 LATIN SMALL LETTER SCHWA, an IPA extension character
xʷməθkʷəy̓əm – uses U+0259, U+0313; U+03B8 GREEK SMALL LETTER THETA, a Greek and Coptic character; U+02B7 MODIFIER LETTER SMALL W, a spacing modifier letter character
hən̓q̓əmin̓əm – uses U+0259, U+0313;
Hul’q’umi’num’/Halq'eméyle/hən̓q̓əmin̓əm – uses U+0259, U+0313; U+2019 'RIGHT SINGLE QUOTATION MARK', a general punctuation character
sc̓əwaθən məsteyəxʷ – uses U+0259, U+0313, U+03B8, U+02B7
c̓əsnaʔəm – uses U+0259, U+0313; U+0294 LATIN LETTER GLOTTAL STOP, an IPA extension character

Halkomelem may be based of the Latin script but clearly it uses characters that are not in the Latin script.

The {{lang}} auto-italics function is conservative. It will only italicize the enclosed text when all of the characters of that text are members of the Latn script plus the very common characters mdash, ndash, and guillemets.

There are several options for italicizing Halkomelem text:

{{lang|hur-Latn|Quw̓utsun̓}} → Quw̓utsun̓

{{lang|hur|Quw̓utsun̓|italic=yes}} → Quw̓utsun̓

''{{lang|hur|Quw̓utsun̓}}'' → Quw̓utsun̓

{{lang-hur}} italicizes by default so there is another option:

{{langx|hur|Quw̓utsun̓}} → Halkomelem: Quw̓utsun̓ → {{langx|hur|Quw̓utsun̓|label=none}} → Quw̓utsun̓

—Trappist the monk (talk) 15:09, 29 June 2018 (UTC)

Something like the getScript function from Module:Language/scripts would help in many cases. It looks up the Unicode scripts for the characters in a string and returns the one script that is not ignorable (if there is only one). Zinh (Inherited) and Zyyy (Common) are the most common scripts that need to be ignored. To show how this works, this is the script breakdown of the examples above. Latn (7), Zyyy (1) means the example has 7 Latn characters and 1 Zyyy character.

Script error: No such module "Language/scripts".

Most have only Latin (Latn) and various ignorable scripts beginning in Z, so getScript would return Latn. So if the function were used, the only examples that would need manual italicization would be xʷməθkʷəy̓əm and sc̓əwaθən məsteyəxʷ because they contain a Greek (Grek) character. (Perhaps that could be handled by returning the "majority" unignorable script, but that might have unintended consequences in some cases.)

The current strategy is to add some characters with ignorable script codes, such as em- and en-dash, to the Latin script pattern. But there are a lot of these characters, so it would be more robust to use full script detection. — Eru·tuon 17:06, 29 June 2018 (UTC)

Methinks there's a bug in your code or I don't understand why certain Latn characters are 'ignorable' (I'm sure I don't understand inherit). My example above:

{{lang|hur|lá:yelhp}} → lá:yelhp

correctly italicized the text because all characters in it are members of the Unicode Latn character set. If I remove the colon from the example text and apply showScripts, this:

{{#invoke:Language/scripts|showScripts|lá:yelhp|láyelhp}}

Script error: No such module "Language/scripts".

This is an astonishing result.

—Trappist the monk (talk) 17:31, 29 June 2018 (UTC)

If there aren't a lot of languages for which this "not quite pure Latin alphabet" issue comes up, it might be most expedient to just make hard-coded exceptions for them. They're "Latin enough" that MOS:FOREIGN will want them italicized, but it looks like the back-end detection is doing something weird. — SMcCandlish ☏ ¢ 😼 17:51, 29 June 2018 (UTC)

@Trappist the monk: The colon (:) isn't actually a member of the Unicode Latin script (Latn). It's a Common-script character (Zyyy). I should clarify: above I was not showing the output of getScript, but rather of a function that counts the number of codepoints in a string that belong to each script. getScript takes that output, removes ignorable scripts, and if there is only one script left, returns that script. So it would return Latn for lá:yelhp, but would return nothing for xʷməθkʷəy̓əm because it contains the one pesky Greek character θ along with all the Latin characters. I suppose I should create a clearer demonstration of that process. — Eru·tuon 18:12, 29 June 2018 (UTC)

For an explanation of Inherited (Zinh) and Common (Zyyy), see Unicode® Standard Annex #24, Unicode Script Property, section 2.4. In Hul̓q̓umín̓um̓ / hən̓q̓əmin̓əm̓, the slash and space are Common characters and the combining comma is Inherited. In text formatted by {{lang}}, probably many of the Common characters encountered will be punctuation and spacing characters, and most of the Inherited characters will be diacritics. — Eru·tuon 19:56, 29 June 2018 (UTC)

I've created an is_Latin function in Module:Unicode data. (Not sure whether to name it that or is_Latn.) It returns true if all characters in a string belong either to the Latn script, or to the not-really-a-script scripts Zinh (Inherited), Zyyy (Common), Zzzz (Uncoded), which notably include punctuation, spaces, combining, and unassigned characters. For examples, see Module talk:Unicode data/testcases.

Except for U+AB65 ꭥ GREEK LETTER SMALL CAPITAL OMEGA, any character recognized as Latin by the current function (is_latn) will be recognized by the new function as well. The new function will allow additional characters that are used in Latin-script text, such as the combining comma (Zinh) in the Halkomelem examples above, and the interpunct (Zyyy) found, for instance, in Catalan col·legi.

These are the characters matched by the pattern used by the current function, and the Unicode scripts that they belong to:

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňŉŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃǄǅǆǇǈǉǊǋǌǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰǱǲǳǴǵǶǷǸǹǺǻǼǽǾǿȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗȘșȚțȜȝȞȟȠȡȢȣȤȥȦȧȨȩȪȫȬȭȮȯȰȱȲȳȴȵȶȷȸȹȺȻȼȽȾȿɀɁɂɃɄɅɆɇɈɉɊɋɌɍɎɏḀḁḂḃḄḅḆḇḈḉḊḋḌḍḎḏḐḑḒḓḔḕḖḗḘḙḚḛḜḝḞḟḠḡḢḣḤḥḦḧḨḩḪḫḬḭḮḯḰḱḲḳḴḵḶḷḸḹḺḻḼḽḾḿṀṁṂṃṄṅṆṇṈṉṊṋṌṍṎṏṐṑṒṓṔṕṖṗṘṙṚṛṜṝṞṟṠṡṢṣṤṥṦṧṨṩṪṫṬṭṮṯṰṱṲṳṴṵṶṷṸṹṺṻṼṽṾṿẀẁẂẃẄẅẆẇẈẉẊẋẌẍẎẏẐẑẒẓẔẕẖẗẘẙẚẛẜẝẞẟẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặẸẹẺẻẼẽẾếỀềỂểỄễỆệỈỉỊịỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞởỠỡỢợỤụỦủỨứỪừỬửỮữỰựỲỳỴỵỶỷỸỹỺỻỼỽỾỿⱠⱡⱢⱣⱤⱥⱦⱧⱨⱩⱪⱫⱬⱭⱮⱯⱰⱱⱲⱳⱴⱵⱶⱷⱸⱹⱺⱻⱼⱽⱾⱿ꜠꜡ꜢꜣꜤꜥꜦꜧꜨꜩꜪꜫꜬꜭꜮꜯꜰꜱꜲꜳꜴꜵꜶꜷꜸꜹꜺꜻꜼꜽꜾꜿꝀꝁꝂꝃꝄꝅꝆꝇꝈꝉꝊꝋꝌꝍꝎꝏꝐꝑꝒꝓꝔꝕꝖꝗꝘꝙꝚꝛꝜꝝꝞꝟꝠꝡꝢꝣꝤꝥꝦꝧꝨꝩꝪꝫꝬꝭꝮꝯꝰꝱꝲꝳꝴꝵꝶꝷꝸꝹꝺꝻꝼꝽꝾꝿꞀꞁꞂꞃꞄꞅꞆꞇꞈ꞉꞊ꞋꞌꞍꞎꞏꞐꞑꞒꞓꞔꞕꞖꞗꞘꞙꞚꞛꞜꞝꞞꞟꞠꞡꞢꞣꞤꞥꞦꞧꞨꞩꞪꞫꞬꞭꞮꞯꞰꞱꞲꞳꞴꞵꞶꞷꞸꞹꞺꞻꞼꞽꞾꞿꟀꟁꟂꟃꟄꟅꟆꟇꟈꟉꟊꟋꟌꟍ꟎꟏Ꟑꟑ꟒ꟓ꟔ꟕꟖꟗꟘꟙꟚꟛꟜ꟝꟞꟟꟠꟡꟢꟣꟤꟥꟦꟧꟨꟩꟪꟫꟬꟭꟮꟯꟰꟱ꟲꟳꟴꟵꟶꟷꟸꟹꟺꟻꟼꟽꟾꟿꬰꬱꬲꬳꬴꬵꬶꬷꬸꬹꬺꬻꬼꬽꬾꬿꭀꭁꭂꭃꭄꭅꭆꭇꭈꭉꭊꭋꭌꭍꭎꭏꭐꭑꭒꭓꭔꭕꭖꭗꭘꭙꭚ꭛ꭜꭝꭞꭟꭠꭡꭢꭣꭤꭥꭦꭧꭨꭩ꭪꭫꭬꭭꭮꭯ﬀﬁﬂﬃﬄﬅﬆ！＂＃＄％＆＇（）＊＋，－．／０１２３４５６７８９：；＜＝＞？＠ＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺ［＼–—«»

Latn (1022), Zyyy (105), Zzzz (29), Grek (1)

Besides the lone Grek character, U+AB65 ꭥ GREEK LETTER SMALL CAPITAL OMEGA, the characters here belong to scripts that are accepted as Latin by the new function. — Eru·tuon 22:07, 30 June 2018 (UTC)

Why does U+AB65 fail when it is part of Latin Extended-E?

I'm inclined to try your code in the live version of the module, and without objection, in a day or two, I'll do that.

—Trappist the monk (talk) 12:06, 1 July 2018 (UTC)

Because not all codepoints in a block whose name contains a script name belong to that script. Latin Extended-E contains Latn (52), Zzzz (10), Zyyy (1), Grek (1) according to my module function. Basic Latin contains Zyyy (76), Latn (52), so it's not majority Latin, though Latin is the most common actual script, as Zyyy is a special script code (i.e., not really a script). See User:Erutuon/Unicode for the scripts in some other blocks. — Eru·tuon 19:45, 1 July 2018 (UTC)

For curiosity's sake, this is the number of officially Latin characters per block for those blocks that contain them:

Basic Latin (U+0000-U+007F): 52 – ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef...
Latin-1 Supplement (U+0080-U+00FF): 64 – ªºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ...
Latin Extended-A (U+0100-U+017F): 128 (all) – ĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğ...
Latin Extended-B (U+0180-U+024F): 208 (all) – ƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟ...
IPA Extensions (U+0250-U+02AF): 96 (all) – ɐɑɒɓɔɕɖɗɘəɚɛɜɝɞɟɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯ...
Spacing Modifier Letters (U+02B0-U+02FF): 14 – ʰʱʲʳʴʵʶʷʸˠˡˢˣˤ
Phonetic Extensions (U+1D00-U+1D7F): 111 – ᴀᴁᴂᴃᴄᴅᴆᴇᴈᴉᴊᴋᴌᴍᴎᴏᴐᴑᴒᴓᴔᴕᴖᴗᴘᴙᴚᴛᴜᴝᴞᴟ...
Phonetic Extensions Supplement (U+1D80-U+1DBF): 63 – ᶀᶁᶂᶃᶄᶅᶆᶇᶈᶉᶊᶋᶌᶍᶎᶏᶐᶑᶒᶓᶔᶕᶖᶗᶘᶙᶚᶛᶜᶝᶞᶟ...
Latin Extended Additional (U+1E00-U+1EFF): 256 (all) – ḀḁḂḃḄḅḆḇḈḉḊḋḌḍḎḏḐḑḒḓḔḕḖḗḘḙḚḛḜḝḞḟ...
Superscripts and Subscripts (U+2070-U+209F): 15 – ⁱⁿₐₑₒₓₔₕₖₗₘₙₚₛₜ
Letterlike Symbols (U+2100-U+214F): 4 – KÅℲⅎ
Number Forms (U+2150-U+218F): 41 – ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿ...
Latin Extended-C (U+2C60-U+2C7F): 32 (all) – ⱠⱡⱢⱣⱤⱥⱦⱧⱨⱩⱪⱫⱬⱭⱮⱯⱰⱱⱲⱳⱴⱵⱶⱷⱸⱹⱺⱻⱼⱽⱾⱿ
Latin Extended-D (U+A720-U+A7FF): 158 – ꜢꜣꜤꜥꜦꜧꜨꜩꜪꜫꜬꜭꜮꜯꜰꜱꜲꜳꜴꜵꜶꜷꜸꜹꜺꜻꜼꜽꜾꜿꝀꝁ...
Latin Extended-E (U+AB30-U+AB6F): 52 – ꬰꬱꬲꬳꬴꬵꬶꬷꬸꬹꬺꬻꬼꬽꬾꬿꭀꭁꭂꭃꭄꭅꭆꭇꭈꭉꭊꭋꭌꭍꭎꭏ...
Alphabetic Presentation Forms (U+FB00-U+FB4F): 7 – ﬀﬁﬂﬃﬄﬅﬆ
Halfwidth and Fullwidth Forms (U+FF00-U+FFEF): 52 – ＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺａｂｃｄｅｆ...

Some pretty odd things are classified as Latin: IPA characters, other phonetic symbols, Egyptological symbols (Ꜣ), rarely-used Roman numerals (Ⅳ), and a whole lot of stuff that I know nothing about.

In Basic Latin, only A-Z and a-z are classified as Latin, which adds up to 52 (26 × 2). Everything else in the block is Zyyy. — Eru·tuon 19:37, 3 July 2018 (UTC)

Shorter way to turn off italicization

|italics=unset takes a lot of space in articles with lots of tables, like Proto-Italic language. At the moment there are 876 instances of |italic=unset or |italics=unset (13 to 14 bytes per template) there, for a total of at least 11388 bytes.

There are two tactics for saving space that I can think of. Adding the alias |i= and changing |italic[s]=unset to |i=unset would bring this down to 7008 bytes (that is, 8 bytes per template). Creating a template that doesn't add italics (like {{lang|..|..}}) could save more. For instance, if we named such a template {{langu}}, it would save 12 to 13 bytes per template over {{lang}} with |italic[s]=unset. That would complicate maintenance somewhat by adding yet another template name.

A draft of a way of including |i= and checking that at most one italics parameter has a non-empty value is in Module:Lang/sandbox. — Eru·tuon 19:13, 8 July 2018 (UTC)

A third method is to create templates to help generate these tables, or module functions. For instance, I created module functions to make it easier to add language-tagging and other stuff to tables of doublets on Wiktionary: see wikt:Category:Lists of doublets for examples. That takes more work, though. — Eru·tuon

I don't think that I support |i=. Short single-character parameter names may be fine for those who have knowledge of the template, for those who have read the documentation; not so much for those who haven't, or won't. For them, the parameter |i=unset is pretty much inscrutable. At least with |italic=unset, the editor knows that the unset value has something to do with how the template renders its display.

A template that specifically sets |italic=unset might be the better solution if space is critical. I'm not sure that space is critical. I get that it is an extra parameter to set so for large quantities of template that all have the same settings, perhaps a template {{itcxp}} which internally might look like this:

{{#invoke:lang|lang|italic=unset|itc-x-proto|text={{{1}}}}}

so you write:

{{itcxp|text}}

And, of course, the argument that I made against |i= can be applied here: what the hell does itcxp mean? We have the space, there is no real need for cryptic template and parameter names other than editor convenience.

—Trappist the monk (talk) 00:02, 9 July 2018 (UTC)

I'm used to short parameter names (as well as short template names) because a lot of them are used very often on Wiktionary. As an example, this is how you'd create an overly verbose link to the nonexistent anchor #Ancient_Greek-noun on wikt:λόγος with annotations (transliteration, gender, translation, part of speech) in parentheses following it: {{m|grc|λόγος|id=noun|tr=lógos|g=m|t=speech, reason|pos=noun}}. The parameter names are cryptic, but having short parameter names makes the content of the parameters more salient when you are reading the wikitext. This particular set of parameters is used in many templates on Wiktionary. All that is not to say that Wikipedia should be like Wiktionary, but to give some context. (Wikipedia has a lot more and more varied templates, so I suppose short parameter names have to be used sparingly, if at all.)

|i= seems clear to me, because it reminds me of the HTML tag that adds italics. And at least at the moment, the only parameter names start with i are |italic= and |italics=, so maybe it wouldn't be hard to guess that it is an abbreviation of one of those.

While writing {{wikt-lang}}, I first added |italics=no to turn off italics (equivalent to {{lang}}'s |italics=unset), but eventually added the shortcut |i=-, because I realized that in some articles italicization would need to be turned off so often that the long parameter name would be distracting. (I was starting work on List of Latin and Greek words commonly used in systematic names when I added the shorter parameter alias. Other examples are Germanic strong verb and Help:IPA/Estonian and Finnish.) Using the hyphen to signify removal of italics was inspired by a parameter on Wiktionary, |tr=-, which turns off automatic transliteration in many templates. I don't know how obvious the meaning would be to the average Wikipedia editor. — Eru·tuon 05:00, 9 July 2018 (UTC)

There's not anything wrong with short parameter names as aliases to the longer ones; we can just have a bot convert them to the long ones, or add it to WP:GENFIXES. Lots and lots of templates have shortcut parameter aliases. — SMcCandlish ☏ ¢ 😼 02:38, 9 July 2018 (UTC)

Well, short parameter names help with typing (though I have mostly been using find-and-replace in my recent edits adding {{lang}}), but I envisioned them as a way to make the wikitext easier to read. Having a bot replace them with the longer ones would defeat that purpose. I guess it wouldn't hurt to lengthen parameter names in most articles, but to use the shorter parameter names in articles that have an exorbitant number of them. (Maybe that would be hard to enforce.) — Eru·tuon 05:00, 9 July 2018 (UTC)

Yeah, en.WP and en.Wikt have different editorial models and communities. There's no equivalent of WP:BITE at Wiktionary that I know of, it's harder to write Wiktionary well (its wikisource is much more formulaic, and more like computer code than prose), and at en.WP we have a long tradition here of clarifying (and usually lengthening) cryptic template goop toward plain English. E.g., there is already a bot that converts template shortcuts like {{cn}} to the actual template names ({{citation needed}} in that case). The shortened forms are for convenience of insertion. But that's no reason not have them in the first place. — SMcCandlish ☏ ¢ 😼 05:17, 9 July 2018 (UTC)

Thanks, that helps clarify things. So a short parameter name wouldn't be the way to achieve conciser wikitext on Wikipedia, but a separate template might be okay. — Eru·tuon 18:33, 9 July 2018 (UTC)

Can't help thinking that it would have been better to leave {{lang}} as it was and create a seperate template for when italics are wanted. For "my" articles, they are not wanted in the majority of cases. --Gerda Arendt (talk) 18:46, 9 July 2018 (UTC)

ps: just look at a random historic version, full of error messages because of the change, [2]. --Gerda Arendt (talk) 18:50, 9 July 2018 (UTC)

In this edit of this talk page (the whole archived conversation), and again in this edit (the whole archived conversation), I suggested a minor-title template as a {{lang}} wrapper that specifies |italic=unset. The first was rejected by another editor and you ignored the second. I have not withdrawn the offer, it remains on the table, you need only take it up.

—Trappist the monk (talk) 19:15, 9 July 2018 (UTC)

Much in the discussions is way above the language I'd understand (minor-title template? wrapper?). I hardly have the time to write the articles I want to, and then people die whose articles are unreferenced, and I am busy for hours with unplanned stuff. Sorry to be of little help help here. --Gerda Arendt (talk) 20:53, 9 July 2018 (UTC)

It does sound like a good idea, Trappist. — SMcCandlish ☏ ¢ 😼 21:32, 9 July 2018 (UTC)

Those suggestions would not have avoided the problems that Gerda illustrated. Gerda's suggestion to create a separate template for very different functionality and behaviour remains valid, although overrun by history. -- Michael Bednarek (talk) 01:54, 10 July 2018 (UTC)

What? A separate template for very different functionality and behaviour when suggested by Editor Gerda Arendt is valid and acceptable, yet my minor-title template suggestion, which was and is a separate template for very different functionality and behaviour, is not acceptable?

—Trappist the monk (talk) 12:49, 10 July 2018 (UTC)