Jump to content

Module talk:Lang-zh/Archive 3

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 2Archive 3Archive 4Archive 5

Language tagging for pinyin again

I'm looking again at this, see #Language tagging for pinyin and some of the discussions in #Module rewrite. This wasn't done at the time as first no-one was maintaining the template, then when porting the template it was important not to introduce changes that might be considered bugs so I left it as it was in the pre-module template. Now seems a good time to revisit it.

So I've added it to the sandbox. Note this may introduce no visual change as it's at the HTML level. You might find it useful to use something like my CSS to colour in foreign languages as this makes the change very apparent. I've also rearranged the testcases so it's easier to compare the current and sandbox versions side by side.

The tags I used are 'zh-Latn' for all of them except Bopomofo which uses 'zh-Bopo'. This is based on the documentation here and the lists here. I'm not at all sure of the Bopomofo one but it's not Latin and I can't find anything else.--JohnBlackburnewordsdeeds 01:19, 8 May 2014 (UTC)

A further note. Both 'zh-Latn' and 'zh-Bopo' are based on a two letter language + a four letter script. In #Module rewrite 'zh-latin' was suggested but I'm pretty sure that's wrong as scripts should be four characters. It explains why when I tried it I was seeing Chinese fonts - the browser was ignoring the non-standard 'latin', just seeing 'zh' so thinking it was Chinese.--JohnBlackburnewordsdeeds 02:39, 8 May 2014 (UTC)

You are right, it should be Latn not Latin, I think that was my error before. I think 'zh-Bopo' is also correct. Note that j, cy should not be 'zh-latn'. The Cantonese should be marked as 'yue' as per w3c list. Rincewind42 (talk) 08:50, 8 May 2014 (UTC)

I've changed both to 'yue', but it's unclear to me whether that or something else is best:

  • You can make the case for 'zh-Latn' as both are cases of Chinese Romanisation.
  • 'yue' is being used now in the sandbox and doesn't cause any problems but that doesn't mean it's being recognised, it may just being treated as unknown: Gwong²zau¹
  • WP elsewhere uses 'zh-yue', e.g. in the barely used {{ISO 639 name zh-yue}} (which gives 'Yue Chinese'), but {{lang|zh-yue|Gwong²zau¹}} has a similar problem to zh-latin, the browser seems not to recognise it and interprets it just zh so uses a Chinese font: [Gwong²zau¹] Error: {{Lang}}: unrecognized language tag: zh-yue (help).
  • Looking at the yue entry on that list, 'yue' as a language own gives 'zh' as the macrolanguage, so maybe adding a script tag to one or both describes it better
    • 'yue-Latn': Gwong²zau¹
    • 'zh-yue-Latn': [Gwong²zau¹] Error: {{Lang}}: unrecognized language tag: zh-yue-Latn (help)

As you can tell I'm struggling here to work out what makes sense. I've tried looking at it in different browsers but the only one that looks at all different is Chrome, which uses a different font for traditional Chinese. But except where noted all Romanisations look the same in Chrome, Firefox and Safari, all use the same font as the surrounding text.--JohnBlackburnewordsdeeds 21:28, 8 May 2014 (UTC)

I mused over this too for some time. The Iana website describes 'zh-yue' as 'redundant' and 'Preferred-Value: yue' but also describes 'zh-Hans' and 'zh-Hant' as 'redundant' with 'Preferred-Value: zh' too so I'm not quite sure what redundant means in this setting.
Note that there are several different places where 'yue' appears. There is a 'extlang: yue prefix: zh' which would give 'zh-yue' for use on Cantonese. In addition there is a 'Type: variant Subtag: jyutping prefix: yue'. My reading of this is that the j field should be marked up as 'zh-yue-jyutping'. I suppose if you follow the logic of 'language-extlang-script-region-variant-extension-privateuse' then 'zh-yue-latin-jyutping' would be the full and precise code, though there are redundancies in there.
As for testing on a browser, I'm not sure if all browsers get this right. There may be bugs in some/all browsers that mess you up even if you code is correct. There is a 'Type: variant Subtag: pinyin Prefix: zh-Latn Prefix: bo-Latn' so if we are precise, then pinyin would be 'zh-latn-pinyin' and the Wade-giles 'zh-latn-wadegile'. Rincewind42 (talk) 08:30, 9 May 2014 (UTC)

Found this: Choosing a Language Tag which has a lot of useful notes on how to use the w3c list. e.g.

For example, although BCP 47 explains that zh (the macrolanguage subtag for Chinese) doesn't actually specify which of the many, sometimes mutually unintelligible, dialects of Chinese is actually meant by this subtag, in practice convention overwhelmingly associates the macrolanguage subtag with the predominant language among the encompassed subtags - in this case, cmn (Mandarin Chinese). If your application identified Mandarin Chinese in the past using the language tag zh-CN (Chinese as used in Mainland China), or even just zh, you can continue to use zh in this way. Using cmn or cmn-CN may cause serious compatibility problems if the software or users expect a tag such as zh. If, on the other hand, you are using zh to refer to another Chinese dialect such as Hakka, you should use the language subtag hak instead.

So 'yue' is better than zh-yue or zh-anything for Cantonese/Yue.

The script section says they can be used "when the script adds some useful distinguishing information" which I guess applies here. 'yue-Latn' is Cantonese written with Latin characters, i.e. a Romanisation of Cantonese. This distinguishes it from 'yue-Hant' which might be used for written Cantonese (except almost all written Chinese is written vernacular Chinese where 'zh-Hans' or 'zh-Hant' is appropriate), and Latn emphasises to any browser paying attention that it's in Roman/Latin script.

By the same reasoning (especially as it's the example they use) Pe̍h-ōe-jī should probably be 'hak-Latn', or 'hak'.

Finally there are variants. Both pinyin and jyutping are classed as variants. Following the rules given for them (and pinyin is given as an example) 'zh-Latn-pinyin' and 'yue-jyutping' are both correct. You can also find 'zh-Latn-wadegile' in the list. But that means 'Latn' isn't needed with jyutping.

I'll try adding all that to the sandbox version, to see if the output is interpreted any differently by browsers.--JohnBlackburnewordsdeeds 12:38, 9 May 2014 (UTC)

It makes no visual difference in all three browsers I can test with which is good and as expected, but if anyone has a screenreader, browser or custom CSS that distinguishes between them it should help: the tags now are almost entirely different.--JohnBlackburnewordsdeeds 12:51, 9 May 2014 (UTC)


Languages

Here's the table from the sandbox, posted to get some feedback. All of these can be seen working in the testcases (though in all browsers tested they look the same unless you're using CSS to highlight languages).

local ISOlang = {
	["c"] = "zh",
	["t"] = "zh-Hant",
	["s"] = "zh-Hans",
	["p"] = "zh-Latn-pinyin",
	["tp"] = "zh-Latn",
	["w"] = "zh-Latn-wadegile",
	["j"] = "yue-jyutping",
	["cy"] = "yue",
	["poj"] = "hak",
	["zhu"] = "zh-Bopo",
}

The first three I think are obvious; the general 'Chinese of some sort (simplified or traditional) and the two for the particular ways of writing.

The documentation (Q&A and registry) says that 'cmn' or 'zh' can be used for Mandarin Chinese. 'zh' seems more common and consistent with other usage here so I've used it. The documentation actually spells out how to do pinyin and Wade Giles so I'm pretty sure of those. The only one I'm unsure of is tp, Tongyong Pinyin. It could be the same as pinyin, 'zh-Latn-pinyin', but perhaps that should be reserved for pinyin. Adding a region code 'zh-Latn-TW-pinyin' is another possibility except Taiwan no longer uses it so that might be confusing. 'zh-Latn' is certainly accurate, it's a Romanisation of Mandarin Chinese.

Chinese varieties other than Mandarin should not use zh but their own code. So 'yue' for Cantonese/Yue, 'hak' for Hakka. 'yue-jyutping' is again spelled out in the documentation, so I'm pretty sure of it. As it's not 'yue-Latn-jyutping' then 'yue-Latn' doesn't seem right/necessary, so I've used just 'yue' for Cantonese Yale and 'hak' similarly for Pe̍h-ōe-jī. Finally Bopomofo is also spelled out in the documentation.

So the ones I'm unsure of are tp, cy and poj; they all seem incomplete but it's unclear what else to add to them, the documentation seems silent on this. I welcome any suggestions and thoughts on these and the other tags.--JohnBlackburnewordsdeeds 16:32, 12 May 2014 (UTC)

I have double checked through the iso 15924 and iso 639 codes and I am quite happy with all of these. Tongyong Pinyin doesn't seem to have a registered variant. Not all variants have codes, only major or common ones do, so there isn't a problem here really. It shouldn't be tagged as pinyin because that is reserved for hanyu pinyin. Adding the TW for taiwan would be acceptable here as I don't think tongyong is used anywhere else. It still doesn't completely differentiate tongyong form hanyu pinyin but it does mark it as a variant latin transliteration. Looking through the various RFC discussions I found a similar query asking how to tag '[[1]] script'. The answer given was either to request an addition of a new varient to the codes or use one of two system to create a private code. For example 'zh-Latn-TW-x-tongyong'. Here it x- makes a private code. While using a private code doesn't have any real benefit as nobody external to us will understand it, the usage does produce a more semantic representation in defining tongyong as "not hanyu pinyin". At the end, I think what we have is sufficient and a major improvement on what was before. Rincewind42 (talk) 06:56, 14 May 2014 (UTC)
I think the '-x' only makes sense in a partially closed/private system. I.e. if you had a database storing language records and your own software to query it, private code would let you add information such as -tongyong, which your software would then understand, prefixed by -x so if for any reason you shared the data other software would ignore your private data. That doesn't apply here, so there's no benefit.
It's very possible the list of tags will be expanded in future to include further tags for Chinese variants and Romanisations; the registry includes entries added as recently as last month. So we should keep an eye on it and update the module if necessary, or if any other information comes to light such as how to better support particular browsers or other software.--JohnBlackburnewordsdeeds 12:27, 14 May 2014 (UTC)

Literary Chinese v Vernacular Chinese

Is written vernacular Mandarin being treated separate from written Classical Chinese? (ISO:lzh) -- 65.94.171.126 (talk) 07:04, 18 May 2014 (UTC)

There's no support for Classical Chinese in the template at the moment. Any examples of pages where Classical Chinese is used and where the template might be useful?--JohnBlackburnewordsdeeds 07:37, 18 May 2014 (UTC)
I should add (having had a quick look for examples) that there probably are few if any instances of Classical Chinese which would benefit from this template. For isolated instances of Chinese text, i.e. without other script systems and Romanisations, the {{lang}} template can be used, e.g. {{lang|lzh|日}} gives . It's the first time I've come across or used the 'lzh' tag so I suspect it's fairly rare, and should only be used for Classical Chinese to distinguish it from normal, i.e. vernacular, Chinese, as browser support may be limited. But it's certainly valid: lzh is a valid language tag.--JohnBlackburnewordsdeeds 07:47, 18 May 2014 (UTC)
Is there any use in a fine-grain tagging for individual words and names? It might be useful for phrases and larger units, though. Kanguole 09:12, 18 May 2014 (UTC)
It's the standard, and for good reason. A page has a language, specified in the header of the page. In the case of en.wp that's English. So foreign words should be tagged as foreign otherwise they'll be treated as English. There are many benefits to this.
Display is perhaps the main one: browsers can and do use the language information to determine how languages are displayed. Even if they don't users can do so for themselves using custom CSS. Someone studying Chinese for example might want to show simplified and traditional Chinese in different colours, to make it easier to tell them apart. Spell checkers and screen readers need to know the language of the text they're processing. Search engines often search by languages. Other tools and software can use the information. On Wikipedia pages containing non-English languages are categorised based on it, via this and other templates.
You might think most of this could be done based on the text, but that only goes so far, as Chinese like English has many uses, including in other languages like Japanese. And it's the short phrases and single characters which are the most likely to be ambiguous; the example I gave earlier of 日is a good example of this, and there are many more.--JohnBlackburnewordsdeeds 10:03, 18 May 2014 (UTC)
What I meant was, is there any point in tagging a word like 日 as lzh or written vernacular rather than just zh? Kanguole 10:58, 18 May 2014 (UTC)
I see. That's surely an editorial choice, and was sort of what I was asking for examples of. Further I think we only need to add lzh to the template if it will be used alongside other sorts of Chinese and Romanisations. E.g. If there were more than a few {{zh|c=...|lzh=...}}. If the Classical/Literary Chinese examples exist on their own then {{lang|lzh|...}} is simple and works already.--JohnBlackburnewordsdeeds 11:13, 18 May 2014 (UTC)

Font?

I notice that there's been a few changes being made here and there recently, but what's going on with the font right now? This probably has something to do with the new language codes being applied to the pinyin field. The pinyin field now uses a serif font similar to what you usually see on Chinese websites (and the Chinese Wikipedia). I think this isn't quite aesthetically appealing, though; think about it, the paragraph text is written entirely in a sans serif typeface, and then all of a sudden an italicized serif font appears out of nowhere. I think things that aren't broken shouldn't be changed, and the font worked fine before. --benlisquareTCE 05:44, 15 May 2014 (UTC)

P.S. I'm viewing these pages in Mozilla Firefox using default Windows 7 fonts. I haven't tested how these pages look using FOSS fonts yet, I'll give that a try once I get access to my Linux machine tonight. Currently, however, reading this (uneven heights) and this (non-uniform letter thickness) really makes me feel itchy and uncomfortable. --benlisquareTCE 05:49, 15 May 2014 (UTC)

(edit conflict)::I have the same issue as benlisquare. Makes the text hard to read and looks amateurish. Can we have the old font back please?  Philg88 talk 05:57, 15 May 2014 (UTC)

There has been no 'font' change .The template doesn't and didn't specified which fonts should be used. What has changed is the template now correctly specifies the language of all of its output, not just Chinese characters as before.
So before pinyin would be output as plain, italicised, text., i.e.
''zhōngguó'' to give zhōngguó
Now it applies a language tag to it, i.e.
''<span lang="zh-Latn-pinyin" xml:lang="zh-Latn-pinyin">zhōngguó</span>'' to give zhōngguó
It's not specifying any fonts, just correctly describing the language – zh-Latn-pinyin is the correct tag to use for pinyin. Browsers are able to recognise this and display it differently. Other software like screen readers can now handle it properly. It's a feature, not a bug, initially requested in #Language tagging for pinyin.
If your browsers are displaying it differently there will be a reason. Custom CSS is one (I use this colour all non-English text) but I can't see that either of you are doing anything like this. It may be a browser setting or some interaction between that and the operating system. I tested this myself with four or five browsers (Safari, Chrome, Opera, Firefox and the mobile version of Safari) and only saw a difference due to my custom CSS, no font difference. I also don't see anything like is described on zh.wp; I just see sans fonts there for Chinese and English, with no custom CSS or other settings changes.--JohnBlackburnewordsdeeds 11:21, 15 May 2014 (UTC)
Yes, must be Firefox picking up the zh-Latn-pinyin tag. Come to think of it, this is how all latin text is displayed at zh wiki so that must be the default encoding over there.  Philg88 talk 12:56, 15 May 2014 (UTC)
From here, it also appears to be a Firefox thing; the same thing doesn't happen with Chrome or IE. That said, Firefox does form a significant market share, so if something can be done about this, it would probably be constructive for readers. --benlisquareTCE 13:00, 15 May 2014 (UTC)
@Benlisquare: I added this to my vector.css "span[lang|=zh] { color: purple; font: arial; font-size: 18px}", which you can mess around with if it helps.  Philg88 talk 13:29, 15 May 2014 (UTC)
The pressing issue isn't exactly me though, since I could probably live with an odd-looking display of fonts. The majority of Firefox-using readers won't be able to tweak their vector.css, however, and this is what I was originally thinking about. --benlisquareTCE 14:39, 15 May 2014 (UTC)

Which version of Firefox are you seeing this with? I wonder if its an older version, as I'd expect recent versions to be fully compliant with the standard.

One problem encountered in testing is if you specify 'zh' without 'Latn' its interpreted as Chinese, even if there's additional information such as 'pinyin'. E.g. the following:

<span lang="zh-pinyin" xml:lang="zh-pinyin">zhōngguó</span> gives zhōngguó

which looks odd to me: not serif but the font is something like Helvetica Neue, and it seems to be using the same font as for simplified Chinese characters, STHeiti.

Using 'Latn', giving a full tag 'zh-Latn-pinyin', fixes this and is the recommended way to describe pinyin. But older software may not be recognising this and may just be seeing the 'zh' and so using Chinese fonts.--JohnBlackburnewordsdeeds 13:52, 15 May 2014 (UTC)

I'm using Firefox 29.0.1, which is the latest.  Philg88 talk 14:09, 15 May 2014 (UTC)
Firefox 29.0.1 here. When I'm logged in, I'm using the Monobook skin, but that doesn't seem to have anything to do with this issue, since if I log out, the pinyin appears exactly the same whilst in the Vector skin. All of the font display settings in Firefox are default, they've never been tinkered. It probably doesn't matter, but I'm running Windows 7 with a Japanese system locale. --benlisquareTCE 14:39, 15 May 2014 (UTC)
Using Firefox 29.0.1 myself, with Mac OS 10.9.2. You don't need to log out to test other skins: go to the testcases, Template:Zh/testcases, and expand the [show] for 'More information and options'. It has links to view the testcases with the various skins. This can be used with any page, but testcase pages supply the links for you. Note that since the module has been updated from the sandbox there's no difference between the two sides at the moment. I doubt the skin makes any difference; I had a look at their css and could see nothing related to language. Also the differences between them is much less that there used to be: I remember there being skins with serif body fonts and much more variation in styles but they've been removed.--JohnBlackburnewordsdeeds 14:50, 15 May 2014 (UTC)
I think the font in question is one that's bundled with Microsoft Windows, so you might be seeing something different to me. I can't recall the exact name of the font right now, but pretty much every single Chinese website (Sina.com, People's Daily, Xinhua, Chinese Wikipedia, etc) has alphanumeric characters display in it, at least in Mozilla Firefox and Google Chrome. (I think Internet Explorer 11 uses some MSHeiTi gothic font instead) As an example, here's People's Daily.

Now, as far as I know, this font does not properly support Latin letters with diacritics (i.e. French, Spanish, and Hanyu Pinyin), which means that Firefox grabs some other font to display those individual letters only. This is why some letters in pinyin are displayed with a different height when in normal style, and end up having different thicknesses when in italic style. --benlisquareTCE 15:39, 15 May 2014 (UTC)

And if you want to compare the default fonts that Internet Explorer 11 and Mozilla Firefox choose to use, here's the Chinese Wikipedia front page on Vector skin: left is IE11, right is FF. --benlisquareTCE 15:45, 15 May 2014 (UTC)

@benlisquare you said "I think things that aren't broken shouldn't be changed, and the font worked fine before." Maybe the font work find but the page was not fine. Tagging the language isn't just for loading the correct font but also for accessibility purposes. Screen readers, used by blind people can use the language tags to assist them in pronouncing the word correctly.

I don't use windows but rather Ubuntu Linux. I checked the display in both Chrome and Firefox and in Firefox I turned on tools/web developer/inspector/fonts to see what font was loaded where. For the English and pinyin it is loading the same font – "DejaVu Sans". For the Chinese characters Firefox loads "AR PL UKai CN". I have a Chinese windows 7 PC my wife uses so I will check that tonight and compare. It looks like this is a bug in the Windows version of Firefox. It might be visually fixed by adding a font-family style to the span outputted by the template to ensure a sans-serif font is loaded. Rincewind42 (talk) 04:59, 16 May 2014 (UTC)

Oh right, now I remember what the name of the font was - it's SimSun. Right now, Firefox renders all the pinyin in SimSun, anything within the lang-ja parameter of the {{nihongo}} template as MS PGothic, anything monospace as Courier New, and everything else in Arial. Previously, the pinyin would also be rendered in Arial, which properly supported the diacritics.

In case this information is relevant, simplified Chinese is being displayed in SimSun, traditional Chinese is displayed in PMingLiU (which also cannot properly display pinyin), and if anyone attempts to write pinyin in Georgia Italic (see #Abbreviating the writing systems section above), the diacritic letters are displayed in Times New Roman Italic. --benlisquareTCE 06:27, 16 May 2014 (UTC)

I checked on my wife's PC running Windows 7 (in Chinese mode) with IE11, the latest Firefox and Chrome. For some reason her version of Chrome uses a horrible courier like font for all of Wikipedia but doesn't vary the font for the pinyin. IE11 displays correctly too. However, in Firefox I did duplicate the bug that benlisquare is seeing. The pinyin is rendered in SimSun the same as the simplified Chinese which is not correct. In addition the Tongyong Pinyin and Wade-Giles is also displayed in SimSun.
This is a known bug in Windows Firefox only and really is Firefox's bug not ours. A report of the bug has existed on Bugzilla since 2009 but doesn't seem to have had attention from developers. Rincewind42 (talk) 15:07, 16 May 2014 (UTC)

Italicisation

I also believe that the now default italicisation of pinyin renders the article in a weird and inconsistent fashion, and would like to see it de-italicised. I don't see why |p= has italics as none of the other transliteration systems, like |j= and|w=, are italicised in the template. The browser issue is sidetracking the discussion as to the merits of italicisation. -- Ohc ¡digame! 03:17, 16 May 2014 (UTC)

Pinyin has always been italicised within {{zh}}, it's been like that since the very beginning. It just appears different now because on Firefox, the italics look strange with the new font. --benlisquareTCE 03:28, 16 May 2014 (UTC)
Yes; I can't find any indication of it it in the archives so it's been italicised for a very long time. It's been raised before that it's inconsistent that only pinyin and not the other romanisations (tp, w, j, cy, poj) are italicised, and I agree. It's an easy thing to add so I've added to the sandbox version for testing. Have a look at the testcases to see it. It would be as easy to remove it from all if that works better, but italics I think conforms to the style guide for non-English text, including foreign languages with accents like French, Italian.--JohnBlackburnewordsdeeds 04:14, 16 May 2014 (UTC)
The pinyin is in italics because Wikipedia:MOS#Foreign words and Wikipedia:MOS/Text formatting#Foreign terms as well as Wikipedia:MOS#No common usage in English says that all non-English words using the Latin script, other than proper nouns, like place names, or loanwords, should be in italics. Technically the Wade-Giles etc. should also be in italics. Doing so helps the text scan as you can differentiate the words from the labels clearly when |links=no is applied. Rincewind42 (talk) 04:45, 16 May 2014 (UTC)
I agree that the other romanizations should also be italicized per WP:MOS#Foreign words. Kanguole 06:32, 16 May 2014 (UTC)
non-English words using the Latin script, other than proper nouns, like place names, or loanwords, should be in italics – well, pinyin usage here on en.wp is almost exclusively for proper nouns, so there's an argument for not italicising these at all. Now that it looks pretty awful with a different font, I think it's a convenient time to switch out of using them. -- Ohc ¡digame! 07:05, 16 May 2014 (UTC)
They're not being used as proper names in a sentence, but being supplied as foreign words in parentheses, so it seems entirely reasonable to distinguish them. It also seems to be common practice (e.g. Mexico City, Munich, Cardiff, Brussels, Istanbul, Manila, Tokyo, etc). Kanguole 11:13, 16 May 2014 (UTC)
I've noticed that {{Chinese}} doesn't seem to have the language tags for Chinese romanisations. Was the intention of this change only to modify {{zh}} and nothing else? --benlisquareTCE 06:46, 16 May 2014 (UTC)
It's one of the reasons I prefer using {{Chinese}} for the lead. The other being the lower amount of clutter. -- Ohc ¡digame! 07:05, 16 May 2014 (UTC)
While many of the words used in the zh template are proper nouns, a very large number of them are common nouns. For example, on the page jiaozi the zh template is used 9 times an not one of them is for a proper noun, more so Ancient Chinese coinage has 165 instances of the template of which only a couple are for proper nouns. Beside which, as Kanguole stated the English word is given outside the parenthesis, italicised or not as required, the transliterations inside the parenthesis should be italicised. If you disagree with italics on foreign words in general, then you should raise that issue on Wikipedia talk:MOS rather than here. Rincewind42 (talk) 14:06, 16 May 2014 (UTC)
{{Infobox Chinese}} should use language tags for the same reasons as here; looking at it now the Chinese characters are tagged but none of the Romanisations seem to be. It's not been converted to Lua so the code here won't work but it could be done by wrapping {{lang}} templates with the right language tags around each bit of output that needs it. It doesn't need italics though; they're only needed for 'isolated' foreign words which I take to mean those inline with the text, as here.--JohnBlackburnewordsdeeds 16:59, 16 May 2014 (UTC)

I've added an extra option ital=no noital=yes ital=no to the sandbox, to disable italics on the various Romanisations. This is based on the discussion at Module talk:Zh/Archive 1#Pinyin italicization, though it doesn't just apply to pinyin now. This is so the template can be used where italics are inappropriate: in tables or in text which is already italics and where foreign words should be upright. In such cases it might be easier to use {{lang|zh-Latn-pinyin|...}} etc. but for completeness (and as it was a simple fix) it's now in the sandbox version. I've updated the testcases with examples of it, copied below for reference.

--JohnBlackburnewordsdeeds 03:09, 19 May 2014 (UTC)

I'm getting an odd effect with that ital=no noital=yes ital=no when using within italic text. Look at the following:
Here if I add ital=no then it is in italics and without ital=no it is not italics. Instead of turning on/off italics as expected, we are reversing the existing setting in an unexpected way.
In addition I am worried about the usefulness of this feature and the potential for abuse/misuse. The above example, where the zh template is nested inside italics should never exist. Are there any examples of this use in the wild? Also foreign words within tables aren't exempt form italics so this shouldn't be used there. Even when we are dealing with propor nouns, which aren't in italics, the contents of the zh template isn't that sentence word but rather falls under MOS:ITAL/Words as words and so would be in italics even if they were English words. This is the difference between 'using' a word and 'mentioning' a word. So we correctly have, "The Premier of China was Zhou Enlai ({{s=周恩来|p=Zhōu Ēnlái}})." Nobody would argue other than Zhou Enlai is a proper noun but in the Chinese language. However before the brackets Zhou Enlai is used in the sentence as a proper noun and within the brackets Zhou Enlai is mentioned as a word, as it is too just a word as a word, not a name, in this sentence.
In short, I can't see any good use for this addition and if there are rare occasions where the text shouldn't be in italics then the editor can always use {{noitalic}} to fix that single instance. Rincewind42 (talk) 06:52, 19 May 2014 (UTC)
I don't think making italics an option is helpful. In the linked discussion, I see one editor saying pinyin should never be italicized and four saying it always should be. Making it a personal preference is a recipe for chaos. This template shouldn't be used inside italics anyway, as we have a guideline against italicizing characters. Kanguole 07:12, 19 May 2014 (UTC)
It's how it was before. I effectively removed that option when I converted the template to Lua, because it wasn't in the documentation which I was using for guidance. I didn't try reverse engineering the template from its parser code which was particularly dense and complex, I instead looked at the documentation to see what it should do, then looked at the actual output to ensure they matched. I missed the italics option as it wasn't documented at Template:Zh/doc and none of the tests I did depended on it.
And looking at the talk page not only was an option but it defaulted to on: here's the change that added it. The parameter syntax is different so I should change the module to match, in case it's being used it already. The difference here is not just pinyin but all the Romanisations can be italicised, so the option should apply to all of them not just pinyin.
It's only an option; editors don't have to use it and the vast majority won't. Not including it would make little difference to how pinyin's displayed in articles as if editors find the template won't do what they want they won't use it, and may just enter the pinyin without a template, so it doesn't get the other benefits of the template such as proper language tagging. The template can't support every possible formatting option but this is a straightforward and logical addition.--JohnBlackburnewordsdeeds 17:45, 19 May 2014 (UTC)
On the examples given of using it in a larger span/para of italic text it should never be used like that, as Chinese characters should never be italicised. But the pinyin is actually working correctly with the default italicisation. What's happening is the italics around the pinyin are cancelling out the outer italics. Compare e.g.
  • Italic test zhōngguó test
The template is effectively doing the same as the wikitext sample above, with the inner pairs of quotes around zhōngguó making it non-italic. But it's not correct as the Chinese is italicised and there's no easy fix for that. In such cases editors should use other approaches such as {{lang}} templates.--JohnBlackburnewordsdeeds 17:58, 19 May 2014 (UTC)
Yes, it was there before, but no-one knew it was there and no-one used it. Presumably you intend to document the parameters that you implement, so this will be a major change. I don't think that it's helpful to provide an option not to follow the standard style. If this parameter is advertised and some people start using it, we'll have a mess. Kanguole 16:31, 21 May 2014 (UTC)

noital

I've changed it from ital=no to noital=yes, to match the earlier usage, and updated the examples above so they still work after the change, both mine and Rincewind42's.--JohnBlackburnewordsdeeds 18:11, 19 May 2014 (UTC)

Just for the record I dislike the choice or words noital=yes versus ital=no, as typing 'yes' to disable something is counter-intuitive and we have increased the amount of typing by three letters with no gain. If you look at the initial discussion it was ital=no on the [21 October 2009] and changed to noital=yes on the December 14, 2009. Before October, and after December, 2009 the default was to have italics on pinyin within this template. I have searched Wikipedia as much as practical and I can't find any example of noital=yes or ital=no/yes anywhere in article namespace, so I don't think we need worry about breaking anything should we make changes to this function. Rincewind42 (talk) 14:44, 21 May 2014 (UTC)
You're right. I wasn't sure if you could search on params but I can search on 'poj' and find uses of it within the template, but 'noital' turns up nothing, except the instances in the template, tests, and a few unrelated uses. That's what happens when a feature is added by stealth I guess. I too think ital=no is better, not for length (it will be typed very few times) but as it's more logical and for consistency with the other switch-like parameters. I'll change it back.--JohnBlackburnewordsdeeds 15:24, 21 May 2014 (UTC)
I've just exported all 28,534 articles that use this template. None of them use |noital= or |ital=. Since it's unused, I think it would be best not to put this feature into the new implementation, for the reasons given above. Kanguole 16:15, 21 May 2014 (UTC)

Latn problem

There's a really big problem here when it comes to using any of the Asian language HTML codes and their romanizations in that when you use zh-Latn or ja-Latn or ko-Latn it forces browsers to encode the text as if it was zh, ja, or ko and draw on the fonts used to display those languages rather than the default font. Right now, all pinyin parsed through this module for users of Firefox is being parsed as SimSun (at least on my end).—Ryūlóng (琉竜) 08:50, 17 May 2014 (UTC)

It's something happening for all Windows users of Mozilla Firefox, including myself and a few others who've made mentions earlier. I personally feel that SimSun looks really ugly for alphanumeric characters, so something should probably be done about it. I'm not going to switch browsers just for this, but it is quite annoying to see SimSun in the middle of a paragraph otherwise written entirely in Arial. --benlisquareTCE 08:57, 17 May 2014 (UTC)
Try MS Mincho/Gothic or Meiryo or whatever I did that screwed up the fonts when I copied them from my old XP to my Vista machine. I can't believe that we have to explain this to people who happen to use other browsers and they wonder why you shouldn't encode the alphanumeric characters as CJK even if in their mind "It's not English so it should be tagged as Japanese" (like every time I spoke with Pigsonthewing when he made mono no aware.Ryūlóng (琉竜) 09:31, 17 May 2014 (UTC)
The way it should be working is that browsers should be recognising the 'Latn' part and so rendering as Latin, i.e. European, text. The Latn subtag has been around for nine years so isn't new, and was designed precisely for this, so browsers would know it was Chinese (or Japanese or Korean) but Romanised so does not need to use Chinese/Japanese/Korean fonts.
Few users seem to be experiencing problems, judging by the number of reports, which suggests it's not a widespread problem, so doesn't need a WP-wide fix, such as adding CSS for zh-Latn and other Romanisations. Not that special CSS should be used anyway; it should depend on users; their choice of browser, their browser and OS settings. If switching browser or changing settings doesn't work/isn't possible you can supply your own CSS. Here's what I have in User:JohnBlackburne/common.css, to highlight all foreign languages but pick out Chinese (characters) and Chinese Romanisation (i.e. pinyin) in different colours.
    span[lang] { color: green; }
    span[lang|=zh] { color: teal; }
    span[lang|=zh-Latn] { color: olive; }
The same technique can be used to set the font. See e.g. Template:Lang/doc#Applying styles for examples.
Also it seems to only be a problem for some people using Firefox on Windows. If so, i.e. if it's just one browser on one platform, it's far more likely to get fixed if a popular site such as Wikipedia is properly using 'zh-Latn' and 'zh-Latn-pinyin' tags. They have the documentation, lots of examples now to test with, and other browsers (including Firefox on Mac) to compare it with.--JohnBlackburnewordsdeeds 15:13, 17 May 2014 (UTC)
The problem might get fixed at Mozilla's end, but based on my past knowledge, bug-related changes are worked on very slowly by the Firefox dev team. It took them something like two years to get anti-aliasing for zoom-out image previews to work properly after it was first reported to them, and they seem more keen on making unnecessary changes like giving tabs round edges like the recent 29.0.0 update. I'm not counting on any rapid changes to fix the zh-Latn-pinyin issue. --benlisquareTCE 19:22, 17 May 2014 (UTC)
It's not just a Windows problem: I am able to reproduce it in Firefox on a Mac, after messing around a lot with the settings. In particular under the "Content" tab on the panel under the "Advanced..." button, I first un-click the "Allow pages to choose their own fonts,...". This on its own changes the formatting completely, so it starts using a serif font for body text for example, ignoring the fonts in WP's CSS.
Second I choose 'Simplified Chinese' from the first popup then set the font to something other than their defaults. E.g. setting proportional to 'serif' and serif to 'AppleGothic' breaks pinyin rendering as described above.
The obvious fix is don't uncheck 'Allow pages to choose their own fonts', or don't change the defaults for Simplified Chinese rendering. But people may have set these for very good reason, or they may be set by the browser or OS in some way. Either way they won't be aware of it and even if this info's mentioned somewhere most people won't find it and act on it, so it's hardly a general fix.
But finding I could reproduce it motivated me to download the code and see if I could find the problem in it, which I've done and fixed it in my copy (actually the Nightly build which is I think version 31). I may eventually be able to submit it as a patch - it's an almost trivial fix but it takes forever to do anything with this large code base - almost an hour to build, hours to clone it in git.--JohnBlackburnewordsdeeds 01:44, 18 May 2014 (UTC)
Finally after a long checkout/clone, after sitting through multiple builds and working out basic git usage I made a patch, which I've submitted as an attachment to the bug, 485179. It's a really simple fix, so there's no reason why it can't be done quickly. It's also there for anyone else who wants to get it and build a working version themselves, though that's not recommended unless you've some experience building similar large projects.--JohnBlackburnewordsdeeds 07:35, 18 May 2014 (UTC)
Aside form the reasons JohnBlackburne mentioned, such as allowing the browser to load the correct fonts and custom displays, the correct language tags in important for accessibility for disabled users. For example, a blind person using a screen reader. When the screen reader try to pronounce the the text on the page, it looks to the language tags to help it get the right pronounciation. Two words with the same spelling may be pronounced quite different in another language. for example consider the English 'hotel' and the French 'hotel'. Likewise imagine a screen reader trying to pronounce the Chinese word xi ([undefined] Error: {{Lang}}: invalid parameter: |s= (help); 'west') but using English pronunciation. It will make a complete hash of it. Perhaps making a sound like in 'xenon gas' or a hard sound like in 'Mexico'. You may think that isn't very important because it affects very few people and not all screen readers are multilingual anyway, however including such accessibility items, along with alt tags on images, may be considered a legal requirement.
It also make the text more semantic and so easier for search engines to index and provide relevant results. A neat think, in addition to the colouring trick that JohnBlackburne mentioned, you can also hide some elements using custom css. If you only know simplified but don't want to see traditional script, or if you can read Chinese and don't want the pinyin etc. or if you don't read Chinese and only wan the pinyin, then you can use the custom css to hide the parts you don't want. This would leave the page clean and uncluttered, thus easier to read.
I checked the display on Firefox, Chrome and the Wikipedia app, on my Android tablet and phone and they display correctly. Has anyone checked Safari on a Mac, ipad or iphone? Rincewind42 (talk) 14:56, 18 May 2014 (UTC)
My test browsers have been Chrome, Firefox, Safari and Opera on a Mac, IE 7 on Windows Vista and Mobile Safari on an iPod 4 (iOS 6). They all worked fine; the only noticeable difference is two (Chrome and Opera I think) use different fonts for traditional and simplified Chinese, but the pinyin and other Romanisations all look the same (Firefox only started displaying problems when I deliberately changed the settings to make it do so, and I've since fixed that in my copy of Firefox and uploaded my fix to their bug tracker).--JohnBlackburnewordsdeeds 15:04, 18 May 2014 (UTC)

Performance comparison

Is there any significant difference in the performance of using {{zh|s=沈阳|labels=no}} versus {{lang|zh-hans|沈阳}} in terms of page load speed or server load? Visually they are identical but the zh version is 2 characters longer typing than the lang version. Other than extra typing is there any reason to prefer one over the other? Rincewind42 (talk) 07:04, 19 May 2014 (UTC)

I don't know. This template could be faster as it's been converted to Lua but I really don't know. lang behind the scenes is quite a complex template: it has to be to handle all the hundreds of languages. It's also in the process of being converted to Lua, but that's a major task so is taking time. But the difference is certainly marginal: when working on the module I used Mandarin Chinese profanity to test it, as it has hundreds of templates. I saw a small speedup but not enough that it would bother anyone.
I wouldn't worry about it. Use whichever makes sense. I tend to use lang for single instances of Chinese or pinyin as it's shorter and simpler. In general performance is not something editors should worry about; that's for the people running the servers, who can actually study where the bottlenecks are and come up with solutions whether it's hardware or software such as optimisations to the source code. Very occasionally pages hit one of the limits of the software and get added to a list or tracking category, such as those here: Special:TrackingCategories. If this happens to a page you're working on then it probably needs addressing but otherwise don't worry about performance.--JohnBlackburnewordsdeeds 19:52, 21 May 2014 (UTC)

Italics summary

I've broken this out into a new section to summarise things. Based on the above discussions, surveys of existing use, etc. I've undone the ital=no option as unnecessary. It's been in for years, except with noital=yes to activate it, and never used. This is largely as it was added but not documented so the only way to discover it was via a note on the talk page or a careful examination of the template code. But the fact it was never used, or requested since, suggests there's no need for it. It could be added at a later date if there's demand for it but that's a separate discussion.

So the changes in the sandbox are just the additional italicisations for the other romanisations, for the reasons given above: consistency, to distinguish them from the labels, and to agree with MOS:FOREIGN. The other option to make it consistent would be to remove italics from pinyin but that would be far more disruptive a change, with pinyin being far more common than the other Romanisations, and does not agree with the style guidelines.

Does that seem OK ? I'm keen to get this merged into the main module, for the reasons above and as it arose from the language tagging so is in a way part of that. Get this in and I'd consider all the fields working properly - properly formatted, properly tagged with the appropriate language tag.--JohnBlackburnewordsdeeds 19:34, 21 May 2014 (UTC)


Please update the module from the sandbox to introduce the additional italicisations as summarised above. The main discussion is at #Italicisation, where there was broad agreement with only one objection which was addressed, before getting sidetracked with the ital=no/noital=yes discussion.--JohnBlackburnewordsdeeds 17:42, 23 May 2014 (UTC)

Done Jackmcbarn (talk) 18:07, 23 May 2014 (UTC)