User:Hyperborean/Proposal:Dialect Tags
This is where PizzaMargherita and I can collaborate on a quick rewrite of the Tagging proposal.
Status: only minor changes have been made to the original discussion. Hyperborean 16:26, 9 June 2006 (UTC)
An archived discussion of this proposal is here (not placed anywhere yet).
Dialect Tags for Pages
Read this first
[edit]This section summarises a proposal for handling national varieties of English in WP. Please read this first, then discuss below. Calls for votes will be made later. Please be certain to read the QA section. Many objections that might first leap to mind are not as valid as you might guess.
The Problems
[edit]If you are reading this, you are probably familiar with some of the following issues
- 1. Spelling inconsistencies within and across articles. This means
- 1.1. WP looks unprofessional
- 1.2. Some people "trip" on alternative spellings
- 2. Resources are wasted "correcting" spellings
- 3. Resources are wasted "correcting back" spellings (either after some time, and so on forever, or immediately, potentially starting arguments or edit wars)
- 4. Resources are wasted arguing over, and trying to interpret the current guidelines, which are failing to solve the above problems.
If you are not convinced, load this page and this page, and search for "spell". Now that is what I call a waste of resources (human and mechanical). More examples available on request.
- 5. Bad blood is generated where it needn't be.
The proposal
[edit]A series of templates will be defined, in the form {{en:humour}}. Once that is done, editors will write variant words in the following way
I have no sense of {{en:humour}} because I'm a pizza.
A "locale" setting will be added to user profiles, which allows them to specify their preferred variety. Based on that setting, the above will be rendered as
I have no sense of humour because I'm a pizza.
for a "UK" user and as
I have no sense of humor because I'm a pizza.
for a "US" user. That way, everyone gets to read WP articles with their favourite spelling style. It will be like having two (or more) WP editions, that are always automatically in sync—i.e., unlike the editions for languages other than English, they will not be affected by forking.
The dynamics of the proposal
[edit]With this mechanism in place it is very unlikely that a casual reader changes a variant word to his preferred spelling, because as soon as they see the template, they think about it twice, possibly learn about yet another benefit of being logged in, and if they care so much about these things they will propagate this idiom in other articles and for other words. People would learn by example and WP would converge gradually to a superior equilibrium, starting from the current state.
This mechanism can be extended easily to other varieties of the English language either immediately or when/if the need arises.
There would be an equilibrium which is the one and only correct equilibrium, which will please everyone, and which every good-intentioned Wikipedian will actively contribute to reach. As opposed to the current chaotic situation whereby there are opposing forces trying to pull the spelling to "their" side, and the WP in this sense will never reach an equilibrium.
Using such templates would be a guideline. As with many guidelines, some editors (most, I would say) will be unaware of it, and write "naturally". Note that this will make articles strictly no worse than the current situation. Then the anal guy comes around and spots the oh-my-god-horrible "misspelling". Being anal, he or she is aware of the guidelines, and therefore corrects it accordingly. Everybody happy, the end.
Again, if you forget to stick to the rules (genuinely or deliberately), it's not a problem, because most people won't notice or won't care, while the anal guys above will correct it immediately to the correct version. And they will do it only once.
Open issues (food for thought)
[edit]Q: How would the title of an article be handled?
A: Good point. Not worse than it is now, but ideas are welcome.
Other features
[edit]This mechanism could be useful for other languages (e.g. Portuguese and Spanish) that have similar spelling idiosyncrasies.
Also it may turn out handy for other similar issues that some people feel strongly about, such as spelling of "God/G-d/god".
QA
[edit]Note: Although many people clearly oppose this proposal without having read this section, to be fair some of the points below have actually been introduced after specific comments from opposing editors.
Q1: {{en:humour}} or {{en:humor}}?
A1: Since this code will be seen only by editors, it doesn't really matter. We could allow both to mean the same thing. If you fear that people will continue edit wars on {{en:humour}} vs {{en:humor}}, which I think is very unlikely and anyway much less important, then we can apply the current spelling rules to the tags (e.g. UK spelling in UK articles, leave the original spelling, etc).
Q2: How do we avoid the "UK Labor Party"?
A2: When talking about the UK Labour Party, you would write it as is, like you do now. Anybody manually correcting it as "Labor Party" or relying on "{{en:Labour}} Party" is clearly a mistake.
Q3: What happens to users that are not logged in or that haven't set the locale preference?
A3: Users' locale can be determined using their IP address.
Q4: Can we use automation (bots, automatic rendering without explicit tagging, etc) to get to the equilibrium faster?
A4: I don't think it's a good idea. Instances like "Labour Party" would be messed up, and I think it's more important not to break what's currently correct than automating the changes. I think trying to automate the transition is missing the point made in the "dynamics" section.
Finally, consider words like "license" (verb) vs "licence" (noun) in the UK dialect. How do you make that distinction automatically? (Thanks to NFH the argument)
Q5: Wouldn't this make Wikitext totally unreadable and a chore to edit?
A5: No. Please explain how this
I have a poor sense of {{en::humor}} because I'm a pizza.
is so much more unreadable and difficult to edit than this
I have a poor sense of humor—I'm a pizza after all. (P. Margherita)
In other words, let's assume that — didn't exist, and someone in the W3C came up and said: hey, wouldn't it be great if we added — to HTML? Would you oppose the move because it would make HTML totally unreadable and a chore to edit?
Q5.1: Would this make Wikitext more readable and easier to edit?
A5.1: Of course not. It will be strictly less readable. However, in my opinion, it will be only slightly less readable. After all, the syntax of the proposal is among the simplest instances of templates, which make regular appearance in our articles.
Anyway, ponder this. How many new editors would be deterred by such a {{en:monstrosity}} to the extent that they would choose not to contribute? Because this is what really matters, isn't it?
As for ease of editing, how many keystrokes and mouse clicks would be wasted in the average editing session? Keep in mind that one needs not use the templates when editing. (Although I don't think this will change the estimate much.)
Q6: I can read all flavours of English just fine. Isn't this proposal useless?
A6: No. See "The Problems" subsection above.
Q7: Isn't this gonna put strain on our servers?
A7: No. All is required is parsing the templates, looking up a user preference and looking up the right spelling in a static map in memory. This is no more complicated than the date format feature. Compare this to the all-important "Skin" feature.
Q8: Isn't this gonna take a long time to implement?
A8: This is besides the point. We are not asking you to do it, simply to express your view on the proposal. If there is an agreement that this would be a welcome feature, it will be prioritised.
Q9: What about other grammatical/punctuation variations?
A9: I think we should stick to spellings to start with, for the following reasons:
- It's much simpler
- It would be much easier to convince people that this is a good idea
In other words, I think that anything more complicated than "downtown" vs. "city centre" should not be done just yet.
Q10: I'd still like to read about US topics in US spelling and UK topics in UK spelling.
A10: We can leave local spellings throughout a local article as per current guidelines. Or we can add a {{context:UK}} or {{context:US}} at the top of "local" articles/sections and users can choose to let this template override the localisation templates. This setting could be turned on by default.
Q11: My idea is along the same lines, but it's better.
A11: Excellent, we'd love to hear about it! The proposal above is only tentative, and some variants are already being discussed (e.g. MOIO vs MABIO below). But don't forget to vote!
Q12: Do you realise how many tags must be put in place?
A12: Yes, but that doesn't need to happen overnight, and in fact it doesn't need to happen at all. Let's assume that — didn't exist, and someone in the W3C came up and said: hey, wouldn't it be great if we added — to HTML? Would you oppose the move because of the sheer amount of dashes on the web that would need to change?
Q13: Some words have two possible spellings in a specific locale—e.g. dialog/dialogue in US. Wouldn't people fight over which one to choose for that locale?
A13: It would be very easy to allow users to specify personal settings for each word, that would override their locale of choice.
Q14: But I like inconsistencies!
A14: Interesting. Well, that's easy to achieve, we can provide an en:random option among the various locales.
PizzaMargherita 09:09, 18 February 2006 (UTC)
Votes on tagging pages proposal
[edit]"A feature like this would be welcome in the English Wikipedia."
- Strongly agree - Er... yes, I agree with myself. :) My view is summarised in the Read this first above. PizzaMargherita 14:34, 8 January 2006 (UTC)
- Agree. It's inclusive, recognises dialectic boundaries, avoids some of the ambiguity of the current guidelines and is harmless if it ends up hardly being used, as it isn't a necessary syntax. I also don't think it will be as ugly in the markup as some people seem to predict; one reason for that being that I guess it will rarely crop up (e.g. I can't see any parts of this paragraph where it could be used), and another reason being that it's reasonably slim-line. --Splidje 11:00, 17 January 2006 (UTC)
- Agree I like the idea, but only if the implementation at least has some kind of automatic tagging, possibly in the form of a checkbox when editing a page. I am also in favour of exploring an alternate implementation (MOIO, see below) further. Jared Grainger 18:29, 8 January 2006 (UTC)
- Strongly disagree. The current system works okay, and this proposal, if implemented, would be a huge server resource hog. BlankVerse 16:11, 8 January 2006 (UTC)
- Mildly oppose. Thank you for putting so much effort into explaining your proposal. I can understand your reasoning, but I disagree. Here is why I oppose the proposal:
- I don't like the idea of splitting Wikipedia into two (or more) viewing modes. There is only one English language and there should only be one Wikipedia. Wikipedia is an international project and having different spellings coexist in that project gives it an international flavour.
- There are many spelling variations that can be used in both British and American English (dialog(ue), travel(l)er, realize(-ise), fetus/foetus, per cent or percent, theatre/theater and so on...) What about Canadian spelling? In order to make everyone happy, you'd have to devise a system that enables every user to create his own individual system of spelling.
- There would never be consensus about the "default view". I'd guess that more than 90% of all Wikipedia users are not logged in. What will they see? Using tags will shift the current spelling controversies to a higher level of abstraction. Now, people argue about what spellings to use in a article. Using tags, they will argue about how these tags will be interpreted.
- Assuming (on average) five spelling variants per article, roughly five million tags must be put into place! Did you realize that? I think resources should rather be used to write new and improve already existing articles. I know that many people care about spelling, but all in all, it's a minor issue.
- Spelling is just the tip of the iceberg of linguistic variations. What about punctuation, grammar, lexical differences?
- There is one rule concerning spelling that has gained general consensus: Articles related to a certain English-speaking country should bear that country's spelling. For instance: London -> UK spelling. There would be strong resistance among the editors of such articles to changing all British spellings like behaviour to {en:behavior} just in order to allow a small minority of users to read the article in their favourite spelling. Nobbie 13:20, 10 January 2006 (UTC)
- Oppose. Burden on resources and on editors wading through the edit screen, for minimal benefit. Problems going beyond spelling, such as "she is in hospital" and the like. Too many options where more than one spelling is used within one geographical region, even if another region doesn't use one of them. Would create more arguments than it avoids. Gene Nygaard 16:45, 10 January 2006 (UTC)
- Oppose the manual tagging method. It's a really nice concept, but not worth the effort. I've not seen spelling partisans creating problems, and in my few months here, less than half a dozen well-meaning editors who honestly thought they were correcting typos. I strongly oppose the automated method, because there is no excuse for knowingly introducing errors where none previously existed. NickelShoe 01:01, 11 January 2006 (UTC)
Oppose, but very cool idea, I'd be interested in the programming. But... I actually like the Wikipedia spelling chaos, you learn a lot about the English language this way (not just spelling). I think Wikipedia makes many people realize how variegated (<- get it?) the English language is! NeutralLang 20:46, 23 January 2006 (UTC)- Support - I've changed my mind. I'm not sure if it can be done (it's an awfully complex task...), but I've thought about it and in principle I agree that it's a good idea. Sorry about my first remark, I didn't give much thought to it... NeutralLang 21:22, 31 January 2006 (UTC)
- Strongly oppose makes editing far harder, with not nearly enough gain to counterbalance this. DES (talk) 18:12, 30 January 2006 (UTC)
- Support. This is great — it's a proper solution, not some feeble compromise. Solutions are brilliant once you find them, compromises degrade Wikipedia. Syntax can be thought about, but I have faith that that's already been done in depth and the chosen is good. I don't believe this will have an adverse effect on editing. There will only be improvement during the transition period — a move away from consistency is not needed in this particular issue in order to gain consistency. Neonumbers 05:52, 31 January 2006 (UTC)
- Mildly oppose. This seems like a good effort at coming up with a solution, but I think the inability to address the grammar and usage differences will result in rather odd rendering. I'm thinking about a sentence like "The {{en:UK}} Labour government were dissatisfied with organized {{en:labour}}'s efforts to protect the {{en:dole}} of pensioners in hospital" being rendered as, "The U.K. Labour government were dissatisfied with organized labor's efforts to protect the unemployment insurance of pensioners in hospital." That's a hodgepodge of American and Commonwealth English that kind of makes my head spin (and incidentally reads perfectly correctly in one dialect and has a non-sequitur in the other). At least now, my mind can click into "reading American" or "reading Commonwealth" and keep going. I hope that the default (which I will certainly leave turned on if this is implemented) is "leave the spelling as-is"—which of course will probably cause the edit disputes to continue as people change {{en:color}} to {{en:colour}} and back... What's the mapping for {{en:rubber}} going to be, by the way? --TreyHarris 07:58, 13 February 2006 (UTC)
- Oppose. This creates a large workload for Wikimedia developers, template maintainers, and article editors, but would solve only a very minor problem. Spelling errors used to really disrupt the flow of my reading, but after years of browsing Usenet and the WWW, I don't even notice spelling mistakes any more. National spelling variations are even less noticeable (with a few exceptions like curb/kerb, tire/tyre). I am sure the large majority of Wikipedia's users don't care and don't notice, so why should we add so much to our workload for so little gain? Readers aren't going to set up an account and log in so they can see tweaked spellings. As long as each article has consistent spelling (of the national variety appropriate to the subject, if applicable), that's good enough. Indefatigable 16:35, 13 February 2006 (UTC)
- Strongly support. I think getting a solution to the spelling problem would move us a long way toward the goal of a consistent professional style. I'd like to take issue with a lot of the contrived examples where poor substitutions might occur. Once we have broken the back of the problem (spelling) we can start to ask editors to be careful in which words they choose where the spelling isn't the issue. In other words, we can ask Americans editors to use "apple juice" where they would say "cider" and Commonwealth editors to say "alcoholic cider". We can make compromises on a lot of the other issues as well and find a middle ground. I suspect that the difficulty with finding a middle group is primarily because of the black-and-white us-versus-them nature of spelling differences. Another example: Americans say either named for or named after, in New Zealand it's exclusively named for. Since Americans have a choice, they can choose the one that applies most internationally. Ben Arnold 00:48, 15 February 2006 (UTC)
- Oppose. Nice idea, but I think it would have a negative effect by perpetuating people's beliefs in the 'rightness' of their own spelling, and causing them to 'correct' 'misspellings' on untagged pages. I think it is better if people can learn that other spellings exist. British books published in America are always 'translated' into American spellings and idioms, so Americans are often see mistakes when they encounter 'untranslated' British writing. In contrast, American books published in Britain are not 'translated', so Brits usually have a better awareness of the different conventions of different countries. I think the British approach is better as it makes you more aware of diversity. The Singing Badger 18:23, 16 February 2006 (UTC)
- Strongly support. At first I liked the idea of having "British English for articles about Britain, American English for articles about the U.S.", but that only provides guidance in a minority of articles. Obviously we would not need to go out and tag every occurence of a problem word in every article, but at least this would silence the endless debates on articles like "color" and "grey". My only concerns are, again, for article titles, and for non-logged-in users who still might care. I also think, that if we implement this, a personalization feature is absolutely necessary. Lesgles (talk) 21:05, 18 February 2006 (UTC)
- Strongly oppose - Needless waste of editor time and overcomplication of code. Wasn't this site an easy-to-edit wiki at one time? — Omegatron 21:01, 21 February 2006 (UTC)
- Strongly oppose per Omegatron. My constitution is strong enough with being exposed to the occasional Americanism. Markyour words 16:50, 22 February 2006 (UTC)
- Strongly Agree Brilliant idea. Problems like "in hospital" vs. "in the hospital" can of course be solved via a similar mechanism. Thanks for being so smart! I was actually about to offer an alternative to the Dialect Problem: the creation of a "Global Dialect" standard. Less wacky than it might sound. But your idea is a better solution. (Actually, we could create a global dialect and have that be another localizable setting.) BrianinStockholm 19:48, 22 February 2006 (UTC). 18:41, 2006 February 22 (UTC)
- Support Great idea so long as it's restricted to spelling. I wouldn't want to see attempts to substitute different words between dialects in such a way, that would change the tone of articles too much. Nick 22:55, 24 February 2006 (UTC)
- Weak oppose There would be benefits to such a feature; however, I believe the costs far outweigh them: editor time (already consumed by regular editing and dialectic changes), overcomplicated coding, server limitations, and perhaps overcomplicated editing. This would likely result in a different sort of dialectic melange than currently. And all of this presupposes that there's something wrong with the status quo, with editors judiciously discussing and implementing dialectic renditions – this is a wiki, after all? And if guidelines are insufficient for some reason, they should be massaged or accepted. This seems a make-work project and, in effect, is a Wp version of the Quebec "tongue troopers". E Pluribus Anthony | talk | 10:22, 20 March 2006 (UTC)
- Support. I'm a little worried about making things over-technical, but of course editors won't have to use the templates; they can be added by others (just as now we have to correct additions to articles in the "wrong" variety of English). If editors already judiciously discussed these matters, I'd be more inclined to a weak oppose, but all too often supporters of one variety simply blindly insist on it. --Mel Etitis (Μελ Ετητης) 11:53, 25 March 2006 (UTC)
- Strongly Support. This would greatly improve the readability and user-friendliness of the encyclopaedia. It would allow the resource to acknowledge both the wishes of Commonwealth and American speakers and cater to both of them simply and invisibly. Also consider the possiblity of language variations done by DEFAULT on all terms except those in quotes or with a special "not subject to dialect" tag, which would make the editors' jobs easier, there being generally less of the latter than of the former in articles. It is a beautiful solution to an otherwise endless, circular and bitter discussion based on such trivial matters as habit and education. For the titles, I propose that if a user searches for "colour" or types in the Colour address of the link, it would show Colour as the title and colour in the article, whereas if they'd searched the other form, the other form would be presented in the title and article. A little "alternative spelling:" tag can be put in where "redirected from:" is now for those articles, so viewers know what's going on. Arrenlex 06:19, 4 April 2006 (UTC)
- Strongly Support. In short, I strongly support this for the reasons given by PizzaMargherita and many other supporters here. An encyclopaedia is about knowledge. In its present format, Wikipedia presents knowlege but propagates confusion over spellings. It would be far better to refer readers to articles on spelling diversity in the English language rather than risk spreading spelling inconsistencies in the audience. Alias Flood 05:16, 6 April 2006 (UTC)
- Strongly Support. I thought of exactly such a system for WP a few months ago myself, but never mentioned it. I'm pleased that someone else has come up with the same idea. A lot of English people strongly object, when reading their own national language, to having American spelling imposed on them (and I am aware that many Americans can be equally upset by the inverse), and this tagging would go some way to address this issue. In this way, if an editor objects to the spelling of a particular word, they can amend it with a tag, thus making everyone happy - no more revert wars over spelling. I'm against using bots, as you only have to look at words like "license" (verb) vs "licence" (noun) where US English uses "license" for both the verb and the noun. It would be difficult for bots to make such a distinction between different parts of speech. NFH 16:17, 17 April 2006 (UTC)
- Support. I don't think this would be any bigger a deal than having dates customised to suit local preferences, which is what already happens. --Susurrus 07:26, 13 May 2006 (UTC)
- Strongly Support. This would not only prevent a lot of pointless edit wars and discussion, but would make Wikipedia easier to read for everyone. As to The Singing Badger's comment, US books are not translated into British English because that is changing the authors original work. It seems strange that it does not work like that both ways. Since Wikipedia articles cannot been seen in the same literary light, and are anyway the work of multiple authors I think it's better to make the translation. Mojo-chan 22:15, 14 May 2006 (UTC)
Tagging pages - Discussion
[edit]Please note, the top part of these comments precedes the proposal above. PizzaMargherita 11:25, 9 January 2006 (UTC)
Is it feasible to create some templates to tag every pages (by en-GB, en-AU, en-GB-oed, en-US, en-CA.....). It is more convenient for editors. So that pages can be kept more consistence. - Cheung1304 19:27, 3 Mar 2005 (UTC)
- There seems to be a discussion about tags on the talk page of Wikipedia:Manual_of_Style Nobbie 13:48, 4 Mar 2005 (UTC)
Proposed templates: Template:BrE, Template:AmE, Template:CaE, etc. Cheung1304 03:45, 9 Mar 2005 (UTC)
- We shouldn't mark articles as only being editable by Britons, Americans, Canadians, etc. This is a very divisive proposal that would only add to Wikistress and edit wars, jguk 23:36, 22 Mar 2005 (UTC)
- This isn't the right approach. Articles should avoid containing editorial information; that's what the Talk: pages are for. Another approach would be to use HTML comments <!-- comments -->, but I'm not sure it's necessary. — Matt Crypto 11:47, 23 Mar 2005 (UTC)
- A comment saying <!-- This page is written using (country) English. Do not change the spelling or style to that of another dialect. --> would be sufficient. For example, there could be comments in the coding for the pages on Chicago, Illinois and White House warning people that changing the text on those pages from American English to British English is a no-no. --/ɛvɪs/ 20:16, Mar 23, 2005 (UTC)
- To use Kenneth Williams' last words, what's the bloody point? jguk 20:36, 23 Mar 2005 (UTC)
- <rant>Beause people like you think we all come from England. Some of us however, LIVE IN THE WESTERN HEMISPHERE. You don't need to overrun this site with british words that the general public cannot understand. </rant> 209.2.60.75 20:12, 3 October 2005 (UTC)
Maybe you'd be interested to know that I also live in the Western hemisphere? :) jguk 20:33, 3 October 2005 (UTC)
- Hmmm, I think only one country in the western hemisphere uses US spelling, the rest (that have English as an official lenguage) use English. I'd be worried about any general public that reads so poorly that it cannot understand English spelling, but would understand US spelling. Pete.Hurd 20:42, 3 October 2005 (UTC)
- I haven't seen any Canadian Tyre stores. Have you? It's a little more complicated than U.S. and UK, and parts of the UK also have strange things not shared by all of the UK. Gene Nygaard 05:08, 8 December 2005 (UTC)
This is quite an interesting problem, as it is pretty much totally about style. If dialects are/could be classified as above - en-GB, en-AU etc. - then we have a set of defined dialects. The Wikipedia approach to whole different languages is to have a whole separate set of articles for each language (meaning differing content on Welsh pages to English pages; so someone in Wales could be looking at a totally different article to their neighbour. Good thing? - separate issue). It would be overkill to apply this same approach to different dialects within a langauge, as the only differences are spelling, grammar and phrasing. Therefore, there could be a tag to give a section of text alternatives in each dialect, e.g. [ [ dialect:en-GB|The colours he favoured were considered humourous to her.|en-US|The colors he favored were considered humorous to her. ] ]. Of course this also requires the abililty for a reader to select their preferred dialect; does anyone know if such information is already available in locales etc.? I suppose if a user's language is en and their country is GB then it could be assumed their dialect choice is en-GB. What do people think? Is there a place where ideas such as this can be put forward? --Splidje 12:57, 7 December 2005 (UTC)
- I think this makes a lot of sense. See my proposal here and here.
- Nobody really liked it, but then again nobody gave any good reasons why we shouldn't do it.
- In essence: this
I have a poor sense of {{EN::humor}} because I'm a pizza.
- would render as this
I have a poor sense of humor because I'm a pizza.
- or this
I have a poor sense of humour because I'm a pizza.
- depending on user preferences. If we want to get fancy, we could even allow personalised dialects, i.e. one may set up my own Pizza dialect where template "humor" renders as "homour" (UK) but "color" renders as "color" (US). PizzaMargherita 21:03, 7 December 2005 (UTC)
- I don't think it's even worth striving for. I'd sooner put up with some variation in spelling rather than having the additional complications in editing, in reading the edit screen, and especially all the clutter that will show up on my watchlist and on recent changes as everybody rushes to add all those silly tags. Plus the strong likelihood of several different robots running amok because they have been poorly designed in an attempt to automate that. Gene Nygaard 05:08, 8 December 2005 (UTC)
- All of these points have been addressed in the discussions linked above, which I encourage you to read.
- If clutter in your watchlist is your primary concern, fear not, for that could only decrease. This would be a consequence of the existence of a stable equilibrium. In fact one of the issues that this proposal is trying to address is edit wars on spelling.
- As for complications in reading and editing wikitext, do you really think that this
I have a poor sense of {{EN::humor}} because I'm a pizza.
- is any more complicated than this?
I have a poor sense of humor—I'm a pizza after all. (P. Margherita)
- Finally, I don't think there's any need for robots. Everything is explained in the links. PizzaMargherita 22:49, 8 December 2005 (UTC)
I (DerekP) think this would be a waste of time because the use of alternative spellings for English language words is not an issue that is causing real problems in the world; except to people who fuss over non-important issues. If we have tagging, then every word should have to be tagged just in case there is an alternative that the author doesn't know about yet.
{{EN::I}} {{EN::have}} {{EN::a}} {{EN::poor}} {{EN::sense}} {{EN::of}} {{EN::humor}} {{EN::because}} {{EN::I'm}} {{EN::a}} {{EN::pizza}}.
In short, why bother wasting time and effort over small issues before the big issues are dealt with?
And by the way, should that be
{{EN:humor}}
or
{{EN:humour}}
or either?
DerekP 02:12, 15 December 2005 (UTC)
- "the use of alternative spellings for English language words is not an issue that is causing real problems in the world." - I agree that war and hunger are more serious problems, and I accept that they should be dealt with first, but you can't deny that spelling diversity is wasting a lot of time and resources ("correcting", "correcting back", discussing, etc), and that spellings are inconsistent across articles, often within articles.
- "every word should have to be tagged" - No, it shouldn't. I'll try to summarise and clarify the proposal in a standalone section.
- "should that be
{{EN:humor}}
or{{EN:humour}}
or either?" - Strange that, spelling inconsistencies in articles are not a problem, but the convention adopted in the template itself (seen only by editors) is. Ok, in that case my answer is that both templates will be introduced and will yield to exactly the same result. PizzaMargherita 17:54, 23 December 2005 (UTC)
- Strongly disagree. The current system works okay, and this proposal, if implemented, would be a huge server resource hog. BlankVerse 16:11, 8 January 2006 (UTC)
- Ok, so you are saying that: 1. the problems I have listed exist only in my head and 2. Q and A number 7 are utter rubbish. Fair enough, but can you please explain why? Thanks. PizzaMargherita 16:32, 8 January 2006 (UTC)
- I don't think the MABIO method would take much in the way of resources. There would only be a handful of additional tags per article and the dictionary is unlikely to change very often, meaning that it can be cached or inlined into the code. The MOIO method would be somewhat inefficient if improperly implemented (i.e. scanning every word every time the article is viewed), but all that needs to be done is to scan the text when it is saved and mark all the dialect words. Then it is comparable to MABIO in efficiency. This also brings another possible implementation idea to mind: put a checkbox that automatically scans the article and tags dialect words.Jared Grainger 17:59, 8 January 2006 (UTC)
In response to Splidje's original question: yes this would be complete overkill, a waste of programmers' time, would complicate wikitext and become a big inconvenience for all editors. Articles will end up in a worse mix of English dialects, as they will end up half-dialectified, while most editors try to work around, ignore, or remove the dialect tags littering them.
If you really want to work on the software, it's probably best to develop an extension to Mediawiki, and discuss it there. I will oppose adding anything like this to English Wikipedia.
Unobtrusively tagging articles as suggested at the very top of this discussion might be a good idea. But why don't we all go improve some articles instead of generating more words about this proposal? —Michael Z. 2006-01-8 18:28 Z
- "complete overkill" - see QA7, please articulate your argument if you disagree. Any technical feasibility study is more that welcome at this stage, whatever its outcome. On the other hand, opposing the proposal simply saying "It won't work" or "If it passes I'm not gonna comply" is not very constructive.
- "a waste of programmers' time" - see QA8
- "would complicate wikitext and become a big inconvenience for all editors" - see QA5
- "Articles will end up in a worse mix of English dialects, as they will end up half-dialectified" - No, that is the current state. Please explain how the situation will be worse. Thanks. PizzaMargherita 18:47, 8 January 2006 (UTC)
- Complete overkill in that it complicates wikitext. I can't look at an article and just type an addition, without risking it becoming a mix of dialects when someone views it. There are already too many different whiz-bang templates cluttering wikitext, without adding this one which has to be mixed in everywhere. Who's going to take on the task of patrolling Wikipedia, searching out out "tire tread" vs. "I tire easily" and adding the English dialect templates to the right ones? —Michael Z. 2006-01-8 22:52 Z
- I must insist: you can't tell me that the proposed templates will make the wikitext (which only editors see) more unreadable than it is already. Is " " readable? Maybe not, but sometimes using it is the one and only right thing to do. We should write what we mean: "{en:humour}" means "humour—this word is written in two ways depending on the locale", whereas "Labour Party" means "Labour Party".
- "I can't look at an article and just type an addition, without risking it becoming a mix of dialects when someone views it." - I agree, but the same can be said about the present situation, which is strictly worse than the proposed one.
- "Who's going to take on the task of patrolling Wikipedia, searching out out "tire tread" vs. "I tire easily" and adding the English dialect templates to the right ones?" - Please note that there is no need to actively and exaustively perform this task. Please read the dynamics of the proposal. The answer to your question is: it will be the same people who are creating these problems in the first place that will do that, along with every good Wikipedian that bumps into such an occurrence. Much in the same way you correct a typo in a random article when you are reading it. I think that a question like "Who's going to patrol Wikipedia?" could have been justified in 2001, but now it sounds a bit silly. PizzaMargherita 23:45, 8 January 2006 (UTC)
- Mildly oppose. Thank you for putting so much effort into explaining your proposal. I can understand your reasoning, but I disagree. Here is why I oppose the proposal:
- I don't like the idea of splitting Wikipedia into two (or more) viewing modes. There is only one English language and there should only be one Wikipedia. Wikipedia is an international project and having different spellings coexist in that project gives it an international flavour.
- There are many spelling variations that can be used in both British and American English (dialog(ue), travel(l)er, realize(-ise), fetus/foetus, per cent or percent, theatre/theater and so on...) What about Canadian spelling? In order to make everyone happy, you'd have to devise a system that enables every user to create his own individual system of spelling.
- There would never be consensus about the "default view". I'd guess that more than 90% of all Wikipedia users are not logged in. What will they see? Using tags will shift the current spelling controversies to a higher level of abstraction. Now, people argue about what spellings to use in a article. Using tags, they will argue about how these tags will be interpreted.
- Assuming (on average) five spelling variants per article, roughly five million tags must be put into place! Did you realize that? I think resources should rather be used to write new and improve already existing articles. I know that many people care about spelling, but all in all, it's a minor issue.
- Spelling is just the tip of the iceberg of linguistic variations. What about punctuation, grammar, lexical differences?
- There is one rule concerning spelling that has gained general consensus: Articles related to a certain English-speaking country should bear that country's spelling. For instance: London -> UK spelling. There would be strong resistance among the editors of such articles to changing all British spellings like behaviour to {en:behavior} just in order to allow a small minority of users to read the article in their favourite spelling. Nobbie 13:20, 10 January 2006 (UTC)
- Thanks for your civil feedback. Here are my replies:
- One language, one Wikipedia. Many dialects, many editions. The content stays automatically in sync. Having different spellings coexist in the same article or across articles makes it inconsistent and prone to endless loops of spell-fixes.
- Canadian spelling, Indian spelling, Nigerian spelling, all catered for. It would be very easy to give the user the possibility to have personal settings for each word that would override their locale of choice.
- See QA3. IP addresses. Easy. Now I'm sure somebody is gonna come up and say: "But wouldn't this be violating the constitutional rights of people who use anonymous proxies?"
- Five million tags must be put into place, but as I pointed out
- it doesn't need to happen overnight
- it doesn't need to happen at all.
- "It's a minor issue." - I agree, but some people don't feel that way, and they are the ones who will put in the five million tags. See dynamics section.
- 5. See QA9. It doesn't make sense to reject a solution only because it's partial, unless of course you can offer a solution to a more general problem. This is a valid solution to a part of the problem, so I don't accept this argument.
- 6. See QA10.
- PizzaMargherita 00:52, 14 January 2006 (UTC)
- Oppose. Burden on resources and on editors wading through the edit screen, for minimal benefit. Problems going beyond spelling, such as "she is in hospital" and the like. Too many options where more than one spelling is used within one geographical region, even if another region doesn't use one of them. Would create more arguments than it avoids. Gene Nygaard 16:45, 10 January 2006 (UTC)
- Thanks for your comments.
- "Burden on resources" - Please be more specific. Storage? The current situation is worse, because there is no clear equilibrium to which to converge, and so spelling styles can (and do) cycle, see The Problems section. Server cycles? See QA7. Developers? See QA8.
- "and on editors" - See QA5.
- "Problems going beyond spelling, such as "she is in hospital" and the like." - See QA9.
- "Too many options where more than one spelling is used within one geographical region" - See above. I'll integrate this and my answer in the QA section.
- PizzaMargherita 01:20, 14 January 2006 (UTC)
- Oppose the manual tagging method. It's a really nice concept, but not worth the effort. I've not seen spelling partisans creating problems [...] NickelShoe 01:01, 11 January 2006 (UTC)
- Thanks for your comments.
- "It's a really nice concept, but not worth the effort." - Sorry, poor argument. See QA8.
- "I've not seen spelling partisans creating problems" - Wanna laugh? Load this page and this page, and search for "spell". Now that is what I call a waste of resources (human and mecha).
- PizzaMargherita 01:20, 14 January 2006 (UTC)
- Strongly oppose makes editing far harder, with not nearly enough gain to counterbalance this. DES (talk) 18:12, 30 January 2006 (UTC)
- Thanks for your comments. Please see QA5. PizzaMargherita 08:01, 17 February 2006 (UTC)
- Mildly oppose. This seems like a good effort at coming up with a solution, but I think the inability to address the grammar and usage differences will result in rather odd rendering. I'm thinking about a sentence like "The {{en:UK}} Labour government were dissatisfied with organized {{en:labour}}'s efforts to protect the {{en:dole}} of pensioners in hospital" being rendered as, "The U.K. Labour government were dissatisfied with organized labor's efforts to protect the unemployment insurance of pensioners in hospital." That's a hodgepodge of American and Commonwealth English that kind of makes my head spin (and incidentally reads perfectly correctly in one dialect and has a non-sequitur in the other). At least now, my mind can click into "reading American" or "reading Commonwealth" and keep going. I hope that the default (which I will certainly leave turned on if this is implemented) is "leave the spelling as-is"—which of course will probably cause the edit disputes to continue as people change {{en:color}} to {{en:colour}} and back... What's the mapping for {{en:rubber}} going to be, by the way? --TreyHarris 07:58, 13 February 2006 (UTC)
- Thanks for your comments. Please see QA10.
- "It will probably cause the edit disputes to continue as people change {{en:color}} to {{en:colour}} and back..."
- This will probably not happen in my opinion, only a few vandals will bother. Tell you what, for the tags we can adopt the rules that we currently have for text. I'll add this to QA1.
- Even if it does happen, WP will be in a strictly better shape than it is now. I sustain, in a much better shape. PizzaMargherita 08:01, 17 February 2006 (UTC)
- Oppose. This creates a large workload for Wikimedia developers, template maintainers, and article editors, but would solve only a very minor problem. Spelling errors used to really disrupt the flow of my reading, but after years of browsing Usenet and the WWW, I don't even notice spelling mistakes any more. National spelling variations are even less noticeable (with a few exceptions like curb/kerb, tire/tyre). I am sure the large majority of Wikipedia's users don't care and don't notice, so why should we add so much to our workload for so little gain? Readers aren't going to set up an account and log in so they can see tweaked spellings. As long as each article has consistent spelling (of the national variety appropriate to the subject, if applicable), that's good enough. Indefatigable 16:35, 13 February 2006 (UTC)
- Thanks for your comments.
- "This creates a large workload for Wikimedia developers". Please see QA8.
- "I am sure the large majority of Wikipedia's users don't care and don't notice". What makes you think that?
- "Readers aren't going to set up an account and log in so they can see tweaked spellings." That is not necessary. See QA3 (IP addresses). PizzaMargherita 08:01, 17 February 2006 (UTC)
- Oppose. Nice idea, but I think it would have a negative effect by perpetuating people's beliefs in the 'rightness' of their own spelling, and causing them to 'correct' 'misspellings' on untagged pages. I think it is better if people can learn that other spellings exist. British books published in America are always 'translated' into American spellings and idioms, so Americans are often see mistakes when they encounter 'untranslated' British writing. In contrast, American books published in Britain are not 'translated', so Brits usually have a better awareness of the different conventions of different countries. I think the British approach is better as it makes you more aware of diversity. The Singing Badger 18:23, 16 February 2006 (UTC)
- Thanks for your comments.
- "causing them to 'correct' 'misspellings' on untagged pages." That's what's happening now. With this mechanism in place there finally is a correct (note no quotes) way to write these words, so when that happens, instead of engaging in an edit war, you would simply use a neutral tag.
- "I think it is better if people can learn that other spellings exist." This proposal does allow that. Please see QA10 and QA14, which I just added. PizzaMargherita 09:23, 18 February 2006 (UTC)
- Strongly oppose - Needless waste of editor time and overcomplication of code. Wasn't this site an easy-to-edit wiki at one time? — Omegatron 21:01, 21 February 2006 (UTC)
- Thanks for your comments, see QA5. PizzaMargherita 18:19, 25 February 2006 (UTC)
User Markyour words, for some reason, doesn't want me to quote his vote. (1) (2). However, I would like to reply to it, if I may.
- Thanks for your comments, see QA5. PizzaMargherita 18:19, 25 February 2006 (UTC)
- Weak oppose There would be benefits to such a feature; however, I believe the costs far outweigh them: editor time (already consumed by regular editing and dialectic changes), overcomplicated coding, server limitations, and perhaps overcomplicated editing. This would likely result in a different sort of dialectic melange than currently. And all of this presupposes that there's something wrong with the status quo, with editors judiciously discussing and implementing dialectic renditions – this is a wiki, after all? And if guidelines are insufficient for some reason, they should be massaged or accepted. This seems a make-work project and, in effect, is a Wp version of the Quebec "tongue troopers". E Pluribus Anthony | talk | 10:22, 20 March 2006 (UTC)
- Thanks for your comments.
- "editor time (already consumed by regular editing and dialectic changes)"—I'm not sure I understand what you mean. If you mean that with the proposal editors will have to waste time changing local spellings to the neutral counterpart, I invite you to read QA12 and the dynamics of the proposal.
- "overcomplicated coding"—I'm not sure I understand what you mean. But see QA8.
- "server limitations"—I'm not sure I understand what you mean. Anyway just in case see QA7.
- "overcomplicated editing"—I think I understand this one. See QA5.
- "And all of this presupposes that there's something wrong with the status quo"—There are quite a few problems with the status quo, see the problems section.
- "editors judiciously discussing and implementing dialectic renditions"—...and never reaching a consensus. Every other topic in the MoS talk page is a dispute about "dialectic renditions", and the current guidelines are in a shambles.
- "And if guidelines are insufficient for some reason, they should be massaged or accepted"—Good. A lot of people are clearly not accepting them and continue to "correct" the spellings. If you have a "massaging" proposal I'd be interested to hear it.
- "This seems a make-work project and, in effect, is a Wp version of the Quebec "tongue troopers"."—I invite you to read the dynamics of the proposal. The way I see it, the spelling bigots are the "tongue troopers", and with this proposal, they will be part of the solution.
- PizzaMargherita 22:10, 20 March 2006 (UTC)
- Strongly support. At first I liked the idea of having "British English for articles about Britain, American English for articles about the U.S.", but that only provides guidance in a minority of articles. Obviously we would not need to go out and tag every occurence of a problem word in every article, but at least this would silence the endless debates on articles like "color" and "grey". My only concerns are, again, for article titles, and for non-logged-in users who still might care. I also think, that if we implement this, a personalization feature is absolutely necessary. Lesgles (talk) 21:05, 18 February 2006 (UTC)
- The rules currently in place would silence the endless debates just fine if people followed them. Implementing this proposal would cause a host of problems of its own. — Omegatron 02:59, 6 April 2006 (UTC)
- Hi. My conception of this proposal is that it wouldn't cause as many problems as one might think. Here is how I imagine it would work: many editors would simply not bother with the tags, which is completely natural. But in pages where a spelling dispute came up, this would provide an easy solution, which would solve the dispute forever and leave the discussion page clear of page-long arguments over which spelling is better. Spelling is a much more divisive issue than it would first appear. When I am looking at a Wikipedia and I see "colour", I think "Why don't they use the more logical spelling? Even Fowler agrees with me!" I know that many Canadians, Britons, etc., when I use the spelling "color' in an article, probably think "Why did those Americans have to change the good old spelling? It was fine as it is!" Simple rules just do not seem to work in this respect. Lesgles (talk) 03:27, 7 April 2006 (UTC)
- Exactly. The rules work fine. The problem is with people who think that their variant of English is "correct" and refuse to follow the rules. — Omegatron 05:20, 7 April 2006 (UTC)
- No, you were right the first time. The rules would work fine. (Actually they are provably inconsistent and confusing, so I would question that as well.) A rule or a law should not be judged by how well it would work if people followed it, but by how well it does solve the problem. In this respect, the current rules are a failure. PizzaMargherita 06:42, 7 April 2006 (UTC)
- Precisely. The rules most certainly do not "work fine" at all. There is an obvious history to this problem: many of the very first editors (this can almost be regarded as Wikipedia's "pre-history") were Americans. Intially, then, there was American spelling where it might not have been appropriate. Now, there are far more Commonwealth/European editors, and many of them are rampantly changing spellings to Commonwealth spellings where it is clearly not appropriate. Those, like myself, who believe in "mild English spelling reform" (this includes not just Americans) become less motivated to create content when they know, for example, that a Web page about China or Israel is going to be changed to Commonwealth spelling (because of increasing anti-U.S. sentiment or whatever it is that is driving these people to change spellings inappropriately. BrianinStockholm 16:20, 10 April 2006 (UTC).
- No, you were right the first time. The rules would work fine. (Actually they are provably inconsistent and confusing, so I would question that as well.) A rule or a law should not be judged by how well it would work if people followed it, but by how well it does solve the problem. In this respect, the current rules are a failure. PizzaMargherita 06:42, 7 April 2006 (UTC)
- Exactly. The rules work fine. The problem is with people who think that their variant of English is "correct" and refuse to follow the rules. — Omegatron 05:20, 7 April 2006 (UTC)
- Hi. My conception of this proposal is that it wouldn't cause as many problems as one might think. Here is how I imagine it would work: many editors would simply not bother with the tags, which is completely natural. But in pages where a spelling dispute came up, this would provide an easy solution, which would solve the dispute forever and leave the discussion page clear of page-long arguments over which spelling is better. Spelling is a much more divisive issue than it would first appear. When I am looking at a Wikipedia and I see "colour", I think "Why don't they use the more logical spelling? Even Fowler agrees with me!" I know that many Canadians, Britons, etc., when I use the spelling "color' in an article, probably think "Why did those Americans have to change the good old spelling? It was fine as it is!" Simple rules just do not seem to work in this respect. Lesgles (talk) 03:27, 7 April 2006 (UTC)
- The rules currently in place would silence the endless debates just fine if people followed them. Implementing this proposal would cause a host of problems of its own. — Omegatron 02:59, 6 April 2006 (UTC)
- Strongly Support. This would greatly improve the readability and user-friendliness of the encyclopaedia. It would allow the resource to acknowledge both the wishes of Commonwealth and American speakers and cater to both of them simply and invisibly. Also consider the possiblity of language variations done by DEFAULT on all terms except those in quotes or with a special "not subject to dialect" tag, which would make the editors' jobs easier, there being generally less of the latter than of the former in articles. It is a beautiful solution to an otherwise endless, circular and bitter discussion based on such trivial matters as habit and education. For the titles, I propose that if a user searches for "colour" or types in the Colour address of the link, it would show Colour as the title and colour in the article, whereas if they'd searched the other form, the other form would be presented in the title and article. A little "alternative spelling:" tag can be put in where "redirected from:" is now for those articles, so viewers know what's going on. Arrenlex 06:19, 4 April 2006 (UTC)
- {{en:marking}} {{en:up}} {{en:text}} {{en:as}} {{en:proposed}} {{en:would}} {{en:be}} {{en:neither}} {{en:simple}} {{en:nor}} {{en:invisible}} — Omegatron 02:59, 6 April 2006 (UTC)
- If you really think that this is how wikitext is going to look like, then your vote is based on a complete misunderstanding of the proposal—or you are being deliberately misleading. I have some overdue comments to add, and questions to answer to, which will hopefully clarify this. Also please let's limit inline comments and use the discussion section below. Thanks. PizzaMargherita 06:31, 7 April 2006 (UTC)
- {{en:marking}} {{en:up}} {{en:text}} {{en:as}} {{en:proposed}} {{en:would}} {{en:be}} {{en:neither}} {{en:simple}} {{en:nor}} {{en:invisible}} — Omegatron 02:59, 6 April 2006 (UTC)
- Arrenlex, thanks for your comments.
- "consider the possiblity of language variations done by DEFAULT on all terms except those in quotes or [...]"—Please see the MOIO (Mark Only Invariant Occurrences) variant below and QA4 (new formulation).
- As for the titles, the question is not just "what should we do?" (whose answer I agree it's pretty much what you suggested), but also how to achieve it. Consider that titles already present some idiosyncrasies, so I don't expect this to be a trivial problem. PizzaMargherita 21:31, 18 April 2006 (UTC)
Q5
[edit]- Q5: Wouldn't this make Wikitext totally unreadable and a chore to edit?
- A5: Please explain how this...
- The response A5 is a non-sequitor which doesn't address the question of wikitext clutter at all:
- It doesn't answer 'yes' or 'no', but asks a rhetorical question in response.
- It ignores the fact that — and even can be entered and edited as literal text. Example, entered directly from my standard Mac keyboard: "I have a poor sense of humor—I'm a pizza after all. (P. Margherita)".
- It makes the ridiculous assumption that wikitext is as complex as HTML code. Wikitext is simple text editing for non-experts, while HTML is a structured markup meant for machine processing, whose usage is guided by a complex specification;. Here's an example illustrating why holding up HTML is an irrelevant strawman argument:
- a moderately complex wikitext fragment, from AAA:
== Entertainment == ;Arts * [[Aces of ANSI Art]] an organized body of artists dedicated to creating ANSI art, 1989–1991 * [[Adult album alternative]], a radio format * [[Against All Authority]] (''-AAA-''), an American DIY ska-punk band
- Equivalent HTML code:
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=AAA&action=edit§ion=2" title="Edit section: Entertainment">edit</a>]</div> <p><a name="Entertainment" id="Entertainment"></a></p> <h2>Entertainment</h2> <dl> <dt>Arts</dt> </dl> <ul> <li><a href="/wiki/Aces_of_ANSI_Art" title="Aces of ANSI Art">Aces of ANSI Art</a> an organized body of artists dedicated to creating ANSI art, 1989–1991</li> <li><a href="/wiki/Adult_album_alternative" title="Adult album alternative">Adult album alternative</a>, a radio format</li> <li><a href="/wiki/Against_All_Authority" title="Against All Authority">Against All Authority</a> (<i>-AAA-</i>), an American DIY ska-punk band</li>
- Gee, do you think anyone would get uptight about adding — to the HTML spec? —Michael Z. 2006-03-24 23:11 Z
- Fixed.
- Not true for nbsp, or at least my browser does not understand it and it splits the two words. Anyway, I don't believe that the majority of people enter mdash as a literal. Do you?
- That's funny, because appears four times in your signature. Anyway, your argument, if anything, strengthens my point, because if HTML is much harder to read and write/remember than the templates I'm proposing (and I couldn't agree more), then surely we should get rid of all HTML from the articles, which I don't see anybody campaigning for.
- Please keep your cool. Thanks. PizzaMargherita 07:53, 25 March 2006 (UTC)
- 1. Well, you added "no", but haven't explained how this would not be the case. In fact, adding so much more markup to words would make the text less readable, and more of a chore to edit.
- 2. Literal non-breaking spaces are preserved in the database, and returned in the editing field. As far as I can tell, neither literal non-breaking spaces nor the typed-out entity currently makes it into an article's HTML text (I first typed the s into my sig before en Wikipedia supported Unicode, and as far as I remember, they never worked, anyway). I left them there because after Unicode was first adopted, Unicode non-breaking spaces used to be broken by editing in a few browsers, but I think Wikipedia's interface has fixed that now.
- I have no idea if your browser support the literal Unicode non-breaking space in UTF-8 pages (Safari does).
- In fact, I do type non-breaking spaces ( ), en dashes ( – ), em dashes ( —, shift-alt-hyphen on the Mac), typographic quotation marks ( “...” ‘...’ ) and apostrophes ( ’ ) from the keyboard when it seems appropriate to enter them. I'm sure the majority don't, but there's a set of easy-to-remember keyboard combinations for any Mac user who would like to.
- 3. This does not strengthen your point. Adding more markup in the text would necessarily make readability/editability of wikitext suffer; it would make it more like HTML.
- I think I'm pretty cool about this, but I think the response to Q5 still makes a demonstrably incorrect statement, or at least asks a question carefully chosen to avoid addressing the issue. At best, you could claim that the reduced readability/editability of wikitext would be an acceptable trade-off for the benefits of the proposed extension. It is certainly false that readability/editability would remain exactly the same or be improved.
- A fair question would be:
- Q5: Wouldn't this reduce the readability of wikitext, and make editing more work?
- You ask me to explain why the answer to Q5 (as it is formulated) is negative. I must say that it is so obvious to me that words fail me. I appreciate that "the burden is on me to change the status quo", but I admit defeat, I cannot break it down any more than this.
- But maybe you can help me see the light. Can you please come up with a scenario whereby new editors would be deterred by such a {{en:monstrosity}} to the extent that they would choose not to contribute? Or could you please estimate how many keystrokes and mouse clicks would be wasted in the average editing session? Keep in mind that one needs not use the templates when editing. (Although I don't think this will change your estimate much.)
- I would like to point out that the syntax of the proposal is among the simplest instances of templates, which make regular appearance in our articles. And in fact it is very similar in its looks to wikilinks or external links. I venture saying, these templates probably look even simpler. Of course I would never dare comparing them to hyperlinks in usefulness, but do you think that wikilinks are "totally unreadable and a chore to edit"? On the contrary, I'm pretty sure that most newbies immediately grasp their meaning and behaviour without the need of reading one line of documentation. And even if they don't, even if they fuck up real bad, we all know that more expert editors can quickly fix their mess, so they can learn by example.
- As for HTML, I recognise that I didn't do my homework properly. I just checked, and my browser doesn't render correctly either(!) So you are absolutely right about that one. But I don't think that this affects the central point in the slightest. Let's see if I can explain myself more clearly. Correct me if I'm wrong, but I think you accept that a) WP articles contain HTML code, b) that HTML code is generally less readable than the syntax of the proposal, and c) that nobody is considering the unreadability and unmanageability of HTML code a reason good enough to get rid of it in WP articles. It follows that we should not oppose this proposal on the grounds that it's unreadable. Unless of course, one thinks that the inconvenience in editing outweights the benefits. If this is your opinion I respect it, but I, and many others, remain to be convinced. PizzaMargherita 22:39, 27 March 2006 (UTC)
- I simply thought that the response to Q5 was inadequate, along with comments elsewhere saying that all objections to the proposal had been addressed and accusing critics of ignoring that. Here I am, not ignoring it.
- The response to Q5 completely ignores that there is some downside or trade-off as part of the proposal. The proposal could be taken more seriously if 1) such a trade-off were acknowledged by the proponents rather than merely shrugged off, and 2) if there were one or more concrete demonstrations of its effects, rather than lots of unsubstantiated assertions.
- Some ideas for demonstrations:
- Pick ten random articles, and count the words and phrases, and number of occurrences of each, which would be enclosed in language tags.
- Pick a short article with as many "international" English words as possible, and mark up the wikitext, demonstrating how much or little clutter the markup would add, and giving an indication as to whether the choice of terms would be controversial or not.
- List a dozen short articles which would require no language tags at all, demonstrating that in many cases there would be no effect.
- This proposal is currently a lot of unsubstantiated opinions about vapourware. Why not bring it up to the level of real vapourware? —Michael Z. 2006-03-28 17:45 Z
- Some ideas for demonstrations:
- I've finally got around to add an additional question, Q5.1, which explicitly acknowledges the existence of a trade-off.
- Please note however that the original Q5 was not "carefully chosen to avoid addressing the issue". On the contrary, it is almost a word-by-word quotation of a critique that had been put forward (and addressed) some time ago. In that sense, I still think that the answer is very adequate. As is my claim that most critics (you possibly being the only exception) despite having had all their concerns addressed, don't seem to be prepared to continue the discussion on the merits and shortcomings of the proposal.
- Thank you for the suggestions you make. I think you are right, it's time to get something more concrete done, also given that there are now a few people that have expressed interest. Please bear with me (volunteers welcome). PizzaMargherita 21:00, 18 April 2006 (UTC)
Preferences
[edit]I, for one, will concern myself with how it looks to the thousands of readers who are not logged in, who do not have preferences set--and will argue if someone wakes this sleeping dog up by pointing it out to me by making such an addition of the markup, where I might let it slide otherwise (actually, I often don't notice varieties of English spellings any more). And, I do like a good argument now and then. Gene Nygaard 17:43, 26 March 2006 (UTC)
- Please see QA3. Now you're going to tell me that we'll have problems with the myriads of readers behind an anonymising proxy, right? PizzaMargherita 18:38, 26 March 2006 (UTC)
- For one thing, that isn't what Q3 said up until a few minutes ago. Now much of this discussion on this page is going to be misleading, because it is based on what Q3 and A3 used to be.
- Second, even if you could determine users' locales accurately, that doesn't guarantee you can determine their personal preferences therefrom. Gene Nygaard 19:17, 26 March 2006 (UTC)
- As anybody can see, the previous version of QA3 was merely a diluted form of its current version (as the edit comment suggests), and anyway it still countered your argument. At any rate, as I have specified at the top of the QA section, this proposal is constantly changing, also (but not in this case) thanks to the constructive criticism of opposing users like yourself. It is not my intention to mislead anyone, and I trust that neither is yours.
- With your second point, what you are basically saying that the members of expatriate communities that do not hold an account would not be able to read articles in their native (or otherwise favourite) dialect. Well, first of all, I challenge you to prove that it would be a worse situation than the current one, where one simply doesn't have that choice. And secondly, if they don't use an account, they won't be able to set any other preference either. And I don't see a lot of people without an account complaining about the fact that they can't have, for example, a watchlist. We'll simply add this feature to the list of benefits of having an account.
- Thanks for your feedback, please keep it coming. In particular, you may want to follow up my other replies to your comments above. PizzaMargherita 20:32, 26 March 2006 (UTC)
MABIO vs MOIO
[edit]In this subsection I have moved discussions about an alternative implementation. PizzaMargherita 11:25, 9 January 2006 (UTC)
- Ok, I suppose a central database of dialectic versions of words would eliminate redundancy. However, there are two opposite approaches to referencing this database. Either the tagging is done on every word that is to be rendered by the "dialect engine", or the tagging is done on every word to be skipped by the engine - e.g.
... was a member of the UK {{nodialect:Labour}} party.
The former approach allows for a more efficient rendering algorithm which just has to be called on finding a tag, but means people have to tag every instance of every word in every article (redundancy); the latter approach means quite a simple job for editors, as all they have to do is exclude the odd word, but it means the algorithm needs to sweep through every word in an article and look up matches in the database. --Splidje 10:10, 13 December 2005 (UTC)
- Ok, I suppose a central database of dialectic versions of words would eliminate redundancy. However, there are two opposite approaches to referencing this database. Either the tagging is done on every word that is to be rendered by the "dialect engine", or the tagging is done on every word to be skipped by the engine - e.g.
- I see what you mean. I prefer the former approach, because I think the latter is more difficult to implement and it's less explicit in what it does. I don't think the former is such a burden for editors. Also, the latter wouldn't prevent somebody that is unuware of the mechanism to change one spelling to another, with no net effect on the article. I mean, with the latter approach, writing "colour" or "color" doesn't change the end result (provided one has set the locale in the preferences), and so there is still no equilibrium to speak of.
- Anyway I think now the real problem is to get enough people to agree that some form of tagging for dialects would be welcome. Then we can worry about the details. PizzaMargherita 12:06, 13 December 2005 (UTC)
- True. How does one go about doing that? Does wikipedia / mediawiki (this is a property of the software) have a mechanism for rallying support behind something? --Splidje 13:33, 13 December 2005 (UTC)
Regarding the discussion about two possible methods of implementation, I believe that only marking words that should have their spelling "forced" is preferable because it is much simpler for the editors.
Advantages:
- Every article is automatically affected.
- No need to worry about which marking words with variable spelling.
- Editors can use their preferred spelling without worrying about it. They can write colour or color when they edit but it will appear in the dialect of the user when it is displayed.
- In fact, this makes the feature mostly transparent as each editor edits and views in his/her preferred dialect. Many of them won't even realize it! This is especially good for new editors. It also reduces conflict.
- The spelling in an article will always be consistent.
- Only the most die-hard "spelling partisan" will go to the trouble of explicitly forcing every word to be in his/her preferred spelling.
Disadvantages:
- Some words that should only appear using a particular spelling would be inadvertently changed, although I'd imagine most of these words would be capitalized or tagged to begin with and could be automatically ignored.
- This fact might go unnoticed by some editors because of the transparent conversion.
An option to show spelling variations in the preview would be useful for catching any problems with this system.
Disadvantages of the other method (explicitly marking words with variable spelling):
- There are many spelling variations and keeping track of and marking them all is a chore. The editing process becomes much more complicated.
- Many people will not even be aware of such a feature, or simply won't bother marking their words.
- Because of the above two points, many articles will still have inconsistent spelling.
- Spelling partisans won't cooperate anyway.
- Every article needs to be updated. This is a monumental task.
Advantages:
- Where it matters (i.e. when a word MUST be spelled a particular way) the spelling in articles stays the same unless someone manually edits them.
Finally, some more thoughts on the subject in general: Although writing about U.S. subjects in American English (for example) might seem like a good standard for selecting a spelling dialect, it really doesn't benefit the reader. Most people would prefer to read words in the way they are accustomed to seeing them, regardless of the subject matter. The advantage of having selectable dialects is that every reader gets to view the page as they prefer to see it. It also allows people to read an article using unusual (for them) spelling in case they feel it makes the article more colouful or they are interested in what the differences are. I don't think it's that difficult to implement and automatic dialect selection based on IP address is easy. Jared Grainger 06:38, 8 January 2006 (UTC)
- Hi Jared, thanks for your thoughts. I agree with many things you say, but I think you missed an important point. The "mark-only-invariant-occurrences" (MOIO) implementation (e.g. UK {nodialect:Labour} Party) does not lend itself to a gradual introduction. I.e. when it's introduced, a lot of pages that do not have the "nodialect" tagging will instantly change from correct to broken. Ok, on the other hand a lot of the articles will instantly change from "debatable" to "consistent", but I don't think it's worth it, and it would be a drastic change that, like with robots, I think it's better to avoid. With the other "mark-all-but-invariant-occurrences" (MABIO), the WP will naturally evolve from the current state to a better one. And as I think you mentioned, in the MOIO scenario, if a UK editor writes "Labour party" without tags, s/he doesn't realise that it's a problem, because that's what he would read back. However, all non-UK readers would see "Labor Party", which is not just debatable, it is plain wrong.
- Anyway, as I said, what this proposal needs now is support. Once we have that, we can start looking into the implementation in more detail. PizzaMargherita 11:57, 8 January 2006 (UTC)
- As I said above, the software could be set to ignore words in certain styles (e.g. capitalized, tagged, italics) which would probably take care of 99% of the problems. The multi-dialect preview option would also help in this regard. I believe the advantages of MOIO clearly outweigh those of MABIO if the main problem of MOIO (words that must be invariant, which is also MABIO's main advantage) can be resolved. Would you agree on that point?
- Additional research into words that must be invariant would be useful.
- Regardless of how it is implemented, I am in favour of some way of converting dialectsJared Grainger 17:59, 8 January 2006 (UTC)
- As I said, I'll be happy to continue this debate on the particulars of the implementation once we have a general agreement that 1. we have a problem and 2. it's a problem worth solving. I think some people are in denial right now. Once again thanks for your ideas, keep them coming. PizzaMargherita 19:11, 8 January 2006 (UTC)
- Well then, perhaps you should start a another poll directly above your original one. Something like "Show your support for or against the displaying of pages in the user's dialect, assuming the details can be worked out before it is implemented." That would probably get more people to vote since it leaves the technical issues aside. Then they can proceed to the more specific poll if they feel they understand the technical issues well enough. Jared Grainger 19:32, 8 January 2006 (UTC)
In response to Splidje's original question: yes this would be complete overkill, a waste of programmers' time, would complicate wikitext and become a big inconvenience for all editors. Articles will end up in a worse mix of English dialects, as they will end up half-dialectified, while most editors try to work around, ignore, or remove the dialect tags littering them.
If you really want to work on the software, it's probably best to develop an extension to Mediawiki, and discuss it there. I will oppose adding anything like this to English Wikipedia.
Unobtrusively tagging articles as suggested at the very top of this discussion might be a good idea. But why don't we all go improve some articles instead of generating more words about this proposal? —Michael Z. 2006-01-8 18:28 Z
- This is what I expected to hear eventually. The programmers have spoken and they don't want to bother with it. Oh well, no use discussing it any further I guess....
- However, you didn't comment on the "transparent" MOIO method. What are your thoughts about that implementation method (besides the fact that you don't like the idea in general), Mr. Z programmer? Jared Grainger 19:00, 8 January 2006 (UTC)
- Am I Mr. Z programmer? If by transparent, you mean that all text gets automatically converted, that would a very bad idea. How does the server determine reliably whether some text is a direct quotation or a proper name, or not? Single quotation marks, double quotation marks, and italics can all be used to mark quotations, and for other uses. How does the server know which sense of "tire" or "curb" is being used? —Michael Z. 2006-01-8 22:52 Z
- I think Michael hit the nail on the head here, it looks like any automation is potentially troublesome due to homographs. PizzaMargherita 23:45, 8 January 2006 (UTC)
Micheal, you might not have read what I (Jared Grainger) wrote about this above. The idea is that the software ignores anything "special," such as capitalized words, italics, words in quotes, etc. which would probably be correct the vast majority of the time.
The few problems that arise with homonyms can be corrected with tags whenever someone spots them and the editing page could optionally display dialect words and their alternatives to quickly check this when editing old stuff or writing new stuff. True, a few errors will crop up in older articles but they will be edited away with time and it's not like the articles out there are flawless anyway.
Your example about tire/tyre being used as a verb would make an interesting case study. How many times is tire ambiguosuly used as a verb in WP? I'd guess that most articles are written in the past tense, so they'd be tired but tyred isn't a real word. Some made up examples:
- He grew tired of the war...
- ...was quoted as saying "this tires me."
Neither of those would cause a problem with software that was properly written because tired wouldn't be in its dictionary and it would notice that tires is surround by quote marks.
To test this further, I searched wikipedia with Google for "tire" and scanned the summary text of the first 100 matches for possible problems. Here are some direct excerpts:
- Tire irons are also...
- The Goodyear Tire & Rubber Company...
- Canadian Tire is a Canadian retail...
- Tyre (tire) characteristics...
- ...such as tire (tiri-) [an article on Elvish]
- ...when the eye's photoreceptors, primarily those known as cone cells, "tire" from the over stimulation...
- ...words like tire and jail ... [talking about differences between dialects]
Note the Google summaries don't include italics so there were one or two that looked like they would have caused a problem but were actually italicized in the article.
- tire was only used as a verb once, and it was in quotes
- Most editors correcly italicized words when referring to the word itself (e.g. tyre is spelled tire in the U.S.)
- Tire was always capitalized when used as the name of a company
The only problem I saw was when the word was at the beginning of a sentence (and therefore capitalized) but again, software could easily handle this situation as there's no proper name composed of the single word Tire (i.e. it's always "Tire Kingdom" or something like that). Therefore, dialect words that are capitalized at the beginning of a sentence and are not followed by another capitalized word (e.g. Tire chains are a...) would be treated as a simple dialect word.
In conclusion, out of 100 articles, I only saw one instance where my software design would have made a mistake: "tyre (tire)." Regardless of the user's dialect settings, this would be simple to spot and correct. They'll see "tyre (tyre)" or "tire (tire)" and realize that it's an old article and will remove the (tyre)/(tire part. A tag isn't even required in this case.
I also did a quick check on tires before posting this and the results were similar. I only checked the first 50, but I only found one problem phrase: "tires ('tyres' in the UK)."
So there is some data that supports my belief that simple software rules will work almost all of the time and that the few mistakes that slip through will be easily spotted and corrected with a single tag.
Can anyone find a word that is commonly used in WP in situations that would be erroneous depending on the dialect? Remember, it doesn't count if...
- it has special formatting
- it is part of a quote
- it meets the criteria I mentioned above for words at the beginning of a sentece.
...because my software design would skip such words.
I get the impression that there are a lot of closed minds involved in this debate, but I always try to keep an open mind. So if anyone can find a major flaw in my idea that cannot be easily remedied with simple software I will be the first to abandon it. Jared Grainger 01:07, 9 January 2006 (UTC)
- I'm mildly in favour of automation, but only as an optional editing tool for new material, and in a MABIO framework. In other words, as I think you said, when one is editing we can have a checkbox saying "scan the diff for variant words and mark them for me if they don't have special formatting", and in the preview they could be shown marked, or something. So if I write "Joe did some labour for the Labour Party", and I choose to check the checkbox (i.e. nothing automatic is happening behind my back, which I would be against), then the preview would show "Joe did some {en:labour} for the Labour Party". On the other hand, if you write "Labor has been underpaid." and use the tool, the tool mis-identifies this occurrence as an invariant. At that point, the only option for the editor is to explicitly type 7 more characters: "{en:Labor} has been underpaid." Even so, frankly I'm not convinced that saving typing 7 characters/occurrence is worth the effort and if it deserves to clutter the edit page with a third checkbox. PizzaMargherita 07:56, 9 January 2006 (UTC)
- First of all, you apparently aren't reading what I write because I wrote a whole paragraph explaing why capitalized words at the beginning of are easy to fix in nearly every case.
- Secondly, you're talking about automation but only in your framework (MABIO) from your POV and then you start disparaging it, leading to reader confusion about my method (MOIO).
- Thirdly, "clutter the edit page with a third checkbox???" I'm sorry, but I find this ridiculous. I offered this idea as a compromise of sorts because I believe your view has some merit, even though I find it inferior to mine.
- BTW, thanks for taking care of all the chores of moving stuff around and reorganizing. Jared Grainger 20:45, 9 January 2006 (UTC)
- "the vast majority of the time"
- "True, a few errors will crop up"
- "out of 100 articles . . . I only saw one instance"
- "I only found one problem phrase"
- "will work almost all of the time"
- That just doesn't seem good enough to me. The software you envision will work right, say, 99.9% of the time. You also say that it "ignores anything 'special,'", so in some percentage of cases it will not change a dialectic word when it ought to (to name one example, when quotation marks indicate a translation instead of a quotation). So the software would change the dialect of articles, with some failures and some false positives. Plus some articles will be only partly tagged for dialect. Plus others will be untagged. Plus wikitext gets peppered with a new template, and to set these correctly editors have to be familiar with several English dialects and understand the complex logic by which the template fails.
- Wikipedia's English gets less consistent. Editing plain text becomes more complex. Editors get to argue about what is correct U.S. English, and U.K. English, and Canadian English, and Indian English . . . What benefit do we get for this cost?
- Let me put this another way: I'm Canadian. For every Canadian editor dialectifying articles, there are probably ten writing them. For every Canadian editor writing, there are probably twenty writing articles using other English dialects. The dialectifier can tag maybe 0.01% of all articles? Cost of added dialect tags in running text: more than zero. Benefit to me: practically zero. Resulting cost/benefit ratio of this scheme: practically infinite.
- Or another way: you say your software idea will almost never change the sense of a sentence. That's not good enough. I'd rather read an article which is simply written in New Zealand English, than one that's 99.99% reliably machine-converted to Canadian English. I don't want the server changing the sense of a sentence or paragraph ever!
- Or another: can somebody point to an example of such software that works well? —Michael Z. 2006-01-9 08:03 Z
- "the vast majority of the time"
- "True, a few errors will crop up"
- "out of 100 articles . . . I only saw one instance"
- "I only found one problem phrase"
- "will work almost all of the time"
- That just doesn't seem good enough to me. The software you envision will work right, say, 99.9% of the time. You also say that it "ignores anything 'special,'", so in some percentage of cases it will not change a dialectic word when it ought to (to name one example, when quotation marks indicate a translation instead of a quotation). So the software would change the dialect of articles, with some failures and some false positives. Plus some articles will be only partly tagged for dialect. Plus others will be untagged. Plus wikitext gets peppered with a new template, and to set these correctly editors have to be familiar with several English dialects and understand the complex logic by which the template fails.
- I think maybe you're confusing the two methods. My method (Christened MOIO by Pizza) doesn't use much in the way of tags or templates except when necessary to force a word to be ignored by the automatic conversion. (internal "tags" can be used for processing efficiency but this is completely transparent to the editors/readers.) In my research, as shown above, NO tags would have been required because the only "mistake" it would have made was when someone was attempting to show the difference in dialects (e.g. some wrote "tyre (tire)") Such phrases are obsolete with dialect conversion and can be removed. Any kind of tags or special editing considerations would be very rarely necessary in my design. Jared Grainger 20:07, 9 January 2006 (UTC)
- Wikipedia's English gets less consistent. Editing plain text becomes more complex. Editors get to argue about what is correct U.S. English, and U.K. English, and Canadian English, and Indian English . . . What benefit do we get for this cost?
- Well, I disagree that consistency would decrease and that and editing plain text would become more complex. With transparent conversion the user writes plain text (simple) and the reader see it in his dialect (consistent, unlike now where there are many different dialects and sometimes mixtures). In the event of a dialectical error, which should be rare in my estimation, someone adds a tag to force the word to be ignored by the dialect engine. I'm sure you'll disagree, but I can't help that.
- However, I don't understand why you think this would cause more arguments about correct forms of English. Could you please explain, preferably with an example of some sort?
What benefit do we get for this cost?
- Well, here are the benefits as I perceive them:
- Almost a complete end to conflicts over spelling
- Consistent spelling over all of WP, not just consistent within an article.
- More comfortable reading for users
- Articles look more professional (some people think dialects of other countries are "crude").
- People are less likely to be "turned off" by "misspelled" words
- The ability to view an article in a different spelling dialect for fun or educational purposes. I can learn about many different spelling dialects just by switching the page. Maybe I'm planning a trip or moving to another country and want to get used to alternate spellings.
- Well, here are the benefits as I perceive them:
- And the disadvantages:
- Additional resources required
- Errors will be introduced into articles
- All other disadvantages (complexity, etc.) stem from these errors. If the number of errors is small enough, and they can be handled easily enough, then I believe the advantages outweigh the disadvantages. Jared Grainger 20:07, 9 January 2006 (UTC)
- Let me put this another way: I'm Canadian. For every Canadian editor dialectifying articles, there are probably ten writing them. For every Canadian editor writing, there are probably twenty writing articles using other English dialects. The dialectifier can tag maybe 0.01% of all articles? Cost of added dialect tags in running text: more than zero. Benefit to me: practically zero. Resulting cost/benefit ratio of this scheme: practically infinite.
- Or another way: you say your software idea will almost never change the sense of a sentence. That's not good enough. I'd rather read an article which is simply written in New Zealand English, than one that's 99.99% reliably machine-converted to Canadian English. I don't want the server changing the sense of a sentence or paragraph ever!
- Okay, that's fair enough. Clearly our opinions are irreconcilable. I believe that consistency, reducing conflicts over spelling dialects and making reading more comfortable is worth the possibility of a few errors.
- I would like to point out that spelling variations rarely affect the meaning and even when it does happen it's almost inconceivable that the meaning would really be changed. The result might be technically incorrect, but should still be understandable. I believe that far more errors in grammer, spelling, and meaning are produced by humans in the course of writing an article than would be produced by transparent conversion of spelling dialects. But again, that's just my opinion and obviously you disagree.
- But here's something different: I don't know of any words that are likely to cause serious problems for my software design, but what do you think about a limited scope version for "very safe" words? I highly doubt there would be any problems with words like capitalisation and capitalization. Would this meet your expectations of 100% accuracy? Jared Grainger 20:07, 9 January 2006 (UTC)
- Or another: can somebody point to an example of such software that works well? —Michael Z. 2006-01-9 08:03 Z
- Would that change your mind? There may be programs out there or a simple prototype program could be written. Would you be in favour of transparent conversions if a demo that "works well" could be found/produced? What about you, Pizza? Jared Grainger 20:07, 9 January 2006 (UTC)
- I'm increasingly of the idea that any attempt to automate tagging (be it with robots or MOIO) is not only risky, but it's also missing the nature of the problems we are trying to solve. As I see it, the overall goal is not to make WP spelling consistent overnight. The goal is to set a point of equilibrium to which everybody can help converge to (if it makes any sense at all to speak of equilibrium in WP), at the same time settling all dialect disputes instantly. PizzaMargherita 12:08, 9 January 2006 (UTC)
- I already discussed the advantages and disadvantages of the two methods above so I won't reiterate them here. Your idea isn't bad, and it would definately help somewhat, but I don't think it's the best solution and it could backfire.
- You aim for gradual acceptance, but what if your MABIO tags are introduced and subsequently become unpopular? A year later they might even become "deprecated" and discouraged from use, only provided for backwards compatability.
- With my method, the articles don't even need to be "unrolled" if MOIO falls into disfavour; only the article rendering software needs to be disabled because all spelling conversions are done automatically and the original text with the original spelling is still there! Only the rare invariant tags would remain in a few articles. Jared Grainger 20:07, 9 January 2006 (UTC)
- When discussing efficiency, which I think would be another problem (real or perceived) for MOIO, you suggested:
- The MOIO method would be somewhat inefficient if improperly implemented (i.e. scanning every word every time the article is viewed), but all that needs to be done is to scan the text when it is saved and mark all the dialect words.
- Now, this suggestion is what I would call computer-aided MABIO, which I don't oppose. On the other hand, I agree with Michael that any demonstrably fallible automation (which pure MOIO, your research tells us, clearly is) should not happen, or at least not unbeknownst to the editor - hence the optional checkbox.
- When discussing efficiency, which I think would be another problem (real or perceived) for MOIO, you suggested:
- As I explained when replying to Michael, the possibility of adding "tags" for efficiency is transparent and completely internal so that not every word needs to be checked when converting dialects purely for efficiency reasons. No human would ever see those "tags."
- As for either implementation "falling into disfavour", MOIO would be affected in the same way as MABIO, because as you say you would be left with invariant tags. For both implementations, however, the "problem" can be immediately resolved with robots, because the wikitext at that point is tagged, and there can be no replacement errors. PizzaMargherita 20:39, 9 January 2006 (UTC)
- You are correct.
This brings to mind another advantage of my method: it can be easily deployed on a "trial basis" covering as few or as many articles as desired and just as quickly unrolled.
Imagine this sceneario: one day a new item appears in eveyone's preference allowing them to test the new "transparent spelling dialect conversion" feature, along with a wiki explaining it. People who don't want to use it don't have to change their preferences as it will be disabled by default and unregistered users won't even have the option. Those people still see articles in the dialect in which they were originally written. Comments are gathered over time and the community gives their feedback on the new system.
The beautiful part is that it is all done in software and doesn't affect the original articles. Rolling it back would be instant (just disable the option in user's preferences) and no work would be lost because people didn't have to manually tag articles in the first place, as with the MABIO method OK, the effort required to create a handful of invariant-word tags would be wasted but that's nothing comparing to losing thousands of tags on every dialect word that the MABIO method requires if it needs to be rolled-back.
How can you deny that this is a good idea? Especially when you add in all the other advantages/disadvantages I wrote about earlier. These points can be added to my list:
MOIO
- Disabled by default
- Covers every article automatically
- More consensus/community friendly. Everyone has a chance to try it and comment about it
- Passive, people can continue edit as normal without participating in the system.
MABIO
- No way to let users "try before they buy" -- tags must be implemented first and many articles must be changed before people have a chance to sample the system.
- Active participation required on the part of all editors or the results get in a muddle.
- If it fails, everyone who spent their time tagging dialect words will feel cheated.
If after reading the above, the two other people involved in this discussion continue to insist that "provably fallible software is bad regardless of the success rate" and "we need to gradually reach equilibrium" and no one else joins in the discussion then I will give up. Jared Grainger 21:38, 9 January 2006 (UTC)
Conditional text to support primary and secondary languages
[edit]It has been a controversy on how national variations of English spellings should be used in the wikipedia. Despite the manual of style guidelines, contributers continue to engage in edit wars. I was wondering why the wikipedia developers don't implement some kind of sub-language support as some kind of templates. The concept of sub-language support is used for quite sometime in Microsoft Windows and in JAVA resource management. It shouldn't be too hard to add the support to the server when the HTML is generated and get this issue over with for good. The users can specify what their first and second choice of sublanguage is. The rule of thumb is to always use a matched preferred spelling; if no match, try second preference; if no match again, always fall back to the US English because it is the most widely used spelling on the Internet right now. For example, the following wikipedia text will be shown differently according to user preference:
{en_US:Armor|en_UK:Armour} is {en_US:spelled|en_UK:spelt} armor in the US and armour in the UK.
For a US user, with preference set to US/US, it shows the first matched US English
Armor is spelled armor in the US and armour in the UK.
For a UK user, with preference set to UK/UK, it shows the first matched UK English
Armour is spelt armor in the US and armour in the UK.
For a Canadian user, with prefrence set to Canadian/UK, it shows the second matched UK English
Armour is spelt armor in the US and armour in the UK.
For an New Zealand user, with preference to be New Zealand/Australia, it shows the default US English
Armor is spelled armor in the US and armour in the UK.
I understand this suggestion may open a new can of worms because now many articles may grow double in size because of the multiple conditional spelling variations. However, this usage is purely optional. I beleive only the most controversial topics would receive a multi-lingual treatment by the contributors. Likewise, not every article became a spelling edit battleground. So I guess it will not be as bad as the edit war situations. Any comment?
My alternative suggestion is not quite the same as the tagging proposal on the top of this talk page because there is no need to look up the variations from a multi-lingual dictionary nor database, ie. no surprises when the database could be mismanaged or corrupted and no additional maintenance tasks required. The different spellings are stored in-line in the template and are in total control of the contributors. They can be maintained just like any other contents on the page.
In my example, I only showed the use of sublanguage, I didn't show the use of the primary language. One example would be {en:fjord|en_NZ:fiord}, where the default spelling is specified along with the only exception. In this example, all the common spellings can be lumped under the primary "en" code, only the New Zealand exception is added under its own sublanguage code "en_NZ". Kowloonese 23:21, 13 April 2006 (UTC)
- Hi Kowloonese, thanks for your ideas. Funnily enough, that's how my proposal started. As you say, it does not require DB lookups—although I don't quite understand how WP would work out that Canadian English maps to Armour/spelt in your example. However, I think that the potential explosion of wikitext, when taking into account the various sublanguages (the ones you mention is not the complete list), would make it more difficult to gain widespread acceptance. I mean, some people at the moment are (in my opinion unreasonably) opposing the "tagging" proposal because it's a "burden [...] on editors wading through the edit screen", it "makes editing far harder", it's a "large workload for [...] article editors", a "needless waste of editor time", and "overcomplicated editing".
- As for which dialect should be the default, I suspect your solution (US) would meet strong criticism. Consider that most readers do not have an account, let alone their preferences set. See QA3 for my suggestion. PizzaMargherita 08:35, 19 April 2006 (UTC)
- In my proposal, each user can have first and second preference. In my Canadian example, the user set Canada/UK as his preferences, hence he got the second match for UK English because Canadian English was not available in the template.
- I guess it is quite obvious that US English is the Internet English out-numbering other variants by a big margin, though it does not imply US Engish is the proper English that depends on opinion. But if you need to pick a default, the Internet English would be the best choice because of its current widespread usage.
- In my proposal, the editors have the option to specify the primary spelling, which will be the default even for people who do not log in. e.g.
{en:Labour|en_US:Labor} Party should be {en:spelled|en_UK:spelt} Labour Party because it is a British entity.
- US users will see
Labor Party should be spelled Labour Party because it is a British entity.
- UK users will see
Labour Party should be spelt Labour Party because it is a British entity.
- unlogged in users will see
Labour Party should be spelled Labour Party because it is a British entity.
- To minimize the wiki-text explosion, perhaps a more compacted syntax can be used, e.g.
{en:Labour|US,NZ,CA:Labor}
- the most common spelling is specified as the primary and default (en) spelling, only the exceptions are listed by their country codes. Another option is to tie most countries into either the UK or the US spelling group. The two major spelling groups will address most of the common situations hence the text explosion would be limited to 2X in most cases. Other exceptions can still be handled by listing each country separately as shown above. Actually, the intention of my proposal is not to make every single article multi-lingual, that would guarantee the text explosion. The bias of using one spelling vs. another should still be following current guidelines, use the native spelling e.g. a British topic article should use UK spelling, a US topic article should use US spelling, a New Zealand topic article should use NZ spelling etc. Only when an edit war starts on certain common topics, the template would provide a solution to resolve the conflicts. The extent of the text explosion should be proportional to the extent of edit wars, otherwise everything should stay status quo. Kowloonese 20:16, 20 April 2006 (UTC)
Tagging pages: Default spellings in non-English speaking countries
[edit]If a non-logged-in user comes to Wikipedia from a country (determined by IP) whose primary language is not English, what should the default spelling be? I would suggest International English, whish is mostly based on British English except that it uses -ize instead of -ise (both are actually acceptable in BE, but -ise is taught in schools).
- Good point, I never thought of that. Your suggestion seems very sensible. PizzaMargherita 05:24, 15 May 2006 (UTC)
- For the record, "International English" has so many different meanings that it's useless. The most common spelling system used among non-native English speakers is American. This is mostly because of (non-Hong Kong) China, Japan, the Koreas, South American, Russia, and, generally, Eastern Europe. But most (though not the vast majority) of Western Europeans use British spelling right now (though this changes partly depending on how much they believe British foreign policy is better than American foreign policy), and nearly all of Africa, and all of the England's former colonies. It's actually not far from 50-50, but the majority use American spelling. (As for the actual language used -- that is, pronunciation and vocabulary, "International English" is overwhelmingly American.) I would recommend having the default be American English, not simply because it's more widely used internationally, but because it's more suited to non-native speakers. After all, this was the whole point of Ben Franklin's and Noah Webster's reforms. When you, for example, take away the -ous from rigorous, you should get the spelling of the noun, rigor, which you do with American (and Shakespearean) spelling. And when adding -ing, you should double the final consonant only when the last syllable is stressed. This helps people with pronunciation. Etc., etc. (Note, though: British punctuation is better suited for international use than American. Wiki policy got that right!) BrianinStockholm 07:45, 26 May 2006 (UTC)
- Detractors point out that the Franklin/Webster's reform had no respect for etymology, which could indeed help foreign learners. Rationality is in the eye of the beholder. Your comments about International English having no clear definition (making it useless for our purpose) are noted. Thanks. PizzaMargherita 09:15, 26 May 2006 (UTC)
- I didn't precisely say it was rational, but I agree that a lot of Webster's suggestions were absurd. Fortunately, most of the absurd suggestions were not taken up by Americans, and most spelling reform advocates agree that current American spelling (combined with current British punctuation) is the best option for "mild" spelling reform. But, anyway, I don't mean to be polemical, just wanted to correct the other person's anglo-centric error. As for a concrete suggestion, perhaps we could do a Google search for usage using the site:X switch to help figure this out? For example, searching /color site:se/ and /colour site:se/, and comparing the results? (Though /color site:mx/ would of course skew the results towards American English! So we'd have to be careful. With Spanish-speaking countries we'd have to use other words -- "center"/"centre", for ex.) Again, thanks for your efforts with this proposal. I think it's a great idea! BrianinStockholm 09:39, 26 May 2006 (UTC)
- Ok, thanks. So how about this: we keep Google hits as a general guideline for what would anyway be a manual process. And we will probably not want to create a profile for each country, we just associate IP areas to existing varieties of English (e.g. Mexico -> American English, France -> British English), as opposed to a more fine-grained scheme: (Mexico, en:foetus)->fetus, (France, en:diarrhoea)->diarrhoea. PizzaMargherita 11:16, 26 May 2006 (UTC)
- Excellent idea. There should probably also be a little box somewhere indicating what the default preferences are for a particular user (IP address), with the option of changing the default. Having the option of changing the default without someone actually logging in might be difficult, but if it's so important for someone living in Mexico to read Wikipedia articles in British English, or for someone in France to read them in American English (or Canadian or whatever), forcing them to log in/establish an account doesn't seem like asking too much. BrianinStockholm 12:08, 26 May 2006 (UTC)