Talk:Mapping of Unicode characters
This redirect does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Article creation
[edit]I added the summary/categorized table of the UCS as I said I would on the UCS dicsussion page. I think if anyone feels the table should be narrowed, the decimal start and end could be omitted without much loss of readability. As I said in edit summariesa and over at the UCS discussion page, I'd like to also add links from within the table to sections of the mapping of unicode characters article and other articles too. I think each of the broad categories (lettered A through N) should be discussed in this article. Then links from each script-block could go to the article on the script or to an ariticle on the script in unicode/ucs.
So I plan to add the folowing sections to this article:
- Scripts (Modern and Anicent)
- Phonetics
- Unified Diacritics
- Unified Punctuation
- Symbols
- Numerals
- Musical Notation
- CJK and Unihan
- Compatbillity characters (legacy and others) and normalization
- Control characters, format characters and variation selectors
- Surrogates
- Private Use Code Points
Anyone else is welcomed to jump in on these tasks. --Indexheavy 01:16, 25 April 2007 (UTC)
On the charge of editorializing
[edit]I can understand how you might read it that way in isolation. I'm not trying to editorialize so much as help make the distinction between semantic characters and glyphs clearer. Many people cite it as a mantra, but don't necessarily understand it (now I'm editorializing). The point that this section (and what I plan to add to the linked main article) is to show that UCS lists the characters according to their glyph names. Meanwhile Unicode adds alias names that try to get a t the phoneme semantics. Right now its a hybird that helps serve as an excellent example of this distinction so often cited (numerals too, though less so). I hope that makes it clearer what I'm trying to do there. In the past many of these articles have simply been long lists of Unicode characters (at one point there was a single article deovted to every character). I didn't find that very encylopedic. I think here at wikipedia we serve readers better by expositing and providing examples and fleshing out these categories and expositing on some of the idiosynncratic characters (like the phoneme characters). --Indexheavy 02:55, 30 April 2007 (UTC)
- BTW, perhaps I'm not understanding correctly what you thought was editorializing. Please respond here to clarify. Indexheavy 02:56, 30 April 2007 (UTC)
the problem with the Unicode consortium is that they seem to think their character names are self-explanatory. Except in some cases where they for some reaosn or other feel disposed to add a gloss. This is a problem (the recent addition of the cuneiform range really drove this home to the point of ridicule), and should be duly discussed, citing notable sources. But so far it is your choice to give such weight to the character name. A name is just that: a unique tag for a codepoint. The actual reason for encoding a character is buried in proposals somewhere. Thus, for a sourced discussion of why a character was encoded and not another, you have to dig up these proposals, study them, and quote from them. Just drawing your own conclusions from the names in the character charts is not helpful and violates WP:OR and specifically WP:SYN. dab (𒁳) 11:43, 2 May 2007 (UTC)
- For some reason I missed your comment here until now. I wasn't ignoring you, I just didn't see it. I'm sure there are all sorts of interesting storeis, behind the scenes disputes and whatnot surrounding the Unicode and UCS. I'm not trying to write about that (nor do I have any expertise or sources on it). I'm trying to write from the Unicode Standard and the other publications of the Unicode consortium on their rendition of the "mapping of unicode characters". You're accusing me of violating WP:OR, yet I say again, I'm the only one who has added a reference to this article. I understand I could use some more specific references, but its quite disingenuine to accuse me of OR when not a single reference existed for this article until I began my edits. Secondly, on the charge of violating WP:SYN, I'm drawing only from the Unicode Standard (which is what I'm most familiar with) and not synthesizing from multiple sources as the policy outlines. I'm also not trying to advance a position. Perhaps if you told me what position you fear I'm advancing we could clear the air and I could try to avoid that misperception as I draft and redraft my material. Indexheavy 09:59, 9 May 2007 (UTC)
Indexheavy
[edit]Indexheavy, before you continue "overhauling" this article, may I ask you to cite your sources. Your "semantic phonemes" and "semantic characters" etc., while well-meant, simply add to the confusion (as I argued here). You want to "help make the distinction between semantic characters and glyphs clearer". I appreciate the thought, but at present you are not exactly helping. First of all, show that your usage of "semantic character" (as opposed to simple "character" is in any way endorsed by Unicode. Unless you do that, I'm afraid we'll have to deep revert to April 25. thanks. dab (𒁳) 11:37, 2 May 2007 (UTC)
- Just to help you understand where my terminology comes from, here's a useful quote from The Unicode Standard 5.0 (p15): "The Unicode Standard draws a distinction between characters, which are the smallest components of written language that have semantic value, and glyphs, which represent the shapes that characters can have when they are rendered or displayed". For Uniocde (especially in contrast to ISO and the UCS without Unicode), many of the compatibility characters (like the Arabic initial, isolated, medial and final fomrs) are redundant. They are character encoding forms and not simply the character as "the smallest components of language that have semantic value" but rather characters that encode a specific abstract glyph for another character. Anything could be encoded as a character. For example, one could designate that code point U+E0FFA will be the letter 'g' from Linotypes Times Roman font version 2.3 released in 1992 (the dingbat characters are a similar example acknoledged by the Unicode Standard). However, these are not examples of semantic characters, but rather characters that encode glyphs. Unicode’s approach in contrast involves moving the handling of these forms/varaints to smart font technology and smart text rendering. These are distinctions made in the Unicode Standard: distinctions I'm trying to explain to a general reader in an encyclopedic manner. I feel a bit like I'm taking shots in the dark here. I'm having a hard time understnading how you read the Unicode Standard. But I'm trying to find ways to begin the conversation. Please let me know how you might reprhase some of my prose. In doing that we might start to understand the different readings. Indexheavy 11:17, 9 May 2007 (UTC)
Longest page on the English Wikipedia
[edit]According to this: special:longpages, this page is the longest page on the English Wikipedia, at 688,000 bytes. either this number is bogus, or this page will take a very long time to load on a low-speed link.
It appears that the HTML table was generated by a word processor. Please consider using a Wiki table or at least a better HTML editor. Thanks. -Arch dude 23:26, 3 May 2007 (UTC)
- Please lend a hand in imporving the table. The conversion to a wikitable might help, but its largely a false efficiency. The wikitable still needs to be converted to an HTML table when its delivered, so everything gained in the compact wikitable syntax is lost upon delivery (keep in mind the size of the article in that list is the storage size, not necessarily the delivery size; when the table's delivered the "_" and "|" characters are replaced with complete "<tr></tr>" and "<td></td>" syntax). The wikitable gains other efficiencies by simply disallowing much of the HTML table semantics. The table was largely generated by hand (not by a word-processor). Unfortunately, many of the stylees had to be added in-line because Wikipedia doesn't support embedded or linked stylesheets for table styling (which would make the total size considerably smaller). If anyone wants to reduce the size of the table, the styles could probably be handled in some other way (I'm not familiar enough with wiki styling conventsions). Also it could be reduced by removing the tooltips, but I think they're quite helpful. Finally, it might make sense to move many of the table details off to separate articles once they're created. Then simply a summary table of the individual tables could appear on this page. So in summary:
- converting to a Wikitable (not much gained)
- Changing the styling (borders, cell horizontal and vertical alignments) to another syntax
- removing or shortening tooltips (these are repeated for every cell with a lengthy phrase)
- breaking table out into the separate related articles (I'll probably do this once I stableize the table and finish the detailed articles).
- My goal here was to cr4eate a drill-down type group of articles, where one could start at this article and see how the various Unicode Planes and Blocks were grouped together and then follow through to see more detail on each block/script/character general category. So moving the tables to other articles would be consistent with that drill-down approach. Indexheavy 02:26, 4 May 2007 (UTC)
I shortened the titles (tooltips) considerably. I also removed most of the inline styles on the table cells. It still doesn't quite look the way i want it to, but its readable and it looks decent (oh if only the wikimedia software developers would enter the 21st century)..The classs and title attributes could probably be elminiated completely if we need to make it smaller. However, the steps I already took get us out of the top 15 articles so maybe we're off the radar now. I do think that breaking the detailed tables off into separate articles makes a lot of sense, so this article could be reduced substantially that way (in time anyway). 04:10, 4 May 2007 (UTC)
Thanks for considering all of the options. You are clearly on top of the situation. If you intend to subdivide the article eventually, may I recommend that you avoid all of the intermediate steps? There are no rules, and I'm just another editor with an opinion, but as you point out, many of the gains are either trivial or bogus. The big win occurs when you split the article. I therefore propose that we live with it as it is until you are prepared to split it. I am not competent to help much. Best of luck on this, and keep up the good work! -Arch dude 13:23, 4 May 2007 (UTC)
Phonetic characters
[edit]The section on phonetic characters makes very idiosyncratic use of the word `phoneme´. I can kind of guess what is meant, but I think that phone would be more appropriate. A phoneme is a language-particular unit, which is defined by its opposition to other phonemes. As such, a phoneme is a logical unit, which has no direct relation to the physical world. One can basically represent a phoneme by any string one likes best (Although for mnemonic reasons, certain strings are of course better than others). A phone on the other hand is observable in the physical world and does have acoustic properties. IPA characters are used to refer to phones. Their representation is not arbitrary. This is probably what is intended by `common phoneme semantics', and I suggest that this be renamed to `underlying acoustic properties' or something like that. Jasy jatere 08:44, 10 May 2007 (UTC)
- I see nothing wrong with the change you propose making: though I'm having trouble seeing how it fits with the phoneme article you link to. For example, would you say that a "bilabial plosive" was a phoneme or a phone? It is that type of semantics the passage refers to.
- There's a second distinction you seem to be making that I"m not clear on too: that between the "strings" used to represent a phoneme, and the "characters" used to refer to phones. Could you provide some examples of what you mean there? Just to provide some clarification from the computing end, in relation to Unicode a string is an ordered collection (an array or list) of characters (or graphemes). On the other hand characters are the “smallest components of written language" So perhaps you were using strings and characters somewhat interchangeably, but I thought maybe there was another distinction there that I wasn't comprehending.. Indexheavy 19:42, 10 May 2007 (UTC)
- One other distinction that I should add to help facilitate communication across these disciplinary boundaries. In relation to the definitions of string and character that I describe above and the distinctions you seem to be making, a glyph (or glyphs) is (are) the visual representation of a character (or character combination). Basically it is the picture or image that text software uses to visually display the character(s). So, for example, the same glyph as that used for Latin small letter 'p' (i.e., a picture of a small Latin letter 'p') might be used as a glyph for the character 'bilabial voiced plosive' (hypothetically speaking, since there is not a character named 'bilabial voiced plosive' though there are similarly named characters). In this case a bilabial plosive would be a single character. In contrast, a character set could encode a character 'bilabial' a character 'voiced' and a character 'plosive'. In this case, the character combination 'bilabial' + 'plosive' + 'voiced' might be represented by a glyph that was identical (or similar) to the Latin small letter 'p'. In this case three characters map to a single glyph. Another approach a character set might take would be to not encode characters for a phonetic writing system at all. Instead, the phonetic writing system would pick and choose characters from other writing systems within the character set based on the glyphs typically used for those characters and make use of those for phonetic writing system. To me this would be the equivalent of not encoding a Greek writing system and instead write the Greek language by borrowing characters with similar looking glyphs from Latin, Cyrillic and mathematical symbols. In many ways we have this very same situation with phonetic writing systems. Indexheavy 20:02, 10 May 2007 (UTC)
Looks like this content has since been removed from the article. -- Beland (talk) 17:27, 4 March 2014 (UTC)
2^20 + 2^16 ?
[edit]The article's first sentence states the total number of code points:
- 1,114,112 = 220 + 216 or 17 × 216
The second explanation is easy to visualize: 216 thingies per plane times 17 planes. But I find the first one confusing. I know it's correct (220 = 24 × 216 = 16 × 216), but why obscure the number this way? Where does that variant come from? --193.99.145.162 17:04, 27 June 2007 (UTC)
- I assume it's just there to give an order of magnitude to the number of code points available. -- Beland (talk) 17:28, 4 March 2014 (UTC)
Lepcha?
[edit]The article claims that the Lepcha script (1C00-1C4F) is part of Unicode 5.0. It isn't.— Preceding unsigned comment added by MBisanz (talk • contribs) 23:33, 22 November 2007
- I guess that claim has since been removed. -- Beland (talk) 17:12, 4 March 2014 (UTC)
Splitting
[edit]I'm proposing that this article be split into 5 sub articles, along the first 5 main entries on the TOC. Mbisanz (talk) 23:33, 22 November 2007 (UTC)
- Ok its been about 12 days and no comments, so I'm gonna begin to split it later tonight. Remember, it can always be rolled back if it turns out this is counter to consensus. Mbisanz (talk) 03:55, 5 December 2007 (UTC)
- I just noticed that the article split resulted in several "Big - see Large / Large - see Big" type situations. The main article refers to the five sub-articles as "main article" and three of them return the favor (the other two seem to have been orphaned). I do not have the expertise to tell exactly what happened, but I read through WP:SS and I am pretty sure this is SNAFU. Cawifre (talk) 21:00, 26 May 2008 (UTC)
WOE's a code point?
[edit]"Unicode’s Universal Character Set potentially supports over 1 million (1,114,112 = 220 + 216 or 17 × 216, hexadecimal 110000) code points". Hey, you guys, this is an encyclopedia, intended mainly for non-specialists. So what's with this unnecessary, snotty, patronising jargon? "Potentially Supports"???? You mean, that without this character set, these "code points" would potentially fall over? And "code point"? How about actually saying what this means? Currently, the message is "if you don't know what \"code point\" means, you're too stupid to be reading this article". You could reformulate this in courteous English so easily: "Unicode's Universal Character Set could code over 1 million different characters". AMackenzie (talk) 13:29, 13 September 2008 (UTC)
- I tried to integrate your suggestions into the intro. -- Beland (talk) 17:08, 4 March 2014 (UTC)
More IndexHeavy: Unicode imposing requirements on fonts?
[edit]I have reverted the recent edits by IndexHeavy. (S)he seems to have invented a notion of Unicode imposing requirements on fonts, and appears to want to propagate this invention through Wikipedia. There is no literature to support this notion. (S)he seems to wish to promote Apple Chancery as a particularly "compliant" font, when there are plenty of other examples of fonts that do what (s)he illustrates with verbiage such as "Most fonts fall short in rendering fractions as required by Unicode. Apple Chancery however is an exception.'" "Most" is a weasel word, as are all of the generalizations promulgated by these edits. Further, the job of fraction rendering is not specifically that of a font's, but rather is some fluid division of labor between a font and the text layout engine. Despite this (s)he writes, "Many fonts fail to render number digits adjacent to the Fraction Slash with the prescribed Unicode treatment", as if the onus is the font's alone and as if Unicode "requires" this of fonts. To bolster this theory, this same editor, in a direct quote from the Unicode specification, left out the last sentence of the specification, which states that fractional rendering is not a requirement. (S)he then proceeded to insert a narrative of verbiage censuring fonts for not meeting the Unicode "requirements". None of this agenda is supported by the literature, and the promotional text is not encyclopædic. Strebe (talk) 01:16, 25 February 2009 (UTC)
- Strebe, this appears to be a continuation of a fight you began in response to some questions posed on the OpenType talk page. I did not leave the quoted sentence out in order to imply anything. I simply thought and continue to think it is not relevant (however, I left that edit in since it is fine to have it quoted). I don't really understand the defensive posture you've taken. Perhaps you could explain what you think is at stake in your contention that fonts are not required to display Unicode text correctly. I really don't see what is at stake here. I don't care about Apple Chancery, I was simply looking for a font that followed Unicode's advice. If you want to produce images with a font demonstrating the same capabilities then by all means do so. I think I found another font on my system to do the same rendering. I can also produce an image of a rendering that is clearly in error compared to Unicode norms ("most" is not a weasel word when it is not referring to a reference).
- In our previous exchange you seemed to be only motivated to instigate a fight (I know there are people who come to wikipedia for no other reason). However, in your edits you have also undone edits and removed important information that contributes to the article. I appreciate the improvements you've made to my prose, but there are certainly deep cuts you've made that omit important information. You restored the word 'vulgar' into this section where its meaning is clearly not used appropriately. You remove information about the rendering of fractions which is not contested. And so on. So I'm restoring those edits and suggest you edit more carefully when you know there are valid disagreements here. Indexheavy (talk) 02:01, 25 February 2009 (UTC)
IndexHeavy, I don't care about your belief in "fights" or your fishing around of my beliefs in that regard. That's not relevant and I won't be lured into it. What is relevant is that you are promoting an invention of yours in Wikipedia that has no basis in the literature, no basis in prevailing practice, and misleads readers about the scope of Unicode. Your invention is that Unicode requires this or that of fonts. Meanwhile, Unicode's specification makes no demands of fonts. Quoting from your new or newly restored edits, this verbiage promotes your theory:
- Most fonts fall short in rendering fractions as required by Unicode.
- Many fonts fail to render number digits adjacent to the Fraction Slash with the prescribed Unicode treatment
If you wish to make a correct statement, you must replace "Most fonts" with "Most text layout software environments," or some such, but even that is not encyclopædic and not relevant because "most" is a weasel word, unsupported by citations, and because the whole enterprise is not important to readers' needs. Simply noting that the Unicode recommendation is only spottily implemented suffices. Rubbing it in with piles of text and annotations on images simply makes it look like the editor is obsessed with fraction slash, is self-righteous, and is patronizing in praising Apple Chancery alone amongst all the sinners. It is fine to note the font as being Apple Chancery as an example of the Unicode behavior in action. But no educational service is rendered in contrasting it against the dubious "most fonts". Again, not encyclopædic; not educational; not relevant.
Prescribed is an exaggeration. Unicode recommends the behavior, not "prescribes" it.
This is unencyclopædic verbiage:
However, with fully conforming Unicode text imaging systems, users get rich typographic quality fractions simply by inserting the special fraction slash character even within plain text.
It reads like a brochure. It adds nothing to the factual statement from the Unicode specification and can be deleted without a reader missing anything useful.
Again, I am reverting the edits. I appreciate your interest in this matter, and I am sure you are sincere, but this is an encyclopædia we're working on. It needs to be terse, relevant, and neutral. Strebe (talk) 02:35, 25 February 2009 (UTC)
- Strebe, I welcome you to reword my writing, but you're making changes that are removing important relevant facts. The sentence you quote about fractions in plain text is the main point I meant to convey. That is that Unicode support rendering of numeric fractions (as opposed to symbolic fractions) in plain text. I don't think that facts can only appear in a brochure, but is also a encyclopedic information relevant to encyclopedia readers. I already said I don't care about Apple Chancery except that it is an example of a font that renders correctly. I could produce another image of a font rendering incorrectly. Or if you prefer to suppress the information that it is Apple Chancery than that is fine with me too (readers could learn that on the image details page if they cared about that level of detail). Or as I already said select whatever font you like to make the demonstration. If you have a font called Strebe go ahead and use that. I'm not trying to undermine your business here. I'm trying to write an encyclopedia article that lets readers understand Unicode and, in particular, these special characters.
- If you're concerned this reads like a Unicode brochure, then start a criticism section to explain the missteps of Unicode with respect to fractions and fraction imaging. I'm going to try to reword this to address your concerns, but again it appears you're just continuing the fight you started on the OpenType page where you lob accusations and where you responded to my polite queries with grandstanding. Indexheavy (talk) 03:03, 25 February 2009 (UTC)
Strebe your edits are also introducing misinformation into the article. This image is not representing different glyphs, but is rendering different characters. Either you don't understand the topic of Unicode sufficiently, you're making a mistake or you engaging in some elaborate vandalism. I want to assume good faith, but such misinformation activities do not help support that assumption. Indexheavy (talk) 03:11, 25 February 2009 (UTC)
- No misinformation. You have chosen to interpret the image as rendering different characters. While it may be true that you generated those images in the way you describe, it is also true that I can generate identical images in the way I describe. Hence the caption I wrote is accurate. I advise a little more circumspection over publicly slinging accusations of mistakes or vandalism (or fight-picking, for that matter)... and especially accusations of misinformation, given how single-mindedly you have promoted, and continue to promote, the idea that Unicode requires or even recommends anything at all about font behavior. Meanwhile your captions do not result in much differentiation of purpose for the two images — which makes one of them superfluous and therefore vulnerable to deletion – and do not expose the reader to the interesting and useful fact that the same Unicode string can be interpreted two different ways depending on the font instructions and text engine behavior. The first image is excellent for that purpose.
- Could you please step back and get some perspective on this? It's looking like an obsession. Pointing out "Note how every consecutive digit before and after the fraction slash—not just the first—is rendered as the numerator and denominator respectively," means nothing to the typical reader because the typical reader is not aware of the contrast you're attempting to portray in the limited space of a caption. It just looks abrupt, out of context, and like a molehill has become a mountain. It's trivia. It's not the purpose of Wikipedia to exhaustively catalogue transient things and behaviors — or worse, exhaustively catalogue certain things certain people fixate on while ignoring vast piles of other things certain people do not fixate on. Given the coverage of the rest of the material in this article, any images at all are a luxury. Still, wisely captioned, they're interesting and relevant. As for the rest, surely it suffices to quote what Unicode says about the behavior and then comment that support for Unicode's recommendation is spotty. With all these edits you are distorting the editorial depth of the article, drawing attention to a very small behavior while the rest of the article generally remains terse and encyclopædic. Strebe (talk) 04:39, 25 February 2009 (UTC)
Strebe, you argue that I'm fixated on fraction slash, however I have been responsible for much of the material in this article as well as much of the material moved by others and me to other supporting articles. So from my perspective the fraction slash has been one small part of a larger project to write encyclopedic materials that explain to an average user what Unicode is and does. From my perspective, you are the one who has fixated on this, as well as twisted the questions I've posed to you earlier and statements I've made about this issue. I stand by my contention that you are out to pick a fight and invite anyone to read through these exchanges to verify that. Even your edit summaries seem more focussed on slinging insults than summarizing edits (even this section heading looks like a personal attack to me).
The huge difference you're making out of what Unicode recommends for a "text rending system" or a "font" is very much splitting hairs as far as the average reader is concerned. While I'm happy to have you raise such fine-grained differences, you're sweeping claims that I've invented a "fiction" because I didn't use the exact phrasing you would have used further suggest to me you are not being sincere here. It looks more to me like you picked a fight due to my question on the OpenType talk page and now you've tracked me down here to pick another fight. You've made some great edits to the article, but the insults (such as calling my writing fiction) are unnecessary and not proper wiki etiquette.
Considering how little space the fraction slash section takes in this article, it is hard for me to imagine that the description of the fraction slash behavior is out of scale for the whole project of describing Unicode (and specifically the mapping of Unicode characters). The description is on par with the other special purpose characters I've described and considering this character is one of the few non-script-specific grapheme characters that has special behavior, a full description seems in line. I find it hard to imagine how it has lead to such a dispute on wikipedia. If you had done your rendition on this topic here, I would not hold the fraction slash to be so sacred as to sweep in here and lob insults at you and delete your prose. I would say to myself, "OK, that might not be the specifics of the fraction slash (or Unicode in general) I would have focussed on, but I'm not obsessed enough with the fraction slash character to pick a fight over it". So I would ask why you haven't been able to detach yourself enough from this topic to make the same judgment.
The complaints you raise over these few sentences are not convincing. For example, these are not necessarily transient behaviors. These are issues that have been around over 15 years and who's to say they're going away soon. It helps to understand what Unicode is about whereas the common implementations of Unicode may lead readers to think this issue has not been dealt with by the specification itself. I'm trying to make it clear where I'm coming from on this topic. I have asked you what your thoughts are, but I find it hard to comprehend how you can care so insistently that this information – conveyed in so few sentences – needs suppression. Your focus on glyphs may be a big indication of the misunderstanding. Unicode is focussed on plain text encoding of characters (and not on encoding glyphs). So it is an important facet of the fraction slash character that it is capable of a complete typographical rendering of fractions within plain text. Looked at in that context, this is a particularly important character (even more so than many of the other special purpose characters). Indexheavy (talk) 06:32, 25 February 2009 (UTC)
- The huge difference you're making out of what Unicode recommends for a "text rending system" or a "font" is very much splitting hairs as far as the average reader is concerned. In other words, you believe it is fair? useful? didactic? encyclopædic? to state or imply that a font does not conform to Unicode's requirements or suggestions because it is only a small fiction and because you personally have decided that the "average reader" would not know the difference? How did you ever convince yourself that's an appropriate policy for an encyclopædia? A quote from an observer: "=S *shakes her head* It should at least be written so that it doesn't lie! That's not exactly a high standard. =P" Well, that pretty much sums it up.
- I've been quite clear here. The primary problem with the edits in the fraction slash section is the injection of your invention of Unicode compliance of fonts. I'm sorry you're having a hard time following my objection. I do not know how to make it clearer, and no, once again, I will not be baited by all your innuendos. If you go back to the OpenType page, it's pretty clear where the conversation fell apart: right where you invented Unicode compliance, and that's where you first accused me of picking a fight. I didn't pick a fight. I called you on a matter of clear fact. Either cite it, or get rid of it. Strebe (talk) 07:41, 25 February 2009 (UTC)
About Private Use Area character properties
[edit]Regarding the edit, [[1]], the Unicode standard does recommend a Corporate Use Subarea and an End-User Subarea, but they are not properties of characters; nor are they bounded on one side as regions. One cannot look at a PUA code point and say it is corporate use or end-user use; one can only say it might be, and even then Unicode does not require the distinction. From the specification introduction:
In addition to character codes and names, other information is crucial to ensure legible text: a character’s case, directionality, and alphabetic properties must be well defined. The Unicode Standard defines these and other semantic values, and it includes application data such as case mapping tables and character property tables as part of the Unicode Character Database. Character properties define a character’s identity and behavior; they ensure consistency in the processing and interchange of Unicode data.
“Character properties” precisely describes what PUA does not assign. Strebe (talk) 02:21, 2 October 2010 (UTC)
Merging "UCS characters"
[edit]The separate article "Universal Character Set characters" apparently was intended to describe the same thing – mapping of UCS characters – from the PoV of ISO/IEC, not Unicode Consortium. Even though, it contains now virtually the same information as "Mapping of Unicode characters". Incnis Mrsi (talk) 18:12, 15 February 2011 (UTC)
- Agree. I long to read a well written paragraph about the diff & overlap of these two. Certainly there should not be two articles. -DePiep (talk) 21:14, 15 February 2011 (UTC)
- Disagree. Unicode characters have several properties that are not recognized by ISO/IEC 10646, and hence the UCS. This article describes those properties, the UCS characters article does not, and should not. VanIsaacWScontribs 23:29, 11 February 2012 (UTC)
- The UCS characters article? Why article, not redirect to Mapping of Unicode characters? Of course, there are Unicode properties unrecognized by ISO/IEC, but the mapping is though the same. Is the PoV of ISO/IEC notable enough to be described in a separate article? Incnis Mrsi (talk) 08:28, 12 February 2012 (UTC)
@Beland: when you threw the stuff from here onto “Universal Character Set characters”, you neglected to clarify that categories pertain to Unicode, not the ISO definition. Fix it, please. Incnis Mrsi (talk) 18:36, 4 March 2014 (UTC)
- @Incnis Mrsi: Should be clarified now. The entire merge has landed, so feel free to tweak anything you're still unsatisfied with. It was rather complicated combining the overlapping parts, so I may have done some things slightly wrong. -- Beland (talk) 18:45, 4 March 2014 (UTC)