Talk:Plane (Unicode)/Archive 1

This is an archive of past discussions about Plane (Unicode). Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Typo in "Supplementary Special-purpose Plane" ??

The section "Supplementary Special-purpose Plane" includes the line:

Variation Selectors Supplement (0E0100–E01EF)

That zero in front of the first hex number sure looks wrong to me, but I honestly don't know enough about this topic to know if it serves some actual purpose. Would someone better informed please fix it if it's wrong, or say why it's right?

[1] 2010 edit. -DePiep (talk) 13:10, 7 October 2021 (UTC)

Old Hungarian

Turoslangos is playing games here. Neither UTC nor WG2 will accept Old Hungarian into the BMP. There isn't room, and neither is there justification for encoding it there. -- Evertype·✆ 21:04, 7 November 2008 (UTC)

Private Use Area planes for social networks

I've been finding HTML documents with glyphs for Facebook, Twitter, etc. as Unicode characters in the Private Area Use planes. This requires a custom font. Any references on this? --John Nagle (talk) 20:46, 30 April 2013 (UTC)

As the definition goes: anyone can publish or use a character definition in PUA space (example: I may have a PUA character to mail to my spouse to say X, and only we two know. We don't see the font, but the char number is enough for us to meet). If FB or TWI does so, it is up to them to provide the font, and to make it work publicly. If they can't get that right, the reader will see the wrong character. Like in the old day: question marks at best.

Actually, is that so? Examples by FB or TWI? It could be users/companies are useing PUAs (writing on FB or TWI), but then the issue is with these users. -DePiep (talk) 21:08, 30 April 2013 (UTC)

UTF-8 "designed for 2^21 bits"

The UTF-8 coding scheme was designed when Unicode was still contemplating a 31-bit space. It was not "designed" for a limit of 2^21 codepoints, and was eventually restricted to a much smaller number anyway (0x10FFFF). Elphion (talk) 01:13, 3 October 2016 (UTC)

Why would Unicode modernize a code space by making it smaller? 108.71.123.25 (talk) 16:05, 5 October 2016 (UTC)

Because otherwise the parties could not agree on a standard. Too many manufacturers were already heavily invested in 16-bit characters. UTF-16 was the compromise that allowed the standard to go forward. When eventually we run out of space (and we will, though computing technology will have changed a lot by the time that happens), larger spaces will be introduced. But they will not be "Unicode". -- Elphion (talk) 16:18, 5 October 2016 (UTC)

But 0x00E00000 to 0x00FFFFFF and 0x60000000 to 0x7FFFFFFF were assigned! And my flip phone uses such an operating system that uses a 32 bit code space. 108.71.123.25 (talk) 16:21, 5 October 2016 (UTC)

(see below -- Elphion (talk) 16:23, 5 October 2016 (UTC))

When I can enter text on my flip phone, a character map with a code point above it is shown. It highlights the space and displays 0x00000020 in the top. This implies that it uses a 32 bit space. 108.71.123.25 (talk) 16:27, 5 October 2016 (UTC)

Plane 16 and "20-bit limit"

Obviously, Plane 16 (100000-10FFFF) is a 21-bit entity (why they crashed thru to Plane 16 with 3-13 unused seems rather inelegant here, but I'm not a Unicode expert. I can, however, decipher hexadecimal. I have no idea how to "improve" ("correct"?) this, but it needs to be done. Grndrush (talk) 17:18, 3 January 2009 (UTC)

I was about to say much the same. Is the answer to call it a 17-plane limit and ignore the bit-question? Alternatively one could explain that the 20-bit limit is a matter of the address space defined by the available surrogate pairs, and thus defines the number of planes available beyond the BMP. (If I have understood aright…) Ian Spackman (talk) 00:11, 28 July 2009 (UTC)

21 bit is just an outcome, it is not the preset limit. Here we go. BMP is defined the full 16 bit (hhhh): 0000-FFFF, ~65000 numbers. (So prefix is 00hhhh so Plane=0). IN this plane are defined 1024 high surrogates and 1024 low surrogates, at D800-DBFF and DC00-DFFF. Surrogates must be used in pairs (one high, one low) to point to a character. So they can identify exactly 1024x1024 ~1M points. Together they need hhhh_low.hhhh_high is 32 bit. So the 1M points are within the range D800.DC00 - DBFF.DFFF (but not every point in that range).

In comes UTF-16. UTF-16 recalculates these 32bit numbers 1:1 into the range 10000-10FFFF_hex, starting right after plane 0 (at FFFF+1), and exactly filled with the ~1M points, creating planes 1 to 16_dec (=the final 10_hex). Now there is no unused number any more, and the whole range can be identified with 21 bits.

So because there are 1024x1024 surrogates defined, the UTF-16 recalculated numbers fit exactly in a 21-bit range. Starting plane 17 at 10FFFF+1=110000 would need a 22nd bit, and cannot be recalculated to the high-low 32bit pair.

Nowadays the U+hhhhhh notation is used commonly. -DePiep (talk) 17:13, 6 October 2010 (UTC)

0xHHHHHHHH 108.71.120.43 (talk) 20:50, 10 October 2016 (UTC)

0x00E00000 to 0x00FFFFFF/0x60000000 to 0x7FFFFFFF

Some operating systems still have these as private use areas. 108.71.123.25 (talk) 16:07, 5 October 2016 (UTC)

But those are not Unicode planes, the subject of this article. The Unicode standard sets a maximum of 17 planes. There is nothing to stop people from storing other values in 32 bits, but that's not Unicode. -- Elphion (talk) 16:13, 5 October 2016 (UTC)

Universal Character Set still has this. Some operating systems still have these. My flip phone has one such operating system that uses UTF-32/UCS-4, and it shows an 8 digit code point. 108.71.123.25 (talk) 16:17, 5 October 2016 (UTC)

No, UCS was revised to agree with Unicode, for consistency. Whatever your flip phone uses is not Unicode, and not UCS-4, no matter how it might be labeled. -- Elphion (talk) 16:21, 5 October 2016 (UTC)

When I can enter text, it displays 0x00000021 and highlights the space. This 8 digit code point means that it is a 32 bit code space. 108.71.123.25 (talk) 16:25, 5 October 2016 (UTC)

As I said, nothing prevents a programmer from storing arbitrary values in 32 bits. That doesn't make them Unicode, which has a very precise and well-documented definition that caps the space at U+10FFFF. The number of leading zeroes shown in the display doesn't alter that. Added: If in fact your phone uses values above U+10FFFF, it was programmed to use a non-standard extension of Unicode, which (since Unicode is capped) is reasonably safe, in the sense that those private characters will never be assigned conflicting Unicode values. But the programmer would have no expectation that the non-standard values would be understood beyond the phone's universe. Such a message sent to another phone from a different manufacturer (or a different revision level) likely won't display as intended. -- Elphion (talk) 16:57, 5 October 2016 (UTC)

I scrolled through the characters. The map starts at 0x00000020 and ends at 0x0002FA1D. 108.71.123.25 (talk) 17:41, 5 October 2016 (UTC)

Regardless of what encoding scheme you phone uses, your changes will be reverted because they contradict the actual Unicode Standard and that's what this article is about. See chapter 2.4 of the Standard:

In the Unicode Standard, the codespace consists of the integers from 0 to 10FFFF, comprising 1,114,112 code points available for assigning the repertoire of abstract characters.

Anything outside of that codespace isn't Unicode and isn't relevant to this article. DRMcCreedy (talk) 18:02, 5 October 2016 (UTC)

In the "Help" display for entering characters, it says "...to select the UTF-32/UCS-4 character..." 108.66.233.59 (talk) 18:04, 5 October 2016 (UTC)

And, according to your own experiment, it does not go beyond the Unicode space: it stops at U+2FA1D, which is well below U+10FFFF. So although your phone is using a 32-bit display (or a 31-bit display, it's hard to tell when the highest digit is 0), it is only dealing in characters within the Unicode space. -- Elphion (talk) 18:21, 5 October 2016 (UTC)

It's a 32 bit display. 108.66.233.59 (talk) 18:34, 5 October 2016 (UTC)

Also, if I click this link on my flip phone, and I enter a number higher than 0x0010FFFF, it displays a box with the code point in it. For example, if I enter 0x60000000, it displays this:

+----+

|6000|

|0000|

+----+

108.71.120.43 (talk) 22:29, 10 October 2016 (UTC)

And that shows nothing, except that your cellphone and the internet app do not screen out non-standard input. Neither your cellphone nor the app at unicodelookup.com constitutes a WP:RS. As we have all been telling you, the standard is quite clear: there are no valid code points above U+10FFFF. -- Elphion (talk) 22:54, 10 October 2016 (UTC)

If I click the same link on my Windows computer or android phone, the site says undefined which is what it is. There is in my assessment too little unused space in BMP to be able to extend UTF-16 into 6 bytes. Otherwise a new type of high surrogate could be allocated as the first of 3 16-bit words.--BIL (talk) 14:59, 8 January 2017 (UTC)