Jump to content

CJK Compatibility Ideographs

From Wikipedia, the free encyclopedia
CJK Compatibility Ideographs
RangeU+F900..U+FAFF
(512 code points)
PlaneBMP
ScriptsHan
Assigned472 code points
Unused40 reserved code points
Source standardsKS X 1001
Big5
IBM 32
JIS X 0213
ARIB STD-B24
KPS 10721-2000
Unicode version history
1.0.1 (1992)302 (+302)
3.2 (2002)361 (+59)
4.1 (2005)467 (+106)
5.2 (2009)470 (+3)
6.1 (2012)472 (+2)
Unicode documentation
Code chart ∣ Web page
Note: [1][2]
Range was initially part of the Private Use Area in Unicode 1.0.0,[3] and removed from it in Unicode 1.0.1.

CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.

The block has dozens of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4][5] These sequences specify the desired glyph variant for a given Unicode character.

Character sources

[edit]

Sources for the original collection of CJK Compatibility Ideographs include:

  • South Korean KS X 1001 (U+F900–U+FA0B, 268 characters)
  • Taiwanese Big5 (U+FA0C–U+FA0D, 2 characters)
  • "IBM 32": 32 Japanese characters from IBM (U+FA0E–U+FA2D; see below)

In ensuing versions of the standard, more characters have been added to the block from:

  • South Korean KS X 1001 (U+FA2E–U+FA2F, 2 characters)
  • Japanese JIS X 0213 (U+FA30–U+FA6A, 59 characters)
  • Japanese ARIB STD-B24 (U+FA6B–U+FA6D, 3 characters)
  • North Korean KPS 10721-2000 (U+FA70–U+FAD9, 106 characters)

The "IBM 32" characters

[edit]

IBM Japanese double-byte EBCDIC includes several kanji which do not exist in, or do not round-trip from, JIS X 0208. These were included as gaiji in extensions to Shift JIS and EUC-JP from IBM (e.g. code page 942), NEC, the Open Software Foundation, and Microsoft (e.g. Windows code page 932). However, they were not used as a source for the original Unified Repertoire and Ordering (URO). Instead, 32 of the IBM extension kanji, those which had not been included in the URO from other sources, were included in the CJK Compatibility Ideographs block in the range U+FA0E–U+FA2D.

Of these 32 characters:

  • 19 are unifiable with characters in the URO, and are therefore compatibility ideographs in the strict sense.
  • One (U+FA20 CJK COMPATIBILITY IDEOGRAPH-FA20) is a kyūjitai form of a kokuji whose extended shinjitai form exists in the URO (U+8612 CJK UNIFIED IDEOGRAPH-8612). Both are hyōgai kanji, and are variants of the jinmeiyō kanji U+8429 CJK UNIFIED IDEOGRAPH-8429 (i.e. Kummerowia). U+FA20 was assigned a normalisation to U+8612, even though the 龜 and 亀 components, while both forms of radical 213, are not usually considered unifiable.[6]
  • The remaining 12 are kokuji characters which are actually unified ideographs (with the Unified_Ideograph property, and which do not change upon normalisation). In spite of their inclusion in the CJK Compatibility Ideographs block and their algorithmically generated character names beginning with "CJK COMPATIBILITY IDEOGRAPH", they are not duplicates of characters in the original CJK Unified Ideographs block in any respect;[7][8] 11 of these 12 are completely non-duplicate, while U+FA23 CJK COMPATIBILITY IDEOGRAPH-FA23 was later unintentionally duplicated in CJK Unified Ideographs Extension B as U+27EAF 𧺯 CJK UNIFIED IDEOGRAPH-27EAF. They are as follows:
  • U+FA0E CJK COMPATIBILITY IDEOGRAPH-FA0E
  • U+FA0F CJK COMPATIBILITY IDEOGRAPH-FA0F
  • U+FA11 CJK COMPATIBILITY IDEOGRAPH-FA11
  • U+FA13 CJK COMPATIBILITY IDEOGRAPH-FA13
  • U+FA14 CJK COMPATIBILITY IDEOGRAPH-FA14
  • U+FA1F CJK COMPATIBILITY IDEOGRAPH-FA1F
  • U+FA21 CJK COMPATIBILITY IDEOGRAPH-FA21
  • U+FA23 CJK COMPATIBILITY IDEOGRAPH-FA23
  • U+FA24 CJK COMPATIBILITY IDEOGRAPH-FA24
  • U+FA27 CJK COMPATIBILITY IDEOGRAPH-FA27
  • U+FA28 CJK COMPATIBILITY IDEOGRAPH-FA28
  • U+FA29 CJK COMPATIBILITY IDEOGRAPH-FA29

Block

[edit]
CJK Compatibility Ideographs[1][2][3]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+F90x
U+F91x
U+F92x
U+F93x 錄
U+F94x
U+F95x
U+F96x
U+F97x 勵
U+F98x
U+F99x
U+F9Ax
U+F9Bx 樂
U+F9Cx
U+F9Dx
U+F9Ex
U+F9Fx 刺
U+FA0x
U+FA1x
U+FA2x
U+FA3x 憎
U+FA4x
U+FA5x
U+FA6x
U+FA7x 奔
U+FA8x
U+FA9x
U+FAAx
U+FABx 謹
U+FACx
U+FADx
U+FAEx
U+FAFx
Notes
1.^ As of Unicode version 16.0
2.^ Grey areas indicate non-assigned code points
3.^ Yellow areas indicate the 12 unified CJK characters encoded in this block.

History

[edit]

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Compatibility Ideographs block:

See also

[edit]

References

[edit]
  1. ^ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. ^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. ^ "3.5: Private Use Area" (PDF). The Unicode Standard, Version 1.0, Volume 1. Unicode Consortium. 1991. pp. 118–119. ISBN 0-201-56788-1.
  4. ^ "Ideographic Variation Database". Unicode Consortium.
  5. ^ "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium.
  6. ^ Ideographic Research Group (2024-11-19). "UCS Ideograph Non-Unifiable Component Variations Summary List (NUCV)". UCV & NUCV Lists (PDF). ISO/IEC JTC1/SC2/WG2/IRG N2746.
  7. ^ "PropList.txt". Unicode Consortium.
  8. ^ Freytag, Asmus; McGowan, Rick; Whistler, Ken (2021-06-14). "Known Anomalies in Unicode Character Names". Unicode Consortium. Unicode Technical Note #27. These 12 characters are unified CJK ideographs, not compatibility ideographs, despite their names.