Template talk:Character encodings

Encoding vs. TES

HZ is a TES, Transfer Encoding Syntax, see UTR17, of GB2312, not a character encoding proper. Nor is it a national standard. If at all kept in this template it should be in the misc section.

Similarly, UTF-7 is also a TES, not a UTF (despite the name). So I was thinking of removing UTF-7 from this template. It's included in the "Table Unicode" template, and I think that is enough.

/keka (talk) 08:40, 21 July 2009 (UTC)[reply]

Grouping

I've tried to group certain encodings in a "logical" way. For instance, even if the GOST standard is/was a national standard, it's for 4, 5, and 6-bit character encodings. Not something used in modern computers. So it's amongst "misc" items. Likewise, HKSCS is near Big5 and CP950 since they are so closely related. Etc.

keka (talk) 08:59, 25 July 2009 (UTC)[reply]

The Big5-HKSCS encoding is not really supported by Windows. Windows 950 should not be considered HKSCS compatible by default. Windows Vista only supports the Unicode characters of Big5-HKSCS. Microsoft HKSCS —Preceding unsigned comment added by 69.110.13.196 (talk) 04:57, 26 July 2009 (UTC)[reply]

newline

UTF-8, read that article please. It is not a "single character" (like horizontal tabulation, backspace etc.), it is a piece of encoding troubles related to line separation. Incnis Mrsi (talk) 09:06, 15 March 2010 (UTC)[reply]

Missing codepages

I notice, that there are a few code pages messing, namely the following

Code page 708 (Arabic ASMO);

Code page 851 (Greek III);

Code page 853 (Latin III);

Code page 868 (IBM Persian);

Code page 934 (MS-DOS Korean);

Code page 938 (MS-DOS Taiwanese);

Code page 999 (Yugoslavian ASCII-7).

I have the Korean edition of MS-DOS 6.2, which uses code page 934. It, and code page 938, are also referenced in MS-DOS 6.22 COUNTRY.TXT file.

MS-DOS code page 999 seems to be the code page version of the Yugoslavian ASCII-7 codepage, commonly used especially in Croatia and Slovenia before the advent of code page 852. One notable user of it is the Slovenia SAOP programming corporation's software.

Code page 708 is referenced in Windows. As for 851, 853, and 868, I've seen specifications of them on Google. - 94.140.73.150 (talk) 16:15, 22 August 2010 (UTC)[reply]

1259, 1260, 1262-1269

What are these Windows Codepages? What is CP0028?

Proposed changes

The design of this template is getting more and more complete but some few things could be done to get it clearer. Here are some suggestions:

Make a clear distinction between what are “Character encoding methods”, “Character sets” and “Code pages”.
The terminology “Code page” is used mainly by IBM and Microsoft, very few other manufacturers / organizations use it. The so called “Miscellaneous code pages” are not code pages. Perhaps, a better name would be “Miscellaneous character sets”.
EUC, ISO/IEC 2022 and HZ are not character sets. They are encoding methods (schemes) which are used to encode character sets, namely JIS, KSX, GB and CNS character sets.
The same goes for all UTF, which are encoding schemes to encode the ISO 10646 character set.
The left column is already arranged accordingly to several platforms. That could be expanded and some character sets included in the “Platform specific” section could be moved to the “right” place:
1. Adobe: Adobe Standard, Adobe Latin 1, Adobe Symbols, etc.
2. DEC: DEC Multinational, DEC Turkish, DEC Greek, DEC Cyrillic, DEC Hebrew, DEC/8/ASMO, DEC Technical, DEC Kanji, DEC Korean, DEC Hanzi, DEC Hanyu, etc.
3. Data General: Data General International, Data General Turkish, Data General Arabic, Data General Kana, Data General Symbols, etc.
4. Hewlett-Packard: HP Roman-8, HP Turkish-8, HP East-8, HP Greek-8, HP Cyrillic-8, HP Hebrew-8, HP Arabic-8, HP Thai-8, HP Japan-15, HP Korea-15, HP PRC-15, HP ROC-15, HP Math-8, etc.
5. Latex: T1 (Cork Encoding), T2A, T2B, T2C, T3, T4, T5, etc.
6. ISO: ISO is not a platform in itself, but some platforms (for instance, UNIX) are designed to work following the ISO standards. Also, many character sets, non specific to any platform, are designed following the ISO standards. For the sake of convenience, perhaps we could consider ISO as a “platform”.
“Acorn” is not a character set but rather a manufacturer (as are IBM or Apple). Perhaps, a better name would be “RISC OS character set”.
Is it worthwhile to have an entry called “National standards”? Of course, some Governments or some Official National Bodies have defined their national standards. But, after that, the manufacturers or organizations have implemented them or some variations of them. And in some cases it was the opposite, some Governments or some Official National Bodies have adopted existing standards as their national standard. But that list, as it is, is a mixed bag and rather incomplete. Here is what I have found out so far:

Country	7-bit standard	8-bit standard	Multibyte standard	16-bit standard	Notes
Arab countries	ASMO 449	ASMO 708
Armenia	AST 34.005:1997	AST 34.002:1997			Commonly called ArmSCII AST 34.002:1997 defines two variants: ArmSCII-8 for ISO environment; ArmSCII-8a for DOS and Macintosh environment.
Bangladesh		BSD 1520:1995 BSD 1520:2000 BSD 1520:2011			BSD 1520:1995 was not approved; BSD 1520:2011 is the same as the Bengali (Unicode block) but assigned to the upper part of an 8-bit character set; commonly called BSCII.
Brazil		NBR-9614:1986 NBR-9614:1991			Commonly called BraSCII.
Canada	CSA Z243.4 1985 alt.11 CSA Z243.4 1985 alt.12				ISO 646-CA.
China	GB 1988 - 1980		GB 2312-80 GB 18030-2000 GB 18030-2005		GB 1988 - 1980 = ISO 646-CN.
Croatia		HRN I.B1.013:1988
Cuba	NC 99-10 - 1981				ISO 646-CU.
Czechoslovakia		ČSN 36 91 03			Nearly identical to ISO Latin-2.
Denmark	DS 2089-1974				Not an official part of ISO 646 series.
Estonia		EVS 8:1993			EVS 8:1993 has defined 3 “tables”: table 3.1 for ISO environment; table 3.2 for EBCDIC; table 3.3 for DOS.
Finland	SFS 4017				ISO 646-FI; identical to Swedish Standard SEN 850200 b.
France	NF Z 62-010 - 1973 NF Z 62-010 - 1982				ISO 646-FR.
Georgia		SSP 18.1:1998			Commonly known as Geostd8; the more popular GeoSCII is not the national standard.
Federal Republic of Germany	DIN 66003				ISO 646-DE.
Greece	ELOT 927	ELOT 928
Hungary	MSZ 77953				ISO 646-HU.
India	IS 13194:1991	IS 13194:1991			IS 13194:1991 defines several character sets: EA-ISCII for 7-bit environment ISCII for ISO environment PC-ISCII for DOS
International	ISO 646-1973 IRV			ISO 10646
Iran	ISIRI 2900	ISIRI 3342			ISIRI 2900 is glyph-based; ISIRI 3342 is character-based.
Ireland	IS 433 - 1996				Not an official part of ISO 646 series.
Israel	SI 960	SI 1311:1988 SI 1311:1998 SI 1311:2002			The International Register number went on changing (IR 138 >> IR 198 >> IR 234) as the Standards Institute of Israel went on updating the character set, but ISO kept the name as ISO 8859-8.
Italy	UNI 0204 - 1970				ISO 646-IT.
Japan	JIS C 6220-1969 JIS C 6220-1976		JIS C 6226-1978 JIS C 6226-1983 JIS X 0208:1990 JIS X 0212:1990 JIS X 0213:2000 JIS X 0213:2004		JIS C 6220 (Roman version, not Katakana version) = ISO 646-JP.
Kazakhstan		ST RK 920:91 ST RK 1048:2002			ST RK 920:91 is for DOS; ST RK 1048:2002 is for Windows.
North Korea			KPS 9566-97
South Korea	KS C 5636 KS X 1003 - 1989		KSC 5601-1987 KS C 5601-1992		KS C 5636 is not an official part of ISO 646 series.
Latvia		RST 1040-90 LVS 8-92			RST 1040-90 is commonly known as Code Page 866-Latvian.
Lithuania		RST 1093-89 RST 1095-89 LST 1282:1993 LST 1283:1993 LST 1284:1993 LST 1590-1 LST 1590-2 LST 1590-3
Malta	?¹	MSA ISO 8859-3?²			¹ There is a character set commonly referred as ISO 646-MT (not an official part of the ISO 646 series), but I don’t know if it has been defined as a Maltese official standard; ² The MSA has included all the ISO 8859 series among their standards; however, I haven’t seen any document saying specifically that MSA ISO 8859-3 is the national standard.
Norway	NS 4551-1 NS 4551-2				ISO 646-NO.
Poland	BN-74/3101-01	PN-T-42118:1993			BN-74/3101-01 is not an official part of ISO 646 series.
Romania		SR 14111:1998
Soviet Union	GOST 13052-74	GOST 19768-74 GOST 19768-87			GOST 13052-74 is commonly known as KOI-7; GOST 19768-74 is commonly known as KOI-8; check if they superseded as Russian standards
Sri Lanka		SLS 1134:1990 SLS 1134:1996 SLS 1134:2004			SLS 1134:1990 was not approved; SLS 1134:2004 is the same as the Sinhala (Unicode block) but assigned to the upper part of an 8-bit character set; commonly called SlaSCII.
Sweden	SEN 850200 b SEN 850200 c				ISO 646-SE. SEN 850200 b is identical to Finnish Standard SFS 4017.
Taiwan	CNS 5205-1996		CNS 11643-1992		CNS 5205-1996 is not an official part of the ISO 646 series; the more popular Big5 is not the national standard.
Thailand		TIS 620-2529 TIS 620-2533
Turkey		TS-5881:1988
United States	ANSI X3.4 - 1968				Commonly called ASCII; ISO 646-US.
United Kingdom	BSI 4730				ISO 646-GB.
Vietnam		TCVN 5712-1:1993 TCVN 5712-2:1993 TCVN 5712-3:1993	TCVN 6056:1995	TCVN 6909:2001	TCVN 5712 is also referred as VSCII; the more popular VISCII is not the national standard TCVN 6056 is for the Chữ Nôm script.
Yugoslavia	JUSI.B1.002 JUSI.B1.003 JUSI.B1.004	JUS I.B1.013			In Croatia, JUS I.B1.013 was superseded as the HRN I.B1.013:1988 standard; check if these standards were not followed in the other countries of former Yugoslavia; JUSI.B1.002 = ISO 646-YU.

As it can be seen, putting all the national standards in the template can be cumbersome. Perhaps, it would be better if, in each article about a character set, we put the clear statement “It is the national standard of (country), called (name or code).”.

I would like to hear some feedback before making some changes.

Code Page Guy (talk) 16:39, 4 March 2017 (UTC)[reply]

Update Apple 1 link & more

Please update the Apple 1 link to point to Apple_I#External_links.

The article the link point to now has been deleted. — Preceding unsigned comment added by 84.82.12.118 (talk) 21:35, 30 November 2019 (UTC)[reply]

There's a draft of the Apple III character set at Draft:Apple III character set but it will never survive by itself. Consider merging all old Apple sets into one article, source it well, and write up some of the history about them, otherwise it will all just get deleted and you might as well remove them from the infobox now.

The Amstrad link should probably point to Amstrad CP/M Plus character set.

The Apple Sabine link should be removed and that article should be deleted.

The only reference to Elwro Junior is here: List of ZX Spectrum clones#Elwro_800_Junior Currently the link points to an article about Polish spelling. I'm actually not sure if the Elwro Junior has its own character set; it may just be the same as the ZX Spectrum's character set.

The Mattel Aquarius character set article will not survive on its own; I recommend merging it into the Aquarius article.

The Minitel character set article has been deleted. Either remove it from the infobox, or put the character set in the Minitel article.

The OricSCII article has also been deleted. Put the character set in Oric or remove it from the infobox.

The Sega SC-3000 character set article should probably be deleted. Games at the time tended to use sprites and tiles and the meaning / appearance of a given code would be determined by whatever was in sprite ROM.

The Teletext character set will probably get deleted soon, as will Videotex character set.

Semi-protected edit request on 9 July 2020

This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request.

The leading word "IBM" and the trailing word "emulations" should not be in this list. These terms don't make any sense next to the works Apple, Adobe, etc. Following are the lines to change - just removed IBM and emulations from each:

IBM Apple Macintosh emulations IBM Adobe emulations IBM DEC emulations IBM HP emulations 66.210.61.254 (talk) 14:41, 9 July 2020 (UTC)[reply]

Not done: please provide reliable sources that support the change you want to be made. Eggishorn (talk) (contrib) 17:00, 9 July 2020 (UTC)[reply]

I don't know of sources, I'm sorry, for the things to be changed are plain: the term "IBM" doesn't precede Apple - why would it. The term "emulations" doesn't follow Apple, why would it? Are you aware of the character sets used in those machines? They aren't emulations of any IBM anything. The terms are unfortunately free of meaning. I didn't know this would be an unusual request. Sorry to have bothered you. — Preceding unsigned comment added by 66.210.61.254 (talk) 17:05, 9 July 2020 (UTC)[reply]

The phrase "IBM Apple Macintosh emulations" means emulations of Apple Macintosh, as used by IBM; it does not mean emulations of IBM.

The Apple encodings are listed by their actual names under the MacOS code pages ("scripts") heading already. The IBM Apple Macintosh emulations heading is listing the code page numbers assigned by IBM to the Apple encodings, e.g. Mac OS Roman is numbered 1275 by IBM (see [1]). These numbers are only used by IBM or by things associated with IBM (e.g. software running under IBM products, or possibly ICU, which started off as an IBM project): for example, Microsoft assigns the same encoding (Mac OS Roman) the completely different code page number 10000 (see [2]; I'm not entirely sure why these are not also listed).

-- HarJIT (talk) 17:57, 9 July 2020 (UTC)[reply]

Semi-protected edit request on 19 April 2022

This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request.

The "Symbol" link in the Platform Specific section links to a general Symbol page. Shouldn't it be linked to Symbol_(typeface) instead? 68.9.24.237 (talk) 08:28, 19 April 2022 (UTC)[reply]

Done ScottishFinnishRadish (talk) 11:11, 19 April 2022 (UTC)[reply]