Jump to content

Talk:EBCDIC/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1

EBCD

The article on the IBM 5100 says EBCD was used in the IBM 2741. --Gbleem 05:30, 1 April 2006 (UTC)

No, the 2741 used a 6-bit character code (plus shift-up and shift-down control codes). The communications controller (an IBM 270x-series device) interpreted the up/down shift codes and converted them into a seventh bit to store in memory, and generated up-down shift codes from the seventh bit on output. This seven-bit code was generally converted to/from EBCDIC by software. The 6-bit code was an encoding of the Selectric's typeball tilt amount (2 bits) and rotation (4 bits). In models where an ordinary office typeball was used, the resulting character code was called "correspondence code". Other models used a code closer to computational BCD, achieved mostly by using a typeball with the characters arranged differently. (This latter form required more logic circuitry to translate keyboard inputs appropriately.) 209.245.22.27 (talk) 18:07, 28 November 2007 (UTC)

Addition Request

The ISO/IEC 8859 article has a nice table showing the various parts. It would be nice to have a similar table showing the EBCDIC variants. One could then see at a glance where they were the same and where they were different. Such a table should have (at a minimum) CCSIDs 037, 285, and 500.

Agreed. The most common code page used in the U.S. was 037, but this has been replaced in recent years by 1047 (at least on S/390 systems running Linux). — Loadmaster 23:28, 13 November 2006 (UTC)

Query

Is 5A not the exclaimation mark? —Preceding unsigned comment added by 194.81.254.10 (talk) 02:01, 2 November 2007 (UTC)

It seems that the code page table was wrong. The row 8 with the letters a-i was be moved one column to the left, that is a=81, b=82 and so on. —Preceding unsigned comment added by 88.131.23.18 (talk) 16:53, 17 December 2007 (UTC) , fixed. JoeBackward (talk) 03:31, 9 January 2008 (UTC)

Support

I would guess that the word support as in the "computer supports EBCDIC" was originally marketspeak. It implies that the use of EBCDIC is disirable option instead of a requirement. --Gbleem 22:15, 31 August 2006 (UTC)

The IBM S/360 had an "ASCII/EBCDIC" bit in the program status word "register", supposedly to control what zone nibbles were created in zone decimal conversion opcodes. I think the theory was that instead of generating "F0 F0" (which is EBCDIC "00"), it would generate "30 30" (which is ASCII "00"). This control bit was removed in later versions of the hardware. — Loadmaster 23:26, 13 November 2006 (UTC)
Actually, the S/360 would generate zone values of "50 50" for ASCII zeroes, because IBM assumed (was hoping) that the industry would accept extending 7-bit ASCII to 8 bits by shifting the first 3 bits to the left and inserting a duplicate of the high-order bit into the fourth bit position (from the left). The logic of packed decimal arithmetic in some instructions, such "Edit" depended on the notion that the "sign digit" would have a value that fell beyond the 0-9 range. (A full discussion of this might be interesting content for Wikipedia, but not in the EBCDIC article.) In the end, IBM apparently decided that the best way to support ASCII was to use EBCDIC internally and then convert character data to ASCII by use of the TR (Translate) instruction. The bit in the PSW (Program Status Word) assigned to specifying "ASCII" mode was re-assigned in S/370 to control "Extended Control" mode. This was safe because IBM never created an operating system that set the ASCII bit to 1, and setting the bit could only be done by privileged (i.e. OS) code. -- RPH 12:29, 27 June 2007 (UTC)
Correction: Actually IBM's concept of an 8-bit version of ASCII (or USASCII, as it was known later in the life of the System/360) was more complex, as described in the System/360 Principles of Operation. IBM had proposed an 8-bit extension of (US)ASCII by applying a mapping transform in which the three high-order bits of the byte were taken from the first two bits of the ASCII character code, followed by the high-order bit, repeated. This had the effect of "stretching" the ASCII code points across the 0-255 range. For example, the numeric values would mapped from hex 50-59 instead of 30-39. IBM apparently hoped that this arrangement would be accepted by the committee, because it would avoid architectural problems with the implementation of packed-decimal instructions. For example, the "Edit" (ED) and "Edit-and-Mark" (EDMK) instructions used character values of 20 and 21 as "digit select" and "significance start" characters, but that wouldn't work properly if the space were still mapped to the hex 20 code point. Under IBM's re-mapping, the value of a Space character would be hex 40 (the same as in EBCDIC). Since the standards committee never agreed to IBM's 8-bit mapping, IBM dropped the "ASCII-mode" bit in the Program Status Word in the following generation of processors, replacing the bit with one that indicated "extended control mode". ASCII would be supported by using the Translate instruction upon input and output. This information would be an interesting historical background for both EBCDIC and the System/360, although probably in a separate article RPH 20:40, 10 September 2007 (UTC) (reedit: RPH 21:30, 23 October 2007 (UTC)) There is a discussion of the ASCII-bit in Note 2 of the IBM System/360 Wikipedia article and its support of the proposed "Decimal ASCII" 80-column punched card that was rejected by the user community.

Pronunciation?

How is EBCDIC pronounced? Eb-ka-dic? --Dgies 18:10, 3 November 2006 (UTC)

The jargon file gives "eb-see-dic" together with two less euphonic variants; but I'd really like an "official pronunciation" added into the article. --tyomitch 03:49, 7 November 2006 (UTC)
Most mainframe programmers I've heard (in the U.S.) pronounce it "eb'-se-dik". — Loadmaster 23:23, 13 November 2006 (UTC)
I've heard "eb-see-dic" a lot too. --Memming (talk) 21:27, 5 May 2008 (UTC)
Yes, "eb-see-dic" or "eb-sa-dic" is what most people say. 210.48.101.106 (talk) 22:07, 17 May 2009 (UTC)
I tried to add this and got reverted. Oh well 24.223.133.37 (talk) 13:57, 28 June 2009 (UTC)
It has been a while since I've had to work with it, but it was always pronounced "eb-sa-dic" then. Surv1v4l1st (Talk|Contribs) 04:27, 30 November 2009 (UTC)
When I worked for IBM Havant (UK) in the early'70s, it was Eb-ka-dic; but of course, it may have changed since then. Or then again, it may be a US/UK thing. Paul Magnussen (talk) 01:50, 1 December 2016 (UTC)
It was "EB-se-dik" in Australia in the 1980s, at least among the Burroughs people.210.84.12.126 (talk) 11:20, 2 December 2016 (UTC)

Relation to Hollerith Code

I have read that EBCDIC is a descendant of Hollerith Code (e.g. http://www.columbia.edu/acis/history/census-tabulator.html). Unless this is not accurate, it should be mentioned. (Even if it is false, that should be mentioned, since it is out there.) —überRegenbogen 12:26, 24 February 2007 (UTC)

I added a mention of "Extended Hollerith" as the card-code that corresponds to EBCDIC in the S/360+ systems. Much more could be said on this topic, such as including a code-chart that demonstrates the logical nature of the mapping between EBCDIC and the extended card-code. Such a chart is found in the IBM S/360 Principles of Operation manual and many other publications from that period. Actually, such a chart, showing EBCDIC in its original form, would be more instructive than the somewhat disingenuous inclusion of one of the National Language Support extensions to EBCDIC, apparently to make the point that EBCDIC is a chaotic mess, even though extensions to ASCII for this purpose has had essentially the same effect on that code as well. -- RPH 12:42, 27 June 2007 (UTC)
I removed the international characters from the chart, leaving only the common EBCDIC characters. I also shaded the invariant code points, which represent the same characters in all EBCDIC variants. (Corrections welcome, of course.) — Loadmaster (talk) 19:14, 12 April 2008 (UTC)

Usage of EBCDIC

All IBM mainframe peripherals and operating systems (except Linux on zSeries or iSeries) use EBCDIC as their inherent encoding but software can translate to and from other encodings.

At exactly which places is EBCDIC used within IBM products? I can only think up EBCDIC being used as a text file encoding, but with Unicode even that usage is obsolete. --Abdull 09:51, 7 June 2007 (UTC)

As far as I know, every IBM mainframe still uses EBCDIC as its primary character set. Which means that that every mainframe disk file (dataset), data storage tape, or CD contains text data in EBCDIC form. Can you name any IBM mainframe systems that actually use Unicode? — Loadmaster (talk) 19:19, 12 April 2008 (UTC)
It is still used in financial transaction processing for debit networks. --Mirell (talk) 15:30, 14 April 2008 (UTC)
Websphere, which is a fairly standard sort of Web server, of course uses UTF-8 extensively. Db2 (an RDBMS) uses UTF-8 internally for nearly everything other than raw table data. There is a component of the operating system called "Unicode Services" which is an "everything to everything" translation engine and of course supports the eponymous Unicode. The fact that IBM called it that rather than "Character Translation Services" would seem to indicate their mental commitment to Unicode. Charlesm20 (talk) 15:37, 17 June 2019 (UTC)

Redundant

Under the "Criticism and humor" section, isn't "Another popular complaint is that the EBCDIC alphabetic characters follow an archaic punch card encoding rather than a linear ordering like ASCII. " equivalent to the snippet from esr: "...such delights as non-contiguous letter sequences..."? --WayneMokane 22:36, 18 October 2007 (UTC)

EBCDIC niceties?

The example given, "while in EBCDIC there is one bit which indicates upper or lower case", is not valid, since the same applies to ASCII-- the third most significant bit signifies lowercase. Anyone have a replacement, or is EBCDIC without niceties? --Luke-Jr (talk) 07:43, 23 March 2008 (UTC)

Yes. In its original form, that is, as an Extension of Binary Coded Decimal Interchange Code, a six-bit IBM code, the addition of two high-order bits allowed the characters to be unfolded into four groups, or "quadrants", numbered 0 to 3. Quadrant 0 contained control codes, generally only used for terminals. Quadrant 1 contained the space and all "special" characters (punctuation marks and symbols). Quadrant 1 was for lower-case letters (rarely used in the 1960's, and not part of the original BCD code). Quadrant 3 contained the capital letters and numberic characters, in that order. Overall, this mapping allowed put the characters into a good sorting-order, while at the same time simplifying the logic circuits that the translation of the old 6-bit BCD into EBCDIC, into quadrants 1 and 3. Since BCDIC had no controls characters or lower-case letters, this was done by setting the leftmost bit according to whether the character was alpha or numeric, and the second bit was always 1. Early peripherals for the 360, such as the 1403 and 1443 printers, carried over from the previous generation of systems, worked without modification using the last 6 bit of the character code, although an extra-cost feature, called UCS for (Universal Character Set), permitted use of the full 8 bits, to support lower-case and other characters. The old 7-track tapes (6 bits plus parity) written on earlier systems, could be read and translated into EBCDIC by the tape control unit electronics. Most installations had mostly 9-track drives, and one or two 7-track drives for tape compatibility with the older IBM systems, which continued to be used alongside the 360's until they were phased out. So, EBCDIC was meant to be a transitional code, but conversion to ASCII turned out to be tougher than anticipated, made more difficult when IBM's proposal for an 8-bit mapping of ASCII was rejected by the standards committee, in which IBM had little voting power. This made the ASCII-mode bit in the 360's Program Status Word essentially useless. The ASCII-mode was dropped in the System/370, and the bit became the "extended control mode" bit, enabling the new 370 features, such as virtual memory. Since the bit was always set to 0 by older operating systems, it became a handy way of enabling compatible operation of 360-based operating system code. RPH (talk) 14:49, 12 April 2008 (UTC)
The "nicety" of turning one bit on or off to change case in character codes is nice for building case-insensitive sort keys for straight keyboard text data sets: you OR an ASCII character value with $20 to make text chars all lowercase or OR an EBCDIC character value with $40 to make text chars all uppercase. ORing figure (numeral, digit) values with $20 in ASCII ($30-$39 for figures 0-9) or $40 in EBCDIC ($F0-$F9 for figures 0-9) does not change the values of the figure codes. EBCDIC and ASCII are about as easy to work with as far as uppercase, lowercase and figures are concerned, but EBCDIC by grouping the common keyboard punctuation in a "quadrant" of $40-$7F was slightly easier to work with. (Programming other than sort/merge, the splits in the EBCDIC A-I J-R and S-Z code assignments dictated by conformity to the Hollerith punch card code makes "bumping" through the alphabet by incrementing the char code value more complicated in EBCDIC compared to ASCII.) In both EBCDIC and ASCII, typesetting systems map additional characters (accented letter, small caps, old style figures, inferior figures, superior figures, symbols) into unused corners of the $00-$FF binary matrix. The RCA GSD extended EBCDIC for PAGE-1 and the Postscript font ASCII assignments (to name two I have had to work with) require a translate table as the most effecient way to build a case insensitive sort key. At that remap stage, neither EBCDIC nor ASCII are nice and both become "necessary evils". Naaman Brown (talk) 13:34, 15 May 2009 (UTC)

Sort Merge EBCDIC Order versus ASCII

Sorting keyboarded text in EBCDIC you group (low to high):
control code values $00-$3F,
punctuation $40-$7F,
lowercase letters $81-$A9,
uppercase letters $C1-$E9,
numbers $F0-$F9;
sorting keyboarded text in ASCII you group (low to high):
control code values $00-$1F,
some punctuation $20-$2F,
numbers $30-$39,
some more punctuation $3A-$40,
uppercase letters $41-$5A,
some more punctuation $5B-$60,
lowercase letters $61-$7A,
some more punctuation $7B-$7F.
The superiority of EBCDIC over ASCII in sort/merge applications should be obvious. And punch cards are immune to EMP. Naaman Brown (talk) 22:49, 14 May 2009 (UTC)
Perhaps, but you should add a citation to back up your claims of "superiority". Otherwise it sounds too much like opinion. And besides, you're forgetting the punctuation and international characters codes interspersed among the English alphabetic characters, which vary from codepage to codepage. — Loadmaster (talk) 22:49, 19 May 2009 (UTC)
See the heading Criticism and Humor. My comment "punch cards are immune to electromagnetic pulse" was meant to be a clue. Seriously though the sort order advantage -- common punctuation, lowercase, uppercase, digits -- is minor, but ASCII having common punctuation between uppercase and lowercase was sometimes annoying. In sorting composition files for various projects at Kingsport Press, I usually had to build translated sort fields whether sorting EBCDIC or ASCII files because of the different mappings for accented letters: they both gave me problems. It is like selecting which is your least favorite of slimy green vegetables. Naaman Brown (talk) 04:05, 18 September 2009 (UTC)
Sorry, too much caffeine was consumed that day. I'm of the opinion that there is no widely used character set that provides a "correct" (or at least "what a reasonable person would expect") character collation sequence. — Loadmaster (talk) 23:21, 12 February 2010 (UTC)

IBM ASCII support citation request

Can someone find a citation to the following paragraph?

Interestingly, IBM was a chief proponent of the ASCII standardization committee. However, IBM did not have time to prepare ASCII peripherals (such as card punch machines) to ship with its System/360 computers, so the company settled on EBCDIC at the time. The System/360 became wildly successful, and thus so did EBCDIC.

How about Wiki itself?

The article on Bob Bemer mentions that he is commonly called the Father of ASCII, and he was an IBM employee at the time, and for the next quarter-century. And IBM sent him to the ASCII meetings, and paid his way, and allowed him to do his extensive work on ASCII during company time. —Preceding unsigned comment added by T-bonham (talkcontribs) 06:35, 12 March 2009 (UTC)

What confuses me is that it implies that EBCDIC was developed because IBM didn't have time to implement ASCII. If they had enough time to develop and implement EBCDIC, then surely they could have saved time by just implementing ASCII. Was it perhaps the case that IBM foresaw that the final ASCII spec would not be ready in time? — User:ACupOfCoffee@ 18:42, 16 November 2011 (UTC)

Spreadsheet format

Many EBCDIC files were tables of data without separating characters. For example, every 100 chars a row ends and that is broken up into 20 '5 character' fields.

EBCDIC is a character encoding of course but what's the name of this table format? —Preceding unsigned comment added by 210.48.101.106 (talk) 06:24, 4 June 2009 (UTC)

It's not really a format, it's known as fixed length records, in which there is 0 or more records which are delimited by lines. The format itself is usually defined by some property in the system or through an accompanying file. In a lot of cases there is no file, what you know as a file is just an extract of part of a database and nothing about this phenomenon is specific to EBCDIC. Jeffz1 (talk) 08:44, 4 June 2009 (UTC)
See Record-oriented filesystem, which explains fixed-length record files. — Loadmaster (talk) 15:49, 4 June 2009 (UTC)

Tables don't correspond

The table in the article is said to be "derived from" CCSID 500. The wording implies that all the characters shown are identical to CCSID 500, and the only difference is the omission of certain characters that aren't "basic English". In fact, though, the table differs from that given at EBCDIC 500 in several positions; for example, 4F, 5A, BA and BB (I don't know if that's a complete list). I think the reason for this, and the meaning of "derived from", should be explained better. 86.161.40.166 (talk) 20:32, 25 November 2009 (UTC).

It's not derived from CCSID 500. It's derived from CCSID 037, and I've updated the article to say that. Guy Harris (talk) 03:12, 25 July 2015 (UTC)

EBCDIC vs. ASCII

Rather than start an edit war, I'm going to ask for comments.

I changed this:

EBCDIC has no modern technical advantage over ASCII-based code pages such as the ISO-8859 series or Unicode. There are some technical niceties in each, e.g., ASCII and EBCDIC both have one bit which indicates upper or lower case. But there are some aspects of EBCDIC which make it much less pleasant to work with than ASCII, such as a non-contiguous alphabet.

to this:

EBCDIC has no technical advantage or disadvantage compared to ASCII-based code pages such as the ISO-8859 series or Unicode. There are some technical niceties in each, e.g., ASCII and EBCDIC both have one bit which indicates upper or lower case. Unlike ASCII the EBCDIC alphabet is non-contiguous and is interleaved with some non alphabetic characters. Data portability is hindered by the fact that EBCDIC lower case alphabetic characters are lower in the collating sequence than upper case, and numerics are higher than both— the exact opposite of ASCII.

and had it reverted with the comment:

Nonsense, the non-continguous alphabet is obviously a problem. The order of case & numbers is trivial in comparison)

It seems to me that there are a lot of rabid anti-EBCDICians, but I believe there is no real difference between code pages containing the same characters as long as the alphabetic and numeric characters sort correctly. The contiguousity or non-contiguousity of the alphabetic characters is immaterial, since code usually uses something like "isalpha" rather than comparing for >='A' and <='Z'. On the other hand, the difference in the way data sorts - I should have said "data interchange" rather than "data portability" - causes a lot of problems porting data between ASCII-based and EBCDIC-based systems, since often programs expect data to be sorted in a particular order.

This is a somewhat longish comment, but, as I said, I feel this article is slanted by people who dislike EBCDIC. What's the consensus? Peter Flass (talk) 12:52, 3 June 2012 (UTC)

I restored your edit because it changed the paragraph so that it presented both sides of the argument very well, and one of the core policies of Wikipedia is neutral point of view. In all my years working with EBCDIC and ASCII, I've never seen the non-contiguous alphabet as a problem, but I would be wrong to remove statements to the contrary from the article. Pages on tech subjects are gonna draw a lot of opinions, but I will oppose anyone who claims their opinion is the only "right" one. There is no such thing as "the one correct opinion"; such is a hallmark of fallacy. — UncleBubba T @ C ) 20:14, 3 June 2012 (UTC)
It seems that someone has changed the article text again, moving around some text and--conveniently--removing the no-technical-advantage wording. I don't know about you folks, but I'm going to insist this be discussed here before the article text is changed, which is--after all--the way things are supposed to work on Wikipedia.
Thoughts, anyone? — UncleBubba T @ C ) 03:24, 13 June 2012 (UTC)
Since the paragraph does not in any way anymore talk about any "technical advantage or disadvantage", except for the non-contiguous alphabet which is already mentioned 3 times in the "criticism and humor" section, it seemed best to remove the totally meaningless opening sentence and the only comparison (which goes against EBCDIC and thus conflicts with the "no disadvantage" statement). The paragraph did discuss two unrelated subjects: I18N extensions to EBCDIC, and the fact[citation needed] that sorting order causes more difficulty for interoperating with ASCII than the code point differences. My edit was an attempt to sort this out.
I have to say that I find it pretty shocking and disgusting that comparing inanimate objects is considered NPOV. But if that is the case, it is best to say nothing.Spitzak (talk) 20:00, 13 June 2012 (UTC)

Control codes

What were the control codes for and why are some not in Unicode? — Preceding unsigned comment added by 82.139.81.0 (talk) 21:09, 14 June 2013 (UTC)

Half (32) of the control codes have exact ASCII equivalents. The other half were mostly control codes for IBM 3270 displays and other IBM hardware devices. A few code points were not assigned at all. — Loadmaster (talk) 21:15, 25 September 2013 (UTC)
Looks like they pretty much all have equivalents. IBM redefined "DC1", "DC2", etc. which are device-specific anyway, and maybe used some if the controls in "unique" ways "FS", "GS", "RS". Peter Flass (talk) 00:24, 26 September 2013 (UTC)

Criticism and humor

"programming a simple control loop to cycle through only the alphabetic characters is problematic." I think people are reaching for things to criticize EBCDIC for out of an anti-IBM bias. Sure you can't simply add one to 'I' and get 'J', but how "problematic" is it to code a table? No competent programmer would have any problems with this. Peter Flass (talk) 13:30, 19 September 2013 (UTC)

I decided to rewrite this sentence. The sequence of EBCDIC characters was only problematic to programmers used to ASCII and the Cism 'I'++ == 'J'.. I hope my version is neutral. Peter Flass (talk) 14:11, 19 September 2013 (UTC)
OK,I wrote this, but had it reverted:

Some programmers accustomed to ASCII were confused that adding one to the binary value of an EBCDIC character might produce an unexpected result

My feeling is that prior to ASCII most or all character sets didn't have contiguous alphabetics. BCD, the most widely-used encoding was similar to EBCDIC, so EBCDIC would not have been problematic. Comments? Peter Flass (talk) 12:29, 21 September 2013 (UTC)
But for some reason the criticism that it was not standard is not suffixed with "Some programmers accustomed to ASCII were confused that this was not a standard", and that there were several versions is not suffixed with "Some programmers accustomed to ASCII were confused that there were several versions", so I see no reason to add this sentence either. It also reads like astroturfing (which is really weird as the argument is 50 years old and will have no impact on IBM's income today).Spitzak (talk) 15:15, 22 September 2013 (UTC)
Reading above it looks like you claim it is not a problem because "everybody uses isalpha()". This is FALSE. In 1970 or so everybody did "c >= 'A' && c <= 'Z'", in fact it was a common problem that even lower-case letters failed. On the first C libraries isalpha() was a macro that did this (it was changed to a macro to avoid the double-evaluation of the argument, and only after much brow-beating as using 256 bytes for a lookup table was considered a horrible waste of memory, quite often versions failed for bytes with the high bit set as they only used 128 byte tables as saving 128 bytes was considered very very important). In any case you can't go fishing for excuses, both claiming old practice and modern practice somehow make a point when at the time of the argument it was brain-dead obvious to any programmer which encoding was superior.Spitzak (talk) 15:23, 22 September 2013 (UTC)
Obviously we disagree ;-) I don't think calling something a problem because many programmers were sloppy or lazy is justified. In any case, my point is that ASCII and EBCDIC orignated in roughly the same time period after BCD had been in use for many years. No one going from BCD to EBCDIC would have been confused; it was only the programmers who had used ASCII first would see anything wrong with it. At any rate, I'll wait and see if anyone has anything to say on the talk page. Peter Flass (talk) 15:39, 22 September 2013 (UTC)
Stating that programmers who used ASCII were confused about the non-linear arrangement of EBCDIC is a retrofitting of history. I would guess that far more programmers moved from EBCDIC to ASCII machines in the 1970s (if they moved away from IBM at all), than those who were accustomed to ASCII (or any of the half-dozen or so other character sets in use during the 1950-1960s) moved to EBCDIC machines. In any case, the idiomatic ⟨c >= 'A' && c <= 'Z'⟩ was an invention of C on ASCII machines. I doubt that any similar idiom existing on any EBCDIC machine at the time. (IIRC, the white book mentions this character set difference.) Obviously, S/360 programmers had some means of testing for alphabetic characters other than a single range comparison. — Loadmaster (talk) 21:46, 24 September 2013 (UTC)
We learned to test whether the char is in range ('a'..'i','j'..'r','s'..'z') or not ... 178.19.213.10 (talk) 11:10, 6 August 2021 (UTC)
A TRT instruction could give you a lot more information than just alpha/numeric. There were similar instructions on most EBCDIC machines. Peter Flass (talk) 13:54, 6 August 2021 (UTC)

8 bit vs. 16 bit

Comments in this article regarding limitations to character encoding are inaccurate. Modern systems support 16 bit EBCDIC encoding known as MBCS and DBCS.

From: http://www-01.ibm.com/software/globalization/terminology/d.html#x2001652

double-byte character set (DBCS) A set of characters in which each character is represented by 2 bytes. These character sets are commonly used by national languages, such as Japanese and Chinese, that have more symbols than can be represented by a single byte. See also multibyte character set, single-byte character set. — Preceding unsigned comment added by 71.217.57.224 (talk) 12:07, 19 June 2014 (UTC)

Just as UTF-8 is distinct from ASCII, any new EBCDIC-derived character sets aren't EBCDIC. MBCS and DBCS could be mentioned if their relationship to EBCDIC is documented and sourced (the one provided here doesn't show that). But unless they are more widely used than I think (which would also have to be sourced), they may not be notable enough for an encyclopedia. A D Monroe III (talk) 15:14, 20 June 2014 (UTC)

EBCDIC variants (and mapping between ASCII)

I was just curious if all of ASCII mapped to EBCDIC (and vice versa) and looked up and found: https://support.microsoft.com/en-us/kb/216399

I didn't realize there where many variants.. (excluding the international "higher 8-bit" that would map to variants of EBCDIC).

About "The table below is based on CCSID 500" (doesn't quite match): "used in IBM mainframes" (Not IBM's version according to Microsoft.., but the HP's one.). comp.arch (talk) 19:39, 24 July 2015 (UTC)

As for whether ASCII maps to EBCDIC, the first edition of the System/360 Principles of Operation has an EBCDIC table in Appendix F, and EBCDIC, in that table, lacks, for example, the curly brackets. Then again, so does the table, in the same appendix, for some oddball "ASCII Extended To Eight Bits", so that may have been from the early days of ASCII.
Now there are a whole bunch of IBM flavors of EBCDIC, for various countries (as well as various extended ASCII character sets).
EBCDIC most definitely does not map to ASCII; ASCII has no "cent sign" or "not sign" characters.
As for Microsoft's "helpful" page:
I'm not certain why HP or AT&T get to have their own versions of EBCDIC.
AT&T's only operating system does have a dd command that, among other things, translated between said operating system's native character set, namely ASCII, to EBCDIC, but the original version had no comments explaining where the translation tables came from, nor did the V7 version. The V7 version appears to map '[' to 0x4A. The System III version at least has, mirabile dictu, a comment:
/* This is an ASCII to EBCDIC conversion table
from a proposed BTL standard April 16, 1979
*/
so maybe the "proposed BTL standard" is their own version of EBCDIC. Following that table is an atoibm table, so I guess that's "ASCII to IBM's EBCDIC".
HP's textbook on SPL, the implementation language for MPE has an ASCII table in it, from which I infer that it was ASCII-based. The manual for the DOS III for the HP 1000 series speaks of ASCII, so I suspect all the HP 21xx/1000-series minicomputers also had ASCII-based OSes. HP's other operating system is based on the aforementioned AT&T operating system, so it's also ASCII-based.
So presumably the "HP" and "AT&T" variants came from some ASCII <-> EBCDIC translation offered by those vendors' OSes. Given that IBM was the creator of EBCDIC, I'd say their version wins, and any differences between it and what other vendors' translation tables generate are bugs in those vendors' translation tables or translation tables derived from older versions of EBCDIC or ASCII.
As for "Code page 500"/CCSID 500, i.e. EBCDIC 500, this version and this version of them, from IBM, are transposed (in the matrix-algebra sense) from the table on the EBCDIC 500 page, but both of them put [ at 4A and ] at 5A, which doesn't match what Microsoft says, so I'd vote for completely ignoring their list of "notable differences" for now. They claim to have gotten that encoding from a manual for the IBM 3780; bitsavers.org doesn't have a manual for the 3780, but it does have a manual for its predecessor, the 2780. That manual doesn't list punched card codes for [ or ] in EBCDIC; it does have codes for them in ASCII, and the "TRANSMISSION BIT PATTERN" for them appears to be the ASCII bit pattern, giving 5B and 5D, respectively. Perhaps the 3780 added EBCDIC support for [ and ], but, if not, perhaps whoever at Microsoft read the 3780 manual misinterpreted those codes as EBCDIC codes. Guy Harris (talk) 21:08, 24 July 2015 (UTC)
Microsoft's "helpful" page also referred to code page 037, and here's one of IBM's tables for that code page. That code page puts [ at 0xBA and ] at 0xBB, so the answer to "what are the code points for [ and ] in EBCDIC?" depends on what you mean by "EBCDIC", i.e. which particular IBM EBCDIC code page/coded character set/etc. Guy Harris (talk) 21:11, 24 July 2015 (UTC)
And the table on EBCDIC is based on EBCDIC 037, not EBCDIC 500; I've fixed the text to say that. As the article says, "Unassigned codes are typically filled with international or region-specific characters in the various EBCDIC code page variants, but the punctuation marks are often moved around as well; only the letters and numbers and space have the same assignments in all EBCDIC code pages." The square brackets, for example, don't have the same code points in 037, 500, and 1047. Guy Harris (talk) 22:34, 24 July 2015 (UTC)
A later edition of the S/360 Principles of Operation also has an EBCDIC table in Appendix F, but it differs in a few places, and is closer to EBCDIC 037 than the older version. The older version differed from 037 in the assignment of the cent-sign character; the newer version is a strict subset of 037. I suspect EBCDIC was revised before the final edition (those changes didn't affect the instruction set, so it's not as if it was a normative change to the Principles of Operation), and that the version in the later edition is the "correct" initial version of EBCDIC.
Of the EBCDIC code pages I could find under ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/ (files with names beginning with "CP" and ending in ".txt", and containing the string "EBCDIC" with a case-insensitive search), 17 give it the encoding 4A (the value in the later Principles of Operation), 33 give it the encoding B0, and one gives it the encoding B1, with no extremely obvious pattern. There's probably a "core EBCDIC", for all the code points that mean the same thing in all versions of EBCDIC (or perhaps just the single-byte versions), but it's even smaller than the EBCDIC in the Principles of Operation table - "cent sign" isn't in it. Guy Harris (talk) 07:40, 25 July 2015 (UTC)
Thanks for all this info, I noticed "Portability is hindered by a lack of many symbols commonly used in programming and in network communications, such as the curly braces." (and how it would be a problem for C..) but then the braces are in the table.. (and in UTF-EBCDIC..). At least in theory, as ASCII is 7-bit, it could map to EBCDIC (at least in a later variant), and that seems to be the case.. I didn't check.. comp.arch (talk) 16:39, 26 July 2015 (UTC)
Curly braces aren't in the A22-6821-7 version. They are, however, in the version in GA22-7000-4, an edition of the System/370 Principles of Operation; however, that version doesn't have square brackets, also used in a number of programming languages.
"At least in a later variant" is the key here. ASCII could fully map to EBCDIC only in versions of EBCDIC that actually fill in 95 of its code points with printable characters from the ASCII repertoire; character encoding A having fewer code points than character encoding B is not sufficient to allow encoding A to map to encoding B - encoding B actually has to use enough of those code points and put all the characters from encoding A there in order for that to work.
It also somewhat helps not to have multiple variants of encoding B that assign different code points to the same character, especially if that character is part of encoding A; unfortunately, that's exactly what EBCDIC has (note that the square brackets, along with some non-ASCII characters such as the cent sign, don't have standard code points in EBCDIC).
On the other hand, the curly braces aren't used in a lot of programming languages, so at least most of the languages initially supported on S/360 worked fine (FORTRAN IV, COBOL, PL/I) - ALGOL 60 required (/ and /) to represent square brackets, but it didn't use curly braces). Guy Harris (talk) 18:31, 26 July 2015 (UTC)
I've asked for some clarification (on what "portability" means) and a citation (on the curly braces being "commonly used") for that quotation. They may be commonly used in C and derivatives, but, at the time ASCII and EBCDIC were originally developed, the C programming language didn't exist. Guy Harris (talk) 18:42, 26 July 2015 (UTC)

So the various versions of the Principles of Operation appear to have the letters, digits, space, ampersand, dash/hyphen/minus-sign, slash, period, left parenthesis, plus sign, exclamation point, dollar sign, asterisk, right parenthesis, semicolon, comma, percent sign, number sign (#), at sign, single-quote/apostrophe, and equals sign in the same places.

In all but the early System/360 Principles of Operation, cent sign, vertical bar, not sign, underscore, greater than, less than, colon, and double quote appear to be in the same places.

For the other characters:

  • Appendix F of A22-6821-0, the first edition of the System/360 Principles of Operation, has what appears to be a centered dot at 49, question mark at 4A, left arrow at 4C, something with a vertical line and three horizontal lines intersecting it at 4F, cent sign at 5F, something that might be an acute accent at 69, something that looks like two arcs at 6D, underscore at 6E, plus/minus at 6F, quotation mark (double quote) at 79, colon at 7A, a check mark at 7F, greater than at C0, less than at D0, and something with a vertical line and two horizontal lines intersecting it at E0. There's no obvious publication date on that edition.
  • Appendix F of A22-6821-6, a later edition of the S/360 Principles of Operation, moved question mark to 6F, cent sign to 4A, underscore to 6D, quotation mark (double quote) to 7F, colon to 7A, greater than to 6E, less than to 4C, put a vertical bar at 4F and a not sign at 7F, and got rid of the centered dot, left arrow, vertical line with horizontal bar characters, two arcs, plus/minus, and the accent. It dates back to January 1967. The A22-6821-7, a still later addition, dating back to December 1967, adds some additional control characters, but no new printable characters.
  • GA22-7000-0, the first edition of the System/370 Principles of Operation, just describes differences between System/360 and System/370, and has no EBCDIC table. GA22-7000-4, a later edition of the S/370 Principles of Operation, is a more complete document, and has an EBCDIC table in Appendix H; it adds a broken vertical bar at 6A, a grave accent at 79, a tilde at A1, a left curly brace at C0, a right curly brace at D0, a backslash at E0, a "hook" at CC, a "fork" at C3, a "chair" at EC, and a "long vertical bar" at FA.
  • Appendix G of GA22-7000-10, a still later edition, removes the hook, the fork, the chair, and the long vertical bar, and adds a "required space" (non-breaking space?) at 41, a "numeric space" (digit-width space?) at E1, and a "soft hyphen" (hyphenation-point indicator?) at CA; those sound like word-processing characters. It speaks of a 94-character set, which "can be used with other systems; those systems may use codes, other than EBCDIC, which have 94 graphic characters", which probably means "ASCII" (although that table shows no square brackets), and which has "all graphic characters shown in the EBCDIC table". There are also an 88-character set, missing the vertical bar, grave accent, tilde, and left and right curly brackets; a 63-character set, also missing the lower-case letters; and a 62-character set, also missing the backslash. They say that 4A (cent sign), 4F (unbroken vertical bar), 5A (exclamation point), 5B (dollar sign), 5F (not sign), 6A (broken vertical bar), 79 (grave accent), 7B (number sign), 7C (at sign), A1 (tilde), C0 (left curly brace), D0 (right curly brace), and E0 (backslash) are defined as "Data Processing National Use Positions", meaning that they may be used for those characters in the U.S. English version of EBCDIC, but might be used for other characters in other national versions. They say that the rest are used for all languages using the Latin alphabet, but that "products designed for data-processing applications in a language which does not use a Latin-based alphabet support character sets meeting the particular requirements of that language", meaning that they could conceivably not even have all the Latin letters present with the same codes. Appendix H of SA22-7200-0, the first edition of the IBM Enterprise Systems Architecture/370 Principles of Operation, says the same thing.

I suspect EBCDIC was still evolving in the early days of S/360, and that the version in A22-6821-0 wasn't final. It became "final", in the sense of "nothing moves", somewhere between that version and A22-6821-6. After that, they added some control characters in A22-6821-7, some more characters between that and GA22-7000-10, and added some word-processingy characters and marked some positions as "national use" after that.

None of those fully support ASCII, as they lack square brackets. I'm not sure when they got added. Guy Harris (talk) 00:00, 27 July 2015 (UTC)

And if we slam the handle on the time machine to "Present", Appendix I of SA22-7832-10, which appears to be the most recent edition of the z/Architecture Principles of Operation, has an EBCDIC table with EBCDIC 037, along with "ISO-8", which means ISO 6429 for control codes plus ISO 8859-1 for graphical characters. Other EBCDIC code pages can, and do, reassign the "Data Processing National Use Positions" as they choose. At least 037 has left and right square brackets at BA and BB, respectively. Guy Harris (talk) 01:02, 27 July 2015 (UTC)
There is clearly much history here! I can understand, portablity of C not being an issue prior to it being invented! I was only pointing out that the table didn't support that, compared to the text in the article (for braces, not letters being non-contiguous..). I can go with having a current version, rather than THE first one, of EBCDIC (this one or most popular variant in the article). I just think then the text should say something like "braces where at the time one of the codepoints missing in EBCDIC, that where present in ASCII". No need to involve C in it.. You get very used to them "always being there", but they've been used prior to C! comp.arch (talk) 13:22, 28 July 2015 (UTC)

If someone believes that the ASCII/EBCDIC conversion tables from the POSIX standard are not OK, please send a note.... check here: [1] Schily (talk) 14:19, 28 July 2015 (UTC)

Until I saw the chart linked to, I had no opinion on the ASCII/EBCDIC conversion tables from the POSIX standard, as I had not looked at them; I'd instead looked at the tables in some historical versions of UNIX that predated POSIX.
Looking at those tables:
  • Neither table looks like the first edition of the System/360 Principles of Operation table, which is a Good Thing as that table appears to be a very early version of EBCDIC changed before the later versions of the S/360 Principles of Operation. For example, question mark maps to 6F in EBCDIC.
  • The "ASCII to EBCDIC" table maps ASCII 0176 = 0x7E to EBCDIC 0137 = 0x5F, and the "ASCII to IBM" table maps ASCII 0136 = 0x5E to EBCDIC 0137 = 0x5F. ASCII 0176 is tilde, and ASCII 0136 is ^; EBCDIC 0x5F is not-sign, so those appear to be attempts at approximate mappings. GA22-7000-4, a later edition of the S/370 Principles of Operation has a real live tilde at 0xA1.
  • Both tables include the OCR characters that are in GA22-7000-4, with the correct code points so either those tables aren't 100% based on any EBCDIC from IBM or there were intermediate versions from IBM with the OCR characters but not the tilde. The OCR characters disappeared in GA22-7000-10, a still later edition of the System/370 Principles of Operation.
  • Both tables map ASCII square brackets to EBCDIC code points, but there don't appear to have been any square brackets even as of the 1980's edition of the System/370 Principles of Operation.
What the Single UNIX Specification says in the rationale for dd is:
Standard EBCDIC does not have the characters '[' and ']'. The values used in the table are taken from a common print train that does contain them. Other than those characters, the print train values are not filled in, but appear to provide some of the motivation for the historical choice of translations reflected here.
The Standard EBCDIC table provides a 1:1 translation for all 256 bytes.
The IBM EBCDIC table does not provide such a translation. The marked cells in the tables differ in such a way that:
EBCDIC 0112 ( '¢' ) and 0152 (broken pipe) do not appear in the table.
EBCDIC 0137 ( '¬' ) translates to/from ASCII 0236 ( '^' ). In the standard table, EBCDIC 0232 (no graphic) is used.
EBCDIC 0241 ( '˜' ) translates to/from ASCII 0176 ( '˜' ). In the standard table, EBCDIC 0137 ( '¬' ) is used.
0255 ( '[' ) and 0275 ( ']' ) appear twice, once in the same place as for the standard table and once in place of 0112 ( '¢' ) and 0241 ( '˜' ).
In net result:
EBCDIC 0275 ( ']' ) displaced EBCDIC 0241 ( '˜' ) in cell 0345.
That displaced EBCDIC 0137 ( '¬' ) in cell 0176.
That displaced EBCDIC 0232 (no graphic) in cell 0136.
That replaced EBCDIC 0152 (broken pipe) in cell 0313.
EBCDIC 0255 ( '[' ) replaced EBCDIC 0112 ( '¢' ).
This translation, however, reflects historical practice that (ASCII) '˜' and '¬' were often mapped to each other, as were '[' and '¢'; and ']' and (EBCDIC) '˜'.
so this table doesn't reflect any actual standard EBCDIC; it reflects, instead, standard EBCDIC plus an extension that some print train offered on some IBM printer plus some historical practice. The word "historical" suggests that these tables aren't supposed to reflect Real EBCDIC, but are, instead, supposed to reflect Real Historical UNIX dd Command Behavior, and they probably wouldn't want to change them to reflect Real EBCDIC.
So don't use conv=ascii, conv=ebcdic, or conv=ibm, on any UN*X, SUS-compliant or not, to do anything other than conversion according to the tables; in particular, don't assume it'll do the right thing with Real EBCDIC. If you want to translate between ASCII and EBCDIC on UN*X, either use the International Components for Unicode if you have them on your system, with the appropriate EBCDIC code page, or pick the appropriate code page and do the translation yourself. Guy Harris (talk) 21:30, 28 July 2015 (UTC)
This is a lot of information.... The AT&T and POSIX tables are based on an important rule: they are invertable and dd conv=ascii | dd conv=ebcdic operate as a no-op. The absense of square brackets may be the reason for the pascal (. .) surrogate, but I cannot remember a machine that did not support square brackets in the 1970s, so it seems that this addition was rather common. Schily (talk) 10:05, 30 July 2015 (UTC)
IBM Code Page 00037 can be exactly mapped to and from ISO/IEC 8859-1 Latin-1, an 8-bit superset of ASCII. The only left-over codes are control characters which have no direct ASCII equivalents, and which can be arbitrarily assigned to unused 8-bit (non-ASCII) codes. — Loadmaster (talk) 23:01, 30 July 2015 (UTC)

EBCDIC was rational in its time

When Mr. Raymond speaks of "early" hackers and programmers hating EBCDIC, he was not thinking early enough. That makes both the criticism and the humor ring a little hollow. There were good reasons for EBCDIC circa 1964.

When IBM broke with the then-universal 6-bit character coding and invented the "byte" circa 1964, cheap ROM technology did not exist. That levied two requirements. It was absolutely necessary to accomplish the conversion to the new 8-bit character codes from the 7-track half-inch tape format in which everyone had invested heavily and the conversion from the the IBM "Hollerith" code in a minimal number of "nor" gates constructed of discrete transistors. A tape drive was already about 60,000 in 1964 dollars, and a card reader quite a bit more. Nobody wanted to pay a few thousand more for unncessarily complicated character-translation logic.

EBCDIC was a very rational response to the technology of its time. It continues because major stakeholders who had invested in EBCDIC data-bases cannot justify the cost to change to successively ACSII, and then UTF-8, and then who-knows-what-else-is-coming-down-the-pike. Those who laugh at it only show how new they are to computing. — Preceding unsigned comment added by Pantagnosis (talkcontribs) 00:38, 29 February 2016 (UTC)

ASCII and 8-bit bytes date from a lot earlier than 1964.Spitzak (talk) 07:00, 29 February 2016 (UTC)
ASCII was a 7-bit code for quite a while. When was ASCII-8 introduced? Peter Flass (talk) 14:14, 1 March 2016 (UTC)
I did not say anything about 8-bit extensions to ASCII. What I meant was "8 bit bytes are earlier than 1964" and "ASCII is from earlier than 1964".Spitzak (talk) 22:07, 4 April 2016 (UTC)
ASCII still is (only) a 7-bit encoding, first published as a standard in 1963. Extensions to ASCII, most notably the ISO 8859-n and the ISO 2022 code sets, were invented starting in the late 1970s to deal with European and Japanese character sets. 16-bit Unicode was invented to replace the explosion of all the disparate 8-bit encodings that had promulgated into the computing world by the 1990s. — Loadmaster (talk) 00:10, 2 March 2016 (UTC)
(And Unicode was expanded to a bit under 21 bits in 1996.) Guy Harris (talk) 18:35, 4 April 2016 (UTC)