Talk:EBCDIC
This is the talk page for discussing improvements to the EBCDIC article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1 |
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||
|
Punched-card photo
[edit]I would not say that the punched-card photo illustrates EBCDIC. Punched-card code is a separate encoding of the same characters. Peter Flass (talk) 00:47, 22 September 2016 (UTC)
- I second this and nominate the punch card image for removal. The card is clearly illustrating a 12-bit encoding system, and the image is presented right next to a paragraph saying EBCDIC is an 8-bit encoding.
- Further, there doesn't seem to be any connection between the encoding shown on the punch card and that described in the article. Some examples:
- Character | Card encoding | EBCDIC encoding
- ----------+---------------+----------------
- 0 | 001000000000 | 11110000
- 1 | 000100000000 | 11110001
- 2 | 000010000000 | 11110010
- 3 | 000001000000 | 11110011
- A | 100100000000 | 11000001
- + | 100000001010 | 01001110
- . | 100001000010 | 01001011
- If 12-bit punch cards are somehow related to 8-bit EBCDIC, the article should have a section added to explain the mapping. If they're unrelated, the punch card image should be removed. Mike Schiraldi (talk) 19:11, 8 March 2023 (UTC)
- Change the lower 10 punches into the binary numbers 0000..1001, and if there is more than one, 'or' them together. This is always the lower 4 bits of the EBCDIC. The upper 3 punches are turned into 1100, 1101, 1110, and 1111 if all off. Then some further logic gates flips some of the bits depending on others:
- 0 | 1110 0000 | 1111 0000
- 1 | 1111 0001 | 1111 0001
- 2 | 1111 0010 | 1111 0010
- 3 | 1111 0011 | 1111 0011
- A | 1100 0001 | 1100 0001
- + | 1100 1110 | 0100 1110
- . | 1100 1011 | 0100 1011
- Spitzak (talk) 02:09, 9 March 2023 (UTC)
External links modified
[edit]Hello fellow Wikipedians,
I have just modified 2 external links on EBCDIC. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
- Added archive https://web.archive.org/web/20130526012525/http://www.trailing-edge.com/~bobbemer/P-BIT.HTM to http://www.trailing-edge.com/~bobbemer/P-BIT.HTM
- Added archive https://web.archive.org/web/20081224063219/http://www.iconv.com/asciiebcdic.htm to http://www.iconv.com/asciiebcdic.htm
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
An editor has reviewed this edit and fixed any errors that were found.
- If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
- If you found an error with any archives or the URLs themselves, you can fix them with this tool.
Cheers.—InternetArchiveBot (Report bug) 14:41, 15 September 2017 (UTC)
- The first of those is on Bemer's own web site, as linked to elsewhere on the page; I've fixed it to point there (and used {{cite web}} for it).
- The second of those doesn't work; machines often don't work when fetched from the Wayback Machine. I removed that link. Guy Harris (talk) 17:21, 15 September 2017 (UTC)
"Succeeded by UTF-16"
[edit]The infobox claims that EBCDIC was "succeeded by UTF-16". A citation has been requested, with the edit comment "add cn on succeeded by UTF-16 in infobox -- it is non-trivial to source (I have not found one yet); and i'm interested in more detail such as when it was succeeded by utf-16."
I'm curious in what fashion it was succeeded by UTF-16.
Deciding whether to store data as UTF-8 or UTF-16 on IBM's Web site says that DB2 supports either encoding, but that "COBOL and PL/I on z/OS use UTF-16 for Unicode data. Neither language supports UTF-8."
And they also say on the "Unicode on IBM i" page that "The IBM® i operating system provides support for Unicode.", although "Mapping of data" says "The IBM® i operating system uses the EBCDIC encoding scheme. However, not all clients attached to the system use an EBCDIC encoding scheme to store, retrieve, and process data. Therefore, some clients use Unicode as an exchange mechanism that is safe across all platforms."
So perhaps 1) IBM's adding support Unicode even in their EBCDIC-based OSes (as opposed to their ASCII-and-extended-versions-thereof-based AIX, and as opposed to the similarly ASCII-and-extended-versions-thereof-based Linux on both IBM Z and IBM Power Systems) and 2) when they're not constrained to use UTF-8, they're choosing UTF-16.
However, are they completely switching to Unicode (in some encoding), so that, for example, a z/OS data set catalog can have entries in Unicode? Can system services that take character strings as arguments take Unicode strings (in some encoding) - in particular, for services that formerly took EBCDIC strings, are there versions that take Unicode strings?
And are they uniformly using UTF-16 as the encoding, except for cases where they're constrained to use UTF-8 (such as for most Internet protocols, and for whatever they call their UNIX environments on IBM i and z/OS these days)? Guy Harris (talk) 03:11, 16 December 2018 (UTC)
- Absolutely not. I can assure you 100% that the primary character set of IBM z/OS is still EBCDIC and is likely to remain so for the foreseeable future. It is so basic to the environment that I am having trouble finding a reference to cite -- it seems to be just kind of assumed, but I assure you that working in z/OS every day I work primarily in EBCDIC. Charlesm20 (talk) 13:43, 17 June 2019 (UTC)
- Here is something of a reference: [1] Charlesm20 (talk) 13:54, 17 June 2019 (UTC)
- Somebody removed the claim that it was succeeded by UTF-16 in this edit. I've seen nothing to indicate that we should put the claim back, and your comments further support its removal. Guy Harris (talk) 17:13, 17 June 2019 (UTC)
Question regarding invariant character set
[edit]Is not colon (EBCDIC 7a, ASCII 3a) part of the EBCDIC invariant character set? It is included here https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_71/nls/rbagsinvariantcharset.htm which would seem to me to be authoritative.
Apologies if I am not doing this edit correctly. It's about my third edit in about 20 years.
Charlesm20 (talk) 22:39, 13 June 2019 (UTC)
- That looks like a pretty authorative reference, probably should be used to update the table and add as a citation.Spitzak (talk) 02:36, 14 June 2019 (UTC)
- Heck, the EBCDIC chart in Appendix F of Form A22-6821-0, IBM System/360 Principles of Operation - no date, but the "-0" suggests this is the first edition - has 0x7a for colon, so that dates back a while. I'm guessing that only the empty spaces in that chart are eligible to be different between different EBCDIC code pages. Guy Harris (talk) 03:47, 14 June 2019 (UTC)
- It does date back. So do I. I basically started my coding career with that manual. The date would be ~1964. I have been working with EBCDIC basically my entire career. No, some of those EBCDIC characters did not survive the sixties. And from bitter experience cent and exclamation point are DEFINITELY not invariant. When IBM discovered there were places that did not use the American alphabet they stole some of those spots for British pound signs and so forth -- that is how we ended up with variant EBCDIC. Charlesm20 (talk) 13:33, 17 June 2019 (UTC)
- The article says the invariant character set is "characters that should have the same assignments on all EBCDIC code pages" - emphasis theirs, not mine. I don't know whether an emphasized "should" means "should have, but don't have in practice" or what, so I'm not sure what should :-) be in the invariant character set by that definition. Should :-) we change the definition to "that have the same assignments", and either make sure it's true on all the code pages whose definitions we can find or find an IBM reference giving the invariant character set? Guy Harris (talk) 17:18, 17 June 2019 (UTC)
- If you remove all of the characters that don't change in any set called "EBCDIC" I think you will remove all the letters, so probably not. I like the idea of using the reference above to fix the colors in the table to match what that document claims is invariant.Spitzak (talk) 22:00, 18 June 2019 (UTC)
Lower-case letters, at least; EBCDIC 290 moves the lower-case letters, although it doesn't move the upper-case ones. But EBCDIC 290 does, at least, have colon as 0x7A. Guy Harris (talk) 23:45, 18 June 2019 (UTC)
36-bit machines more likely to adopt ASCII than EBCDIC?
[edit]The article speaks of 36-bit machines having 5 7-bit ASCII characters per word where, presumably, they'd have to go with 4 8-bit characters per word if they used EBCDIC.
The PDP-10, running either TOPS-10 or TENEX, did store ASCII in that fashion (with TOPS-10, at least, also using a 6-bit ASCII-derived subset, SIXBIT). Multics on various 36-bit GE and Honeywell machines, however, stored 4 9-bit bytes per word, with ASCII characters in those bytes, and the UNIVAC 1100/2200 series running EXEC 8 through OS 2200 apparently also went with 9-bit ASCII or 6-bit FIELDATA.
So 5 7-bit ASCII characters per word wasn't universal with newer (post-ASCII) 36-bit machines (the older 36-bit machines, such as the IBM 700/7000 machines, didn't adopt ASCII because it didn't exist yet). DEC may have picked ASCII because it only required 7 bits, but also may have picked it because it was the character encoding used by Teletype models such as the Teletype Model 33, as those were often used as terminals. Guy Harris (talk) 03:21, 25 May 2020 (UTC)
- A good explanation, I think the PDP-10 as with other machines of that time used ASCII where the interface to teletypes had become important. The bit saving idea sounded very odd to me (even though as an assembler programmer of the time, we were always trying to save bits in memory and storage). Indeed most machines of this period were using 6 bit bytes and 36 bit words which is far more sensible SO LONG AS YOU ONLY WANT CAPITALS lol. Brian R Hunter (talk) 01:20, 26 May 2020 (UTC)
How is this a sentence fragment?
[edit]A {{sentence fragment}} tag was added to
IBM AIX running on the RS/6000 and its descendants including the IBM Power Systems, Linux running on IBM Z, and operating systems running on the IBM PC and its descendants use ASCII, as did AIX/370 and AIX/390 running on System/370 and System/390 mainframes.
but I see a subject, verb, and object in the first clause - the subject is "IBM AIX running on the RS/6000 and its descendants including the IBM Power Systems, Linux running on IBM Z, and operating systems running on the IBM PC and its descendants", i.e. a list of operating systems, the verb is "use", and the object is "ASCII" - and I see a subject, verb, and implied object in the second clause - the subject is "AIX/370 and AIX/390 running on System/370 and System/390 mainframes", the verb is "did" as in "did use", i.e. "used to use" (because those OSes are unlikely to still be used), and the implied object is "ASCII" again, as per "as".
So what makes any of that a sentence fragment? Guy Harris (talk) 20:23, 5 July 2021 (UTC)
- I can understand why the complainant thought so, because the sentence is so overloaded with verbiage that its structure is difficult to decipher. Fundamentally, ASCII is a software function so the hardware references are just a background noise that is drowning out the signal. How about
- Problem solved. --John Maynard Friedman (talk) 23:22, 5 July 2021 (UTC)
- Now we need to solve the "what about Linux?" problem. :-)
IBM AIX, Linux on IBM Z, Linux on Power, AIX/370 and AIX/390 all use ASCII.
- IBM no longer makes x86 boxes, so we can probably leave the PC operating systems out.
- Then again, "IBM AIX" includes more than just the one remaining AIX, so perhaps just
IBM AIX, Linux on IBM Z, and Linux on Power all use ASCII.
- Guy Harris (talk) 00:44, 6 July 2021 (UTC)
- Works for me, though it could be argued that Linux has always and only used ASCII or Unicode, so how is it relevant? IBM-World Rules, I suppose. --John Maynard Friedman (talk) 11:26, 6 July 2021 (UTC)
- It could also be argued that various versions of AIX have always and only used ASCII or various flavors of extended ASCII (including UTF-8), so how are they relevant?
- For the S/3x0 machines, UN*Xes (whether AIX UTS, Linux, or the never-shipped(?) Solaris port) are exceptions, as most other OSes on them use EBCDIC, so that makes them equally relevant.
- For POWER/PowerPC/Power ISA machines, they didn't run any OSes using EBCDIC until the AS/400 switch from IMPI to PowerAS, and those were a separate line of machines from the RS/6000 line that ran a version of AIX; however, with the IBM Power Systems, the lines running AIX and Linux (IBM System p) and running OS/400 (IBM System i) were merged, so perhaps more relevant. Guy Harris (talk) 17:56, 6 July 2021 (UTC)
- Works for me, though it could be argued that Linux has always and only used ASCII or Unicode, so how is it relevant? IBM-World Rules, I suppose. --John Maynard Friedman (talk) 11:26, 6 July 2021 (UTC)
- Which I guess risks getting bogged down in hardware again because the question only arises because IBMers assume that if is is S360 or S370, it must be EBCDIC. Let's just go with your last text and tiptoe quietly away. --John Maynard Friedman (talk) 22:40, 6 July 2021 (UTC)
- Done. Do we need to mention UTF-8 as well? (Other flavors of extended ASCII are supported, but AIX and Linux probably mostly use UTF-8.) Guy Harris (talk) 02:39, 7 July 2021 (UTC)
- I think the reason the sentence was there and why this particular subset of non-EBCDIC-using systems was listed was because they are IBM products. The point is that IBM makes some stuff that does not use EBCDIC. I added what I hope is the minimal amount of text needed to indicate why anybody is listing these things.Spitzak (talk) 03:25, 7 July 2021 (UTC)
- Done. Do we need to mention UTF-8 as well? (Other flavors of extended ASCII are supported, but AIX and Linux probably mostly use UTF-8.) Guy Harris (talk) 02:39, 7 July 2021 (UTC)
- Which I guess risks getting bogged down in hardware again because the question only arises because IBMers assume that if is is S360 or S370, it must be EBCDIC. Let's just go with your last text and tiptoe quietly away. --John Maynard Friedman (talk) 22:40, 6 July 2021 (UTC)
New character chart format
[edit]@Spitzak: I liked the old format for the character chart much better. I thought it was a lot more readable than the current version.Peter Flass (talk) 20:23, 17 November 2021 (UTC)
- Can you explain exactly how? The whole point of doing this is to make the tables readable and to match the style used elsewhere in Wikipedia.Spitzak (talk) 20:30, 17 November 2021 (UTC)
- I thought the colors made it more readable, and the stippling detracts from it. I know this table has been the subject of a lot of back-and-forth, but I thought it had been gotten into pretty good shape. Peter Flass (talk) 02:30, 18 November 2021 (UTC)
- The people doing the Unicode block charts were rather insistent in the dotted boxes, I tried a version without them first.Spitzak (talk) 02:54, 18 November 2021 (UTC)
- Yes, because the dotted boxes carry semantic information in Unicode charts (and isn't limited to control characters). I'm not convinced they have to be used for these character charts though. There's an argument for consistency between the two types of charts but another syntax (like parenthesis in IBM's charts) could work. I would oppose removing them from the Unicode block charts, of course. DRMcCreedy (talk) 04:32, 18 November 2021 (UTC)
Humor
[edit]As a note, I'm not sure that "The bank argued in part that it could not comply because its computer system was only compatible with EBCDIC, which does not support umlauted letters." is, strictly speaking, correct. Or at least, while the bank may have argued that, it isn't true. The only OS in widespread use that use EBCDIC these days is z/OS, and it absolutely supports umlauts. The bank may not have wanted to update its programs to use something more recent, but the computer system does support UTF characters, and ASCII, too. 173.62.118.144 (talk) 16:56, 2 May 2024 (UTC)
- You haven't considered the possibility that the bank may still be running its system on OS/360 . --𝕁𝕄𝔽 (talk) 17:34, 2 May 2024 (UTC)
- I think IBM i 1) is in somewhat common use and 2) uses EBCDIC.
- But, yeah, the bank's argument seems fishy. The first reference for the case in question has some machine-translated text, and the translation mangles it, so I'll ask Google Translate to translate bits of the Dutch (Flemish?) original (Google Translate won't translate the PDF, so I'll do it by copying and pasting), and attempt to fix up obvious bogosities in the translation:
- The current application for managing customer data of Bank X was put into use in 1995 and still runs on an American-made mainframe system. This system only supported EBCDIC ("extended binary-coded decimal interchange code"). This is an 8-bit standard for storing letters and punctuation marks, developed in 1963-1964 by IBM for their mainframes and AS/400 computers. The code stems from the use of punch cards and had the following characters:
- {old EBCDIC code table}
- It is for this reason that all our customer names are stored in capital letters and there are no accented letters because the latter were not recognized by the system. Accented letters have since been added to EBCDIC, but this was not included in updates to the customer data application. In the near future, Bank X will be moving away from the current application, as well as from the mainframe system and this new one environment will certainly be able to deal with letters with accents.
- So:
- They at least acknowledge that there are EBCDIC code pages that can support accented letters;
- The bank's application software wasn't modified to support them;
- so that's only a problem with EBCDIC in that the application originally was limited by the EBCDIC of the time when it was originally developed. EBCDIC itself was modified to handle them, but the app wasn't updated to support other code pages, so that is a problem with the developers of the application (the bank itself, or some organization from whom they got the application), not with EBCDIC or the OS. Guy Harris (talk) 22:14, 2 May 2024 (UTC)
- I rephrased it to pu the blame on the bank's software and the fact that it only handle EBCDIC Classic, not that EBCDIC itself couldn't handle scented characters. Guy Harris (talk) 01:25, 3 May 2024 (UTC)
Does EBCDIC support accented letters or not?
[edit]EBCDIC § Code pages with Latin-1 character sets indicates that there are code pages that supported at least some "country-specific character repertoires" in the past and were extended to support ISO 8859-1, which suggests at least some level of support for accented letters.
EBCDIC § Humor claims that the bank that was accused of a GDPR violation because a customer's name couldn't include properly-accented versions of some letters in the name response "included the fact that their system used EBCDIC, as well as that it did not support letters with diacritics (or lower case, for that matter)." the GDPRHub page about this case has a mangled machine translation of the original case information in Dutch (Flemish?). Google Translate won't translate the PDF if you hand it the URL, and, if you just copy and paste the text of the bank's response, it also mangles the translation (the phrase "De Bank X" appears to mess it up; I'm not sure why it can't translate it as "The Bank X", or just "Bank X" - the "X" is presumably to preserve the banks' anonymity, just as the references to the customer as "Y" is presumably meant to preserve their anonymity). The mangled translation says:
It is for this reason that all names of our customers are stored in capital letters and there no letters with accents are present because the latter were not recognized by the system. Letters with accents were added to EBCDIC in the meantime, but this became not included in customer data application updates. Letters with accents were added to EBCDIC in the meantime, but this became not included in customer data application updates.
I added a comment to the GDPRhub page giving the result of an attempt to improve the translation by using various tricks to work around Google Translate's problems. The fixed translation says
It is for this reason that all our customer names are stored in capital letters and there are no accented letters because the latter were not recognized by the system. Accented letters have since been added to EBCDIC, but this was not included in updates to the customer data application.
In both versions, the bank does note that EBCDIC now can support accented letters (probably by, in their case, choosing a code page that supports, at minimum, French accented letters), but that their application was not updated to allow that. The improved translation may make this clearer.
It appears that, having made a GDPRhub account, I could edit the machine translation myself. Not knowing Dutch (other than "de", "het", and words that are sufficiently close to the English equivalent :-)), I considered that to be above my page grade :-), so I leave it up to Somebody Else there to fix it.
That also means that the page is a Wiki, so maybe it's not a WP:RS, and that the original page should be used as a reference - perhaps preferably by somebody who knows Dutch, so that, if "manually shoving it through Google Translate to try to get a non-mangled translation" is considered original research, they can post it as a reference (possibly with a translated quote). Guy Harris (talk) 09:55, 5 May 2024 (UTC)