User:Thunderbird2/The case against deprecation of IEC prefixes

The case against deprecation

Background information

Decimal
Value	Metric
1000	kB	kilobyte
1000²	MB	megabyte
1000³	GB	gigabyte
1000⁴	TB	terabyte
1000⁵	PB	petabyte
1000⁶	EB	exabyte
1000⁷	ZB	zettabyte
1000⁸	YB	yottabyte
1000⁹	RB	ronnabyte
1000¹⁰	QB	quettabyte

Binary
Value	IEC		Memory
1024	KiB	kibibyte	KB	kilobyte
1024²	MiB	mebibyte	MB	megabyte
1024³	GiB	gibibyte	GB	gigabyte
1024⁴	TiB	tebibyte	TB	terabyte
1024⁵	PiB	pebibyte	—
1024⁶	EiB	exbibyte	—
1024⁷	ZiB	zebibyte	—
1024⁸	YiB	yobibyte	—
—
—

Orders of magnitude of data

In most contexts the SI prefixes kilo-, mega- and giga- mean 1 thousand, 1 million and 1 (short scale) billion, respectively, as in one kilogram = one thousand grams, one megajoule = one million joules and one gigawatt = one billion watts. In symbols: 1 kg = 1,000 g; 1 MJ = 1,000,000 J; 1 GW = 1,000,000,000 W.
In computer science the units kilobyte, megabyte and gigabyte (symbols kB, MB and GB) were originally used in this standard decimal sense to mean 1,000 and 1,000,000 and 1,000,000,000 bytes, respectively. In symbols: 1 kB = 1000 B; 1 MB = 1000² B; 1 GB = 1000³ B.
However, in modern use (and depending on the context), the same three symbols sometimes have a binary meaning. The binary definitions of these three symbols are 1 KB = 1024 B; 1 MB = 1024² B;^[1] 1 GB = 1024³ B. In this context it is customary to use an upper case "K" instead of the SI prefix "k", for kilo.
The computer itself does not account for the number of bytes using binary prefixes, but someone in the 1980s decided to report memory, file and HDD size in this manner. As such, the use of binary prefixes is only a convention. Altering this convention to agree with SI Prefixes such as in Apple's 2009 "Snow Leopard" release and Ubuntu could have been done at any time; however, it stuck this way for much of the computer industry.^[2]
For many applications (primarily the storage capacity of hard disk drives and data rates for telecommunications), the decimal convention is retained, whereby one kilobit is exactly one thousand bits and one megabyte is exactly one million bytes.^[3]
There are many WP articles in which the same symbol (eg MB) is used with two different meanings, often hopping between them in the same paragraph or section, sometimes even in the same sentence. This dual use creates confusion and a corresponding need to disambiguate.
These ambiguous usages are common beyond Wikipedia and have led to litigation.
Problems get successively worse with higher values prefixes tera- (1000⁴ vs 1024⁴), peta- (1000⁵ vs 1024⁵), etc. The highest value SI prefix for which a binary counterpart has been defined is yotta-, meaning 1000⁸. The corresponding binary prefix yobi- means 1024⁸ (≈1.21×10²⁴), which differs by 21 % from the conventional decimal interpretation of yotta-.
In December 1998, in an attempt to resolve the ambiguity the International Electrotechnical Commission (IEC) introduced a new set of prefixes kibi-, mebi- and gibi- for the binary meanings, with symbols Ki-, Mi- and Gi- so that 1 KiB (one kibibyte) = 1024 B, 1 MiB (one mebibyte) = 1024² B and 1 GiB (one gibibyte) = 1024³ B. In the IEC standard, the prefixes kilo-, mega- etc are reserved for their original decimal meanings.
In March 2005, the IEC prefixes were adopted by the Institute of Electrical and Electronics Engineers (IEEE) after a two-year trial period.
The use of IEC prefixes has been approved by national and international standards bodies, including, in addition to IEC and IEEE, the International Bureau of Weights and Measures (the standards body responsible for the SI system of units), the European Committee for Electrotechnical Standardization (CENELEC) and the US National Institute of Standards and Technology.
The binary prefixes defined by the IEC are now incorporated in the International System of Quantities (ISQ).
The alternative (binary use of SI-like prefixes) is deprecated by the same standards bodies.
Use of IEC prefixes in popular literature is rare, making them unfamiliar to many readers. Their use in scientific publications increased from fewer than 15 per year on first introduction to about 200 per year in the early 2010s, and about 600 per year in the mid-2020s: 1999-2001 (ca. 40 hits); 2002-2004 (60 hits); 2005-2007 (190 hits); 2008-2010 (380 hits); 2011-2013 (710 hits); 2014-2016 (1050 hits); 2017-2019 (1330 hits); 2020-2022 (1510 hits); 2023-2025 (1240 hits to date).

Why Wikipedia should not deprecate the use of IEC prefixes

IEC prefixes are unambiguous, succinct, simple to use and simple to understand.
The use of IEC prefixes is endorsed by national and international standards bodies.
The use of one symbol (e.g. GB) to mean two different things in the same article creates confusion and ambiguity. Despite this ambiguity, there are many WP articles in which kilobyte, megabyte and/or gigabyte are used in this way. In this situation, the IEC prefixes provide an ideal disambiguation tool because they are unambiguous and succinct.
Deprecation (of IEC prefixes) increases the difficulty threshold for disambiguation, reducing the rate at which articles can be disambiguated by expert editors.
In turn this reduces the total number of articles that can be further improved by less expert editors with footnotes etc (assuming that there is consensus to do so).
Deprecation is interpreted by some editors as a justification for changing unambiguous units into ambiguous ones.
Removing IEC prefixes from articles, even when disambiguated with footnotes, destroys a part of the information that was there before, because it requires an expert to work out which footnote corresponds to which use in the article.
In the long term, the use of IEC prefixes would ultimately avoid the need to use same symbol (e.g., MB) with two different meanings. This may sound like a pipe dream, but it could be implemented as a user preference, so that readers could choose between familiar (ambiguous) units and (unfamiliar) unambiguous ones.
The main argument for not using IEC prefixes is the unfamiliarity of, for example, the mebibyte (MiB) compared with the megabyte (MB). The unfamiliarity is not disputed, but is not relevant to disambiguation. The point is that disambiguation is rare and therefore all disambiguation methods are unfamiliar.
Alternative disambiguation methods are either cumbersome (i.e., exact numbers of bytes), difficult and time-consuming to implement in a manner that is clear to the reader (i.e., footnotes)^[4] or unlikely to be understood (i.e. exponentiation).

In conclusion, disambiguation is not easy, so it would be unwise to discard the simplest disambiguation tool at our disposal just because it is unfamiliar to some readers. The best disambiguation method has yet to be established, so it is premature to deprecate this one.

Footnotes

^ MB even has a third meaning, equal to 1000 KiB or 1,024,000 B
^ Snow Leopard changes how file and drive sizes are calculated
^ According to the LBA Count for IDE Hard Disk Drives Standard from the website of the International Disk Drive Equipment and Materials Association (IDEMA), there are 1,000,194,048 bytes (1,953,504 logical blocks x 512 bytes/logical block) per nominal gigabyte of hard drive storage.
^ This problem is illustrated by Address space layout randomization, which includes the confusing disambiguation footnote "Transistorized memory, such as RAM and cache sizes (other than solid state disk devices such as USB drives, CompactFlash cards, and so on) as well as CD-based storage size are specified using binary meanings for K (1024¹), M (1024²), G (1024³), ..."

[1] MB even has a third meaning, equal to 1000 KiB or 1,024,000 B

[2] Snow Leopard changes how file and drive sizes are calculated

[3] According to the LBA Count for IDE Hard Disk Drives Standard from the website of the International Disk Drive Equipment and Materials Association (IDEMA), there are 1,000,194,048 bytes (1,953,504 logical blocks x 512 bytes/logical block) per nominal gigabyte of hard drive storage.

[4] This problem is illustrated by Address space layout randomization, which includes the confusing disambiguation footnote "Transistorized memory, such as RAM and cache sizes (other than solid state disk devices such as USB drives, CompactFlash cards, and so on) as well as CD-based storage size are specified using binary meanings for K (1024¹), M (1024²), G (1024³), ..."

[1]

[2]

[3]

[4]

The case against deprecation

Background information

Why Wikipedia should not deprecate the use of IEC prefixes

See also

Footnotes