Jump to content

Talk:Comparison of archive formats

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Purpose (Statement of Intent)

[edit]

This article has no defined purpose or statement of intent. It is also in duplication/competition with the article List of archive formats. Request discussion on the purpose and intent of this aub-article to differentiate it from other sub-articles.

Improvements to be made

[edit]

I visited the main article and noticed something which needs improving. The comparison table needs changing somehow. The 'Introduced in' column needs updating so that when arranged by this column, it does it by date. currently, because there is no standard for this, it arranges by letters and other numbers. Ideally, the date should be the first thing in the column with any other details after it. Also, when arranging by a column, some of the rows have duplicate column headers (Format, Filename Extension, Created by etc..). These should be removed or excluded from the sort so they don't sit in the middle of the table after you have arranged by 'introduced in' for example. Try arranging by the 'introduced in' column and you will see what I mean. I am not knowledgeable enough on wiki code to perform these changes. Gicronin (talk) 11:00, 2 August 2011 (UTC)[reply]

Note: I modified the "introduced in" column entries so the year appears first. This makes sorting work better. Acorn (Nathan) (talk)

Rationale

[edit]
  • ISO 9660 - It is not strictly an archive format, and more commonly referred to as a file system, but it can be used for archiving, and that's why I included it. --Boborok 16:51, 26 March 2006 (UTC)[reply]
ISO-9660 is an archive format in the same way that FAT-16 is an archive format. It's a filesystem. ".iso" files are just raw block images, usually of a CD-ROM, and they may or may not contain an ISO-9660 filesystem (e.g. there are HFS CD-ROMs). It's worth listing on any page of archive formats for the simple reason that people encounter them frequently, but it's a little strange to refer to it as an archive format. Fadden0 (talk) 20:56, 30 March 2016 (UTC)[reply]

List vs. Comparison

[edit]

There is nothing wrong with having two separate articles, however they should not contain overlapping data points. At a high level a list of archive formats would be just that: extension, major OS, possibly type of algorithm, multi-file vs single file format, split capabilities, etc. the bare minimum to identify an archive format such that a person trying to find a format based on some high level criteria could find it. A comparison would provide meaningful comparisons between different formats: how does one compare to the other with regards to time, compression, etc. What are the trade-offs between archive format XYZ vs ABC when compressing benchmark data? How does format MNO compare to format JKL when they are fed numerical vs. alphabetic data? Etc. tendim 02:19, 5 October 2010 (UTC) —Preceding unsigned comment added by Tendim (talkcontribs)

Merge. These two articles are highly redundant. The "list" contains a great deal of comparison. - Frankie1969 (talk) 14:13, 17 September 2011 (UTC)[reply]
p.s. @Tendim, the type of comparisons you seek are highly implementation-dependent (version, architecture, algorithm, etc) and unlikely to be maintainable.
Merge. I was very surprised to see two articles on this. I think List should be merged into Comparison. -- BlindWanderer (talk) 18:09, 5 September 2015 (UTC)[reply]

archive FORMAT ???

[edit]

how can we compare archive format...that's impossible and very limitated why not a larger article including:

  • filesystem with compression such as fat,ext2,squashfs,cramfs,cloop,ziofs
Because file systems are not archive formats. They have their place.tendim 02:22, 5 October 2010 (UTC)
  • compare the compression algoritm and their implementations(that can difers a lot,for example bzip2 have different implementation that have different results(an example is bzip2 that have several implementation with technical differences such as blocksize but are compatible with the standard implementation))
Comparing compression algorithms would not be in line with archive formats. Archive formats are users of compression algorithms. tendim 02:22, 5 October 2010 (UTC)


but first we can present the metods of compressions because some can't be included in a comparison such as hard-links but such thing is obviously a way to save space... —The preceding unsigned comment was added by 213.189.165.28 (talkcontribs) 14:03, 19 April 2006 (UTC).[reply]

mabe we just need to find a new name to this page —The preceding unsigned comment was added by 00 tux (talkcontribs) 14:21, 19 April 2006 (UTC).[reply]
Hard links have nothing to do with archiving. tendim 02:22, 5 October 2010 (UTC)
"How can you compare archive formats?" It's not only possible, but this page is an example of how it can be done. Guy Harris 21:54, 19 April 2006 (UTC)[reply]

File system comparison?

[edit]

Why is that here, rather than on Comparison of file systems? Guy Harris 21:50, 19 April 2006 (UTC)[reply]

Purpose of this article

[edit]

Alright, I spent a good two and a half hours trying to begin the process of cleaning this article up. Considering the subsequent reversion, I feel that it is necessary to address the purpose of this article.

First, I feel it is necessary to understand that an archive is a file that contains other files. This explicity includes any formats that fufill at least this purpose, which therefore means formats designed solely for archiving, for archiving and compression, and for software distribution/packaging. As such, I have reverted the article to my changes, omitting the compress-only formats.

I feel, however, that it is pertinent to leave those formats in place, because archive-only formats almost always use one of them. This is restricted, of course, to those formats that function as wrappers (bzip2, gzip, compress, rzip, etc.). A more detailed list of compression algorithms is needed, yes, but it does not mean this page cannot also include a brief mention of related formats—the archive-and-compress formats inherently mention a compression algorithm, anyway!—Kbolino 06:02, 5 May 2006 (UTC)[reply]

Yes. but this also means that such formats look more limited than they are in real life. For example, tar is almost exclusively used together with gzip/bzip2, and these compressions add integrity checking (but the the table says tar has partial integrity checking). --Crashie 18:06, 25 February 2007 (UTC)[reply]

Long filenames

[edit]

We need very detailed information about how each archive format handles long filenames. Since these formats are used to transfer files, perhaps between very different OSes, what happens to the names? What if the name is not valid in the receiving system? What about FAT LFNs, which is really a dual-name system? Is only the LFN stored in the archive, the 8.3 is not stored? Then the 8.3 is regenerated by a receiving FAT, and thus may change? What if the archive is made on a Linux OS with native long name support, and there are two files that differ only in case, and the archive is received in a FAT LFN system that does not allow duplicate filename that differ only in case? 69.87.203.23 23:36, 22 January 2007 (UTC)[reply]

This is a feature of the program used to process the archive format, not of the format itself. One extraction program might overwrite files with similar names, another fail to extract them, yet another rename them, yet another encode the extra information by adding extra characters according to some reversible rule. Thus it cannot go in this table. 90.184.187.71 (talk) 17:30, 29 September 2010 (UTC)[reply]

Technical limitations

[edit]

The most important technical aspects are missing:

  • The longest possible archive filesize
  • The maximum amount of files contained in the archive
  • The longest possible filename contained in the archive
  • Other special features (like: file recovery records, storage of file attributes, etc.)

--Loh 12:18, 24 April 2007 (UTC)[reply]

Capabilities

[edit]

Questions that should be answered include:

  • Is there archive-level metadata (e.g. ar but not tar)? Is there per-file metadata (both)?
    • Filesystem metadata includes, among others: usernames, groups, permissions and POSIX ACLs.
  • Is a hierarchical file structure supported (e.g. tar) as opposed to only base filenames (e.g. ar)? —Preceding unsigned comment added by 194.81.223.66 (talk) 10:37, 3 June 2010 (UTC)[reply]
  • Does the archive format support special (non-data) file types, such as symbolic links, FIFOs and sockets?
  • Are hard links supported?

Idea for new column

[edit]

I've got an idea for a new column: how big is the archive if you packed ###mb of data into it (e.g 100mb). People can then use the data to make a decision on whether to use one format or another, depending on the packed size. --Stinkfly 19:33, 4 August 2007 (UTC)[reply]

Firstly, this depends on the compression algorithm format used rather than the archive format. Some archives formats do not compress, and some archive formats (such as ZIP) support more than one compression algorithm. Therefore such information would be better off in an article about compression algorithms rather than archive formats. Secondly, even for a given algorithm, the compressibility depends heavily on the type of content used; no compression algorithm can reduce the size of all files (or else you could compress any file down to a single bit just by re-compressing over and over again), and of those they can reduce the size of, it depends heavily on the file. Even if you were to measure on the same file for every compression algorithm, it would be an unfair comparison, since different compression algorithms are designed for different types of files. mmj (talk) 06:15, 17 September 2008 (UTC)[reply]

Opening Improvements

[edit]

Since this article is flagged for improvement, I added an introductory section with explanations of the table contents. I believe that the Integrity column should be renamed Error Detection and that the Recovery column should be renamed Error Correction. These are the proper technical terms. Before you go adding a bunch of new columns (max size, max files, filename size, special features, compressibility, etc.) I believe that you'll need to fill in all the existing question marks before adding a fleet of new ones. Carl Gusler 22:55, 4 August 2007 (UTC)[reply]

I agree the Integrity and Recovery column names could use some clarification, but I'm not sure I'd say those are proper technical terms. They're more precise probably. The phrase "integrity checking" is used pretty darn often, more than I personally hear "error detection", though they're of course two sides of the same coin. Since this is about archive formats (and not the behavior of any particular programs reading or writing them), another thing to consider is simply putting it in terms of what the format actually stores--e.g. a checksum. (In such a column you could see "MD5, per-file", "SHA-1, per-archive", etc.) Technically, whether or not that's used to detect errors (and at what resolution) is up to the software reading the format. What the format itself stores is important because it tells you the fundamental limits on what programs can do, and I argue more important for this comparison of formats because that info wouldn't come in any other. Leave discussion of what they do do to a comparison of archive programs. The first new columns added should be products of making the existing columns more precise (e.g. separating currently conflated aspects), in my opinion. --MilFlyboy (talk) 21:01, 8 October 2010 (UTC)[reply]

Todo

[edit]

Somebody should mention the XAR format, since it's now officially being used by Apple (in their mkpg archives). —Preceding unsigned comment added by 76.90.68.218 (talk) 11:36, 19 November 2007 (UTC)[reply]

Encryption - per-file or per-archive

[edit]

Some archive formats which support encryption do the encryption on a per-file basis, so that the file contents are encrypted but the directory of files, including the filenames, dates, sizes, positions within the archive etc, is not encrypted. Others encrypt the whole filesystem, so that the filenames, sizes, positions etc are encrypted. It would be nice if this list distinguished between the two, where this information is known. mmj (talk) 06:18, 17 September 2008 (UTC)[reply]

Split archive files

[edit]

Should there be mention of which of these formats incorporate support split archive files? This is not automatic. For example, split ZIP files created by one program often cannot be read by another, even though ZIP is apparently an open standard. I believe the same problem applies to other compression/archive formats as well. —Preceding unsigned comment added by Chris313 (talkcontribs) 02:22, 4 October 2008 (UTC)[reply]

popularity

[edit]

We should have a ranking on popularity: what are the most popular archive formats, and by what margin? --Piotr Konieczny aka Prokonsul Piotrus| talk 21:40, 16 December 2008 (UTC)[reply]

I disagree. The popularity of an archive format is highly dependent on the operating system and time period. ZipDisk is one of the de-facto standards for Commodore file compression, but so is .d64.gz, but they both have their place and time. SIT is still incredibly popular on the Mac even though the DMG "format" would seem more popular because that is the flavour of the day for releasing installer packages. tendim 02:14, 5 October 2010 (UTC) —Preceding unsigned comment added by Tendim (talkcontribs)

Time storage

[edit]

It would be interesting to compare how the different formats store – and restore – the various date-and-times of a file:

  • flavors: which times are stored (modification, creation, access...)
  • resolution: second? millisecond?
  • time zone: is the time zone stored? or is the time stored in UTC? What happens when the TZ of the target system is ≠ from the source system?
  • ...

--Arnauld (talk) 09:03, 29 September 2009 (UTC)[reply]

The "Unicode Filenames" is rather bogus

[edit]

Whether a filename "is Unicode" depends on many things. I assume what was meant by "Unicode filenames" was "can store Unicode using the wide characters as done by Windows filesystems". Which is one way of doing it, don't get me wrong, but e.g. UTF-8 does the same thing differently, and it more widely understood. —Preceding unsigned comment added by 24.147.238.142 (talk) 00:34, 31 March 2010 (UTC)[reply]

No the "Unicode Filenames" means: "Can store filenames that use any combination of Unicode characters even those not in the current or some specific locale". Examples: .ARJ files can only store filenames that fell within the OEM/BIOS character set of the current locale, so it gets a NO. .CAB files support UTF-8 encoded unicode filenames up to 255 bytes long and gets a YES. .MSI files are restricted to filenames in a Windows SBCS or DBCS character set which must be the same for the entire package and gets a NO. 90.184.187.71 (talk) 17:27, 29 September 2010 (UTC)[reply]

"Modification Date Resolution"

[edit]

What that? Wikipedia needs more info on it. --194.219.178.124 (talk) 22:00, 28 May 2010 (UTC)[reply]

This is how much the timestamps on files are rounded off before that information is stored in the archive (thus the rounding cannot be undone when extracting). Examples: Formats that store time as the number seconds since some date have a resolution of 1s, as do files that store Year-month-day-hour::minute::second as individual numbers. Formats that store time in the classic FAT file system encoding have a resolution of 2s. Formats that store time with subsecond fractions up to some number of decimal places have smaller resolution, such as 1ms, 1 microsecond, 1 ns or even better.
For instance if a file on an NTFS disk has a timestamp of "2010-10-29T17:35:53.1234567" but is compressed into a format with 1s resolution, the format will only record that the timestamp was approximately "2010-10-29T17:35:53", a format with a resolution of 2s would only record that the timestamp was approximately "2010-10-29T17:35:54", a format with a resolution of 1ms would record it as approximately "2010-10-29T17:35:53.123" and one with a resolution of 1ns would record it as "2010-10-29T17:35:53.123456700" 90.184.187.71 (talk) 17:39, 29 September 2010 (UTC)[reply]

Verifiability (need references)

[edit]

This article needs verifiability. If a file format doesn't have a wiki link or reference, then its should be deleted. Please add a wikilink or references for the entries that are missing them! • SbmeirowTalk22:07, 13 October 2014 (UTC)[reply]

Merge request

[edit]

I suggest that the best part of this article is merged with Comparison of file archivers. This article is about archiving algorithms, the other article has that as well though not everything from this. The two together would be more complete but now they rather compete. Simon Grönlund (talk) 12:34, 18 November 2014 (UTC)[reply]

(1) If you propose it, then you need to propose the title of the new article name, since the other article would no longer be just file artchive software. (2) You need to use the merge request template on both articles, and ask the same question in both articles and point to ONE discussion location. The answer is NO until you do this step. • SbmeirowTalk23:11, 18 November 2014 (UTC)[reply]
The current format comparison article is not useful as it does not really compare formats. I would prefer to see an enhanced article first and then it may be obvious that this enhanced article should better be left separate. Schily (talk) 11:29, 19 November 2014 (UTC)[reply]

Adding the entry "Support for Alternate Data Stream" in the comparison table

[edit]

I would like to see the entry "Support for NTFS's Alternate Data Stream" in the comparison table. As far as I know, WinRAR supports it, while most other archivers don't. — Preceding unsigned comment added by 41.248.170.43 (talk) 07:47, 1 March 2015 (UTC)[reply]

There are lots of things that could be added, but the table would get pretty wide and hard to view. • SbmeirowTalk13:14, 1 March 2015 (UTC)[reply]
Lets add it. WHEN the table becomes to wide we can reformat it into two tables. -- BlindWanderer (talk) 18:07, 5 September 2015 (UTC)[reply]

If this would be added, it definitely should not be named "alternative datastreams", as the implementation from Microsoft is just a subset of the extended attribute file feature that was added into Solaris in August 2001 after Microsoft tried to get their non-posix compatible implementation into POSIX. Since extended attribute files are part of the NFSv4 standard, many OS would need to implement them in order to become NFSv4 compliant. Schily (talk) 11:21, 2 March 2015 (UTC)[reply]

Request: separate page for package formats

[edit]

Software packaging is a topic going far beyond archiving. Considering all its particular aspects deserves a separate Wiki article. The package format section here would be completely redundant to what the new[?] article has to say about software packaging.

My proposal is, to make our package format section the starting point for a software packager comparison article [if none such article exists yet]. — Preceding unsigned comment added by 195.146.229.12 (talk) 00:19, 4 June 2015 (UTC)[reply]

Self-Archiving with 7-zip

[edit]

In the section of the article titled "Features", the self-extracting column indicates that 7-zip does not offer this solution. I cannot confirm this feature is in 7-zip myself, but on the 7-zip page of Wikipedia, the feature is listed. See: https://en.wikipedia.org/wiki/7-Zip#Features . Maybe the table is referring to some other feature that is not exactly the same, but at a minimum it could be made clearer if it is the case. 207.253.195.17 (talk) 15:43, 23 September 2016 (UTC)[reply]

[edit]

Hello fellow Wikipedians,

I have just modified 2 external links on Comparison of archive formats. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 16:48, 11 August 2017 (UTC)[reply]

Proposed merge with List of archive formats

[edit]

As per previous discussion there is too much overlap here. The article Comparison of archive formats is more mature than List of archive formats, given the possibility that one might accidentally be recreated, I would recommend merging and then producing a redirect on one of their names. Ethanpet113 (talk) 03:48, 25 January 2019 (UTC)[reply]

  • Merge. The "comparison" article does a poor job categorizing compress-only, archive-only, and compress-archive-combined formats; and it also has a completely citation-absant "Features" section. I suggest use "List of archive formats" as starting point, adding in info such as filename character set, and permissions and attributes (and perhaps other columns in Comparison of file systems), rename it to "Comparison of -", then redirect the "List" article. Dannyniu (talk) 12:48, 19 July 2019 (UTC)[reply]
  checkY Merger complete. Klbrain (talk) 12:06, 8 February 2020 (UTC)[reply]