Talk:ISAM
This is the talk page for discussing improvements to the ISAM article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Merge with Isam
[edit]Regarding proposed merge with Isam (redundany, inaccurate title), please support/oppose and sign with ~~~~ :
- Support Tvh2k 12:08, 26 May 2006 (UTC)
It should obviously be merged. In fact the entry "Isam" is improperly titled. The acronym should be all uppercase.
- Support for the same reason. RossPatterson 23:06, 26 May 2006 (UTC)
- Support for reasons above. Thalter 16:45, 27 July 2006 (UTC)
- Support for reasons above. equinexus 23:20, 30 July 2006 (PST)
- Support -- no-brainer 84.12.245.134 14:46, 10 August 2006 (UTC)
- Support obvious Thelem 22:53, 23 August 2006 (UTC)
Corrections to Basic Concepts
[edit]In ISAM systems, all fields do not have to be fixed length, nor do records have to be fixed length. This includes Btrieve. One can have variable length fields (typically text strings) and, consequently, variable length records.
Additionally, the indexing does not have to be a hash table. See Knuth (The Art of Computer Programming, any edition) for an in-depth discussion of trees, BTrees (I assume that is how Softcraft named its first product - as a sort of geeky pun), and B*Trees, which have much better properties for indexing millions of records than hash tables - such as no collisions.
My main point here is that ISAMs do not require fixed length fields and records, nor do they require hash tables. Very many ways to skin this cat.
In particular, the user manual for Btrieve - and I used it in a production environment starting with version 2.xx or something until 6.15 running as an NLM on Netware - discussed parent and child nodes, and when it created such nodes and the circumstances under which it would populate them with record addresses. Additionally, for I/O efficiency,Btrieve organized the data records in pages of different sizes (always a multiple of 512) that was specified when the Btrieve file was first created.
As I recall, Knuth (?) states a properly constructed B*Tree can retrieve a single record from a table of 1 million records with only 3 disk reads...
RSzoc 12:21, 10 October 2007 (UTC)
Too Much Sweep with no References
[edit]The opening paragraph states that ISAM forms the "basis of almost all databases, relational or non-relational". Unfortunately, there is no citation for such a sweeping statement, and just knowing the database world some, I would find it difficult to be true.
- I agree - for instance the article later says DB2 uses VSAM, and Informix (part from SE) moved on to RSAM with the first release of Turbo/OnLine/DSA/Whatever-it's-called-now, nearly 20 years ago.Pterre 14:49, 25 October 2007 (UTC)
Why? Hmmm. Let's see. There are at least 4 different types of database types: Network, Hierarchical, Indexed-Sequential, and Relational. Additionally, there are variations within each type. Except for Index-Squential (ISAM) which had a specific implementation when originally developed by IBM that is not the same as other ISAM products, I don't know which, if any of the other types of databases - and there are probably more now, such as object-oriented - use ISAM as a fundamental data structure.
There is also the distinction between a database - which can be a collection of tables - and a particular method for organizing records in a file. That is why - I imagine - that "AM" part of ISAM stands for "Access Method", and it's not "DB" for database. ISAM is a method to organize records within a table, not necessarily a database per se....
Anyway, either a source citation for the sweeping statement or eliminating the last sentence of the firs paragraph would make this more accurate. RSzoc 00:36, 13 October 2007 (UTC)
Along the same lines, some vague sentence constructions could probably be tightened up and substantiated. Two specific examples (others exist):
- "Most databases now use..." makes one immediately ask, "when is now?"
- "...of the B-tree for this purpose" seems to refer to the restrictive relative clause of the previous sentence (...allows both sequential and keyed access to data), but that is not so clear.
(Miimno (talk) 17:00, 22 January 2014 (UTC))
Correction to the RMS shared File/Record Capability
[edit]From the article, "RMS, by default, allows only one reader or writer of a file because it does not provide a record locking mechanism and has no means of managing multiple writers to the same data record."
This is totally incorrect.
RMS on VAX/VMS was built on top of a lock manager, later a cluster wide distributed lock manager, its main purpose was to allow file and record sharing. Reference Wikipedia's own article on the VMS Distributed Lock Manager (http://en.wikipedia.org/wiki/Distributed_lock_manager).
RMS provided robust file and record level locking. It was common to have hundreds of users accessing a shared (typically indexed) file in large VMS Cluster in the days before relational databases. —Preceding unsigned comment added by 128.222.37.20 (talk) 15:54, 13 January 2009 (UTC)
Sequential Access?
[edit]Does "Indexed Sequential Access Method" imply "Sequential Access"?
IBM ISAM on tape was sequential access at the hardware level: was it sequential access at the logical level?
Foxpro and dBase used sequential access logical methods on top of random access at the OS level and at the hardware level.
In the MS DOS world, ISAM databases were called ISAM because of the logical methods they exposed.
Has this morphed somehow into using the label "ISAM" for any database system that uses OS-level record-locking or binary access, even though it is rare for any db system to use OS-level indexing or sequential access? 203.206.162.148 (talk) 03:22, 2 March 2009 (UTC)
Origins
[edit]Is there any citable evidence for the assertion that ISAM was originally developed by IBM? Generally, this article appears to conflate the general principles of indexed sequential file organization with the specific detail of the IBM implementation. Mhkay (talk) 15:47, 22 March 2013 (UTC)
Not Just For Main Frames Anymore i.e. Great for Windows(tm) Servers and Windows(tm) PC's
[edit]On my Windows 7 Home Premium PC with NTFS file system... I use: (ISAM) indexed relational "text" flat file databases (many 128-GIG flat files, or larger, with 100's of millions of records each, or even billions each), indexed by persistent external binary PERL DBM database files of key/value pairs, tied to in-memory PERL program hash tables at run-time, where the KEY is 1 or more fields and/or partial fields contained within the flat file records, and the VALUE is the byte offset location of a record, used for setting the position of the file pointer before performing READ/WRITE operations upon a record in any of the relational flat files. Custom word-to-code key mapping of data for compression/encryption is utilized, or formula-based MIMEbase64. Database native Windows(tm) GUI user-interfaces are written in ActiveState ActivePerl for Windows 5.26.1 (the latest) for accessing the shared flat file system.
Supported are: Unique Primary Key, Unique Alternate Keys, and Alternate Keys with Duplicates (1000's of duplicates, or many more), Parent-Child 1-to-Many record relationships, and Look-ups to remove redundant data from the databases (e.g. look-up part-description via part-number, etc.).
Indexes are built for arbitrary querying of the database upon the data contained within the fields of records.
NOTE: A previous poster mentioned records not having to be fixed-length. That is correct. You can have variable-length records. Fixed-Length are best for READ/WRITE databases since you want the byte offsets of records/fields to remain reliable/constant for editing the records "in-place". Var-length are great for READ ONLY databases because you can store perhaps 4 times as many records in the same 128-GIG or 256-GIG flat file as you can fixed-length. I am not sure the SIZE LIMIT on flat files for which random access is supported. I have gone as far as 128-GIG flat files and the random access to any record is instantaneous.
Some folks may not have tried ISAM relational indexed flat file databases for quite some time. Before moving recently from ActiveState ActivePerl version 5.6.1 to version 5.26.1, my flat files could only support random access to 4-GIG because of the integer limitation for seeking to records from top (pos. 2-GIG) or bottom (neg. 2-GIG) of file. With the integer limitation removed with PERL 5.26.1, the sky is the limit perhaps. PERL 5.26.1 allows you to at least use positive byte offsets to 128-GIG relative from top-of-file. I am glad, because I never like indexing from end-of-file backwards using a negative integer.
One nice way to index a file, if the file is READ ONLY, is to sort the file into logical/physical groupings of records, then index only the first record in the group and store with the byte offset the number of records within that group. That way you can set the position of the file pointer to the first record in the group, then READ into memory a large block of contiguous fixed-length records (in one READ statement) and parse them into an array in memory for processing in a foreach loop: foreach $record (@RECORDS) {do some stuff...}.
I have built a demonstration database which has loaded to it 7500 copies of the KJV Bible (528 byte fixed-length records/verses, 31102 verses, 1189 chapters, 66 books / Bible copy) for a total of over 233 million verses/records in one flat file of size 114-GIG. I only index the first verse in each chapter (1189 chapters/Bible * 7500 Bible copies). I READ a whole chapter of verses into memory at a time, into an array.
My GUI database user-interface displays a single selected chapter of the Bible within a RichEdit widget. An entire Bible of Books and their Chapters are displayed in a TreeView widget (Books are parent nodes, and when expanded show the chapter child nodes) which the end-user selects from by double-clicking with the mouse upon a certain chapter. This is a great way to limit your end-users to small groups of records of a couple dozen or so.
[With SQL, an end-user may return 1000's of rows in a query. This slows down throughput greatly].
For Sequential scans of the databases, I limit my end-users to 1 copy of the Bible at a time (31102 verses/records), whichever copy they desire at that moment. I use REGULAR EXPRESSIONS to filter the 31102 verses per the user's selection criteria. Verses matching the criteria are displayed in a RichEdit widget, 1 chapter at a time. Search Word(s) matched are highlighted within verses. I employ the same TreeView widget, so that the user can pick a single Book/Chapter to view the matching verses. The end-user knows which Books/Chapters have matches because I employ a "Royal Blue Bible" bitmap image attached to Bible and Chapter nodes containing matches, whereas Books and Chapters containing no matches display a "Grayed-Out Bible" bitmap image. 66.97.144.8 (talk) 21:34, 8 May 2019 (UTC)
Origins of ISAM
[edit]In the 1960's NCR sold a mainframe known as the NCR-315. It had a data storage unit known as CRAM, which was an acronym for Card Random Access Memory. This had ISAM file software from day one. I don't have access to any NCR-315 manuals or other documentation. However I don't believe that IBM are the pioneers of ISAM files. NCR also manufactured passbook printers and they sold 315's to banks along with application software that allowed the bank to have an on-line, real time system for updating passbooks. This would not have been possible unless ISAM access was available.