Talk:Natural key

I moved the following from the article to here because it seems out-of-scope. These three paragraphs talk about a "well-designed SQL engine" storing natural keys efficiently, "file system SQL engines" -- of which Mysql is one -- as being inefficient, and natural key characteristics that two do not apply. Am I seeing something that's not there?

In a well-designed SQL engine, the values of a natural key appear only once in physical storage and they are referenced when they are used as foreign keys. The structural mechanisms for such referencing vary: Sybase SQL Anywhere uses pointer chains; the SAND engine uses compressed bit vectors.

Older SQL engines, which are based on file systems, physically repeat the value in the FOREIGN KEY column and depend on procedural mechanisms to keep them aligned. Treating each table as if it were a separate file, instead of implementing the schema as an interrelated whole, can make changing the structure of a key very expensive.

Natural keys need three characteristics.

Uniqueness. This is the definition of any key. It cannot be NULL because NULLs lack uniqueness. Validation. In the relational model, attributes are scalar values drawn from a domain that has rules. For example, the value "2006-02-31" is outside the domain of valid dates. Check digits and regular expressions can guard against invalid key values. Verification. The key has to identify a real entity whose existence can be verified. For example, does a particular VIN (Vehicle Identification Number) refer to a real vehicle?

The preceding was submitted by Timhowardriley (talk) 18:35, 11 June 2008 (UTC).[reply]

The paragraph beginning "The main disadvantage of choosing a natural key" is confused and misleading. Jimgawn (talk) 15:03, 29 April 2010 (UTC)[reply]

——— The bulk of this article is incorrect. Consider the quote "The main advantage of a natural key over a surrogate key, which has no such logical relationship, is that it already exists; there is no need to add a new, artificial column to the schema."

There is no advantage or disadvantage "of a natural key". It is a logical property. Their are advantages and disadvantages to the USE of a Natural Key for specific purposes. This article is discussing the use of a Natural Key *as a Primary Key in the author's favorite RDBMS*. In order to actually encourage people to understand what a Natural Key is, the bulk of the article should follow in the same vein as the first paragraph, and describe it in terms of the Relational Model. Consider http://en.wikipedia.org/wiki/Relational_model#Database and the quote "A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (which includes candidate keys) can be composite, that is, can be composed of several attributes."

The use of a Natural Key as a Primary Key in the author's favorite RDBMS is potentially worth discussing, but the way the article is written now implies that this is the *definition* of a Natural Key.

Bad example

US SSNs are a bad example, as duplicate SSNs do exist! If anything, that's an argument against the use of natural keys. Obviously if SSNs had started out in a relational database, this wouldn't have become a problem ;-) Ghiraddje 12:11, 24 October 2014 (UTC) — Preceding unsigned comment added by Ghiraddje (talk • contribs)

I added a mention of the SSN example. Not necessarily "good" or bad in my opinion but anyway worthy of discussion Oradium (talk) 04:45, 25 September 2019 (UTC)[reply]