Talk:Data model/Archive 1
This is an archive of past discussions about Data model. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 |
Evaluation/Goals for a Data Model
(proposed section)
General goals for data models (paraphrased from ""Data Models" by Tsichritzis & Lochovsky, p206):
- A data model should be able to specify some of its own constructs (this measures the "descriptive power" of a particular data model).
- A small subset of a data model should be able to describe a much larger subset of that data model.
- This way properties of the data model subset can be verified and propagated to the full data model.
- Changes in the data model should be specifyable with the data model itself.
- Most importantly, if a data model can describe itself, it can also be used as a metamodel to describe other data models.
Examples of Theories and Instances
Can someone please list a few names of existing "Data model theories"?
Remarks
I have the impression, that the whole area of data model, database model and database management system might have to be made clearer and restructured. The practical use of terms often is unclear and leads to inproper use of terms.
Proposal:
Database model: everything what is related to general design/concepts how data can be structured (relational, hierarchical, network, object-oriented, ...)
Database Management Systems: technical aspects of the implementation of database models (e.g. languages like SQL, indexing, ...)
Data model: aspects of designing a specific application (Entity-Relationship- Diagramm, UML, ...)
Database: the implementation/instance of a data model
... --Udo Altmann 15:15, 17 Oct 2004 (UTC)
Wrong examples of two meanings?
I'm not sure I agree with the examples. They seem to imply that the relational model is a data model in the first meaning and that the ER model, FDM, UML and ORM all fall within the second meaning, but that would not be correct. All are examples of data models in the first meaning and in both models you can describe data models that are specific for a certain application. -- Jan Hidders 11:52, 18 Oct 2004 (UTC)
- It is correct, that the relational model is a database model. The ER model is part of design methods used to create a domain specific data model especially for relational databases. I'm not sure about the exact meaning of functional data model and object role model that the previous authors of this article had in mind since terminology depends on opinion leaders. I think it is better to make an extra section for design tools. --Udo Altmann 13:35, 18 Oct 2004 (UTC)
- ?? Where did I say that the RM is not a database model? The point is that the RM, FDM, ORM, UML class diagrams, et cetera are all data models in the same sense of the word, i.e., they are not application specific and give you a notation to model data. Of course the RM is closer to the physical level than the ERM and therefore less abstract, but they are definitely both data models. But the term is also often used to refer to a specific data model for a certain application, organization or bussiness, (as in "the data model of my company") which is then written down in the notation of either ORM, UML, IDEF1X or whatever. From this point of view the ERM is actually a meta-model because it gives a model of how your application-specific models look. (But the terms meta-model and meta-modelling already have another well-established meaning.) So those should be the two meanings that are discussed on this page, but you seem to identify the first non-application-specific meaning of data model with that of "database model" which is not correct because not all data models are database models. The ORM and FDM data models are well-known and well-defined data models that can be readily found in the relevant scientific literature. Why you claim that the interpretation of these terms depends upon opinion leaders is a mystery to me. -- Jan Hidders 12:04, 19 Oct 2004 (UTC)
- A data model is not necessarily a database model. For instance, IDEF1x is most certainly a data modeling paradigm, but there is no database management system that implements it as the core representation of data in its database engine. Data modeling is, by its nature, an abstraction. We don't have to have a database engine that implements the data model paradigm for it to be considered a true, and useful, data modeling paradigm. KeyStroke 14:19, 2004 Oct 19 (UTC)
- Well I didn't say that Jan Hidders said that the RM is not a database model - I agreed with him. But nevertheless. I think it is important to distinguish between the model how a database management system works, the idea how data structures can be defined - relational, network, hierarchical - and the methods to design a specific model. As you say, ORM and UML are notations and so I would say is the ERM Method. So we have the methods in which (data) models can be defined - the meta-level (?) - and for example the concrete entity-relationship diagram (sometimes also called entity-relationship model; it is clear, a pure diagram does not contain all details about the data definitions of an application). But meta-level is not meant as "is-a" relationship, so I don't say that data models are database models and I don't say that concrete entity-relationship models are specializations of the entity-relationship model method. It's simply the relationship between methodology and applying a methodology. This is, what I wanted to clarify - hope you can agree. --Udo Altmann 14:30, 19 Oct 2004 (UTC)
- I think you are over-analyzing this. The "common vernacular" when referring to what we are talking about here is "data models". If I were to lay a diagram from ER-Win in front of an experienced data modeler and ask him/her "What is that?" the answer would be "a data model". The answer would not be "the result of applying the data modeling methodology". Remember, K.I.S.S. Lets not get overly academic about this. KeyStroke 15:37, 2004 Oct 19 (UTC)
- Sure. I just felt a little misunderstood and wanted to clear that point. The only point which we should discuss about is, whether the article should be improved and if yes how. In my opinion it is not too much academic. --Udo Altmann 16:13, 19 Oct 2004 (UTC)
- There was indeed a misunderstanding on my behalf and I think I now know what caused this, so allow me to overanalyse this even more. :-) There are in fact two important distinctions that need to be made when talking about data models and I thought Udo was talking about another one than he really was. The first distinction is the one between a data modelling paradigm (such as the ER model) and a concrete instance of that paradigm (such as a certain ERD). Both are often called "data models" and I thought this is what Udo meant because he talked about a "domain specific model" and "data of a specific application". Note, by the way, that this distinction is important enough for it to be discussed by Chris Date in his "An Introduction to Database Systems". But what Udo apparently meant was the distinction between more concrete data models (such as the RM) and more abstract data models (such as the ER model). Although I certainly agree that this distinction should be made, I also claim that it is important to realize that this is a gradual distinction and not a fundamental one. As a case in point look at the Functional Data Model (FDM) which was originally introduced as a database model but is in fact at a higher abstraction level than the RM and now is mostly used as just a data model. Another example is ORM (or NIAM which actually is a method and not just a data model) which is not just a notation but has a well-defined formal semantics and could easily be used as database model, and finally the database models that were proposed for object-oriented databases that were at the same time concrete (because database models) and abstract (because at a higher abstraction level than the RM). So, Udo, do you agree with this analysis of the situation and that this should be explained better in the article? --- Jan Hidders 17:25, 19 Oct 2004 (UTC)
- So I myself have to admit that I had somehow a too narrow definition of data model in my mind since I thought mainly of the data aspect (I think many people do so since this is the persistent aspect). In fact some methods have a specific part dedicated to data modelling. I think we could agree that the article has to make clear the distinction between the datamodelling methods/paradigms and the instances. References could be made to the most popular and well-known paradigms like relational, hierarchical, ... since these are the theoretical foundation of Database management systems (although theory often not completely implemented). In this sense they are models for databases (database models). I'm not sure, whether the version of September 12th is the better starting point or the current version. --Udo Altmann 20:29, 19 Oct 2004 (UTC)
Data Model and Data model
What about http://en.wikipedia.org/wiki/Data_Model and http://en.wikipedia.org/wiki/Data_model ? —The preceding unsigned comment was added by 82.200.65.190 (talk • contribs) 2006-03-23 10:11:42 (UTC)
- Merge. How the heck did that happen? Jon Awbrey 19:04, 23 March 2006 (UTC)
- Marge. "Data Model" was created by an anon who was working on another article and apparently didn't know naming conventions. The Rod (☎ Smith) 22:42, 23 March 2006 (UTC)
Marketing Material
The External Links section looks like an advertisement.
"Swiss based ECOFIN is a leading company providing financial solutions...."
152.119.41.164 13:19, 20 September 2006 (UTC) Greg
Commented out the ECOFIN content that looks like blatant advertising. Please provide authoritative reference or an explanation for why this should not be permanently deleted from the article. Dreftymac 05:50, 11 October 2006 (UTC)
from the Data Model article
A data model is a concrete representation of an information model. It represents the entities, properties, relationships and operations defined in an information model in a manner that allows actual instances of those entities to be managed, manipulated, stored, operated upon and verified.
-- phoebe 03:58, 22 November 2006 (UTC)
Check the article
Please check the article. I reversed some content due to vandalism. —The preceding unsigned comment was added by 194.117.36.2 (talk) 14:59, 28 February 2007 (UTC).
Merge datamodeling here in data model
This must seem like an odd merge proposal, because the current data model article here is no more then a stub. Now I am not proposing a full merge. I would like to:
- merge most content (about datamodels) in the datamodeling article here
- and specialize the datamodeling article really on data modelling
At the moment both articles seems to be all about datamodels, and I think this could be improved by more specializing both articles. -- Marcel Douwe Dekker (talk) 13:37, 7 October 2008 (UTC)
- I started merging both articles and have improved this article some more. For now three sections needs to be created/improved about: history, database models and data modeling. I think I will first try to improve those articles some more, before I get back here. -- Marcel Douwe Dekker (talk) 00:09, 8 October 2008 (UTC)
View model
I am adding corrections to the articles that are in support of my day job, and as the next few months I am going to be responsible for reconciling the terminology between two ISO standards, one of them being the ISO/IEC 42010 IEEE Recommended Practice for Architectural Descriptions of Software-Intensive Systems, I am willing to put some time into the models and viewpoints.
- I looked at the view model. My first impression was that this article is about the models of user interface, in other words, that it is related to the model-view-controller architecture pattern. I like the content of the article, with one exception. Do you think, the section about perspective and projection in maps is less directly related to models? It looked somewhat anomalous, it is only an analogy (maybe a useful one).
- I do not think, there is a view model in the same sense as e.g. a data model. The concept of a view belongs to a higher meta-level as it provides some organization to models.
- Here is what IEEE 42010 says:
- In the conceptual framework of this recommended practice, an architectural description is organized into one or more constituents called (architectural) views. Each view addresses one or more of the concerns of the system stakeholders. The term view is used to refer to the expression of a system’s architecture with respect to a particular viewpoint.
- NOTE—This recommended practice does not use terms such as functional architecture, physical architecture, and technical architecture, as are frequently used informally. In the conceptual framework of the recommended practice, the approximate equivalents of these informal terms would be functional view, physical view, and technical view, respectively.
- Other information, not contained in any constituent view, may appear in an AD , as a result of an organization's documentation practices. Examples of such information are the system overview, the system context, the system stakeholders and their key concerns, and the architectural rationale.
- A viewpoint establishes the conventions by which a view is created, depicted and analyzed. In this way, a view conforms to a viewpoint. The viewpoint determines the languages (including notations, model, or product types) to be used to describe the view, and any associated modeling methods or analysis techniques to be applied to these representations of the view. These languages and techniques are used to yield results relevant to the concerns addressed by the viewpoint.
- An architectural description selects one or more viewpoints for use. The selection of viewpoints typically will be based on consideration.
- What do you think?
Equilibrioception (talk) 03:43, 13 January 2009 (UTC)
- I have copied this question to the Talk:View model#View model page and have given a respons over there. -- Marcel Douwe Dekker (talk) 21:12, 13 January 2009 (UTC)
Distinction between structure and function in relation to data
I suggest we separate model that define structure of data (this article) from models that define process or function (see function model, process model, business model and enterprise model. I would also distinguish between the terms 'model' and 'diagram' or 'notation', following the ISO distinction between 'meaning' and 'expression'. Data Flow Diagram is a particular notation for a model that describes the function that involves some transformations or movement or data. It complements data models, but is not a data model itself. Therefore I took the liberty to move the section of DFD to 'related models' and added the link to 'function model' in the introductory paragraph. Equilibrioception (talk) 22:32, 10 January 2009 (UTC)
- Thanks for the move. I allready realized in this article based on these arguments:
- A data model is defined as a model that describes how data is represented and accessed.
- And a Data flow diagram doesn't fit this definition.
- A Data flow diagram shows how data flows through an enterprise.
- The "flow of data" is a subject beside the "representation and access", so both models are complentary.
- I guess I agree with your other suggestions. I don't know if you noticed but I have been rewritting all of these five articles data model, function model, process model, business model, enterprise model) recently, and the database model and view model and a lot more. In fact I created the function model and view model, and recreated the business process modelling and database model articles to get a clear division of "basic Software engineering models" in Wikipedia. One way or an other these models all seem to come together in the Enterprise Architecture Frameworks.
- Back to this article I notices one other thing, which I don't know yet what to think of it. Now the Entity-relationship model is listed as Database model. I wonder if it shouldn't be listed as separate type of data model. What do you think?
- -- Marcel Douwe Dekker (talk) 22:55, 10 January 2009 (UTC)
- I just read some other text by William Olle (1996) here:
- The term "data model" has been the source of confusion. It is most widely used to refer to a model for a specific business area (order processing, insurance claims, airline seat reservations) prepared using to a data modelling technique. Unfortunately, the term "data model" was hijacked in the early seventies and used in the sense of "the network data model, the relational data model and the hierarchical data model"... This use of the term has been widely taught and causes confusion whenever one is in a group which needs to reference both interpretations...
- I just read some other text by William Olle (1996) here:
- I wonder if this confusing is also in this article. Does for example the listed Geographic data model classifies as datamodel? And isn't the Data Structure Diagram a type of Entity-relationship model? -- Marcel Douwe Dekker (talk) 23:29, 10 January 2009 (UTC)
- I agree with your approach. I noticed the lists of articles at your user page (very impressive, by the way) - this can be very useful to provide an overview for the e.g. entire field of models, which can make a difference in the uniformity of coverage. I agree with you that there is some confusion in the usage of the concepts data model and even data structure. Equilibrioception (talk) 03:43, 13 January 2009 (UTC)
- My approach leaves me with a lot of unanswered questions, which gives me reason to proceed. I am not sure I understand your suggestions about provide "an overview for the e.g. entire field of models". However, this is or maybe has been one of my prime objectives, and this is why I started the scientific modeling article 3 years ago.
- Creating a uniformity in a Wikipedia article is called "Wikification" here. Creating a uniformity in the coverage of any subject in Wikipedia is an other balgame of creating, merging and deleting articles. Two moths ago I for example proposed to merge the Logical data model, Logical schema and Semantic data model.
- I still wonder if the Entity-relationship model should be listed as Database model or not. What do you think?
- -- Marcel Douwe Dekker (talk) 23:10, 13 January 2009 (UTC)
- By "an overview for an entire field" I meant the list of related articles, such as ones at your user page, or the topics of the Wikipedia, like list of mathematics lists or topic overviews, for example topic outline of information science. Regarding the Entity-Relationship Model, in my opinion it is definitely a Data Model. I will add it to the list of "see also" links for Data Model.
- -- Equilibrioception (talk) 14:28, 25 January 2009 (UTC)
- Thanks, but I think you missed my question. At the moment the Entity-relationship model is allready listed in the Types of data models/Data base model, the seventh item. My question is if this is right or wrong?
- As I mentioned, in my opinion the Entity-Relationship model does belong to the Data Model article, because it represents a certain kind of a Data Model.
- -- Equilibrioception (talk) 16:07, 26 January 2009 (UTC)
- As to that overview of the field of modelling, the scientific modeling gives such an overview, or not? -- Marcel Douwe Dekker (talk) 23:17, 25 January 2009 (UTC)
- Yes, it does a very good job of providing an overview to the field. I'll make few suggestions at the Talk:Scientific modelling page. Given your interest in models and modeling, what are your thoughts about proposing a Wikiproject on Modeling as a subproject under Computing ?
- -- Equilibrioception (talk) 16:16, 26 January 2009 (UTC)
- Ok thanks. I don't know about such a subproject. I am more of an systems engineer, and that is why I initiated the WikiProject Systems. I created a separate field of scientific modelling within this project, because of my own interest. But there is not much more movement here. -- Marcel Douwe Dekker (talk) 22:32, 26 January 2009 (UTC)
Data Models in Telecommunications and Networking
Data Models, i.e. formal approaches to describe how data is represented and accessed are one of the key topics for the telecommunications and network protocols.
- One of the key International Standards in this area is ASN.1, defined in 1984 by ITU-T. ASN.1 is to my knowledge one of the first formal models of data. The emphasis of ASN.1 is efficient encoding and cross-platform interoperability, and it was the dominant standard in the pre-XML days. It is integrated in Z-series specification languages, standardized by ITU.
- the current article on ASN.1 is ok, but can be improved
- Another standard related to organization of data in the interoperability context is the CORBA IDL, part of the CORBA specification by OMG. CORBA IDL is the foundation for several OMG specifications.
- the current article on CORBA IDL is inadequate
- ISO has another standard related to data organization, called ISO/IEC 11404 Language-Independent Data Types
- A very important standard for data model interoperability is called OMG Common Warehouse Metamodel (CWM). This is also an ISO standard, known as ISO/IEC DIS 19504. It is part of the ISO Open Distributed Processing (ODP) Stack
- the current article on CWM is inadequate
All of the above standards address complex data types and references (aka foreign keys). Some fruit for thought.
-- Equilibrioception (talk) 04:31, 5 February 2009 (UTC)
- Interesting links. I wonder how this can be integrated into the article? At first I thought, you can add this whole section (without these secondary remarks) to the article in a new "Data models in Telecommunications and Networking" section. This paragraph could be added to the "types of data models" section, because it is about one type of "data model" and not about "related models".
- On the other hand these standards seems rather applied, and it seems to me there must be other standards like this in related fields as well. The current "types of data models" section is more about theoretical data models. So maybe with the texts about these applied standards we could create a separte "application" section...!? What do you think? -- Marcel Douwe Dekker (talk) 12:27, 5 February 2009 (UTC)
- There exists a useful categorization of data based on its use within an enterprise system:
- data at rest - aka persistent data in a database or in the data repository
- data in use - data that is being used by an information system, usually for the purpose of presenting it to the user, and
- data in transit - data in transactions, also data in events, messages, in the network packages, etc.
- There exists a useful categorization of data based on its use within an enterprise system:
- This is a rather neat way of looking at data, isn't it? When it comes to information assurance, each of the above categories requires specific techniques. So, this categorization may be the way to categorize data models. Obviously my notes were specifically focused on the 'data in transit'. Now, 'data in use' is mostly about the data structures in programming languages. And it is related to the "user interface models".
- Let me trace the exact origins of this categorization (must be somewhere in one of the enterprise architecture framework, like TOGAF, UPDM, etc.)
- I found (only) two sources mentioning this
- Keith D. Willett (2008) in Information Assurance Architecture page 159 in Chapter 8.3.3.2 Data State explaines: "Data states are at rest, in transit, and in use. Data at rest is on a permanent storage medium... Data in transit refers to data traverersing a network... Data in use is in the virtual storage..
- I found (only) two sources mentioning this
- Fred Cohen (2007) in IT Security Governance Guidebook with Security Program Metrics on CD-ROM, p.189 in a paragraph on the structure of information protection states: "Users tend to deal with data life cycles and information at rest, in use, and in transit, and are subject to organizational effects, mandates, and awareness..."
- From a Wikipedia point of view none of these concepts data at rest, data in use, data in transit seem that notable. An overview article like data model is no the place to introduce such new concepts. -- Marcel Douwe Dekker (talk) 08:46, 6 February 2009 (UTC)
- OK, it makes sense; these could be security-related concepts. In that case, they may not add much value to data models.
- However, there are quite a few uses of these concepts 'in the wild'. Nothing earth-shuttering, I agree, but may be worth considering in the Computer Security project.
- In particular, data at rest seems a common term.
- and so on.
- I guess you are right, these concepts are more data security and/or information security related. -- Marcel Douwe Dekker (talk) 19:33, 6 February 2009 (UTC)
Section "Method related models" removed
I removed the "Method related models", which stated.
- A lot of the existing data modeling methods, software development methodologies, and other modeling languages in the field of computer science have defined their own type of models.
I think this statement is confusing. -- Marcel Douwe Dekker (talk) 19:24, 8 March 2009 (UTC)
The "Zachman Framework"section removed
I removed the following section from the article
- In an alternative framework, called the Zachman Framework, a data model instance may be one of six kinds (according to John Zachman, 1987, 1992, 2005, 2007):
- a contextual data model (list) identifies entity classes (representing things of significance to the organization).
- a conceptual data model (semantics) defines the meaning of the things in an organization. This consists relationships (assertions about associations between pairs of entity classes).
- a Logical schema | logical data model (schema) describes the logic representation of the properties without regard to a particular data manipulation technology. This consists of descriptions of the attributes (role a data element plays in relation to the thing (entity) it represents.
- a Physical schema | physical data model (blueprint) describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.
- a data definition (configuration) This is the actual language coding of the database schema in the chosen development platform.
- a data instantiation holds the values of properties applied to the data in the schema.
- The significance of this approach, according to John Zachman, is that it allows the six perspectives to be relatively independent of each other and have different contributors, audiences and purposes. In each case, of course, the structures must remain consistent with the other model instances although the details change. The table/column structure may be different from a direct translation of the entity classes, relationships and attributes, but it must ultimately carry out the objectives of the contextual entity class structure and conceptual relationship structure. Zachman regards each perspective a separate and distinct vantage point of the data: his view is not a methodology but rather a way of classifying the parts, however development projects and software tools often proceed from Contextual list, to conceptual data model, followed by the Logical schema|logical data model. In later stages when the data platform is known (whether it be database software or filing cabinets), this model may be translated into a Physical schema|physical data model followed by the data definition. When the database actually stores values and is operational data manipulation can take place.
I think this section doesn't explain the Zachman Framework and its relation to data model, and data modeling. I also can't find the listing given here in the Zachman 1987 article.
-- Marcel Douwe Dekker (talk) 19:34, 8 March 2009 (UTC)
Renewed first image
The removal of the first image here, reminded me that the image had to be redrawn and the caption had to be improved. So I did both an readded the image (with the data modelling part highlighted), because I do think this image is particularly appropriate in that article. It gives an overview of both the data modelling process, and the context of enterprise modelling. -- Marcel Douwe Dekker (talk) 13:02, 2 July 2009 (UTC)