Re: Surrogate Key vs Production Key

From: Noons <wizofoz2k_at_yahoo.com.au>
Date: 14 Oct 2004 01:55:49 -0700
Message-ID: <73e20c6c.0410140055.2690e121@posting.google.com>

"Mark C. Stock" <mcstockX_at_Xenquery .com> wrote in message news:<RYGdnZkHtp-rsfDcRVn-og_at_comcast.com>...
> big difference -- perhaps the entire reason for apparent disagreement
>
> you appear to sometimes use 'natural key' to refer to an 'intelligent key',
> which is quite different from a natural key structure, that is, a composite
> (multi-column) primary key or foreign key, and sometimes use it to refer to
> a multi-column composite key

I referred from the start to surrogate keys. Period. There is only one definition of them: system assigned and guaranteed unique. That means there is NO preassumed algorithm or meaning to the key: the system simply guarantees that it will be unique, whatever form it may be asked in. And Codd zap the designer that gives them ANY arbitrary meaning. That is a surrogate key and how it is defined everywhere.

Searches for both "natural" and "surrogate" keys return exactly what I'd expect in this context. If someone is talking about 'intelligent' keys now, whatever that may be, then it's probably a good idea to define them first. I certainly won't attempt to.

>
> you also appear to use the term 'surrogate key' as a synomym for
> 'system-assigned', which it is not.

No I don't.

> system-assigned key may very well be a component of a natural key structure
>

I doubt it.

> hopefully properly analyzed. That analysis will not happen if the data
> modeler simply grabs a data values that is currently used in the business as
> a unique identifier without determining how that value is established and if
> it needs to be decomposed, or if that value is simply used as an attribute
> and a system-assigned key is used as a surrogate key (which may not be a bad
> thing, the point is, the analysis of the business's current practices must
> be peformed)

Sure it must be performed! The use of a surrogate key does not mean abandoning normal analysis techniques, nor has it ever meant that. However, I also draw the line at attributing any inherent structure and meaning to a natural key, decomposed (!) or not.

> | Why? The conceptual model has NOTHING to do with implementation
> | details. Absolutely nothing.
> |
> |
>
> Data Modeling 101: the conceptual model precedes the implementation
> (referred to sometimes as the logical model or the physical model), and
> implementation details are compromises made to apply the conceptual model to
> a particular technology or environment. The relationship between the two
> models is essential to good design.

Exactly. And still, the conceptual model has NOTHING to do with the implementation details, like I said. I did not say that the implemnetation details had nothing to do with the conceptual model. That is a FUNDAMENTAL difference. One is dependent on the other, but that is not reversible.

> My comments were specific to the data modeling phase (conceptual model) but
> discovery does continue during initial implementation as well as ongoing use
> of the data. No analyst, and no data model, perfectly captures every nuance
> of a business at the first take -- new discoveries typically clarify earlier
> incorrect, arbitrary or misconceived ideas.

There is a recommendation then to use surrogate keys if I ever saw one. If you can't be 100% sure you have discovered all the nuances and possible relationships, how can you then define your tables with a potentialy complex relationship tree with natural keys? How can you now use PKs based on those natural keys in extensive FK/PK trees? What will happen to the FKs first time you need to change the PK because your analysis discovered a hidden natural key?

> application, not peak during the initial analysis. (I know that last comment
> leaves me open to criticism of doing insufficient analysis, but the careful
> reader will see that my point is that more will always be learned as the
> application is used and evolves, new observations and discoveries are not
> automatically incorrect).

There is no such criticism. No one can expect a prescient knowledge about a complex system of interactions. The maths alone to allow for that, would make most systems not viable economically. What is expected however is a flexible design that can be adapted with minimal impact to new discoveries about that system. And surrogate keys lend themselves perfectly to that.

>
> since i presented a hypothetical case, i covered two possible ways the
> business rule could be viewed by the sponsoring organization. in real life,

You see, this is where I disagree with all the arguments for the use of
natural keys. There is not ONE example put forward by their defenders that doesn't start with some assertion involving "poor", or "weak", or "incorrect" or words to that effect. Which immediately presumes that there should be an easy proof of the superiority of natural
keys.

Yet when I follow the logic exposed, WITHOUT exception, a fault crops up that makes the whole example inconsistent with the exclusive use of natural keys. And when I point this out, again invariably the reply is: it is only hypothetical.

Well, if even hypothetically no one can provide me with a concrete example that simply demonstrates: "here, you cannot do this with surrogates, that is why you should be using natural keys", I'm sorry but I remain unconvinced.

> i think it's quite concrete that using a column as an FK to a parent table
> and part of a multi-part FK to a sibling table maintains the sibling part of
> the relationship quite efficiently -- using surrogate keys would require
> additional code

No it most definitely does not! You see, once again a complete mis-representation of surrogate keys crops up. You have EXACTLY and PRECISELY the same code. If anything, a surrogate key will require a SINGLE column as a FK, not a complex and potentially extensive multi-part FK. If anything, it is PRECISELY the surrogate key that will yield the less complex code.

> and, more important, preserving this key structure in the conceptual model
> supports more thorough analysis of the business problem

Well, there is where we disagree: I don't think keys have anything to offer analysis. It's the other way around: analysis is supposed to provide the keys. If it can't in an effective manner, then it's time for surrogates.

> if the early model is 'test-driven', it becomes very apparent that if an
> employee has been assigned to department 10, and a vehicle 37 is assigned to
> department 14, the employee cannot ever be assigned to vehicle 37, since the
> assigned department does not match the vehicle department -- and that just
> requires FK constraints (conversely, if i try to reassign the employee to
> another department, i am prevented until the current vehicle assignment is
> removed)

Not really. The foreign key contraints cannot enforce that UNLESS you make the assumption that the dept_id part of dept_id||pers_id above (multi-part fk to employee) will be the same as the dept_id part of dept_veh_id||veh_id (multi-part fk to department_vehicle). And there is no referential integrity syntax anywhere that will allow you to explain that two part FK columns must be equal. The syntax for referential integrity only has to do with PKs and FKs. Check constraints are NOT part of referential integrity. If they were, then of course you could use a constraint to say that dept_id = dept_veh_id. But then you can do EXACTLY the same with surrogate keys:
dept_sur_id = dept_veh_sur_id.

>
> perhaps you'd care to provide an example of implementing this scenario with
> surrogate keys?

Just did. You cannot enforce the previous example using natural keys with just FK/PK relationships. You have to use a check constraint. And that is not exclusive to natural keys. ANY column cna be involved in a check constraint: those are data integrity constraints, as opposed to referential integrity ones.

> maybe we agree after all, to a point. but if you advocate always using
> surrogate keys for all tables, then you end up with a very broad, and
> arbitrary, divergence between the conceptual model and the implementation --
> and restrict yourself from employing some very useful data modeling
> techniques

I can't see why.

>
> that's the point -- but the reasons for the compromises should also be
> documented, and there should by no IFs about having a conceptual data model

100% agreed. Never said there wouldn't.

> thanks for the post -- i was having trouble waking up this morning and this
> got my mind moving.
>

:)
Welcome to my life! Received on Thu Oct 14 2004 - 03:55:49 CDT