Trying to define surrogates

From: JOG <jog_at_cs.nott.ac.uk>
Date: 17 Aug 2006 04:51:00 -0700
Message-ID: <1155815460.834114.124680_at_i3g2000cwc.googlegroups.com>


Bob Badour wrote:
> I disagree that the concept of surrogate vs. natural is useful.

Ok, I've had time to digest this now, and I have to say that I /do/ believe the distinction can be important, and I think your interpretation is slightly awry. Let me explain:

Bob Badour wrote:
> It is a surrogate for whatever a surrogate key is for. Think of any
> natural key. How is it not a surrogate?

o.k., a surrogate is a subsitute for something. That's agreed.

> My name is not me. It is an arbitrary identifier chosen by my parents.
> It is familiar because I was conditioned from an early age to respond to it.

I /strongly/ contest that your name is a surrogate. Your name is a 'label' applied to you, it is not a 'substitute' for you. I know it is a subtle distinction but it is important. (n.b. these are not really my deductions but a regurgitation of the writings of William Kent, highly rated by perople such as Date.)

> My SSN is not me. It is an arbitrary identifier chosen by the IRS to
> identify tax filings related to my income. It is familiar because I was
> given a little blue card with it inscribed, and I was instructed to
> transcribe it to a variety of documents.

Again an SSN is a label applied to you just in a different context, and not a substitute for you. Same for the other examples supplied.

> [snip]
> I am not suitably represented for machine processing.

Agreed, but the labels applied to you /are/ suitable for machine processing. Hence we don't need to provide any substitutes for them - they can go straight into propositions and ultimately the database.

However, there are some identifiers that we do not have suitably formatted labels for. Attributes that are currently not easy to enter into a proposition. Fingerprints for example. Some attributes might not be easily recordable even though we know they exist. These are attributes not suitable for machine processing.

Hence we 'subsitute' that key with a different artificial key we have generated, to act as its representative. That is what surrogacy is.

The blur comes in that this is only really useful at design time, because as soon as the attribute is used externally, it becomes a new natural key. Hence I agree that after the fact the distinction is not useful, but a priori it is important to /understand/ exactly what's going on as it eliminates any foolhardy temptation to try and hide such attributes and violate the information principle. Received on Thu Aug 17 2006 - 13:51:00 CEST

Original text of this message