Re: Trying to define Surrogates
Date: Thu, 17 Aug 2006 16:39:10 GMT
Message-ID: <OY0Fg.50708$pu3.588455_at_ursa-nb00s0.nbnet.nb.ca>
JOG wrote:
> Bob Badour wrote: >
>>JOG wrote:
>>
>>
>>>[apologies for the cross posting].
>>>
>>>Bob Badour wrote:
>>>
>>>
>>>>I disagree that the concept of surrogate vs. natural is useful.
>>>
>>>Ok, I've had time to digest this now, and I have to say that I /do/
>>>believe the distinction can be important, and I think your
>>>interpretation is slightly awry. Let me explain:
>>>
>>>Bob Badour wrote:
>>>
>>>
>>>>It is a surrogate for whatever a surrogate key is for. Think of any
>>>>natural key. How is it not a surrogate?
>>>
>>>o.k., a surrogate is a subsitute for something. That's agreed.
>>>
>>>
>>>
>>>>My name is not me. It is an arbitrary identifier chosen by my parents.
>>>>It is familiar because I was conditioned from an early age to respond to it.
>>>
>>>I /strongly/ contest that your name is a surrogate. Your name is a
>>>'label' applied to you, it is not a 'substitute' for you. I know it is
>>>a subtle distinction but it is important. (n.b. these are not really my
>>>deductions but a regurgitation of the writings of William Kent, highly
>>>rated by perople such as Date.)
>>
>>I respectfully suggest that no important distinction exists between a
>>label and a surrogate as used in the context of candidate keys.
>>
>>
>>
>>>>My SSN is not me. It is an arbitrary identifier chosen by the IRS to
>>>>identify tax filings related to my income. It is familiar because I was
>>>>given a little blue card with it inscribed, and I was instructed to
>>>>transcribe it to a variety of documents.
>>>
>>>Again an SSN is a label applied to you just in a different context, and
>>>not a substitute for you. Same for the other examples supplied.
>>>
>>>
>>>
>>>>[snip]
>>>>I am not suitably represented for machine processing.
>>>
>>>Agreed, but the labels applied to you /are/ suitable for machine
>>>processing. Hence we don't need to provide any substitutes for them -
>>>they can go straight into propositions and ultimately the database.
>>
>>And when we use the values of labels to identify the values of the
>>labels, no surrogacy is required. When we use the labels to identify me,
>>they stand as surrogates for me.
>>
>>Candidate keys are labels. Values are self-identifying and self-labelling.
>>
>>
>>
>>>However, there are some identifiers that we do not have suitably
>>>formatted labels for. Attributes that are currently not easy to enter
>>>into a proposition. Fingerprints for example.
>>
>>Does it matter whether the fingerprint has been scanned and digitized
>>and represented suitably for machine processing?
> > > To my mind yes. If the DBMS can handle the attributes type, of course > it requires no substitute domain, and hence has no need for surrogacy > by my interpretation. > >
>>
>> Some attributes might not
>>
>>>be easily recordable even though we know they exist. These are
>>>attributes not suitable for machine processing.
>>
>>Define 'easily'. Fingerprints are recorded. Genomes are recorded.
>>Feature-length films are recorded. CT scans are recorded. What is too
>>difficult to record? The exact location and velocity of a sub-atomic
>>particle? The only reason we cannot record that is we cannot measure it
>>in the first place.
>>
>>
>>
>>>Hence we 'subsitute' that key with a different artificial key we have
>>>generated, to act as its representative. That is what surrogacy is.
>>
>>How exactly does that differ from labelling? We have a fingerprint and
>>we label it 'Defense Exhibit 117' or we label it 532673294. We then use
>>the label to refer to the fingerprint.
> > > Ok, let me clarify: > > My name is an identifier for me. > My fingerprint is an identifier for me. > > Say we don't have the ability to digitise the photos we have of > fingerprints. Then we produce: > > 532673294 is a identifier for my fingerprint, which is a identifier for > me.
Where does the ability to scan and digitize the fingerprint enter into it? How does the identifier get mapped to the fingerprint?
> The 2nd level of indirection in the last line indicates use of a > representative for an attribute that existed naturally before the > design of the database. It is not that it is just wasn't 'familiar', it > didn't exist at all - we have made the domain up specifically to > facilitate the information modelling process. We have not just modelled > the propositions we have added to them.
We do that every time we create a candidate key. It isn't familiar until it is used, and then it is.
> That for me is the distinction made when I see the word surrogate in > context of databases.
Your explanation doesn't seem very coherent to me. I suspect you imagine distinctions that don't exist.
>>This is merely case of trading off simplicity and familiarity.
>>
>>>The blur comes in that this is only really useful at design time,
>>>because as soon as the attribute is used externally, it becomes a new
>>>natural key. Hence I agree that after the fact the distinction is not
>>>useful, but a priori it is important to /understand/ exactly what's
>>>going on as it eliminates any foolhardy temptation to try and hide such
>>>attributes and violate the information principle.
>>
>>I suggest it is more illuminating to /understand/ during design that one
>>is making a pragmatic design tradeoff among a handful of sometimes
>>conflicting design criteria: simplicity, familiarity, stability,
>>irreducibility.
>>
>>If one remembers that familiarity is a design criterion and why it is a
>>design criterion, one won't feel any temptation to hide anything.
Received on Thu Aug 17 2006 - 18:39:10 CEST