Re: ID field as logical address

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 8 Jun 2009 13:27:39 -0400
Message-ID: <bSbXl.31301$Ws1.6356_at_nlpi064.nbdc.sbc.com>


"Bernard Peek" <bap_at_shrdlu.com> wrote in message news:6ciFVPRahQLKFwtp_at_shrdlu.com...
> In message
> <b766171d-7b30-4251-8c46-b799946a8277_at_l28g2000vba.googlegroups.com>, JOG
> <jog_at_cs.nott.ac.uk> writes

>>On Jun 6, 5:50 am, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>>> "JOG" <j..._at_cs.nott.ac.uk> wrote in message
>>> > On Jun 5, 2:10 pm, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>>> >> "JOG" <j..._at_cs.nott.ac.uk> wrote in message
>>> >> <snip>
>>> >> > At risk of repeating myself, S# has merely proven to be a bad key
>>> >> > again - it is clearly an unstable identifier for a supplier (just 
>>> >> > as
>>> >> > 'name' was in the 'divorcee' example). This is just another flawed
>>> >> > schema, not a problem with the RM.
>>>
>>> >> I didn't say it was a problem with the RM. I said it was a problem 
>>> >> with
>>> >> Date and Darwen's notions that a database is a collection of relvars 
>>> >> and
>>> >> that insert, update and delete are shortcuts for relational 
>>> >> assignments.
>>>
>>> >> You're underscoring my point, by the way, which is that adopting 
>>> >> those
>>> >> notions requires that every instance of every key be a permanent
>>> >> identifier
>>> >> for something in the Universe of Discourse
>>>
>>> > Yup, if you talk about something in a proposition use a stable
>>> > identifier for it. It's not just desirable, but essential. Use a nice
>>> > stable EMP# not a person's name. It is about integrity not
>>> > 'expressiveness'.
>>>
>>> It is not essential. Language terms can denote different things at
>>> different times. "The President of the United States" is Barack Hussein
>>> Obama now, but was George Walker Bush just five months ago.
>>
>>Those are not merely 'language terms'. The "President of the US" and
>>"Barack Obama" are different things, with different properties. The
>>fact that they currently happen to coincide is what is confusing you
>>(imo of course).
>>
>>It is /essential/ one knows which of those things one wants to keep
>>track of in order to pick a key that will be stable over time, and
>>hence construct a schema that will maintain integrity over time. If
>>you are concerned with the "person" then that should be the chosen
>>key, and their "post of office" will change over time. If you are
>>concerned with the "post of office" then that is the key, and the
>>"person" holding that position will change over time.
>

> Let's get metaphysical. There is an attribute of every unique object
> called "Identity." This is a non-numeric dimensionless constant. What
> makes this difficult to deal with is that there is no function that can be
> applied to {Identity} which returns a meaningful text string.
>

In fact, identity is a relation. The word you're looking for is haecceity, "thisness." The haecceity of something is essentially just that which distinguishes it from everything else. Haecceity can be embodied by a rigid designator, or a rigid definite description, but it is in fact a separate property that is dependent upon neither. What can be said about haecceity is that everything that can be has one--one that is different than that of every other thing that can be.

> We therefore choose a range of surrogate keys which can be manipulated as
> text strings and to a greater or lesser extent map 1:1 to the Identity
> value.

Not exactly. We include enough of the properties of each in the set of things being represented in order to just be able to distinguish one from another at any given point in time, which is all that is required in order to represent them in a relation or reason about them using first order predicate logic. Surrogates need only be introduced when some of those properties are not relevant to the problem at hand.

> In some cases we have natural keys where the mapping is enforced by the
> laws of physics. In other cases we issue an invented value to identify an
> object, and we attempt to maintain the 1:1 mapping by processes that take
> place outside the database. So we issue a National Insurance number and
> tell the person it identifies to remember on pain of dire consequences.

>

> In every case that I can think of the mapping is maintained by processes
> that are outside the database and outside the relational model. The
> relational model takes it as axiomatic that this mapping is somehow
> maintained. It does not deal with how it is maintained.

I have to disagree. Codd anticipated the need for key updates when he introduced the Relational Model to the world. He wrote:

        The totality of data in a data bank may be viewed as a     collection of time-varying relations. These relations are of     assorted degrees. As time progresses, each n-ary relation     may be subject to insertion of additional n-tuples, deletion     of existing ones, and alteration of components of any of its     existing n-tuples.

        --A Relational Model of Data for Large Shared Data Banks,
        Communications of the ACM, June 1970, page 379.

Note that there is no differentiation between key and non-key components. In fact, that would unnecessarily restrict the kinds of variations that can occur--especially when there is more than one key. Later on in the paper he referred directly to key updates:

        There are, of course, several possible ways in which a     system can detect inconsistencies and respond to them.     In one approach the system checks for possible inconsistency

     whenever an insertion, deletion, or key update occurs.
        --page 387.

If any prime attribute can be the target of an update, then, obviously, multiple instances of a key can map to the same thing but only at different times, and a single instance of a key can map to different things but only at different times.

>

> In this it is no different from any other branch of algebra. If an
> equation asserts that 3X=6 then we assume that all three values of X are
> identical and map on to the same value. In relational algebra it is
> assumed that whatever {Identity} maps to {Key} is always the same.

This statement is true (if one interprets "always" to be the scope of any algebraic expression), but it is vacuously true, because expressions in both the relational algebra and the relational calculus apply to just one instantaneous state of the data bank at a time. So every appearance of an instance of a key /at one instantaneous state of the data bank/ maps to the same thing in the Universe. But that doesn't preclude an appearance of that instance of the key /at another instantaneous state of the data bank/ from mapping to a different thing.

>

> What we are discussing in this thread is pathological conditions where we
> assume that the mapping may change over time. This is essentially the same
> problems as we would face if we tried to perform simple algebra when a
> variable can have multiple different values at the same time and in the
> same equation.

I don't think they are pathological conditions. The real pathology is that there seems to be a pervading implication in the posts here on CDT that every relation should if at all possible have just one unary key and that there is even always a choice of a key, when there are in fact a myriad of instances that require composite keys, multiple unary keys, both unary and composite keys or even multiple overlapping composite keys, and when much of the time key specifications arise as a consequence of normalization. If a relation schema has two keys, do you allow updates that target just one of the keys? If it has three, do you allow updates to all but one? If you do allow updates that target just one key, or all but one key, then why shouldn't you allow updates that target any or even all keys? Why artificially impose limitations on the model? Received on Mon Jun 08 2009 - 19:27:39 CEST

Original text of this message