Re: ID field as logical address

From: <robur.6_at_gmail.com>
Date: Wed, 3 Jun 2009 04:43:26 -0700 (PDT)
Message-ID: <43a5ac21-08f1-4922-a457-61e77c8cc8eb_at_r13g2000vbr.googlegroups.com>


On May 31, 5:38 am, "Brian Selzer" <br..._at_selzer-software.com> wrote: ...
> If instances of a key do not
> permanently identify things in the Universe, then transition constraints
> that join the relation values before and after an assignment to a relvar
> would not be sound. For example, suppose that you have a relation, {L, F,
> Stat} Key {L, F}that starts out with the following value:
>
> {{L:Smith, F:Mary, Stat:Single},
> {L:Jones, F:Mary, Stat:Married}}
>
> and is then assigned the value,
>
> {{L:Smith, F:Mary, Stat:Divorced},
> {L:Jones, F:Mary, Stat:Married}}
>
> You might be wondering how someone who is Single can become Divorced?
> Should the assignment be rejected on that basis? Or is it possible that
> Mary Jones' maiden name is Smith, and that she became Divorced while at the
> same time the Mary Smith who was Single Married Robert Jones? It might also
> be that the Mary Smith who was Single moved out of the area while a
> different Mary Smith, who is Divorced, moved in.

A very good example indeed. However let’s imagine the following conversation:

-“Do you know that Mary Smith got divorced?”
-“It must be a mistake, Mary Smith has never been married!”
-“Well, actually it is about Mary Jones that married and became Mary
Smith. The Mary Smith you know changed his name to Jones before that.”

As can be seen an ambiguous statement can be confusing for humans as well (and requiring supplementary explanations). A database (loosely speaking) is just a collection of statements. The system is not supposed to know (or guess) the truth, but just to check the new statements for consistency with the existing statements in the database.

So in specified context a statement “Mary Smith got divorced” should be rejected due the transitional constraint. A correct sequence of statements is “Mary Smith changed his name to Jones”, “Mary Jones got married and changed his name to Smith” and “Mary Smith got divorced”. This is unambiguous for both database system and a human as well.

...
> If, on the other hand, insert,
> update and delete are not shortcuts, but rather primitive operations, then
> the introduction of the autogenerated IDs is not necessary. All that is
> needed is to correlate the tuples in a relation that are the target of an
> update with those in its successor. For example, the update
>
> {{L:Smith, F:Mary, Stat:Single, L':Smith, F':Mary, Stat':Divorced}}
>
> should clearly be rejected, but the update,
>
> {{L:Smith, F:Mary, Stat:Single, L':Jones, F':Mary, Stat':Married},
> {L:Jones, F:Mary, Stat:Married, L':Smith, F':Mary, Stat':Divorced}}
>

The relation variable changes its state from one set of statements to a different one. No one should care which tuple was replaced by which tuple. It is just a new set of statements consistent with the current state of the database. Just as in mathematics if you have a set A = {1,2,3} you cannot update 3 to become 4. You can however update the value of A by an assignment A = {1,2,4}. The effect might be the same but it is something different.

Now let’s consider a foreign key DEP {L, F} references EMP {L, F} and the following states:

EMP: {{L:Smith, F:Mary, Stat:Single}}
DEP: {{L:Smith, F:Mary, Dept:HR}}

An assignment

EMP := {{L:Jones, F:Mary, Stat:Single}}

will fail due the foreign key constraint. However the assignment can be rewritten as

EMP := {{L:Jones, F:Mary, Stat:Single}}, DEP := {{L:Jones, F:Mary, Dept:HR}}

A foreign key is not a tuple reference. It is just a prerequisite that the employee relation should contain a statement about Mary Jones before we can enter a statement about her in departments. This is a completely different thing from a department object that has a pointer to an employee object. And requires a different way of thinking about.

Another example:

Employees := {{L:Jones, F:Mary, Stat:Single}}, Departments := { }

is also a perfectly valid assignment. We cannot tell however if Mary Smith has been fired and Mary Jones was hired without being assigned to a department yet. Or maybe Mary Smith changed his name to Jones but she is not working for HR anymore and not yet assigned to another department.

However we cannot blame the relational model for that. It is the limited set of statements existing in the database that is limiting our knowledge. But that was a design decision. We can choose different set of keys, we can make more statements, or make our statements more precise (by adding more attributes) if we need.

Now back to the surrogate keys.

Surrogate keys give as for free stability and uniqueness. However is that it all what we need?

Suppose:

I tell you my name is Mary Jones.
I tell you my name is Mary Jones.

This is the same statement stated twice.

I tell you for the first time that my name is Mary Jones. I tell you for the seccond time that my name is Mary Jones.

Those are definitely different statements. If we really need the “first time”, “second time” information then it is ok. If we just need “name is Mary Jones” then we’re in trouble. (Perhaps someone can reformulate that in terms of predicates and variables, but I hope I was clear).

Regarding stability it is more a performance (hence physical) problem. If we “change” a value in one tuple we should “change” that value in all related tuples in the same assignment to prevent constraint violations.

Just my two cents. Hopefully someone with a deeper knowledge of relational theory will correct me if I made a mistake or used an excessively loose language. Received on Wed Jun 03 2009 - 13:43:26 CEST

Original text of this message