Re: Relation Schemata vs. Relation Variables

From: <brian_at_selzer-software.com>
Date: 23 Aug 2006 01:49:15 -0700
Message-ID: <1156322955.893483.55820_at_p79g2000cwp.googlegroups.com>


J M Davitt wrote:
> brian_at_selzer-software.com wrote:
> > JOG wrote:
> >
> >>Brian Selzer wrote:
> >>
> >>>"JOG" <jog_at_cs.nott.ac.uk> wrote in message
> >>>news:1156107137.284318.326750_at_m73g2000cwd.googlegroups.com...
> >>>
> >>>>Brian Selzer wrote:
> >>>>
> >>>>>In the context of an update, the predicate of a database along with the
> >>>>>current database state determines the set of all /possible states/ that
> >>>>>can
> >>>>>become current. Integrity rules, which are implicitly or explicity
> >>>>>specified as part of the database predicate, can be classified as either
> >>>>>state constraints or transition constraints. State constraints define
> >>>>>the
> >>>>>set of all consistent database states; transition constraints determine
> >>>>>whether or not a state change should be allowed. Given a set of
> >>>>>consistent
> >>>>>database states and the current state, one can derive a set of
> >>>>>transitions,
> >>>>>each containing what is different on a tuple by tuple basis between the
> >>>>>current state and a proposed state (any one of the consistent states). A
> >>>>>transition can be defined as a set of triples (r, t, t') where r is the
> >>>>>name
> >>>>>of a relation, t is a tuple from the current state, and t' is a tuple
> >>>>>from
> >>>>>the proposed state.
> >>>>
> >>>><argggghhhh/>.Sorry brian, but this still isn't right. It is illogical
> >>>>to talk about the transition of a tuple from one value to another, as
> >>>>though they were entities from the real world themselves. Look, say
> >>>>mathematically you are talking about a relation composed of three
> >>>>tuples:
> >>>>
> >>>>R := {x, y, z}
> >>>>
> >>>>x, y and z are not variables! They are aliases for values. I can't
> >>>>compare x, y an z with their future selves - they only have one value,
> >>>>today, tomorrow, for evermore.
> >>>>
> >>>
> >>>I think you're confusing attributes with tuples. Even if you're not, I
> >>>agree: tuples are values. If they *can* correspond, it is in the mind of
> >>>the designer of the database who defined the transition constraint, and thus
> >>>the fact that they *do* correspond must be conveyed by the user during the
> >>>update.
> >>
> >>I am not confusing anything. If you agree tuples are not variables,
> >>then you agree that tuples cannot 'change'. And by that one is saying
> >>that they /cannot/ have a transition. That's the logic, and its
> >>unavoidable - how can you argue against it?
> >>
> >
> >
> > Because facts are about things, and things /do/ change. If you want to
> > limit the set of possible database states to include only facts that
> > reflect alterations that /can/ occur, then you need to know what each
> > thing looked like at the end of the last change and what it looks like
> > in each consistent database state.

>
>

> Selzer has the ball. He moves up court. He stops, he
> looks... Selzer shoots! He misses! Wide by a mile!
> What kind of drugs is he *using?*"
>

Let's see.... Aspirin...Tylenol.... I think I need some now. My ISP's news server is on the fritz, so part of the following may appear to be a repost....

Can a candidate key value identify something beyond the scope of a single database state? It appears to be the general concensus that it can: choose a stable key...surrogates are sometimes necessary.... The fact is, key stability is a workaround because a candidate key value CANNOT identify something beyond the scope of a single database state. The stability of a key is irrelevant, because within the scope that its values can be used, it can identify a fact. Moreover, what it identifies depends on the context of a query or update. Consider a line of people waiting to be served at the bank, (Bob, Brian, Lynn, Susan, Mark, Lisa). Many different queries can be answered, for example: "Who's first in line?" "How many people are in front of Susan?" "Who's in front of Lisa?" "Is Joe in line?" In a relational database, you can model a line as a relation schema, LINE {PERSON, POSITION} where both PERSON and POSITION are candidate keys. Since PERSON is a candidate key, the answer to "Is Joe in line?" can be answered. Since POSITION is a candidate key, the answer to "Who's first in line?" can be answered. Once Bob has been served, the line changes, (Brian, Lynn, Susan, Mark, Lisa). The queries above can still be answered, but in this state the facts are all different. In the original state, Bob is first in line; in this state Brian is first in line. The value for POSITION in one state does not identify the same thing in the other. For example, the value "1" for POSITION in the first state identifies the tuple (Bob, 1); in the other it identifies the tuple (Brian, 1). It is clear that (Bob, 1) and (Brian, 1) are not identical. Similarly, the value "Brian" for PERSON in the first state identifies the tuple (Brian, 2), whereas in the other it identifies (Brian, 1). It is clear that (Brian, 1) and (Brian, 2) are not identical. Therefore, a candidate key value can only logically identify something in single database state. Furthermore, any given tuple can represent different things depending on the perspective of the observer. (Brian, 2) could represent Brian's position in the line in one context, or the person who's second in line in another. So, the meaning of the answer to a query depends on the key used to identify a fact. Similarly, the meaning of an update depends also on the key used to identify a fact. For example, if I issue the update,

UPDATE LINE SET PERSON = 'Joe' WHERE PERSON = 'Mark'

What I'm saying is, "Hey, wait a minute! That's not Mark, it's Joe." On the other hand, if I issue the update,

UPDATE LINE SET PERSON = 'Joe' WHERE POSITION = 5

What I'm saying is, "Joe is fifth in line, not whoever was there before."

The point is, by definition, a candidate key value can only be used to identify a tuple in a single database state. Using one beyond that scope implies that it has already been assigned meaning--that it identifies more than just a tuple. Meaning is something that should be left up to the designer of a database and it's users, not the data model. In order to separate meaning from the model, erroneous assumptions, such as, "A candidate key value must identify the same thing throughout an update." must be discarded. It *can* identify the same thing, if the designer so chooses, but that doesn't mean that it always does.

Because a state-transition involves more than one database state, a transition constraint is *always* concerned with what values mean, not just how relation values can change from one database state to another.  That's the rub: how do you divorce meaning from the model and still enforce transition constraints? At the same time, the model must provide a mechanism for defining transition constraints that is also not based on any erroneous assumptions, such as, "if a key value changes, then you're talking about different things." At a minimum, there must be a way to correlate the facts in one database state with those in the next. Facts are values and cannot change, but what they're about can, so in order to define a transition constraint, you must be sure that you're looking at facts from both states that are about the same thing. There are two ways to accomplish this.

A change marks a temporal boundary between two successive situations in one universe. Determining if a thing in the preceding situation is the same thing in the succeeding situation requires either knowledge of how a change can affect something or the ability to observe something as it changes. You must either know that the change will not affect at least one set of identifying properties of something, take note of them in the preceding situation and then compare them to the same set of identifying properties of something in the succeeding situation, or you must identify the thing in the preceding situation and then watch it as it passes through the temporal boundary into the succeeding situation. In other words, if you know that a set of identifying properties will not change, then given that set of identifying properties, you can identify the thing in both situations; or, if you can observe something as it changes, then once you've identified it, you know that it is the same thing in both situations. In the context of a database, a set of identifying properties that is not affected by the change may not be relevant to the discussion. If that is the case, then in order to enforce a transition constraint, you must either inject irrelevant information into the discussion, or you must be able to observe things throughout a transition, and then convey how that transition affected each thing individually.

As has been made clear in earlier posts, keys can change, so unless you introduce object identifiers or immutable surrogates, you cannot *always* enforce transition constraints based only on key values. (I percieve other problems with relations that have more than one candidate key, but I haven't drawn any conclusions.) My main objection is the injection of irrelevant information when it's not necessary. If a change can be observed, then it doesn't matter if keys change, you're still looking at the same thing. That's why I think that the update semantics of the model should be augmented to allow the user to specify how facts correlate if needed. This does not break the model, it just makes it possible to enforce transition constraints when keys can change. Received on Wed Aug 23 2006 - 10:49:15 CEST

Original text of this message