Re: A real world example

From: JOG <jog_at_cs.nott.ac.uk>
Date: 15 Aug 2006 16:44:10 -0700
Message-ID: <1155685450.575606.117120_at_b28g2000cwb.googlegroups.com>


Brian Selzer wrote:
> "JOG" <jog_at_cs.nott.ac.uk> wrote in message
> news:1155662126.381260.226850_at_m73g2000cwd.googlegroups.com...
> > Brian Selzer wrote:
> > [big snips for clarity]
> >> I agree that it is a violation of the information principle. I also
> >> agree
> >> that if there's no other way to distinguish things, that is, if there is
> >> no
> >> relevant natural key, such as for distinguishing individual quarters in a
> >> new roll, then they should definitely be exposed to end-users.
> >
> > fair enough. We are on the same page there then.
> >
> >> I don't agree that the lack of real world identification is the only
> >> reason to use
> >> surrogates;
> >
> > It is the only logical reason though. Anything else is a kludge.
> >
> >> I contend that the fact that that real world identification is
> >> not *guaranteed* to remain constant precipitates that need. If there IS
> >> a
> >> natural key, then as a best practice, the surrogate should not appear on
> >> any
> >> data entry screen or report. In that case, the surrogate is only needed
> >> to
> >> work around the limitations of the Relational Model, and since the users
> >> have another means of identify something, they don't have a need to know.
> >
> > This still makes no sense to me. In the real world if you have a value
> > that is insufficient to identify me over time, then it will be no good
> > for the database either.
> >
>
> I think that every natural key falls into this category, because they CAN
> change. It may not happen often. It may not happen at all, but the fact
> that it CAN happen is critical, because it shifts the responsibility for
> maintaining integrity during updates onto either implementational extensions
> or application programs. I think that that violates data independence,
> because it ties the database to a particular implementation or application.
>
> > For instance if I change my name, then _in the real world_ this is of
> > no use as identifying me when I was younger. It is an identifier for me
> > now, but not at all for me as an entity that stretches over my
> > lifetime. Same for my height, hair colour, age - they all change and
> > hence do NOT identify me as a temporal concept. And thats in the real
> > world, nothing to do with databases. Do you agree that?
> >
>
> Sure, I agree.
>
> > Now, there are things that will identify me over my lifetime (in this
> > case the universe of discourse is objects over a period of time), that
> > will remain consistent. THESE are the correct identifying keys to
> > choose if the lifetime of a changing entity is what one is concerned
> > with. I do not see how one can argue against this.
> >
> > Now sure, a surrogate can represent these perhaps unrecordable
> > identifiers. But if I want to refer to myself at any point in time, I
> > _need to know_ that surrogate (or something very intelligent in the
> > middle has to deduce it from me giving it lots of other information).
> > Either way the relational model must not hide it.
> >
>
> The Relational Model has no temporal component, with the exception of
> change. The Relational Model is value-based, meaning that any change
> involves replacing an existing database instance with a new database
> instance, or selecting a different database instance in favor of another.
> Only two database instances are important: those bounded by the change. And
> again, unless there's a way to correlate the propositions in one database
> instance to those in the next, you can't determine whether or not you're
> talking about the same thing in both database instances. This requires the
> use of implementation extensions or code in an application program to ensure
> that the succeeding database instance is consistent.

No, if you have chosen the correct key, none of that is necessary. Now, I reckon your probably saying that "any key can change", and you are right, and I'll address this further on.

>
> > Hence I contend to you Brian that this is nothing to do with the
> > relational model. Rather it is a mistake in the *universe of discourse*
> > that the schema designer has made. They mistakenly believed they were
> > identifying 'snapshot entities'. In reality they needed to identify 'an
> > entity over its lifetime'. The domain they were considering was
> > completely wrong, and hence the key they chose was completely
> > incorrect.
> >
>
> Sure it is. The definition of a key is not strong enough to identify
> something in a universe that can change. A key value does not directly
> identify something: it identifies a statement about something in a single
> relation value from a single database instance. Once you need to identify
> something in more than one database instance, such as during a change, the
> model falls apart and implementational extensions or code in application
> programs must pick up the slack.

Well for my standpoint, this I think highlights the mistake in your argument. First the definition of a key is a red herring because we are all agreed on what a key is. Rather it is picking an _appropriate_ key for the task at hand that is vital.

It is important to see that "Me now" is a completely different entity to "Me over my whole lifetime". The temporal issue is irrelevant, all that matters is to recognise they are just different entites. I know this is initially seems an obtuse philosophical point, but it has _real_ consequences for how to model those entities.

Something must remain constant to compare entities - something must identify them. If nothing remains constant the things being compared, by liebniz equality, are different things full stop. This is what mathematical logic is grounded in, we can't just avoid it. You seem to be saying it is possible that "every attribute of something has changed, yet it is still the same thing". Surely that's logical nonsense!

>
> > This is exactly the same as the basic philosophical question of change.
> > Are you the same person you were when you were 10? In one context the
> > answer is absolutely not, so an appropriate identifier is not DNA. In
> > another yes you are the same person, so DNA is perfect. It depends on
> > the domain you are considering, and one must pick the correct key
> > accordingly.

I worry you have glazed over this, but it highlights how the context information is more complex than we initially think, and we must understand that context to make the right key choices (especially if you want to make inter-relation value comparisons).

My DNA don't change over my lifetime, so its a good key for that conceptual entity (n.b. the representative label to record the DNA might change, but the value is the same.)

> >
> >
> >>
> >>
> >> Here's a simple example of what can happen:
> >>
> >> create table P
> >> (
> >> x int primary key,
> >> y int
> >> )
> >> create table F
> >> (
> >> x int primary key references P(x),
> >> z int
> >> )
> >> insert P (x, y) values (1, 1)
> >> insert P (x, y) values (2, 3)
> >> insert P (x, y) values (3, 2)
> >> insert F (x, z) values (1, 5)
> >> insert F (x, z) values (2, 4)
> >> select * from P join F on (p.x = F.x)
> >> update P
> >> set x = case x
> >> when 1 then 2
> >> when 2 then 1
> >> end
> >> where x in (1, 2)
> >> select * from P join F on (p.x = f.x)
> >>
> >> RESULTS:
> >>
> >> Before:
> >> x y x z
> >> ------ -------- ------- --------
> >> 1 1 1 5
> >> 2 3 2 4
> >>
> >> After:
> >> x y x z
> >> ------ ------- -------- --------
> >> 1 3 1 5
> >> 2 1 2 4
> >>
> >>
> >> Should the new values for x in P have cascaded into F? Assume that the
> >> system only has the before image and the after image (the first two
> >> columns
> >> in RESULTS) in order to complete the update. How can such a system
> >> differentiate between the above update and the following update?
> >>
> >> update P
> >> set y = case y
> >> when 1 then 3
> >> when 3 then 1
> >> end
> >> where x in (1, 2)
> >>
> >> Clearly first update affects a key, and consequently, the changes should
> >> cascade, but with the information available (both updates produce the
> >> exact
> >> same before and after images), the system cannot differentiate between
> >> the
> >> two updates; therefore, it cannot determine whether or not to cascade the
> >> changes. Updates within the Relational Model are are constrained in the
> >> same way as this hypothetical system. All that is available is the
> >> preceding instance and the succeeding instance, and if the only key can
> >> change, then there is no way to correlate tuples.
> >
> > I'll consider the example in due course. RL work to do in the meantime,
> > sigh. All best, Jim.
> >
>
> Another example is trying to write an update trigger in Sql Server to log
> changes to one table in another. The deleted and inserted pseudotables can
> be thought of as the preceding and succeeding database instances
> respectively. Unless you constrain updates to a single row, or reject
> updates that affect both key and non-key columns, or use a surrogate, you
> can't tell what changed to a particular row. In fact, it's possible for it
> to appear that nothing at all has changed, even though new key values have
> been propogated into referencing tables. (Sql Server does not faithfully
> implement the relational model. It can tell the difference between the two
> updates above, even though within a trigger on P, the deleted and inserted
> pseudotables would appear identical.)
Received on Wed Aug 16 2006 - 01:44:10 CEST

Original text of this message