Re: A real world example

From: J M Davitt <jdavitt_at_aeneas.net>
Date: Thu, 17 Aug 2006 03:22:45 GMT
Message-ID: <9iREg.76088$Eh1.13806_at_tornado.ohiordc.rr.com>


Brian Selzer wrote:

> "J M Davitt" <jdavitt_at_aeneas.net> wrote in message 
> news:qFPEg.76061$Eh1.21923_at_tornado.ohiordc.rr.com...
> 

>>Brian Selzer wrote:
>>
>>>"JOG" <jog_at_cs.nott.ac.uk> wrote in message
>>>news:1155685450.575606.117120_at_b28g2000cwb.googlegroups.com...
>>>
>>>
>>>>Brian Selzer wrote:
>>>>
>>>>
>>>>>"JOG" <jog_at_cs.nott.ac.uk> wrote in message
>>>>>news:1155662126.381260.226850_at_m73g2000cwd.googlegroups.com...
>>>>>
>>>>>
>>>>>>Brian Selzer wrote:
>>>>>>[big snips for clarity]
>>
>>[more snips for more clarity]
>>
>>
>>>Note: I categorically reject the notion that the application should issue
>>>key updates separately. If the system allows ad-hoc queries (and most
>>>do), then one could issue such a change, bypassing the application.
>>
>>I think the "bypassing the application" concern reveals
>>more of your perspective than everything else you've
>>written so far. The fact that you somehow see tension
>>between application requirements and data requirements
>>and seem to regard the application as more important
>>indicates shortsightedness.
>>
>>(The terms are mine and I apologize if my use of the
>>terms is a mischaracterization of Brian's thoughts.)
>>
> 
> 
> One argument that I've heard in the past is that key changes should be 
> segregated.  The only way to do that is in an application; hence the point. 
> As far as tension, it's there, but not in the way you've characterized.  I 
> think that the data requirements are more important: if the information 
> can't be relied upon, then it doesn't matter how good an application is.

Tell me, then: what was the point you were trying to make when you used the phrase "bypassing the application?"

>>>>It is important to see that "Me now" is a completely different entity
>>>>to "Me over my whole lifetime". The temporal issue is irrelevant, all
>>>>that matters is to recognise they are just different entites. I know
>>>>this is initially seems an obtuse philosophical point, but it has
>>>>_real_ consequences for how to model those entities.
>>>
>>>I see the difference, but I can't see how you can shrug off the temporal
>>>issue. They must have a common property because they're related. I
>>>would argue that "Me now" is part of "Me over my whole lifetime."
>>
>>If some sort of temporal continuity is important, the
>>design must include features that meet the requirement --
>>but some naively suggested surrogate key isn't the answer...
>>

> 
> If you've read the rest of my argument, then I ask you: how can you possibly 
> use a temporal attribute to make the tuples in the database instance 
> preceding a change correspond to tuples in the database instance succeeding 
> a change?
> 

>>[snip]
>>
>>
>>>In an earlier post, I used the terms, "individual" and "universal" to
>>>describe these categories of properties. Individual properties define
>>>the essence of something and remain constant throughout its lifetime.
>>>Universal properties depend on the state of the universe during that
>>>lifetime. Universal properties are the only ones that can change.
>>
>>..even if you proclaim that your made-up value is an
>>"individual property."
>>
>>All this...
>>
>>
>>>>>>>Here's a simple example of what can happen:
>>>>>>>
>>>>>>>create table P
>>>>>>>(
>>>>>>>x int primary key,
>>>>>>>y int
>>>>>>>)
>>>>>>>create table F
>>>>>>>(
>>>>>>>x int primary key references P(x),
>>>>>>>z int
>>>>>>>)
>>>>>>>insert P (x, y) values (1, 1)
>>>>>>>insert P (x, y) values (2, 3)
>>>>>>>insert P (x, y) values (3, 2)
>>>>>>>insert F (x, z) values (1, 5)
>>>>>>>insert F (x, z) values (2, 4)
>>>>>>>select * from P join F on (p.x = F.x)
>>>>>>>update P
>>>>>>>set x = case x
>>>>>>> when 1 then 2
>>>>>>> when 2 then 1
>>>>>>> end
>>>>>>>where x in (1, 2)
>>>>>>>select * from P join F on (p.x = f.x)
>>>>>>>
>>>>>>>RESULTS:
>>>>>>>
>>>>>>>Before:
>>>>>>>x y x z
>>>>>>>------ -------- ------- --------
>>>>>>>1 1 1 5
>>>>>>>2 3 2 4
>>>>>>>
>>>>>>>After:
>>>>>>>x y x z
>>>>>>>------ ------- -------- --------
>>>>>>>1 3 1 5
>>>>>>>2 1 2 4
>>>>>>>
>>>>>>>
>>>>>>>Should the new values for x in P have cascaded into F? Assume that
>>>>>>>the
>>>>>>>system only has the before image and the after image (the first two
>>>>>>>columns
>>>>>>>in RESULTS) in order to complete the update. How can such a system
>>>>>>>differentiate between the above update and the following update?
>>>>>>>
>>>>>>>update P
>>>>>>>set y = case y
>>>>>>> when 1 then 3
>>>>>>> when 3 then 1
>>>>>>> end
>>>>>>>where x in (1, 2)
>>>>>>>
>>>>>>>Clearly first update affects a key, and consequently, the changes
>>>>>>>should
>>>>>>>cascade, but with the information available (both updates produce the
>>>>>>>exact
>>>>>>>same before and after images), the system cannot differentiate between
>>>>>>>the
>>>>>>>two updates; therefore, it cannot determine whether or not to cascade
>>>>>>>the
>>>>>>>changes. Updates within the Relational Model are are constrained in
>>>>>>>the
>>>>>>>same way as this hypothetical system. All that is available is the
>>>>>>>preceding instance and the succeeding instance, and if the only key
>>>>>>>can
>>>>>>>change, then there is no way to correlate tuples.
>>
>>..doesn't describe a general case. Without specific
>>requirements, one can't say "Clearly...the changes
>>should cascade." Yes, "[u]dates within the Relational
>>Model" occur as you depicted -- but whether changes to
>>any value is appropriate or not has nothing to do with
>>relational theory. To me, it seems that you're trying
>>to invent requirements for the model because you can't
>>see how to fulfill them in your design.
> 
> 
> That's not the case at all.  I see exactly how to fulfill them.  The point 
> is: I shouldn't have to.  I should be able to define declarative constraints 
> to ensure the integrity of the data. That means not only within each 
> database instance, but also between two successive database instances 
> because the model allows a database to be updated and describes how it 
> happens, that is, updates are set-based.  Now, if an update involves more 
> than one tuple, and that update depends on the current database instance, 
> then there should be a way to correleate the tuples in the current database 
> instance with those in the proposed instance.  I've shown that candidate 
> keys as defined are not sufficient.  Can you think of another way?  I 
> shouldn't be able to compromise integrity with an ad-hoc update. 

Brian, it's not that key constraints are insufficient, it's that the key constraints you declared in the example are insufficient. As I've been saying all along: the design is lacking.

I first dealt with questions like this almost 20 years ago. The requirement was that "the system should know" that what is, today, for example, known as "American Airlines Flight 3" was previously known as "Pan American Flight 3" and before that "TransWold Airlines Flight 1."

Surrogates - i.e., system generated "uniquifiers" - were part of that solution -- but I know better, now. And I grant you: modern SQL products are hard pressed to meet the need. But, if you think about it, there's no reason to expect that changing a value constrained as a key value should be as simple as an SQL UPDATE and there's no reason to expect the underlying model to provide convenient support for something that would make some developers' lives easier. Thinking like that is part of the reason that commercial DBMSs are such a mess. Received on Thu Aug 17 2006 - 05:22:45 CEST

Original text of this message