Re: Relation Schemata vs. Relation Variables

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sat, 26 Aug 2006 19:34:47 GMT
Message-ID: <rn1Ig.12260$1f6.6985_at_newssvr27.news.prodigy.net>


"David Cressey" <dcressey_at_verizon.net> wrote in message news:c8YHg.45$8Q6.5_at_trndny01...
>
> "Brian Selzer" <brian_at_selzer-software.com> wrote in message
> news:vYJHg.17542$gY6.10049_at_newssvr11.news.prodigy.com...
>> >> > A small digression: all this reminds me of the (in)famous SQL
>> > construct:
>> >> >
>> >> > UPDATE .... WHERE CURRENT of <cursor>
>> >>
>> >> That's probably because it hasn't clicked yet that a transition is a
>> >> *set*
>> >> of triples, where each element represents only a distinct component of
>> >> the
>> >> overall difference between the current database state and a /possible
>> >> state/. The above construct involves something that passes over a
> result
>> >> set in a particular sequence. That's a totally different thing
>> > altogether.
>> >>
>> >
>> > I'm well aware that you are talking about a set of transitions. Your
>> > professional history of dealing with bozos has conditioned you into a
>> > habit
>> > of condescension that is out of place in this newsgroup.
>> >
>>
>> I ASS-U-MEd that you were confusing triples with transitions because of
> the
>> digression below. I was wrong, and I apologize. I did not intend to be
>> condescending.
>>
>
> OK. No harm done. The subject of UPDATE ... WHERE CURRENT is worth a
> side
> discussion here, because
> it blurs the distinction between content based addressing and location
> based
> addressing. The CURRENT row of a result table is like "the can of
> Campbell's chicken noodle soup that I have in my left hand", although at
> a
> different level of abstraction.

I see your point. On the other hand, since the set of triples in a transition describes the entire difference between two successive database states, it is not exactly the same thing. The idea of location based addressing doesn't have any place in the conceptual or logical models. What I'm trying to point out is that how tuples correlate during an update is something that either the database designer must specify, or that the user must specify, or both. The database designer specifies it with a system-generated surrogate. The user specifies it in the WHERE clause. I believe that if the model provided the capability for the user to specify which tuples will be affected, that is, how a specific set of tuples were selected by the user for updating, then a variety of transition constraints could be defined that cannot when only using a surrogate. For example, assume that you were asked to develop a set of job descriptions for every person on the payroll, so you create the following table with two candidate keys:

ASSIGNED_DUTIES {EMPLOYEE#, DUTY, POSITION#} The candidate keys are {EMPLOYEE#, DUTY} and {POSITION#, DUTY}

After conducting interviews of all of the employees and their supervisors, you end up with a complete set of duties and a list of positions from which you can create job descriptions. Over time, the duties of an employee may change, which has the effect of altering the duties for the position that the employee is filling. In this way, if someone quits, you know exactly what the company needs to fill the position. Now, consider the following updates:

(1) UPDATE ASSIGNED_DUTIES

        SET DUTY = 'Record labor using barcode terminal'
        WHERE EMPLOYEE# = 45
            DUTY = 'Fill out labor ticket'

(2) UPDATE ASSIGNED_DUTIES
        SET DUTY = 'Record labor using barcode terminal'
        WHERE POSITION# = 22
            DUTY = 'Fill out labor ticket'

(3) UPDATE ASSIGNED_DUTIES
        SET DUTY = 'Record labor using barcode terminal'
        WHERE DUTY = 'Fill out labor ticket'

The semantics of these updates are different, even though they may affect the same row. In (1) you're changing one of the duties of an employee; in (2) you're changing one of the duties for a position, and in (3) you're making a blanket change. Similarly, consider the following updates:

(4) UPDATE ASSIGNED_DUTIES

        SET EMPLOYEE# = 99
        WHERE EMPLOYEE# = 45

(5) UPDATE ASSIGNED_DUTIES
        SET EMPLOYEE# = 99
        WHERE POSITION# = 22

From the perspective of (4), you're assigning the duties that employee #45 had to employee #99, whereas in (5) you're assigning the duties for position #22 to employee #99. It's a subtle difference, I know, but that difference can easily be captured if you supply the attributes used to select each tuple in a transition. So instead of a set of triples, you could have a set of quadruples,

(r, s, t, t')

where r is a relation name, s is a set of attribute names that were used to select a tuple, t is the tuple selected from the current database state, and t' is the corresponding tuple from the proposed state.

If this information that is already supplied by the user could be made available, you could define transition constraints based not only on what is different on a tuple-by-tuple basis, but also on the perspective of the user when they issued the update. In the above example, for (1) you could require documentation from the employee's supervisor; whereas for (2) you could require documentation from the HR manager instead. I realize that this is pushing the limit a bit, and I really haven't given it a whole lot of thought, but I think it could be useful. This information is provided when a user issues an update, so why not make it possible to take advantage of it?

>
> I should mention in passing that I am NOT one of those purists in this
> forum who consider SQL so fallen as to be not worth the effort. On a
> practical level I'll settle for SQL until I learn something better. And
> I'm
> not in a hurry to learn something better. Later in this post, I'm going
> to
> discuss triggers that are fired on update, in order to get a more concrete
> handle on the abstraction that you are describing as a "transition".
>
>
>
> And, sometimes, expressing things in a
>> > What may not have clicked with you is that the "set of transitions from
>> > one
>> > state to another" describes, nearly exactly, the progress of a
> processor
>> > executing a computer program. Abstracting out all of the intermediate
>> > states, where some, but not all of the transitions have been carried
> out,
>> > is the essence of atomicity of transactions.
>> >
>>
>> I'm not sure that I understand what you're trying to say. Yes, there may
> be
>> intermediate states within a transaction, and from a logical standpoint,
>> it's not necessary that we know what those are.... Did I somehow imply
>> by
>> saying "from one state to another" (I didn't think I said that...maybe it
>> was in another thread) that there would be a series of successive
>> transitions between the current state and each possible state? That was
>> definitely not my intended meaning. What I did intend was that the user
>> specifies one and only one transition, which when applied to the current
>> database state yields one and only one possible database state. Just as
> the
>> set of all state constraints defines a set of consistent database states,
>> each of which may become current; the set of all transition constraints
>> defines a set of possible transitions, each of which by itself describes
> the
>> difference between the current database state and a possible database
> state.
>>
>
> The above is also worth a discussion of its own. Essentially the
> relational
> model (or at least the part of the model I'm familiar with) models the
> state
> of the database between transactions, not the intermediate states of the
> database during a transaction.
> During a transaction there is code operating on the client side and also
> code that's operating on the server side. Describing database based
> constraints that operate in that environment can get awfully complex.
>

I absolutely agree. I personally think that modifications should be cached until the end of a transaction and then submitted all at once in a single operation (a transition) wherein all constraints could be checked and enforced en mass. A transaction is supposed to be a single unit of work, so to me it doesn't make any sense to do otherwise.

> In that context I'm going to refer you to the discussions about tutorial D
> that surface in this forum. I've never fooled around with D, so I lack
> the
> specific concrete knowledge to comment on those discussions. But here's
> what I've gleaned from following the discussions.
>
> In a language like SQL, A transaction is a series of actions, starting
> with an action that requires a transaction, and ending with a COMMIT or a
> ROLLBACK. (Actually, DEC Rdb/VMS had a specific "start transaction"
> action, but I don't think that's standard SQL.) Control is passed back
> and
> forth between the client and the server in the middle of all of this.
>
> In D, if I understand correctly from what I've read in this forum, A
> transaction can (and should?) be expressed as a single message from the
> client to the server. Thus the entire transaction is atomic, not only
> with
> regard to the effect on the database, but also with regard to the client.
> I
> can't be sure of this, but I expect that this would simplify transition
> analysis a great deal.
> A model such as your "set of triples" could be subjected to a rigorous
> analysis in D, in a way that baffles me with regard to SQL. Or so I
> imagine. I await eagerly the input of those who know both SQL and D.
>
>
>
>
>> >
>> >> My thinking is that a user must assert which combination of component
>> >> differences applies to a particular change, since there can be more
> than
>> > one
>> >> combination for a /possible state/ and not all of those combinations
> may
>> > be
>> >> allowed.
>> >>
>> >> >
>> >> > End digression.
>> >> >
>> >> > Anyway, how does
>> >> >
>> >> > (r, t, t')
>> >> >
>> >> > differ from the pair of transitions?
>> >> >
>> >> > (r, t,empty)
>> >> > (r,empty,t')
>> >> >
>> >>
>> >> In the transition {(r, t, t')}, t and t' both /concern/ the same
>> >> thing;
>> >> however, in the transition {(r, t, empty), (r, empty, t')}, t and t'
>> >> /concern/ different things.
>> >>
>> >
>> > How do you know?
>> >
>>
>> In the one transition both tuples appear in the same triple, whereas in
> the
>> other, they appear in different triples. So if they appear in the same
>> triple, then they concern the same thing. If they don't appear in the
> same
>> triple, then either they don't concern the same thing, or it wasn't
>> important that they do. In a closed world, if you're not told that facts
>> concern the same thing, then you cannot prove that they do, so they
>> don't.
>>
>> > PS: in the above, I intended to say "the pair of transitions",
>> > rather
>> > than "the transition composed of the pair of triples".
>> > If you treat
>> >
>> > (r, t, empty)
>> > (r, empty, t')
>> >
>> > as a single transition composed of two triples, you're going to be in
>> > deeper water than you already are.
>> >
>>
>> Is this not what happens with relational assignment? The old values are
>> deleted and the new values are inserted.
>
> It depends on what you mean by the above. I only know the internals of
> one
> product (DEC Rdb/VMS) well enough to comment on the above. With regard to
> Oracle RDBMS, all I can do is trust the documentation and the engineers
> and
> hope for the best. I forget wat Oracle documentation says about this.
>
>

I wasn't referring to any specific implementation. Conceptually, relational assignment replaces the current relation value with a new relation value. Without immutable surrogates and a means of preventing key updates, one cannot assume that tuples in the new value correspond to tuples in the old one, so at the conceptual level, every tuple in the new relation value represents a different fact--even if all of its attribute values are identical to one that had already existed. The reason for this is that the facts represented in the current relation value concern things that existed at the instant that that value became current, and what is reflected in those facts depends on the situation that existed at that same point in time. The facts represented in the proposed relation value, on the other hand, originate from a later situation. It's as if there's an imaginary timestamp or transaction sequence number attached to each tuple in the current state, and another in the proposed state, so "=" (logical identity) will always return false. Therefore, it's necessary for the database designer or the user to tell the system how tuples correlate.

>
> Let me say a word about triggers, here. In SQL, a trigger can be fired
> up
> by an update. In fact, a trigger can even be specified as being fired
> up
> once for the entire update, or once for each row that is about to be
> updated. In the latter case,
> the prodecure invoked by the trigger has available two values within the
> context of a single row, namely the old value and the new value.
>
> In this context, your concept that the triple (r, t, t') refer to the
> same
> "thing" has a concrete meaning that I can wrap my mind around: the same
> "thing" refers to the same table row. However, I have the uneasy
> feeling that by "the same thing" you mean "the same thing" in the world
> of
> the subject matter. (Oops, meybe I ASS-U-ME too much). I await your
> shedding some light on this before proceeding.
>

I probably just answered this question in the above paragraph, but I'll put it another way. Because database states are instantaneous snapshots of the universe, transitions *always* concern changes that occur in the universe, not just differences between relation values or tuples.

> anyway, back to SQL triggers. Note that at the end of the update, but
> before the commit, the client is going to receive control. The trigger
> alert the client by generating an error. So this kind of trigger operates
> not merely in the context of a single transaction, but in the context of
> a
> single action within that transaction. As such, it provides a poor
> concrete simile for the kind of analysis you are attempting with the sets
> of
> triples.
>

Indeed. A row trigger allows changes to each row to be checked, but not in context with the entire set of differences. A statement trigger exhibits the same problems as relational assignment: you can't correlate the rows if keys can change. A transition on the other hand, presents all of the information at once. It's a simple thing to transform a transition into three sets of relations, one for new information, one for changing information, and one for information that is becoming history. These can then be compared with the current state and possibly the proposed state by using normal relational operators in order to determine whether or not the entire transition should be allowed. In the context of a trigger, you could have a set of deleted values, a set of changed values and a set of inserted values. (Of course, such a trigger would be limited in scope to just a single table.)

> By an amazing coincidence, the "old" and "new" values in the context of a
> trigger during an update suffers from the exact same mystery as the
> UPDATE
> .... WHERE CURRENT suffers from.
>
>
>
Received on Sat Aug 26 2006 - 21:34:47 CEST

Original text of this message