Re: A real world example

From: JOG <jog_at_cs.nott.ac.uk>
Date: 19 Aug 2006 19:03:03 -0700
Message-ID: <1156039383.926159.159950_at_m73g2000cwd.googlegroups.com>


Brian Selzer wrote:
> It's possible for something to have its appearance altered without altering its
> essence

Pish! There is no such thing as an 'essence'. See 'King Milindi's chariot' for a good explication.

> and it's also possible for something to be identified by a property
> that can change.

Nope. Not if you want to compare it over time (which is what you are talking about). Then it's not an identifying property at all. And that sort of consideration should be occuring way before the RM is applied.

> For example, consider a line of people at the bank. Both
> Person and Position are identifying properties. Assume that you're third in
> line, so Person is you, and Position is 3. When the guy at the head of the
> line leaves, your Position changes to 2. Now let's put that in the context
> of a database. You have a relation with candidate keys Person and Position.
> So the current instance might look something like
>
> {(Bob, 1), (Brian, 2), (You, 3)}.
>
> The proposed instance would look something like
>
> {(Brian, 1), (You, 2)}
>
> Even though Position is a candidate key in each situation and indirectly
> identifies

'Indirect identity'? There is no such distinction to be made.

> an entry in the queue, the value 2 from the current instance
> identifies the tuple containing You in the proposed instance, not the one
> containing Brian. This illustrates the difference in the frame of reference
> for a candidate key and that for an update, and Position is an example of an
> identifying property that can change.

Why not extend this? Perhaps brian changed his name to bob while he was waiting, and queueing positions are changed from numerical to alphabetical by the bank

rv1: { (Brian, 2) }
rv2: { (Bob, A) }

...and you want to automagically correlate these things? Rather than think in hindsight maybe the identifiers chosen for the entities concerned might have been a wee bit of a mistake? Does that not strike you as making more sense?

>
> >>> >> > Maybe, but from a functional standpoint, that operator is just a
> >>> >> > function (e.g. "subtract $500 from X), in which the balance is a
> >>> >> > free
> >>> >> > variable. Only in an imperative world does that involve "knowing"
> >>> >> > (referencing) the "previous" balance. Function application means
> >>> >> > there's no "query" of the value prior to the update.
> >>> >>
> >>> >> Not necessarily. For example, consider a sales order that can have
> >>> >> several
> >>> >> states, proposed, open, firm, shipped, received, closed, cancelled.
> >>> >> Assume
> >>> >> that the order stated is the normal set of state changes for the
> >>> >> order.
> >>> >> Now
> >>> >> consider that an order that cannot become proposed once it is firm,
> >>> >> it
> >>> >> cannot become received unless it has been shipped. It cannot become
> >>> >> closed
> >>> >> unless it has been received. Unless you define special operators to
> >>> >> deal
> >>> >> with the states, you need to know what the old value was in order to
> >>> >> maintain the consistency of the database throughout the update.
> >>> >
> >>> > Maybe the discrepancy hinges on the phrase "you need to know." I'd
> >>> > argue that no query is needed, merely constraints.
> >>>
> >>> What type of constraints? I don't understand how you could define a
> >>> constraint. Could you please show me?
> >>
> >> Sorry, my comment didn't directly address the issue, so I'll rephrase:
> >> given that an application can produce several different state
> >> transitions in a row, I'm not sure what value these state-transition
> >> constraints would have. For example, an application can issue two
> >> updates in a row: set order status to open, then immediately set to
> >> firm. Such state transition constraints seem to have no meaning at all
> >> if they don't reference other relations (e.g. you can't set the status
> >> to received if there's no corresponding fact regarding the receipt
> >> date/time), aren't they just "static" relation/database constraints,
> >> which can be enforced for every update?
> >>
>
> I see updates as database updates, so in order for the proposed instance to
> become current, all state constraints along with all transition constraints
> must be satisfied. If there are other relations involved, then the update
> must include any changes to them as well. Because dependencies between
> relations can be mutual, for example, circular inclusion dependencies, I
> think it's important to think in terms of "database assignment" instead of
> "relational assignment" when defining constraints. (I think Date calls it
> "multiple assignment," and for partitions of related relation variables, it
> often makes sense to think in terms of what will affect the smallest number
> of relations required, but I prefer to keep it simple.)
>
> >> I have to give this much more thought, but is this a case where
> >> syntactic sugar for state-transition constraints could simply
> >> transparently introduce "surrogate relations" to implement the state
> >> transition constraints as "static" database constraints?
> >>
>
> I don't think so. I think that a transition constraint must always involve
> two database instances, the current instance and the proposed instance. If
> you only rely on the proposed instance, then you're relying on the user to
> reassert the old values along with the new values. If they fail to do so,
> then they can circumvent the constraint. From another perspective, the
> number of all possible database instances that satisfy both the state
> constraints and the transition constraints can be smaller than the number of
> all possible database instances that satisfy the state constraints, even if
> you include "surrogate relations." In this case, possible instances that
> shouldn't be possible include those that do not include a tuple containing
> the old value. The "surrogate relations" solution shifts responsibility for
> enforcing the transition constraint onto the user.
>
> >>> >> >> More
> >>> >> >> importantly, the bank must be able to identify the account that is
> >>> >> >> about
> >>> >> >> to
> >>> >> >> change, and that identity must remain constant in both the
> >>> >> >> preceding
> >>> >> >> and
> >>> >> >> succeeding database instances.
> >>> >> >
> >>> >> > Why? As long as it can be identified via some query, what
> >>> >> > difference
> >>> >> > does it make? For example, if I make a database schema change and
> >>> >> > introduce a new key, with appropriate view changes to support old
> >>> >> > application code, is there some logical distinction? If the
> >>> >> > external
> >>> >> > queries all still produce the same results, excepting the specific
> >>> >> > values being updated, what does "identity" have to do with it?
> >>> >>
> >>> >> Because changes are set-based, and if the identity of the account can
> >>> >> change, then it's possible to update the wrong row, or to allow a
> >>> >> charge
> >>> >> to
> >>> >> clear that shouldn't be allowed.
> >>> >
> >>> > There is no "wrong row," only a set of propositions. The same
> >>> > possibility for human error would seem to be present in any update:
> >>> > that you might issue an update without knowing about a change made
> >>> > between the time you last loaded the page, and the time you pressed
> >>> > Save, and therefore could violate a constraint which you wouldn't
> >>> > violate if only the database were in the state you think it is (based
> >>> > on what's on the screen). This issue seems to be a particular variant.
> >>> >
> >>> >> > There is no "thing." These are propositions, or assertions if you
> >>> >> > like,
> >>> >> > nothing more. The only meaning is in the correlation of queries to
> >>> >> > external phenonema of interest.
> >>> >>
> >>> >> What are the propositions or assertions about? If they're about
> >>> >> values
> >>> >> then
> >>> >> they're just hot air. A database contains knowledge. Knowledge
> >>> >> about
> >>> >> what?
> >>> >> Scalar values? I don't think so.
> >>> >
> >>> > They're about what is in our heads - the application (business)
> >>> > domain.
> >>> > The database doesn't care about that; it's in crafting predicates and
> >>> > constraints that we tell the database as much as it needs to (or can)
> >>> > "know."
> >>> >
> >>> >> The relational model doesn't have a correct theoretical mechanism to
> >>> >> correlate tuples during updates. The scope of a key value's ability
> >>> >> to
> >>> >> identify a tuple is a single relation value from a single database
> >>> >> instance.
> >>> >> I think that the model is incomplete without such a mechanism,
> >>> >> because
> >>> >> there
> >>> >> are some constraints that cannot be enforced, and certain update
> >>> >> anomalies
> >>> >> can occur, as I've provided examples of in other posts.
> >>> >
> >>> > Since we're not talking about a machine that "really knows" the real
> >>> > world, I don't understand what sort of mechanism you have in mind -
> >>> > what is an example of a "correct theoretical mechanism"? The
> >>> > relational
> >>> > model already allows surrogate keys.
> >>>
> >>> But it does not require them. Nor does it define mutability constraints
> >>> in
> >>> conjunction with entity integrity. Nor does it define a tuple-level
> >>> assignment operator.
> >>
> >> Aren't tuple-level assignment operators unnecessary if you have the
> >> surrogate keys you seek?
> >>
> >> In any event, I'm not sure that mandatory surrogate keys solve more
> >> problems than they create. The ability to change the non-surrogate keys
> >> arbitrarily seems to indicate that The anything, given that they allow
> >> arbitrary combinations of the various keys. In other words, by
> >> definition the surrogate key is unrelated to anything else, and as such
> >> it seems the keys can be shuffled at will. In the case of a relation
> >> where there is only a single surrogate key (e.g. the tuples represent
> >> facts that are nearly indistinguishable, like events in a trace), it
> >> doesn't matter.
> >>
> >>> I think that the definition of the model should be
> >>> strong enough so that I can't break it.
> >>
> >> I think you really expect way too much from models, and I'm not sure
> >> this is a fracture any worse than the cure would produce. I think these
> >> are fundamental identity problems, not just ones particular to the
> >> relational model.
> >>
> >> At the risk of being accused of waving my hands, I'll quote from one of
> >> Bill Kent's papers, The Unsolvable Identity Problem
> >> (http://www.idealliance.org/papers/extreme/proceedings/html/2003/Kent01/EML2003Kent01.html):
> >>
> >> "Why is the identity problem unsolvable? To begin with, as just shown,
> >> we don't agree on what the identity problem is."
> >>
> >> "A general unified theory of identity is elusive. It probably doesn't
> >> exist. The main reasons:
> >> * The problem is not well defined.
> >> * There are theoretical and practical limitations to what can be
> >> achieved.
> >> * There are too many semantic issues.
> >> * There are too many domains. We can't achieve a consistent
> >> solution across all of them.
> >>
> >> But this quest for the Holy Grail is educational."
> >>
> >> "So what do we do? Cope, as we always do. If there is no ideal
> >> solution, we develop solutions that are good enough. The trouble is
> >> that what's good enough for you today isn't good enough for me
> >> tomorrow. We are forever doomed to compromise, extend, patch and rework
> >> to make our good enough solutions a little better. We'll never get it
> >> right. That's life.
> >>
> >> Human beings manage to cope somehow with imperfect identification
> >> schemes. Our computer systems might do no better than that."
> >>
> >> - erk
> >>
> >
> >
Received on Sun Aug 20 2006 - 04:03:03 CEST

Original text of this message