Re: A real world example

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sun, 20 Aug 2006 03:12:27 GMT
Message-ID: <vqQFg.9779$1f6.2753_at_newssvr27.news.prodigy.net>


"JOG" <jog_at_cs.nott.ac.uk> wrote in message news:1156039383.926159.159950_at_m73g2000cwd.googlegroups.com...
> Brian Selzer wrote:
>> It's possible for something to have its appearance altered without
>> altering its
>> essence
>
> Pish! There is no such thing as an 'essence'. See 'King Milindi's
> chariot' for a good explication.
>
>> and it's also possible for something to be identified by a property
>> that can change.
>
> Nope. Not if you want to compare it over time (which is what you are
> talking about). Then it's not an identifying property at all. And that
> sort of consideration should be occuring way before the RM is applied.
>

So, what you're saying is, "Never use natural keys." Right?

>> For example, consider a line of people at the bank. Both
>> Person and Position are identifying properties. Assume that you're third
>> in
>> line, so Person is you, and Position is 3. When the guy at the head of
>> the
>> line leaves, your Position changes to 2. Now let's put that in the
>> context
>> of a database. You have a relation with candidate keys Person and
>> Position.
>> So the current instance might look something like
>>
>> {(Bob, 1), (Brian, 2), (You, 3)}.
>>
>> The proposed instance would look something like
>>
>> {(Brian, 1), (You, 2)}
>>
>> Even though Position is a candidate key in each situation and indirectly
>> identifies
>
> 'Indirect identity'? There is no such distinction to be made.
>

I didn't say indirect identity, I said indirectly identifies. The candidate key value identifies a fact which in turn identifies a thing. There is indeed a distinction.

>> an entry in the queue, the value 2 from the current instance
>> identifies the tuple containing You in the proposed instance, not the one
>> containing Brian. This illustrates the difference in the frame of
>> reference
>> for a candidate key and that for an update, and Position is an example of
>> an
>> identifying property that can change.
>
> Why not extend this? Perhaps brian changed his name to bob while he was
> waiting, and queueing positions are changed from numerical to
> alphabetical by the bank
>
> rv1: { (Brian, 2) }
> rv2: { (Bob, A) }
>
> ...and you want to automagically correlate these things? Rather than
> think in hindsight maybe the identifiers chosen for the entities
> concerned might have been a wee bit of a mistake? Does that not strike
> you as making more sense?
>

If a key can change, it will. It doesn't matter how stable it is. Choosing a stable key only reduces the probability that a change will occur or reduces the frequency of the changes. It does not eliminate the possibility. Your example above supports my argument. Imagine a very large database that is updated tens or hundreds of thousands of times a day. Now assume that the probability of a change ocurring is .01%. This means that at least once a day there's a possibility of corrupting the database. The point, even if you can't see it, is that it is not a matter of choosing a more stable key. No update should *ever* be able to violate or circumvent the database predicate. If an update *can* violate integrity rules, then either the data model is broken or the implementation is broken. If the definition of the model cannot prevent it, then the model is broken.

>>
>> >>> >> > Maybe, but from a functional standpoint, that operator is just a
>> >>> >> > function (e.g. "subtract $500 from X), in which the balance is a
>> >>> >> > free
>> >>> >> > variable. Only in an imperative world does that involve
>> >>> >> > "knowing"
>> >>> >> > (referencing) the "previous" balance. Function application means
>> >>> >> > there's no "query" of the value prior to the update.
>> >>> >>
>> >>> >> Not necessarily. For example, consider a sales order that can
>> >>> >> have
>> >>> >> several
>> >>> >> states, proposed, open, firm, shipped, received, closed,
>> >>> >> cancelled.
>> >>> >> Assume
>> >>> >> that the order stated is the normal set of state changes for the
>> >>> >> order.
>> >>> >> Now
>> >>> >> consider that an order that cannot become proposed once it is
>> >>> >> firm,
>> >>> >> it
>> >>> >> cannot become received unless it has been shipped. It cannot
>> >>> >> become
>> >>> >> closed
>> >>> >> unless it has been received. Unless you define special operators
>> >>> >> to
>> >>> >> deal
>> >>> >> with the states, you need to know what the old value was in order
>> >>> >> to
>> >>> >> maintain the consistency of the database throughout the update.
>> >>> >
>> >>> > Maybe the discrepancy hinges on the phrase "you need to know." I'd
>> >>> > argue that no query is needed, merely constraints.
>> >>>
>> >>> What type of constraints? I don't understand how you could define a
>> >>> constraint. Could you please show me?
>> >>
>> >> Sorry, my comment didn't directly address the issue, so I'll rephrase:
>> >> given that an application can produce several different state
>> >> transitions in a row, I'm not sure what value these state-transition
>> >> constraints would have. For example, an application can issue two
>> >> updates in a row: set order status to open, then immediately set to
>> >> firm. Such state transition constraints seem to have no meaning at all
>> >> if they don't reference other relations (e.g. you can't set the status
>> >> to received if there's no corresponding fact regarding the receipt
>> >> date/time), aren't they just "static" relation/database constraints,
>> >> which can be enforced for every update?
>> >>
>>
>> I see updates as database updates, so in order for the proposed instance
>> to
>> become current, all state constraints along with all transition
>> constraints
>> must be satisfied. If there are other relations involved, then the
>> update
>> must include any changes to them as well. Because dependencies between
>> relations can be mutual, for example, circular inclusion dependencies, I
>> think it's important to think in terms of "database assignment" instead
>> of
>> "relational assignment" when defining constraints. (I think Date calls
>> it
>> "multiple assignment," and for partitions of related relation variables,
>> it
>> often makes sense to think in terms of what will affect the smallest
>> number
>> of relations required, but I prefer to keep it simple.)
>>
>> >> I have to give this much more thought, but is this a case where
>> >> syntactic sugar for state-transition constraints could simply
>> >> transparently introduce "surrogate relations" to implement the state
>> >> transition constraints as "static" database constraints?
>> >>
>>
>> I don't think so. I think that a transition constraint must always
>> involve
>> two database instances, the current instance and the proposed instance.
>> If
>> you only rely on the proposed instance, then you're relying on the user
>> to
>> reassert the old values along with the new values. If they fail to do
>> so,
>> then they can circumvent the constraint. From another perspective, the
>> number of all possible database instances that satisfy both the state
>> constraints and the transition constraints can be smaller than the number
>> of
>> all possible database instances that satisfy the state constraints, even
>> if
>> you include "surrogate relations." In this case, possible instances that
>> shouldn't be possible include those that do not include a tuple
>> containing
>> the old value. The "surrogate relations" solution shifts responsibility
>> for
>> enforcing the transition constraint onto the user.
>>
>> >>> >> >> More
>> >>> >> >> importantly, the bank must be able to identify the account that
>> >>> >> >> is
>> >>> >> >> about
>> >>> >> >> to
>> >>> >> >> change, and that identity must remain constant in both the
>> >>> >> >> preceding
>> >>> >> >> and
>> >>> >> >> succeeding database instances.
>> >>> >> >
>> >>> >> > Why? As long as it can be identified via some query, what
>> >>> >> > difference
>> >>> >> > does it make? For example, if I make a database schema change
>> >>> >> > and
>> >>> >> > introduce a new key, with appropriate view changes to support
>> >>> >> > old
>> >>> >> > application code, is there some logical distinction? If the
>> >>> >> > external
>> >>> >> > queries all still produce the same results, excepting the
>> >>> >> > specific
>> >>> >> > values being updated, what does "identity" have to do with it?
>> >>> >>
>> >>> >> Because changes are set-based, and if the identity of the account
>> >>> >> can
>> >>> >> change, then it's possible to update the wrong row, or to allow a
>> >>> >> charge
>> >>> >> to
>> >>> >> clear that shouldn't be allowed.
>> >>> >
>> >>> > There is no "wrong row," only a set of propositions. The same
>> >>> > possibility for human error would seem to be present in any update:
>> >>> > that you might issue an update without knowing about a change made
>> >>> > between the time you last loaded the page, and the time you pressed
>> >>> > Save, and therefore could violate a constraint which you wouldn't
>> >>> > violate if only the database were in the state you think it is
>> >>> > (based
>> >>> > on what's on the screen). This issue seems to be a particular
>> >>> > variant.
>> >>> >
>> >>> >> > There is no "thing." These are propositions, or assertions if
>> >>> >> > you
>> >>> >> > like,
>> >>> >> > nothing more. The only meaning is in the correlation of queries
>> >>> >> > to
>> >>> >> > external phenonema of interest.
>> >>> >>
>> >>> >> What are the propositions or assertions about? If they're about
>> >>> >> values
>> >>> >> then
>> >>> >> they're just hot air. A database contains knowledge. Knowledge
>> >>> >> about
>> >>> >> what?
>> >>> >> Scalar values? I don't think so.
>> >>> >
>> >>> > They're about what is in our heads - the application (business)
>> >>> > domain.
>> >>> > The database doesn't care about that; it's in crafting predicates
>> >>> > and
>> >>> > constraints that we tell the database as much as it needs to (or
>> >>> > can)
>> >>> > "know."
>> >>> >
>> >>> >> The relational model doesn't have a correct theoretical mechanism
>> >>> >> to
>> >>> >> correlate tuples during updates. The scope of a key value's
>> >>> >> ability
>> >>> >> to
>> >>> >> identify a tuple is a single relation value from a single database
>> >>> >> instance.
>> >>> >> I think that the model is incomplete without such a mechanism,
>> >>> >> because
>> >>> >> there
>> >>> >> are some constraints that cannot be enforced, and certain update
>> >>> >> anomalies
>> >>> >> can occur, as I've provided examples of in other posts.
>> >>> >
>> >>> > Since we're not talking about a machine that "really knows" the
>> >>> > real
>> >>> > world, I don't understand what sort of mechanism you have in mind -
>> >>> > what is an example of a "correct theoretical mechanism"? The
>> >>> > relational
>> >>> > model already allows surrogate keys.
>> >>>
>> >>> But it does not require them. Nor does it define mutability
>> >>> constraints
>> >>> in
>> >>> conjunction with entity integrity. Nor does it define a tuple-level
>> >>> assignment operator.
>> >>
>> >> Aren't tuple-level assignment operators unnecessary if you have the
>> >> surrogate keys you seek?
>> >>
>> >> In any event, I'm not sure that mandatory surrogate keys solve more
>> >> problems than they create. The ability to change the non-surrogate
>> >> keys
>> >> arbitrarily seems to indicate that The anything, given that they allow
>> >> arbitrary combinations of the various keys. In other words, by
>> >> definition the surrogate key is unrelated to anything else, and as
>> >> such
>> >> it seems the keys can be shuffled at will. In the case of a relation
>> >> where there is only a single surrogate key (e.g. the tuples represent
>> >> facts that are nearly indistinguishable, like events in a trace), it
>> >> doesn't matter.
>> >>
>> >>> I think that the definition of the model should be
>> >>> strong enough so that I can't break it.
>> >>
>> >> I think you really expect way too much from models, and I'm not sure
>> >> this is a fracture any worse than the cure would produce. I think
>> >> these
>> >> are fundamental identity problems, not just ones particular to the
>> >> relational model.
>> >>
>> >> At the risk of being accused of waving my hands, I'll quote from one
>> >> of
>> >> Bill Kent's papers, The Unsolvable Identity Problem
>> >> (http://www.idealliance.org/papers/extreme/proceedings/html/2003/Kent01/EML2003Kent01.html):
>> >>
>> >> "Why is the identity problem unsolvable? To begin with, as just shown,
>> >> we don't agree on what the identity problem is."
>> >>
>> >> "A general unified theory of identity is elusive. It probably doesn't
>> >> exist. The main reasons:
>> >> * The problem is not well defined.
>> >> * There are theoretical and practical limitations to what can be
>> >> achieved.
>> >> * There are too many semantic issues.
>> >> * There are too many domains. We can't achieve a consistent
>> >> solution across all of them.
>> >>
>> >> But this quest for the Holy Grail is educational."
>> >>
>> >> "So what do we do? Cope, as we always do. If there is no ideal
>> >> solution, we develop solutions that are good enough. The trouble is
>> >> that what's good enough for you today isn't good enough for me
>> >> tomorrow. We are forever doomed to compromise, extend, patch and
>> >> rework
>> >> to make our good enough solutions a little better. We'll never get it
>> >> right. That's life.
>> >>
>> >> Human beings manage to cope somehow with imperfect identification
>> >> schemes. Our computer systems might do no better than that."
>> >>
>> >> - erk
>> >>
>> >
>> >
>
Received on Sun Aug 20 2006 - 05:12:27 CEST

Original text of this message