Re: A real world example

From: erk <eric.kaun_at_gmail.com>
Date: 18 Aug 2006 06:34:40 -0700
Message-ID: <1155908080.655605.22860_at_p79g2000cwp.googlegroups.com>


Brian Selzer wrote:
> "erk" <eric.kaun_at_gmail.com> wrote in message
> news:1155827594.752249.65890_at_h48g2000cwc.googlegroups.com...
> > Brian Selzer wrote:
> > [snip]
> If a constraint is defined in terms of successive states of a database, then
> facts cannot be thought of just in terms of instances of predicates.

Yes, that's all facts are, though constraints mean that your facts are consistent with one another, based on how you've defined the predicates of relevance to you (i.e. inter-relation constraints reflect requirements as well as "reality"). I'm speaking out of some degree of ignorance; no database I've ever worked with has had state-transition constraints like this. The only constraints have been what constitutes a valid relation (and database) value.

> Any such instance has identity only within a single relation value. Since the
> value of a candidate key determines the values of all other attributes, it
> can be used to identify a single tuple within a single relation value;
> therefore, it can be used in other relation values as a substitute for
> enumerating all of the attribute values of the referenced tuple in every
> referencing tuple within the same database state--but only within the same
> database state.

I don't think that's a valid interpretation. It's not a substitute for referencing all of the attribute values, nor, given only the key value, can I determine the values of those attributes. The key has a specific meaning in all the predicates in which it's referenced, as distinct from the other attributes - it's not a surrogate for them. Relations, even those with foreign key constraints, are standalone facts. Joins and constraints reflect references to the same types.

I think this difference is important, and am not just spitting hairs for the sake of it.

> Extending the scope of a candidate key's ability to
> identify instances of predicates from a single database state to successive
> database states would require that those instances be identical, not just
> the candidate key values.

So this ability to uniquely identify a fact across successive database states is important purely for state-transition constraints? Is there any other use for them?

> If the values determined by the candidate key
> value in the proposed state were different than those in the current state,
> then the instances would not have the same properties, and therefore, would
> not be the same.

Sameness is irrelevant; a fact is a fact, and if database value N+1 differs from database value N, it's because facts have changed, and we need the database to reflect reality more accurately.

> Instances are values; values do not change. Therefore,
> relaxing this restriction so that a candidate key value can identify
> instances in successive database states that are not necessarily identical,
> but have identical candidate key values can only be possible if the
> instances of a predicate represent things in the universe of discourse that
> can have their appearance altered without altering their identity, and what
> is identified by a candidate key value is not just a tuple, but something in
> the universe.

It's a fact, not a thing. I have to admit that I'm still quite leery of the notion that facts have "identity." I have no current counterproposal; I'm just trying to understand why this is necessary. A fact is a statement about things. One attribute of such a statement might be that a real-world thing has a real-world unique identifying value, and I understand the need to sometimes introduce surrogates. But given that a separate "identity attribute" for a fact can have no possible correlation with anything in the real world, including natural candidate keys, I smell potential problems.

> >> > Maybe, but from a functional standpoint, that operator is just a
> >> > function (e.g. "subtract $500 from X), in which the balance is a free
> >> > variable. Only in an imperative world does that involve "knowing"
> >> > (referencing) the "previous" balance. Function application means
> >> > there's no "query" of the value prior to the update.
> >>
> >> Not necessarily. For example, consider a sales order that can have
> >> several
> >> states, proposed, open, firm, shipped, received, closed, cancelled.
> >> Assume
> >> that the order stated is the normal set of state changes for the order.
> >> Now
> >> consider that an order that cannot become proposed once it is firm, it
> >> cannot become received unless it has been shipped. It cannot become
> >> closed
> >> unless it has been received. Unless you define special operators to deal
> >> with the states, you need to know what the old value was in order to
> >> maintain the consistency of the database throughout the update.
> >
> > Maybe the discrepancy hinges on the phrase "you need to know." I'd
> > argue that no query is needed, merely constraints.

>

> What type of constraints? I don't understand how you could define a
> constraint. Could you please show me?

Sorry, my comment didn't directly address the issue, so I'll rephrase: given that an application can produce several different state transitions in a row, I'm not sure what value these state-transition constraints would have. For example, an application can issue two updates in a row: set order status to open, then immediately set to firm. Such state transition constraints seem to have no meaning at all if they don't reference other relations (e.g. you can't set the status to received if there's no corresponding fact regarding the receipt date/time), aren't they just "static" relation/database constraints, which can be enforced for every update?

I have to give this much more thought, but is this a case where syntactic sugar for state-transition constraints could simply transparently introduce "surrogate relations" to implement the state transition constraints as "static" database constraints?

> >> >> More
> >> >> importantly, the bank must be able to identify the account that is
> >> >> about
> >> >> to
> >> >> change, and that identity must remain constant in both the preceding
> >> >> and
> >> >> succeeding database instances.
> >> >
> >> > Why? As long as it can be identified via some query, what difference
> >> > does it make? For example, if I make a database schema change and
> >> > introduce a new key, with appropriate view changes to support old
> >> > application code, is there some logical distinction? If the external
> >> > queries all still produce the same results, excepting the specific
> >> > values being updated, what does "identity" have to do with it?
> >>
> >> Because changes are set-based, and if the identity of the account can
> >> change, then it's possible to update the wrong row, or to allow a charge
> >> to
> >> clear that shouldn't be allowed.
> >
> > There is no "wrong row," only a set of propositions. The same
> > possibility for human error would seem to be present in any update:
> > that you might issue an update without knowing about a change made
> > between the time you last loaded the page, and the time you pressed
> > Save, and therefore could violate a constraint which you wouldn't
> > violate if only the database were in the state you think it is (based
> > on what's on the screen). This issue seems to be a particular variant.
> >
> >> > There is no "thing." These are propositions, or assertions if you like,
> >> > nothing more. The only meaning is in the correlation of queries to
> >> > external phenonema of interest.
> >>
> >> What are the propositions or assertions about? If they're about values
> >> then
> >> they're just hot air. A database contains knowledge. Knowledge about
> >> what?
> >> Scalar values? I don't think so.
> >
> > They're about what is in our heads - the application (business) domain.
> > The database doesn't care about that; it's in crafting predicates and
> > constraints that we tell the database as much as it needs to (or can)
> > "know."
> >
> >> The relational model doesn't have a correct theoretical mechanism to
> >> correlate tuples during updates. The scope of a key value's ability to
> >> identify a tuple is a single relation value from a single database
> >> instance.
> >> I think that the model is incomplete without such a mechanism, because
> >> there
> >> are some constraints that cannot be enforced, and certain update
> >> anomalies
> >> can occur, as I've provided examples of in other posts.
> >
> > Since we're not talking about a machine that "really knows" the real
> > world, I don't understand what sort of mechanism you have in mind -
> > what is an example of a "correct theoretical mechanism"? The relational
> > model already allows surrogate keys.

>

> But it does not require them. Nor does it define mutability constraints in
> conjunction with entity integrity. Nor does it define a tuple-level
> assignment operator.

Aren't tuple-level assignment operators unnecessary if you have the surrogate keys you seek?

In any event, I'm not sure that mandatory surrogate keys solve more problems than they create. The ability to change the non-surrogate keys arbitrarily seems to indicate that The anything, given that they allow arbitrary combinations of the various keys. In other words, by definition the surrogate key is unrelated to anything else, and as such it seems the keys can be shuffled at will. In the case of a relation where there is only a single surrogate key (e.g. the tuples represent facts that are nearly indistinguishable, like events in a trace), it doesn't matter.

> I think that the definition of the model should be
> strong enough so that I can't break it.

I think you really expect way too much from models, and I'm not sure this is a fracture any worse than the cure would produce. I think these are fundamental identity problems, not just ones particular to the relational model.

At the risk of being accused of waving my hands, I'll quote from one of Bill Kent's papers, The Unsolvable Identity Problem (http://www.idealliance.org/papers/extreme/proceedings/html/2003/Kent01/EML2003Kent01.html):

"Why is the identity problem unsolvable? To begin with, as just shown, we don't agree on what the identity problem is."

"A general unified theory of identity is elusive. It probably doesn't exist. The main reasons:

  • The problem is not well defined.
  • There are theoretical and practical limitations to what can be achieved.
  • There are too many semantic issues.
  • There are too many domains. We can't achieve a consistent solution across all of them.

But this quest for the Holy Grail is educational."

"So what do we do? Cope, as we always do. If there is no ideal solution, we develop solutions that are good enough. The trouble is that what's good enough for you today isn't good enough for me tomorrow. We are forever doomed to compromise, extend, patch and rework to make our good enough solutions a little better. We'll never get it right. That's life.

Human beings manage to cope somehow with imperfect identification schemes. Our computer systems might do no better than that."

  • erk
Received on Fri Aug 18 2006 - 15:34:40 CEST

Original text of this message