Re: A real world example

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sat, 12 Aug 2006 17:46:23 GMT
Message-ID: <PtoDg.7802$kO3.1452_at_newssvr12.news.prodigy.com>


"Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message news:fqmDg.39802$pu3.533163_at_ursa-nb00s0.nbnet.nb.ca...
> JOG wrote:
>
>> Brian Selzer wrote:
>>
>>>In an earlier thread, Marshall asked for a real-world example, and while
>>>this isn't exactly what was asked for, it illustrates the problem.
>>>Here's
>>>the situation. I have two databases containing orders. One is the
>>>customer's database containing purchase orders, lines, schedules and
>>>releases. The other is the supplier's database containing sales orders
>>>and
>>>lines. There should be a 1:1 relationship between the releases in the
>>>customer's database and the lines in the supplier's database.
>>>
>>>If you join the respective tables in each of the databases, you get the
>>>following:
>>>
>>>Customer purchase order detail:
>>>{PO#, POLine#, CustPart#, Schedule#, ReqestedDate, RequiredQuantity,
>>>ReceivedQuantity}
>>>
>>>The primary key is {PO#, POLine#, Schedule#, RequestedDate}
>>>
>>>Supplier sales order detail:
>>>{SO#, PO#, SOLine#, Part#, CustPart#, RequestedDate, PromiseDate,
>>>RequiredQuantity, ShippedQuantity}
>>>
>>>The primary key is {SO#, SOLine#},
>>>an alternate key is {PO#, CustPart#, RequestedDate, SO#}
>>>
>>>The first problem is that the customer uses a blanket PO, and therefore,
>>>(usually) reuses the same PO# and Line# to order a part--the schedule
>>>number
>>>differentiates one set of releases from another.
>>>
>>>The second problem is that the supplier's database doesn't track
>>>schedules,
>>>so there's no way to differentiate sets of releases, except that the SO#
>>>will be different for different schedules.
>>>
>>>Aside from those there have been keying errors, some rows don't have a
>>>customer part #, some rows don't have a PO number, etc.
>>>
>>>What I've been asked to do is to produce exception reports, showing what
>>>is
>>>different between the two databases so that changes to purchase orders
>>>that
>>>appear in the customer's database can be verified after they're entered
>>>into
>>>the supplier's database.
>>
>> It may be tiredness, but I found that example hard to digest.
>
> It's hard to digest because it is incomplete and artificial. It seems
> difficult because the idiot expects a 1:1 correspondence between tuples
> when none should exist based on the given information. A summary of two
> relations projected on PO# must reconcile, and that is all. If they must
> reconcile, then presumably the workers at the sales organization must have
> kludged their system probably by combining the customer's PO# and
> Schedule# into the sales PO#.
>
> Supplying common business sense where details were omitted, the total
> amount invoiced must reconcile with the total amount authorized on the
> purchase order. The total requested quantities must match and the total
> quantity shipped must reconcile with the quantity requested and
> authorized.
>
> If things don't reconcile, someone will have to go through the whole thing
> line by line and probably also locating the physical material. Forensic
> studies are necessarily meticulous and labour intensive.
>
> The whole example suggests to me that the sales organization hired a crank
> like Selzer in the first place who then failed to correctly capture the
> requirements for progress draws.
>

No. I stepped in to fix a problem that already existed. The problem was real--it was in no way artificial. The supplier's database was royally screwed up. Schedules were being missed and they were in danger of losing a significant percentage of their business. And yes, the tuples should correspond because they refer to the same thing. You are obviously too pig-headed and closed-minded to comprehend that. To add fuel to the fire, the customer makes frequent changes to the schedules for parts, so there is no way to correlate by using total amounts or quantities. Any new database state from the supplier can contain changes to and cancellations of existing schedules and releases, along with additional schedules and releases for existing POs, as well as totally new POs. Therefore, the supplier's database will always lag behind the customer's, in the same way that what is known by the database lags behind the current state of the universe.

>
> However
>> if I understand your gist correctly, you are pointing out a situation
>> where an external entity is identified differently, in two distinct and
>> separate databases?
>
> It goes deeper than that. Will the purchasing organization accept or pay
> for shipments they cannot trace to an exact (PO#,Schedule#) ? Why should
> they pay for something they didn't order? How else do they know they
> ordered it?
>

The supplier has been hand writing this information on the packing slips. Scary, yes?

>
> If so, then this will always necessarily require a
>> 'situated' sentient-being to act as middleware. After all a db doesn't
>> represent a universal truth about the real world, it just stores an
>> encoding of it acceptable enough for a target user to interpret what it
>> tells them in a useful way. In the end its always the human user
>> putting the semantic pieces back together.
>>
>>
>>>My point is that here is a real-world example where the universes of two
>>>databases overlap, but the set of attributes used to identify something
>>>common to the two universes is different for different databases. I
>>>understand that this is a common problem when merging or otherwise
>>>connecting databases, and I've run into this numerous times and have it
>>>covered, but the fact remains that a similar situation can occur within a
>>>single database. With natural keys, that is, keys whose value can be
>>>different in successive database states, it is possible for the values of
>>>one set of identifying attributes of something in the universe to be
>>>different in successive universe states
>>
>> huh? different identifying attributes, different things. How can they
>> be the same, unless there is an incorrect key choice?
>
> Since when did a natural key become an unstable key? If you haven't
> figured it out yet, it is impossible to communicate with a crank who makes
> up his own definitions for words.
>
> A natural key is simply a familiar surrogate. Nothing more. Nothing less.
> While stability is an important design criterion for choosing keys, both
> surrogate and natural keys can be stable or unstable.
>
> If his whole argument is that stability is an important design criterion,
> the appropriate answer is "Well, duh!"
> http://www.dbdebunk.com/page/page/622344.htm
>

No. Primary keys should not just be stable, they should be immutable. Unless values for key attributes are guaranteed to remain constant throughout the discussion, then it doesn't matter how stable a key is: if it can change, it will, and if you don't have a mechanism to detect such changes and deal with them, then you'll end up with garbage in your database. It's as simple as that. Any system that relies on the stability of a key cannot be proven to be correct. It may operate correctly 99.999% of the time, but it will eventually fail. I've seen it happen many times. You must either limit the types of updates that can occur, or you must add a ton of extra code.

>
> Liebniz equality,
>> etc., etc. (I know you've heard this from me before now, so apologies
>> for the repetition, but I still can't see how I'm not connecting with
>> you on it). The successive database states example, unless i've
>> misconstrued it, appears analagous to talking about a company who uses
>> a primary key of firstname for its employees relation, only for them to
>> realise a year later that they've cocked up when they hire someone with
>> a clashing firstname. Thats no mistake in the theory, just in the
>> coconuts trying to apply it.
>
> Some kind of nuts anyway.
>
>
>>>, and when this new information is
>>>imparted to the database, it is possible for there to be propositions in
>>>successive database states that should correspond, but cannot because the
>>>only relevant set of identifying attributes of something is different in
>>>successive universe states.
>>
>> Again I'm foxed by this Brian. How can propositions with different keys
>> correspond between database states? That notion seems nonsensical to
>> me. Only entities can 'correspond', and of course they are part of the
>> conceptual not logical model right?
>
> If you haven't figured it out yet, Selzer thinks physical location in
> storage has some kind of theoretical or logical importance.
>
That's an asinine comment, and a misrepresentation. Why do you feel the need to lie? Is your argument so weak that you must resort to misdirection, misrepresentation, and outright falsehoods? Are you a Democratic politician, or an adolescent?
>
>>>It doesn't matter how stable a set of
>>>identifying attributes is, if there's the slightest chance that its
>>>values
>>>can be different in successive universe states, you have to assume that
>>>they
>>>will.
>>
>> Well, only if there is sufficent risk involved to necessitate worrying
>> about such a situation for that specific database. It is _possible_
>> that two people's DNA may match for example, but the chances are so
>> absurdly infintessimely small that it is of no concern.
>
> Clones are more common than you think. An incidence of 4 per 1000 for
> identical twins makes them just a little more scarce than Canadians in the
> world population.
>
>
>>>The problem is one of relevance. If the DNA of a person is not
>>>relevant, but is the only set of attribute values that is guaranteed to
>>>remain constant throughout the discussion, then what do you do?
>
> That would be a poor assumption given the existence of chimeras and
> transplant recipients--not to mention early recipients of stem cell
> treatments.
>
>
> If you need
>>>to know what was known about a thing, then there are only three choices:
>>>(1)
>>>use a surrogate for the DNA and make it the primary key,
>
> i.e. Choose a valid candidate key. Familiarity, while also an important
> design criterion, is irrelevant to the point discussed. After all, a
> natural key is merely a familiar surrogate.
>
As I stated before, if it can change, it will.
>
> (2) identify each
>>>statement in the database so that it can be discussed in successive
>>>database
>>>states (in other words, use a surrogate for each statement),
>
> If one has a valid key, the statements already have logical identity. If
> that changes, it is no longer the same statement.
>

This proves your inability to see beyond your nose. The scope of a candidate key's ability to provide logical identity is a single database value. Perhaps you should review the definition. I'm sure there are also a lot of people on this forum who could walk you through it if that's what you need. I suggest, however, that you treat them with a little more respect than you have in the past. Most people don't respond well to abuse.

>
> or (3) define a
>>>tuple-level assignment operator (A tuple-level assignment operator would
>>>permit the system to correlate tuples in successive database states and
>>>to
>>>act accordingly. For example, Oracle has a FOR EACH ROW trigger.)
>
> That would be stupid. Physical identification has numerous problems that
> logical identification obviates.
>

I agree that it would be stupid, but not with your reason. What is physical identification, and what does it have to do with correlating tuples? Are you talking about the address on the disk of a particular tuple? Using that would be the height of stupidity.

>
>> Naturally (1) (with that 'if there is sufficient risk' caveat) - after
>> all, that's what a surrogate is for. A replacement for an unobtainable
>> distinguishing attribute. It is the only option which makes any sense
>> to me logically. However you do not specify which you would choose? My
>> impression from your previous posts is that you would also choose (1)
>> but would desire it to be hidden? All best, J.
>
> Who can say what the hell he is trying to say? He makes up his own
> definitions for words and expects everyone to know them. He is an idiot
> and a crank.

No, I'm a maniac because my mind operates faster than yours; you, on the other hand, appear to be doing 25mph in the passing lane. Received on Sat Aug 12 2006 - 19:46:23 CEST

Original text of this message