Re: A real world example

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sat, 12 Aug 2006 03:21:19 GMT
Message-ID: <PObDg.7715$kO3.6667_at_newssvr12.news.prodigy.com>


"JOG" <jog_at_cs.nott.ac.uk> wrote in message news:1155343153.851364.71130_at_h48g2000cwc.googlegroups.com...
> Brian Selzer wrote:
>> In an earlier thread, Marshall asked for a real-world example, and while
>> this isn't exactly what was asked for, it illustrates the problem.
>> Here's
>> the situation. I have two databases containing orders. One is the
>> customer's database containing purchase orders, lines, schedules and
>> releases. The other is the supplier's database containing sales orders
>> and
>> lines. There should be a 1:1 relationship between the releases in the
>> customer's database and the lines in the supplier's database.
>>
>> If you join the respective tables in each of the databases, you get the
>> following:
>>
>> Customer purchase order detail:
>> {PO#, POLine#, CustPart#, Schedule#, ReqestedDate, RequiredQuantity,
>> ReceivedQuantity}
>>
>> The primary key is {PO#, POLine#, Schedule#, RequestedDate}
>>
>> Supplier sales order detail:
>> {SO#, PO#, SOLine#, Part#, CustPart#, RequestedDate, PromiseDate,
>> RequiredQuantity, ShippedQuantity}
>>
>> The primary key is {SO#, SOLine#},
>> an alternate key is {PO#, CustPart#, RequestedDate, SO#}
>>
>> The first problem is that the customer uses a blanket PO, and therefore,
>> (usually) reuses the same PO# and Line# to order a part--the schedule
>> number
>> differentiates one set of releases from another.
>>
>> The second problem is that the supplier's database doesn't track
>> schedules,
>> so there's no way to differentiate sets of releases, except that the SO#
>> will be different for different schedules.
>>
>> Aside from those there have been keying errors, some rows don't have a
>> customer part #, some rows don't have a PO number, etc.
>>
>> What I've been asked to do is to produce exception reports, showing what
>> is
>> different between the two databases so that changes to purchase orders
>> that
>> appear in the customer's database can be verified after they're entered
>> into
>> the supplier's database.
>
> It may be tiredness, but I found that example hard to digest. However
> if I understand your gist correctly, you are pointing out a situation
> where an external entity is identified differently, in two distinct and
> separate databases? If so, then this will always necessarily require a
> 'situated' sentient-being to act as middleware. After all a db doesn't
> represent a universal truth about the real world, it just stores an
> encoding of it acceptable enough for a target user to interpret what it
> tells them in a useful way. In the end its always the human user
> putting the semantic pieces back together.
>

I was pretty tired when I posted it. Maybe it was contageous. It was possible to correlate most of the entries by using temporal sequencing, but it requires two successive database states from the customer database; I just created an extra few reports to deal with the duplicates.

>>
>> My point is that here is a real-world example where the universes of two
>> databases overlap, but the set of attributes used to identify something
>> common to the two universes is different for different databases. I
>> understand that this is a common problem when merging or otherwise
>> connecting databases, and I've run into this numerous times and have it
>> covered, but the fact remains that a similar situation can occur within a
>> single database. With natural keys, that is, keys whose value can be
>> different in successive database states, it is possible for the values of
>> one set of identifying attributes of something in the universe to be
>> different in successive universe states
>
> huh? different identifying attributes, different things.

No, you can have more than one set of identifying attributes for a thing. I just posted a detailed explanation in response to keith's post. I think that instead of revisiting the explanations here, I should refer you there.

>... How can they
> be the same, unless there is an incorrect key choice? Liebniz equality,
> etc., etc. (I know you've heard this from me before now, so apologies
> for the repetition, but I still can't see how I'm not connecting with
> you on it). The successive database states example, unless i've
> misconstrued it, appears analagous to talking about a company who uses
> a primary key of firstname for its employees relation, only for them to
> realise a year later that they've cocked up when they hire someone with
> a clashing firstname. Thats no mistake in the theory, just in the
> coconuts trying to apply it.
>

Most of what I do is cleaning up other people's messes, so I understand fully about poor key choice, but I think the problem goes deeper than that.

>>, and when this new information is
>> imparted to the database, it is possible for there to be propositions in
>> successive database states that should correspond, but cannot because the
>> only relevant set of identifying attributes of something is different in
>> successive universe states.
>
> Again I'm foxed by this Brian. How can propositions with different keys
> correspond between database states? That notion seems nonsensical to
> me. Only entities can 'correspond', and of course they are part of the
> conceptual not logical model right?
>

If a proposition describes some aspect of one or more things in the universe, then you should be able to correlate that proposition in one database state with a proposition that describes the same aspect of the same thing or things in the next database state. In other words, you may need to know what is known about something in order to assert something new about that thing.

>> It doesn't matter how stable a set of
>> identifying attributes is, if there's the slightest chance that its
>> values
>> can be different in successive universe states, you have to assume that
>> they
>> will.
>
> Well, only if there is sufficent risk involved to necessitate worrying
> about such a situation for that specific database. It is _possible_
> that two people's DNA may match for example, but the chances are so
> absurdly infintessimely small that it is of no concern.
>
>> The problem is one of relevance. If the DNA of a person is not
>> relevant, but is the only set of attribute values that is guaranteed to
>> remain constant throughout the discussion, then what do you do? If you
>> need
>> to know what was known about a thing, then there are only three choices:
>> (1)
>> use a surrogate for the DNA and make it the primary key, (2) identify
>> each
>> statement in the database so that it can be discussed in successive
>> database
>> states (in other words, use a surrogate for each statement), or (3)
>> define a
>> tuple-level assignment operator (A tuple-level assignment operator would
>> permit the system to correlate tuples in successive database states and
>> to
>> act accordingly. For example, Oracle has a FOR EACH ROW trigger.)
>
> Naturally (1) (with that 'if there is sufficient risk' caveat) - after
> all, that's what a surrogate is for. A replacement for an unobtainable
> distinguishing attribute. It is the only option which makes any sense
> to me logically. However you do not specify which you would choose? My
> impression from your previous posts is that you would also choose (1)
> but would desire it to be hidden? All best, J.
>

I think it depends on the situation. (1) if the universe doesn't contain the database, (2) if it does, and (3) if changing the schema will break a bunch of applications. (2) is much more appealing for temporal databases. Received on Sat Aug 12 2006 - 05:21:19 CEST

Original text of this message