Re: In an RDBMS, what does "Data" mean?

From: Anthony W. Youngman <wol_at_thewolery.demon.co.uk>
Date: Mon, 14 Jun 2004 23:12:27 +0100
Message-ID: <qZztNxCLLizAFwnY_at_thewolery.demon.co.uk>


In message <qj3yc.2599$9C6.264_at_newssvr15.news.prodigy.com>, Eric Kaun <ekaun_at_yahoo.com> writes
>"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
>news:oCcd7SN$x6xAFw31_at_thewolery.demon.co.uk...
>>
>> Basically, any assumption that underlies mathematical logic is an axiom.
>> Copernican orbital theory is a mathematical model. Newtonian Mechanics
>> is a mathematical model. Therefore the assumptions that underlie them
>> must be axioms.
>
>OK, my mistake - I thought you meant the formulae themselves were axioms.
>You're referring to underlying assumptions, which of course are.

Some of the formulae may have to be axioms too. If you need to assume, then it's an axiom, if you can derive from your previous assumptions then it's a theorem.
>
>> >Cascading deletes are useful for implementations, not part of the
>theory - a
>> >cascading delete is simply nice shorthand for an implicit multi-update
>(as
>> >advocated by Date in recent writings), and roughly corresponds to the
>> >usefulness of the "foreign key" concept in place of a longer-winded
>> >constraint definition.
>>
>> In other words, the theory has no way of coping with what I call "the
>> adjectival clause" - a table whose contents are meaningless without the
>> existence of another table to point to. An invoice line item cannot
>> exist without an invoice for it to belong to!
>
>Right, but cascading delete is different than that. Constraints encode what
>you just said ("A cannot exist without B"); cascades are useful shorthands
>for updates, designed to make sets of operations easier while obeying the
>constraints.

But the need for a cascading delete is metadata - information that should be *implicit* within the database. You're turning it into an *external* constraint - putting it where it does NOT belong!
>

>> Please explain to me how, in the real world, an invoice line item can
>> have an existence in the absence of the invoice to which it belongs ...
>> because as I read you you are saying that relational theory says it can
>
>I may have expressed myself badly. What relational theory says is that
>statements about line items on an invoice state truths about values which
>are unrelated to the invoice as a whole, though each line item of course
>depends on the invoice (header). That statement, while "related to" the
>invoice header (in that it can't exist without it), has logical meaning on
>its own - I can formulate useful queries over line items which don't involve
>the header.

I think we're having a bit of fun here :-) You're saying you want to extract data from certain "columns" without caring what the primary key is. Fine - no problem there. Ignore the columns you're not interested in.

I'm saying that deleting the primary key should delete all related rows - even those in other tables! If your analyst forgot to specify a cascading delete (and you say that they're external to the theory, anyway), what you're saying is that the theory FAILS to enforce data integrity in that you're using something external to theory to keep the tables in sync.

Pick just stores it all together so that taking out the primary key takes out everything else.
>
>If cascading delete is your guide, then would deleting a customer cause all
>of that customer's invoices (and their line items) to be deleted as well?
>Danger aside, does that imply that invoices are an attribute of customers? I
>understand there's a difference between customer:invoice and invoice:line
>item relationships, but I'm trying to boil it down to something more than
>"they're part of the same thing". When it comes to general ledger entries,
>invoices, payments, shipments, contacts, etc., the line between what is and
>isn't part of a customers gets a little murkier. Or does it?
>

Well, if you follow accounting rules, yes it should :-) Although I think that really goes the other way - you can't create an invoice for a non-existent customer :-)

I think I know what you mean though, when you say "gets a little murkier". Except, in practice, it doesn't. "customer" is a noun - it gets its own FILE. "invoice" likewise. "line item" - is it a noun or adjectival clause? Pick Business Analysis would unhesitatingly place it in the category of adjectival clause. But I know why you would want to treat it as a noun.

Probably because it makes the General Ledger so much easier :-) you want to analyse by line-item, and not by invoice. Actually, that's not difficult at all - you just add ledger code as an attribute of invoice, grouped as part of line-item :-) But yep. I can see why you wouldn't think it as clean - I'm inclined to agree with you. If I was programming this, I'd probably say that "line-item" in the general ledger wasn't the same as "line-item" in the invoice and that would make my life nice and simple :-) but it would have the relational people throwing their hands up in horror. Or just make the entries in the GENERAL-LEDGER FILE a list of foreign keys pointing at the line item in the invoice file - not hard at all. Just a smidgeon more work for the database (but rather more mental contortion for the programmer).

But I've been thinking about a few other things while this reply has been sitting half-composed on my computer ... Relational Theory is all about capturing *data*. BUT - a lot of information is *metadata* which an RDBMS is incapable of storing as such. We were discussing ordering - an RDBMS only captures this - as data - if the analyst thinks it important. A Pick database captures it as a matter of course.

And constraints - I categorise them as "natural constraints" and "business constraints". You can't have an invoice line item without an invoice - that's a "natural constraint". But you *can* have an invoice without a valid company. It might be an error, or it might be called a receipt. But there's nothing to stop the accounts dept screwing up and issuing an invoice to a non-existent company :-) That's what I call a "business constraint". You seem to think that should be captured as *data*. Pick captures it as *metadata*.

Now compare the amount of *metadata* available to Pick and/or relational. It doesn't matter what your database is, the data in it is, as far as the dbms is concerned, a meaningless "blob". To optimise performance, storage, whatever, the only thing available of any use to the dbms is *metadata*. Which Pick has in abundance.

That's why I describe Pick as a superset of relational - it can convert metadata into data and present it to the app. It can also USE the metadata to optimise itself. Relational can only store this sort of information as *data*, and as such the information is not available to the dbms for its internal use.

Cheers,
Wol

-- 
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999
Received on Tue Jun 15 2004 - 00:12:27 CEST

Original text of this message