Re: In an RDBMS, what does "Data" mean?

From: Anthony W. Youngman <wol_at_thewolery.demon.co.uk>
Date: Sat, 19 Jun 2004 01:31:29 +0100
Message-ID: <PnuESeShl40AFw43_at_thewolery.demon.co.uk>


In message <8D0Ac.41$NZ6.38_at_newssvr33.news.prodigy.com>, Eric Kaun <ekaun_at_yahoo.com> writes
>"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
>news:qZztNxCLLizAFwnY_at_thewolery.demon.co.uk...
>> In message <qj3yc.2599$9C6.264_at_newssvr15.news.prodigy.com>, Eric Kaun
>> <ekaun_at_yahoo.com> writes
>>
>> But the need for a cascading delete is metadata - information that
>> should be *implicit* within the database. You're turning it into an
>> *external* constraint - putting it where it does NOT belong!
>
>I don't understand your distinction between "implicit" and "external" here.
>External to what? In relational, both are part of the database, which
>includes both relations (actually relvars, relation-typed variables which
>are updated with new relation values) and constraints. In most businesses
>there are rules which bind multiple relations; I'm sure you have something
>similar in Pick, though it may be enforced by the application. Using
>first-order logic over relvars, you can specify most of these (if not all).

See lower in my original post - an invoice line can't exist without an invoice, whereas a car can exist without an owner ...
>
>Haven't you ever seen a Pick app where deleting from (or updating) a record
>in FILE A requires a corresponding delete/update in FILE B, yet you don't
>have the ability to encode that in the file or dictionary?

It probably happens, but MUCH less than relational. So much so, that the need has never been worth doing anything about :-)
>
>I think you're suggesting that the data structures themselves should encode
>the constraints, which gets you into dangerous territory, leading to novel
>data structures for each individual enterprise. That's fine as long as the
>query and update operators stay consistent, but you'd quickly find
>constraints undoing that, leaving you with custom persistence and no
>standard at all. Remember that even foreign and primary key constraints are
>just that; SQL and even D give shortcuts, but it's just a 1-1 mapping to a
>constraint declaration.
>
>> You're saying you want to
>> extract data from certain "columns" without caring what the primary key
>> is. Fine - no problem there. Ignore the columns you're not interested
>> in.
>
>But then why include them at all? Certainly I can ignore attributes, for
>example in updating attribute A I ignore attribute B. Consistently treating
>a set of Pick attributes as a group (e.g. the line item attributes), while
>they're part of INVOICE, seems logically wrong; those attributes are
>different than, for example, the INVOICE_DATE.

Because statistics tell me that if I access a line item, then I am highly likely to want to access the invoice date at the same time. The cost of retrieving it unnecessarily turns out to be worth it in making it available in case I want it.

But that's a physical thing that relational theory refuses to address...
>
>> I'm saying that deleting the primary key should delete all related rows
>> - even those in other tables! If your analyst forgot to specify a
>> cascading delete (and you say that they're external to the theory,
>> anyway), what you're saying is that the theory FAILS to enforce data
>> integrity in that you're using something external to theory to keep the
>> tables in sync.
>
>Nothing external about it. In Pick the integrity of the files is enforced by
>the application, yet you don't regard the app as external (at least I've
>seen arguments to the contrary). Constraints are different from relations
>because they make statements about those relations. Both are integral to
>relational.

No they are NOT enforced by the application. They are enforced by the DESIGN. An invoice line has as its primary key the invoice number (plus an *implicit* sequence number). Delete that primary key and all associated data disappears including all the invoice lines.
>
>> Pick just stores it all together so that taking out the primary key
>> takes out everything else.
>
>A fine shorthand, and again the CASCADE DELETE (not always what you want, by
>the way) is simple enough to do, and even to add in later (unlike in Pick,
>where you have to make that decision up front).

Yup, you do "make that decision up front", but it's an obvious decision. I know you can't design something to be fool-proof, but you really do have to be an idiot to make a design blunder of this magnitude...
>
>I object less to this than you'd expect; I can see some cases where this
>buys you a short-term gain. I just see little long-term gain, and expect
>long-term cost. I've been trying to think of past databases I've worked on,
>and whether MVs would have bought me anything. Haven't found anything yet...
>in the few cases where multi-descriptions or multi-coding would have helped,
>I had cross-business unit and internationalization issues that would have
>prevented leveraging them anyway. And I can remember a few cases where
>properly treating a simple code as its own "noun", rather than an adjective,
>saved me much work later.
>
>Short version: I see adjectives "becoming" relations fairly frequently. I
>see relations which remain "unused" as such infrequently. Of course, I'm
>aware that our perceptions are less than objective, and that the lexicons in
>our head guide our observations more than they should.

And Pick treats foreign keys as "just another attribute", so I'm sorry but I'd just dismiss your "adjectives become relations" with "so what!". We don't see it as a problem.
>
>> But I've been thinking about a few other things while this reply has
>> been sitting half-composed on my computer ... Relational Theory is all
>> about capturing *data*. BUT - a lot of information is *metadata* which
>> an RDBMS is incapable of storing as such. We were discussing ordering -
>> an RDBMS only captures this - as data - if the analyst thinks it
>> important. A Pick database captures it as a matter of course.
>
>True. An avenue for discussion might also be what other metadata is useful,
>other than order. I think that more general question would cut more to the
>heart of why different data models appeal in different ways.

Part of the problem is that relational theory explicitly ignores implementation. The main reason for keeping metadata as metadata not data, is that it assists greatly in optimisation - an implementation issue.

Keeping metadata is worthless in the relational paradigm, which I suspect is why so many people have trouble with me repeatedly talking about statistics :-)
>
>> And constraints - I categorise them as "natural constraints" and
>> "business constraints". You can't have an invoice line item without an
>> invoice - that's a "natural constraint". But you *can* have an invoice
>> without a valid company. It might be an error, or it might be called a
>> receipt. But there's nothing to stop the accounts dept screwing up and
>> issuing an invoice to a non-existent company :-) That's what I call a
>> "business constraint". You seem to think that should be captured as
>> *data*. Pick captures it as *metadata*.
>
>Given that relational advocates (at least in recent writings by Date) a
>system catalog that is also composed of relational (and which effectively
>represents a partial second-order relational algebra/calculus), I'd say that
>relational definitely wants all data, even metadata, as relations and
>constraints. I have no particular reason to think that that's not desirable,
>but it also begs the question: what metadata is there, what's useful, and
>how does the importance of a given "type" of metadata influence the utility
>of a given data model?
>
>There may be research on such... just haven't stumbled across it.
>
>> Now compare the amount of *metadata* available to Pick and/or
>> relational. It doesn't matter what your database is, the data in it is,
>> as far as the dbms is concerned, a meaningless "blob". To optimise
>> performance, storage, whatever, the only thing available of any use to
>> the dbms is *metadata*. Which Pick has in abundance.
>
>What, other than ordering?

Which data is *tightly* linked to other data, and which data is *loosely* linked. Information which helps it guarantee that after one disk access, the next few requests can be met from cache not disk ...
>
>> That's why I describe Pick as a superset of relational - it can convert
>> metadata into data and present it to the app.
>
>True enough about the ordering, but I'd argue that without any constraints
>(ordering is implicit), Pick doesn't offer much else. I have to admit not
>knowing enough about the dictionary, but that seems to be functional
>transformation, not actual constraints on what's placed into a file... and
>in particular no constraints that cross multiple files. I think relational
>constraints can be much, much more descriptive (as well as being
>proscriptive) - far better than SQL would let on.

The dictionary doesn't declare constraints between FILEs, but seeing as a FILE usually contains contents equivalent to several relational tables, it has no need to ... that's why Pick doesn't have constraints like relational does - even those of us who understand relational constraints just can't see the point of implementing them in Pick :-)
>
>> It can also USE the metadata to optimise itself.
>
>How so? I thought the programmer had to make the opimization, by choosing
>what data is retrieved at one time by virtue of being in the same file? I
>may be missing something.
>
>> Relational can only store this sort of
>> information as *data*, and as such the information is not available to
>> the dbms for its internal use.
>
>Well, if the catalog is relational (as it should be), then I'd say this
>isn't quite correct. One could even enforce database design / naming
>standards using constraints over system catalog relations!
>
>In any event, it would seem useful even in Pick if there were certain
>"implicit" files that represented the files in the system - for example, a
>file called FILE with one record per file, and perhaps an attribute called
>ATTRIBUTES containing a list of attributes... anyway, you can probably see
>the utility of that for app generation, enforcing standards, and even
>implementing the Pick engine (and extensions/plugins). Date advocates that,
>and I believe that Dataphor uses that heavily.
>
Sounds interesting ... and from another post of mine, you'll see that if I've understood you correctly, Pick does indeed have something like that ... or if it doesn't then it would be easily implemented if it made any sense within the model.

Cheers,
Wol

-- 
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999
Received on Sat Jun 19 2004 - 02:31:29 CEST

Original text of this message