Re: algebra equations for Reference and FD constraints

From: Brian Selzer <brian_at_selzer-software.com>
Date: Tue, 6 Jan 2009 07:31:14 -0500
Message-ID: <naI8l.10289$as4.7510_at_nlpi069.nbdc.sbc.com>

"Walter Mitty" <wamitty_at_verizon.net> wrote in message news:oNj6l.1943$Es4.106_at_nwrddc01.gnilink.net...
>
> "paul c" <toledobythesea_at_oohay.ac> wrote in message
> news:F366l.5463$Ou7.4752_at_newsfe24.iad...
>> Brian Selzer wrote:
>> ...
>>> is the case. That is the essence of database updates, which are
>>> definitely outside the scope of the algebra and the calculus.
>>> ...
>>
>>> Database updates are indeed a relational model concept even though
>>> neither the algebra nor the calculus are sufficient to express them..
>>> ...
>>
>> Taken together, those two sentences amount to a very familiar mysticism,
>> trying to have it both ways. The RM is a model for a machine.
>> Self-evident that it's not a model for people. For a db to be useful,
>> obviously people must interpret the results of the model, ie., they agree
>> to interpret its results the same way (in a given setting).
>> An example - obviously Codd expected that the practical purpose his
>> audience had in mind was that some results would persist on some medium.
>> But his model doesn't require that. Persistence is often mistaken as
>> being essential to Codd's model, but it's not, the model stands up quite
>> well without any notion of persistence. Do the same with any other
>> concept, remove it from the picture and then ask is there a practical
>> interpretation of the results of the calculus or algebra as applied to
>> some formal definition of relation. Don't worry about whether everybody
>> in the world prefers that interpretation, just ask whether some people
>> do. As the posts from many people on various c.d.t. threads over the
>> years show, there are many interpretations and yet very few concepts
>> needed. This is became most concepts people talk about are interpretive
>> concepts. Such concepts can be excised from all talk about the formal
>> model without harm.
>>
>
>
> It's not clear to me what you mean by "Codd's model". In the footnotes to
> Codd's early work, it's clear that the relational model of data predates
> Codd's initial work. It also seems clear, at least to me, that Codd's
> early work dealt not with the relational model of data as such, but with
> the application of the relational model of data to the task of database
> organization and interface, and therefore to the task of database
> management system design. If I'm right about this, then the notion of
> persistence is central to what Codd was discussing, regardless of whether
> or not the notion of persistence is essential to the relational model of
> data.
>
> Throughout the history of c.d.t. there has been an ongoing ambiguity about
> whether the relational model is being proposed as the framework for a
> theory of databases or whether it's being proposed as the framework for a
> theory of computing in general. Not that it couldn't be proposed for
> both. Just that the discussion at each point remains clear only if the
> writer and the reader both know which proposal is being discussed.
>
> In the context of databases, Brian's term "database update" is crystal
> clear. There is a state of the database (out of all the possible states
> of that database) before the update. There is likewise a state of the
> database (out of all the possible states of the database) after the
> update. The database update is the difference between those two states,
> regardless what imperative form resulted in the updates taking place. In
> particular, it doesn't matter what combination of SQL UPDATE, DELETE or
> INSERT imperatives were performed, or in what sequence, or whether some
> other imperative language was used, as long as the difference remains the
> same. No confusion need exist between "database update" and "SQL
> update". Either that, or I'm misreading Brian's comments.
>
> What's not clear to me is whether or not the concept of "assignment" is
> inherently an imperative concept, and not reducible to purely declarative
> expression. I lean towards the view that assignment is inherently
> imperative.
>
--boy did that last post format poorly

You're not quite misreading my comments, but that's not exactly how I would say it. While it doesn't matter what combination of SQL imperatives were performed or in what sequence, a database update isn't just the difference between two states. Let me explain. This is not a simple explanation, and I want to be thorough and absolutely clear so that there isn't any confusion. In the following, a database is a value--a set of relations, which are also values. To say 'database value' or 'relation value,' therefore, is redundant. I've also shied away from words like 'state' because they are loaded and draw heckles and derision from people who still think the world is flat.

The set of all possible databases is determined by the database schema. A possible database is simply one that conforms to the database schema.
Only one possible database can actually be the database at a time.
Each possible database corresponds to a distinct proposition where
each relation schema corresponds to a sentence (a closed first-order formula), and for finite domains each relation schema extends into a set of tuples, each of which corresponding to a positive atomic formula in the first-order language of the proposition,
each database constraint corresponds to one or more sentences,
each element of each domain corresponds to a distinct constant, and
each key value in each tuple corresponds to either a constant or a definite description, so that
the proposition is the logical conjunction of all of those formulae.
Due to the unique name, closed world and domain closure assumptions, only one of the corresponding propositions can be assigned a positive truth value at a time.
A database update essentially asserts that a different possible database is now the database, but there is more to it than that.
A consequence of a database update is that a negative truth value is assigned to the proposition that corresponds to the possible database that has been the database the current database) and a positive truth value is assigned to the proposition that corresponds to the possible database that is becoming the database (the proposed database).
Due to 6, expressions in the algebra and the calculus can involve no more than one database or possible database at a time.

Note that up to this point I have avoided references to individuals in the universe of discourse, even though the assignment of positive and negative truth values to propositions can only happen under an interpretation. Even so, just as you cannot draw a sound conclusion from a proposition that has been assigned a negative truth value and thus does not represent what is the case--especially one that is the conjunction of many atomic formulae, the result of any algebraic expression that involves more than one possible database must therefore be considered suspect.

8. Since each tuple has a key value, under an interpretation each tuple maps indirectly to an individual in the universe, but not necessarily to the same individual all the time.

For example, suppose that there is one red hat, one blue hat, one green hat and one orange hat, then the phrase 'the guy wearing the red hat' is definite (hence the definite article), and in a room with up to four guys wearing hats, 'the guy wearing the red hat' uniquely identifies a particular guy.../at a time/. Now suppose you have two pictures of the room, and the red hat appears in each picture. Is it valid to say that 'the guy wearing the red hat' is the same guy in each picture? No, it isn't. The guy wearing the red hat at the time that one of the pictures was taken could have swapped hats with the guy that was wearing the blue hat, or could have walked out of the room and handed the hat to someone that wasn't in the picture. The point is that the key value 'red' may map to different individuals at different times.

9. A database update consists of a set of primitive assertions that together enumerate

all of the tuples in the current database that map to individuals that no longer exist;
all of the tuples in the current database that map to individuals that differ only in appearance (and how); and
all of the tuples in the proposed database that map to individuals that came into being.
Assignment is not primitive: it is composed of combinations of 9a and 9c and is semantically different from 9b. Consider the example in 8. Assignment would corresponds the the scenario where the guy walked out of the room--the guy wearing the red hat doesn't even appear in the other picture. 9b would correspond to the scenario where the guys that were in the room swapped hats--the guy wearing the red hat differs only in appearance from one picture to the other.
There can be more than one set of primitive assertions that results in the same proposed database value--in fact, there can be a large number of them. This is why I hold that a database update isn't just the difference between the proposed database and the current database.
A database update, therefore, specifies not just which possible database is now the database, but exactly how the proposed database differs from the current database.

A database is a set of named sets of sets of named values: it is not a set of named sets of named sets of named values, though you can make it a set of named sets of named sets of named values by introducing surrogates (automatically generated identifiers), which act as rigid designators in the first-order language of the propositions that correspond to possible databases. With surrogates it is possible to determine exactly what is different between one database and the next and by extension one picture of the universe and the next because surrogates rigidly designate--that is, permanently identify--individuals in the universe of discourse. Surrogates, therefore, would make it possible to treat the operations insert, update and delete as shortcuts for assignments. Without surrogates, on the other hand, information is lost when those operations are translated into assignments: in particular, exactly how one database differs from the next is lost, making it extremely difficult if not impossible to define transition constraints. (Try writing a set-based update trigger that acts as a transition constraint on a table with a natural key. You'll find that you can't determine with certainty whether or not a key update such as the hat swapping example above occurred. And as a consequence, you can't just join the old value to the new one to determine what happened.)

As far as whether or not assignment is imperative, the same thing that limits expressions in the algebra and the calculus to a single database at a time (see 6 above) prevents assignment from being defined using just the algebra or the calculus. So if by declarative you mean "capable of being defined just using the algebra," then assignment and all database updates in general are definitely inherently imperative. Received on Tue Jan 06 2009 - 13:31:14 CET

This message: [ Message body ]
Next message: rpost: "Re: native xml processing vs what Postgres and Oracle offer"
Maybe in reply to: Brian Selzer: "Re: algebra equations for Reference and FD constraints"
In reply to Brian Selzer: "Re: a union is always a join!"
Next in thread: rpost: "Re: native xml processing vs what Postgres and Oracle offer"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message