Re: In an RDBMS, what does "Data" mean?

From: Marshall Spight <mspight_at_dnai.com>
Date: Sun, 27 Jun 2004 15:55:59 GMT
Message-ID: <j6CDc.101024$2i5.34223_at_attbi_s52>


"mAsterdam" <mAsterdam_at_vrijdag.org> wrote in message news:40de988a$0$48920$e4fe514c_at_news.xs4all.nl...
> Marshall Spight wrote:
> > Anthony W. Youngman wrote:
> >>Note that it's easy to go from a list to either of the other two. But in
> >>order to go back, the set or bag needs to contain extra data (ie the
> >>order) over the list.
> >
> > I don't see how you could consider that data "extra" if it was
> > there originally.
> >
> > Anyway, the list [A, B, C] expressed as a set is { (1, A), (2, B), (3, C) }
> > where [] denotes an ordered collection and {} denotes an unordered
> > collection.
> >
> [...]
> First, the way you chose is not without problems:

For sure!

> Let's put a new element 'N' into
> the list, right after the 'B':
>
> The new list is [A, B, N, C].
> Now what is the set?
>
> is it UC1: { (1, A), (2, B), (3, C), (2.5, N) } or
> is it UC2: { (1, A), (2, B), (4, C), (3, N) } ?

It's UC2.

> In the case of UC2 every insert causes updates to
> all elements that should be after the new element.

Let us also consider the two other common implementations of lists: linked lists and arrays.

In UC2, an insert requires O(N) time: every index above the one inserted requires updating. In a linked list, an insert requires O(1) time, but locating an item in order to insert in front of it is O(N). In an array, an insert requires O(N) time, because you have to move all the higher elements up.

The issue here is not performance, it's interface. Using SQL to manipulate ordered data *where the ordering is positional* and *not by any logical component of the data* is just way too hard.

"Lists are very common, and SQL doesn't handle them well at all. RM doesn't handle them well either."

Let me expand on what I mean by "handle," and to do so, I'll use the "structure, integrity, manipulation" definition of a DBMS.

If you have an unchanging list, using (1,A), (2,B), (3,C) etc as a list *structure* works just fine. All the query operators you'd expect to have, you have: get me item 2, how many items are there, etc. You'd also get some not-so-common queries: at what indicies does the letter "Q" occur?

But let's consider manipulation. Simple insert, using your (3,N) tuple.

Java:
list.insert(3, N);

SQL:
begin

update List set position = position + 1 where position >= 3 insert into List (position, value) values (3, N) commit
[note if you do them in the wrong order you're screwed.]

Wait, I just remembered: Date says integrity enforcement should be at the statement level, not the transaction level. So the first update will fail, because it leaves a hole in the sequence. So I've gotta figure out how to update all those pos values at once, which I think I can do with mod and some offsets. I expect it would take me a few hours to figure out.

Now: integrity. I have to specify some checks. Uh: unique(pos)
check( pos >= 0 )
check( select max(pos) from List = select count(pos) from List )

Did that get it? I think it did, but I'm not sure. Also, if I saw that in a table declaration, would I say "list" or would I say "bunch of integrity constraints."

> The problem is we have to make a choice with consequences to get from
> the list to a unordered collection carrying the same meaning, so we are
> capable of reconstructing the list.

I don't disagree, but I might say it differently: what matters is what we options we have to enforce integrity, and what operators we have to perform manipulation. I think RM is a step backwards in ease-of-use for each of these where lists are concerned. (Which is not to say that I think we should make our decisions on that basis alone, but I do think it's significant.)

> Thing is we *have* to make a complex choice,
> because (and only because) we decided that the order was
> meaningful.

I don't think the choice is intrinsically complex. I think the issue is just that SQL (and TTM for that matter) don't give you even the most basic list manipulation or integrity checks.

> There are even more worms in this can, I think, but
> I would appreciate your comments on this one.
>
> <snip>
> > Lists are very common, and SQL doesn't handle them well at all.
> > RM doesn't handle them well either.
> >
> > (I can also believe that MV handles them quite well, although I
> > have no direct experience to that end.)
> >
> > Does anyone know of a "list calculus" or "list algebra" with a
> > formal definition? It is just too simple for anyone to have
> > cared about?
>
> Lisp and prolog come to mind - but that is not what you are
> looking for - at least it is not what I am looking for.

No, if I want a PL with list operators, I can find them anywhere. I was more thinking of something like the relational algebra, with its minimal set of operators, additional operators defined in terms of the minimal ones, and a definition of what it means for a set of operators to be complete. Only for lists.

Frighteningly, if I can't find something like that, I may have to do it myself.

Marshall Received on Sun Jun 27 2004 - 17:55:59 CEST

Original text of this message