Re: Resiliency To New Data Requirements

From: Marshall <marshall.spight_at_gmail.com>
Date: 16 Aug 2006 19:46:58 -0700
Message-ID: <1155782818.920618.74520_at_i42g2000cwa.googlegroups.com>

dawn wrote:
> Marshall wrote:
> > dawn wrote:
> > >
> > > Maybe a little, but the www is still a very large distributed database
> > > of sorts.
> >
> > It totally isn't. What makes up a database? Structure, integrity,
> > manipulation to start with.
>
> I didn't say that it was a DBMS.

Okay then; I'm hoist on the petard of the database/dbms difference:

> Your definition of "database" is not
> the norm, it seems, but I went to the cdt glossary and found this
>
> "[Database]
> "A logically coherent collection of related real-world data
> assembled for a specific purpose." -- rephrased from
> "Fundamentals of Database Systems", Elmasri & Navathe.
>
> 1. Deluxe filesystem
> 2. Shared databank (E. Codd) "
>
> I think the web fits within these definitions.

The web is a way to retrieve an HTML document via a URL key. So if we are to call it a database, we would then have to consider other similar things in which one looks up a value with a key a database. For example, FTP would be a distributed database. When you go to the laundry, give them your ticket and they give you back your clothes, that's a database. If you have an array in C or Java, that's a database. (You have an integer key and you use it to retrieve a value.) In fact, the array is the superior database compared to the www because the array supports updating.

One may stretch the definition of database to include these things, but none of them would be of interest or even on-topic in comp.databases.theory.

> > HTTP+HTML has *none* of those,
> > let alone more advanced things we might consider part of
> > a dbms.
> >
> > > It has structured data (in spite of what others might call
> > > it),
> >
> > No it doesn't. It has markup.
>
> You see no structure in the marked-up data?

Again, we may stretch the definition of structure until it can encompass this kind of usage, but doing so is counterproductive and not on-topic if we are discussing database theory.

I could say that an HTML document has structure insofar as it is a sequence of characters. "A sequence of characters" is a kind of structure. We could say that an HTML document has structure, because you can look at the glyphs that are rendered for each character and analyze that structure; for example the letter O has the structure of a circle. An anthill is also a kind of structure; it has little tunnels and so forth. But none of these is "structure" in the sense that we use the word when we talk about "structured data." When we talk about structured data, we mean data that has identifiable subcomponents for which we have associated meaning. It can be quite simple: an (x, y) pair representing a point in a 2D plane is structured data. A C struct (note the name) is structured data.

> > Markup is not structure; there is
> > no schema.
>
> If we limit it to the xhtml pages, would you then say it is
> structured data?

No; I would call it markup. The added information outside of the actual text is not semantic or structural; it is presentational merely. All you can do with HTML is instruct the browser to render something as an indented paragraph, or in italics, or whatever.

Thought experiment: if you have a plain text ascii file, is it structured data? Now add a facility that lets you annotate the plain text such that each letter may be rendered in a particular color. Does the addition of color transform the plain text document into structured data? Adding color is *very* similar to adding bold, italic, h1, etc.

> > If HTML is structured data, then troff is structured
> > data. No schema: no structure. The level of things you can
> > do with HTML are: put this word in bold.
> >
> > There's no DML. There isn't even a query language. GET is
> > not a query language. There are no integrity rules.
>
> Perhaps there is a theory definition of the word structure that you are
> using to draw the conclusion that the web does not have structured
> data.

Every technical field uses familiar terms to mean specific things. For example logicians and computer programmers use the word "or" to mean specifically inclusive disjunction, because "inclusive disjunction" is unweildy. They do this even though "or" can also mean exclusive disjunction in everyday discourse.

The field of data management uses the term "structure" as in "structured data" in a specific way.

> My take is that a single page can be a node/attribute with a
> value that is the html, for example, with directional paths to other
> nodes for which there are links in the page. Structure, no?
>
> > HTTP+HTML doesn't even remotely qualify as a data
> > management system.
>
> Agreed. It doesn't fit my def of a DBMS.
>
> > It's a distributed document retrieval
> > system. They are not the same thing. I'm not even sure
> > on what basis one could claim they were related.
>
> Every attribute value is a document, of sorts, however small.

Again, this is stretching a term past its breaking point. If we return to my minimal structured data example, the x,y point, I would not consider it accurate to say that the x component was a document.

> > > persisted on secondary storage devices, accessed by people.
> >
> > If this is your definition, then 3x5 cards is data management.
>
> Not DBMS, but, yes, it would be a database by my definition (and the
> def of many others as I understand it).
>
> >
> > > There isn't a great query langauge, I'll grant.
> >
> > There isn't *any* query language. Retrieving a document by
> > a key isn't a query language. It's a cheapo function call.
>
> Fine. Again, perhaps the industry has done something with the English
> word "query" so that when I put a word into google and retrieve a list
> of "keys" from which to choose that would not be a query. It seems
> like a query to me, but surely not like an SQL query.

Google is not the web. Google is an extensive document indexing application. And yes, it has a query language; the Google query language. It is not the web query language; there is no web query language. Go to a different search engine and you will find a different query language.

Also note: it is a query language for document retrieval, not for structured data retrieval.

> > > The requirements are not
> > > identical to those of a DBMS, but the model for the data ought to be
> > > taken seriously and moved forward accordingly.
> >
> > No, it shouldn't. There is no data model to take seriously.
>
> Perhaps not until "it" (a di-graph of tree nodes, perhaps?) replaces
> that which currently is considered the only possible data model, eh?

What does that even mean? As best I can tell that's a prediction that pick or possibly www is going to replace SQL for structured data management. That idea is so completely a nonstarter that I can't bring myself to summon appropriately severe wording for how bad it is; it would take someone more practiced than myself in course language. That idea is even less creditable than the idea that SQL is going to replace the web for document retrieval. Or, I don't know, that people are going to start writing device drivers with Excel macros. Neither idea makes a lick of sense.

> Since the model of data that I typically work with (in what most would
> call a DBMS) could be seen as a di-graph of trees, we know it is
> possible to do data management with such collections,

Are you talking about Pick or www now? As I understand it, Pick was designed for data management, so we should not find it remarkable that it is possible to do data management with it. Www on the other hand was not, and can't.

> even if we don't
> want to call the abstraction of this approach a data model. I'll grant
> it is not the same. But I will stand my ground that it should be taken
> seriously by those researching data models.

Neither one has anything to offer *as a data model* for data management that isn't trivially performed with SQL, (let alone some improved later-generation relational-theory-based approach) so I don't see any reason for researchers to pay attention, and they mostly aren't. Neither one can touch SQL for generality, completeness, nor integrity management features.

If you want to convince me that I should pay attention to something besides relational, a good start would be to come up with a structured data query that is hard in SQL but easy in the other thing.

> > Efforts to retrofit one have been embarrassing. If we want
> > to do data management, and we are studying HTML+HTTP,
> > then we should consider it a negative example.
>
> For some things, yes, e.g. 404s. Cheers! --dawn

For pretty much *everything.* For me, the challenge with thinking about www is trying to come up with any vague explanation for its success, since it's so terribly awfully horrible at pretty much everything it sets out to do. As application UI it is horrible. I am fond of saying that Tim Berners-Lee set the field of UI design back 15 years. As data management it is worse than useless. As hypertext it is almost completely featureless. Frankly it isn't even good at markup.

The only things that makes it a success as far as I can tell, are that it has a low barrier to entry, and that it has distributed hypertext. These ideas are both very powerful. They are not data management ideas, but they are powerful enough ideas for what they are.

Marshall Received on Thu Aug 17 2006 - 04:46:58 CEST

This message: [ Message body ]
Next message: Brian Selzer: "Re: A real world example"
Previous message: Bob Badour: "Re: Notions of Type"
In reply to: dawn: "Re: Resiliency To New Data Requirements"
In reply to Marshall: "Re: Resiliency To New Data Requirements"
Next in thread: Keith H Duggar: "Re: Resiliency To New Data Requirements"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message