Re: Resiliency To New Data Requirements

From: dawn <dawnwolthuis_at_gmail.com>
Date: 16 Aug 2006 20:16:52 -0700
Message-ID: <1155784612.476010.179740_at_h48g2000cwc.googlegroups.com>


JOG wrote:
> dawn wrote:
> > Perhaps there is a theory definition of the word structure that you are
> > using to draw the conclusion that the web does not have structured
> > data. My take is that a single page can be a node/attribute with a
> > value that is the html, for example, with directional paths to other
> > nodes for which there are links in the page. Structure, no?
>
> No, Marshall is correct. Information such as that on the web is known
> in the scientific literature as Unstructured data. That which comes
> between that and relationally represented data is known as
> Semi-structured data.

Yes, I've read those terms. I do understand the use of "unstructured" within an attribute value, such as an attribute whose value is a document. That doesn't make the whole unstructured. The fact that a database holds music doesn't make the database any less structured. There can be a structured database that includes unstructured attribute  values.

R(URL, html, foreignKeyList)

That's some structure, right? For the subset of the web with xhtml backed by a schema, we would perhaps be able to show more structure.

> Definitions are woefully slapdash, but Google
> scholar will supply a whole host of papers on the subject.

Yup, and those papers can define unstructured and semi-structured however they want. I'm just saying that the data is also structured. You could put the data into a relational database. I could put it into a PIck database. Right now it is in a highly distributed database that has a structure, even if there is "unstructured" data within it.

Maybe I'm misunderstanding the use of these terms, but I reallly, really dislike the term "semi-structured" for data that has a structure. --dawn Received on Thu Aug 17 2006 - 05:16:52 CEST

Original text of this message