Re: Resiliency To New Data Requirements
Date: 16 Aug 2006 21:40:14 -0700
Message-ID: <1155789614.688385.250150_at_h48g2000cwc.googlegroups.com>
Keith H Duggar wrote:
>
> It probably is off-topic, however, for general usage I'm
> forced to partly agree with Dawn that the various *ML's do
> have semi-structure.
Well, I would draw a distinction between "general usage" and technical usage.
And this does all circle back to my recurring point about definitions: anyone can make one; they are intrinsically neither right nor wrong. However, some definitions are more formally made and sometimes more authoritatively endorsed. If we agree that some term has some definition, we can usefully use that to communicate.
In the context of data management (which is in fact our current context in this NG) I expect the term "structured data" to mean something more specific than I might if I used the term with Aunt Mildred.
(Ben Kenobi: "So the data I sent you *was* structured ... from a certain point of view.")
The problem I have with terms like "unstructured" and especially "semi-structured" is that I don't think there is any agreement on what it means. Where I work is filled with quite smart, well educated people. I don't work at Initech by any means. And on occasion, I have heard people (again, *smart* people) use the term "semi-structured" and later asked them what *exactly* does that mean? And by and large they hem and haw and make vague attempts to define it, but in the end I'm convinced they are using the term in an evocative rather than technical way. It means "kinda structured." Or even "badly structured." This definition of "semi-structured" I don't think has much to teach us.
Now, there are other definitions. The best one I've heard I got from Jan Hidders, although I don't remember whether it was him defining it or it was defined in a paper he referenced. Referring back to the point-of-view issue, we can define semi-structured data as data for which we know only part of the schema. And in fact there are some very interesting use cases there.
What about plain English text? How "structured" is the Declaration of Independence, A Shropshire Lad, the wikipedia entry for Obi-wan Kenobi, or this post? I believe, although I cannot demonstrate, that human thought has a schema. It is of course a hugely complex schema, vastly moreso than any schema of deliberate human invention. And in fact it is likely a different schema for each person, but with many commonalities and cultural trends. But whether I am right about this or not is irrelevant for now as the issue is AI-complete.
For the computer, the issue is simple enough to be reduced to a bumper sticker:
No schema, no semantics.
Know schema, know semantics.
> I think almost anyone would understand
> what you meant to communicate if you said something like
> "plain text is unstructured, relational data is structured,
> and the stuff in between like HTML is semi-structured". Sure
> the semi-structure sucks in major ways but the word semi-
> structured communicates the concept just fine.
Well, what *is* the structure, or semi-structure, that HTML has that plain text doesn't? I don't see that it's really anything more than "put this word in bold." Okay, there are also anchor tags, and you have an href and the link text. But this structure is one for which the schema is already fixed; it is the HTML grammar. SQL in contrast provides a meta-model; a model with which one generates models. HTML does not provide a meta model, just a model. Any data management solution, or programming language for that matter, requires a meta model, not just a model.
Marshall
http://en.wikipedia.org/wiki/Obiwan_Kenobi http://en.wikipedia.org/wiki/A_Shropshire_Lad Received on Thu Aug 17 2006 - 06:40:14 CEST