Re: Resiliency To New Data Requirements

From: dawn <dawnwolthuis_at_gmail.com>
Date: 10 Aug 2006 17:19:59 -0700
Message-ID: <1155255599.217178.66640_at_m73g2000cwd.googlegroups.com>


JOG wrote:
> dawn wrote:
> > Marshall wrote:
> > > dawn wrote:
> > > >
> > > > I agree. If we are going to start somewhere and move forward, we might
> > > > be well-served to look to what works today outside of the RM (even
> > > > though it, of course, typically markets itself as relational). Is it
> > > > less expensive to work with Cache' than Oracle given such and such an
> > > > environment? If so, why?
> > >
> > > Is there theory behind any of this? Any mathematical models or other
> > > formalisms? It seems to me that comparing Cache with Oracle for
> > > TCO is not on-topic on c.d.t.
> > >
> > > Does any of "what works today outside of the RM" have any theory
> > > behind it? This is a theory newsgroup after all.
> >
> > Hi Marshall. The reason I originally came to this list was to learn
> > what it was about the theory that lead the industry down a path of
> > throwing out some good features such as lists, which I have used as my
> > primary example. I learned from this forum and elsewhere that the
> > theory has come back around to now permit nested structures, while a
> > huge amount of software implementations are stuck, for practical
> > purposes, with the flawed theory of what was once known as 1NF.
>
> I think the initial interpretation of 1NF was confused rather than
> 'flawed' - at the end of the day all theories are developed
> iteratively.

OK, or perhaps "the use of 1NF" was flawed, while there is nothing wrong with coining and defining it. I'm not sure that "nonsimple domains" (in the definition) was ever nailed down as precise mathematics. But if the mathematicians tell me the mathematics was not flawed, then I'm good with that. It is the application of that mathematics to data (the modeling of data) where my interests lie. The mistake was requiring software development teams to model data in what was termed 1NF.

> Of course in math, a relation can contain an element from
> any domain, and once RM became established this was picked up on
> relatively quickly.

I gather that you mean "in theory" it was picked up relatively quickly.  I'm not heavily tapped into what everyone out there is doing, but my pals are not defining new domains right and left.

> I think it's pretty much accepted now that how one
> operates on that complex element is not within the remit of the RM
> itself, and as such the DBMS must handle its decomposition.

This is a fine distinction, but I'll buy that "relational theory" can define itself as working with relations and as "orthogonal" to the question of what domains are supported. This can simply be a matter of defintiion. I'm not so sure I can buy that the relational model, that being the model of data that SQL attempted to implement whether they missed the mark by a lot or a little, can claim that any operators are irrelevant, including those that are specific to one domain or another.  Here is Date on Codd re the meaning of "data model"

"Codd defines a data model in a 1980 paper Data models in database management. By his definition a data model consists of a collection of data structure types, operators that can be applied to instances of these types and consistency rules that define valid states for the data."

Are these consistency rules only related to relationships between relations? Are they unrelated to the consistency of data values for an attribute, the sets from which valid values may come? Are contraints related to domains outside of the scope of the relational model? If so, what is the name of the scope they are in? Given that SQL implements a model that is bigger/broader that includes specific domains, for example, I need a name for what seems to me to be "a data model" that is implemented (with flaws) by SQL. In your terminology would that then be some sort of "uber data model"?

The good news is that even if theorists are split or narrowly define the RM so that it no longer contains any of the issues it helped cause in the industry, I think practioners would generally understand the relational data model to be the model that (at least in the 80's) forbade nested values, repeating groups, multivalues, non-1NF, or whatever you want to call it. So I think when I speak about "the relational model" with practitioners, they pretty much understand that it disallows lists as attribute values, for example.

I suspect we could both agree to the terminology that it is the advent of the relational model that brought about what was termed 1NF and disallowed non-simple domains (such as lists) even if we define the RM differently today.

> And of
> course that's the way it should be given that dates, strings and other
> decomposable types have no relevance to relational theory.

Again, I guess I'll go along with a redefinition of relational theory that says that it no longer cares if the value of an attribute is itself a relation. But once upon a time, it was definitely a player in the problems that arose from the relational model (not just from the implementations thereof).

> > So I want to talk about theory and its relationship to practice. We
> > don't need another two decades of flawed tools that blindly try to
> > follow another flawed theory. The industry had lists, then pooh-poohed
> > them, and now is bringing them back, where "the theory" seems to now
> > permit nested sets (although there are still many who are not ready to
> > accept that extension of the theory), and lists are accepted if defined
> > as user-defined types. But there are still no list operations in the
> > theory as best I can tell. If theory people want to discuss theory
> > sans "end users" of the theory (like me), they can do their work in a
> > vacuum, but then perhaps the industry would be well-served if more of
> > it (than in the past) would stay there so we don't repeat the mistakes
> > of the past (e.g. normalization as originally defined being implemented
> > before it was ripe).
>
> I've seen you refer to this as "throwing the baby out with the bath
> water". I'm not sure at all thats a good analogy, as it infers that
> complex types were the most important factor involved - if they had
> been RM would have seriously struggled,

OK, I'll buy that.

> whereas the other important
> advantages of the model over its competitors proved to be overwhelming,
> and it dominated in good old darwinian fashion.

I never thought of darwinianism in terms of the marketing buzz surrounding survival of one technology and not another, but ... ;-)

> Unfortunately recent 'advances' in db work such as XML databases seem
> to be attempting to retrieve the 'baby' by rebuilding a bathroom
> without any planning schematics, installing an upside-down bath and,
> worst of all, no plumbing system.

And I'd have to say that the baby isn't coming to life in that area for me yet either, but we might still in the pregnancy and the morning sickness is awful (I'll skip my great anecdote on that one).

> > So, while I want to talk about theory and its relationship to practice,
> > I'm not developing theory, and I don't know the totality of the theory
> > behind any di-graph models, for example. I suspect that there are many
> > here who would not accept anything other than set theory (functions are
> > sets, so I'm sure anything software developers do can be modeled as
> > sets if someone has a reason to do so.)
>
> Well, graph theory is constructed from set theory itself, a graph being
> defined as a triple (set of inputs, set of outputs, set of edges). It
> has a powerful theory layered on top of this definition and is hence
> applicable to a whole range of practicalities. Unfortunately
> information handling is not one of these

Hmmm. www?

> , given some information
> relationships will not fit into a binary approach such as graphs at the
> logical level*.

I'll admit I don't know what you are referring to, but are these relationships absolutely essential to your average software applications? Where is the show-stopper?

> Believe me, I spent an an incredibly frustrating year
> attempting to 'make them fit' before conceding defeat - looking back it
> seems a very naive period, but it was an invaluable education.

Just as with relational theory covering relations and something else covering domain operators in a single "data model" (or what I would term one), we can partition the space so that one theory meets some requirements for solutions and another meets another. It could even be partitioned so that some types of problems or domains use this data model and others use another, right? (I fully accept that I'm not "getting it" on this point and you may certainly point that out). Cheers! --dawn

> Jim.
>
> (* That is apart from 'hypergraphs', which uninvuitively allow edges to
> connect > 2 nodes, so allowing the handling of n-ary relationships.
> However, then the set of edges essentially become a set of tuples - an
> n-ary relation, and we are left with somewhat of a reinvention of
> relational theory.)
>
> >
> > Did that clarify? If so, is that, or is that not a valid discussion in
> > this forum? (Don't worry, even if you suggest it is valid to discuss,
> > I will still keep a low profile here as I know there are some who
> > really, really dislike having me around and I prefer the company of
> > those who are at least civil in their discourse when they disagree with
> > someone, as you, David, mAsterdam, JOG, x, and many others have always
> > been).
> >
> > Cheers! --dawn
Received on Fri Aug 11 2006 - 02:19:59 CEST

Original text of this message