Re: Multiple-Attribute Keys and 1NF

From: David Cressey <cressey73_at_verizon.net>
Date: Fri, 31 Aug 2007 18:12:00 GMT
Message-ID: <QRYBi.194$NL2.188_at_trndny04>


"Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message news:46d815d3$0$4063$9a566e8b_at_news.aliant.net...
> David Cressey wrote:
>
> > "JOG" <jog_at_cs.nott.ac.uk> wrote in message
> > news:1188556656.192653.305160_at_r23g2000prd.googlegroups.com...
> >
> >
> >>Well I've never suggested multiple values contained in a collection.
> >>But yes as I said, multiple roles does break the guaranteed access
> >>rule. My question is now (in the continuuing hunt for the theory
> >>behind 1NF) is why on earth would that be a problem? I don't see any
> >>affect on the relational algebra.
> >
> >
> > I honestly think that the impetus behind "normalization" in the Codd
1970
> > paper is more of a stopgap than a theory. (I'm not familiar with the
1969
> > paper, and I only read the 1970 paper after I began participating in
the
> > discussions in c.d.t.) In the 1970 paper, Codd suggests that it may be
> > worthwhile to consider the subset of schemas that contain only atomic
> > attributes. (He didn't use the word "schemas", but I hope I can use it
> > without introducing confusion.)
> >
> > He pointed out that such a restriction did not thereby reduce the
> > expressiveness to the system, in that for every unnormalized schema,
there
> > existed an equivalent normalized schema. "normalized" in the 1970
paper is
> > called 1NF in later writings, once further normal forms were discovered.
> >
> > There is one other piece of the 1NF definition in the 1970 paper, the
"no
> > duplicates rule". The no duplicates rule has to do with the
representation
> > of a relation, and not with a relation itself. Codd imagined
(correctly)
> > that the first relational database systems would use records to
represent
> > tuples and (virtual) arrays of records to represent relations. In a
> > relation, there is no such thing as "a tuple appearing twice". However,
in
> > an array of records, there is such a thing as two of the records having
> > identical contents. Codd ruled that out as a practical stop gap, in
order
> > to prevent the implementations from diverging from the properties of
> > mathematical relations in an unnecessary and harmful way. This is my
> > reading of the 1970 paper, in regard to 1NF theory.
>
> I disagree slightly with your interpretation. Codd did not disallow
> physical duplication. Duplicates in sets have no meaning, thus:
> { 1, 2, 1 } = { 1, 2 } = { 2, 2, 2, 2, 1 } etc.
>
> At the logical level, duplicates count only once, and the physical
> structure conveys no meaning. By divorcing physical structure from
> logical interpretation, one enables physical independence. Thus
> duplicating information in an index alters the performance
> characteristics without changing the meaning of queries etc.
>

It looks like I was wrong. I scanned the 1970 paper again and didn't find a mention of the "No duplicate rule". I must have confused the 1970 paper with some of the writings that refer to it.

The case I was mentioning would be more like

{1, 2} = {2, 1}

than the ones you outlined. The only near reference to this in the 1970 paper is in the discussion of "Project", where he says that after eliminating come columns, duplicates left in the result table must be eliminated. I infer from this that he did not intend to allow duplicate rows to be passed response to a query, at least in the case of a project.

My comments have nothing to do with indexes.

>
> > There's a connection between the "atomic values" rule and the "no
duplicates
> > rule", at the implementation level.
> >
> > consider the following fact:
> >
> > Jack speaks English and German.
> >
> > Let's say we are about to include this fact in a relation stored
somewhere
> > in a relational database, and that one of the columns of a relational
table
> > is "set of languages spoken".
> >
> > Further, let's say that there is already a tuple in the relation with
the
> > following fact stored:
> >
> > Jack speaks German and English.
> >
> > As a practical matter, in terms of the representation of data inside a
> > database, it can be extraordinarily difficult to ascertain that these
two
> > propositions, together, violate the "no duplicates rule"
> >
> > Notice that my focus has been entirely on the implementation, and not
on
> > the relational algebra itself. With regard to the relational algebra
> > itself, I believe your understanding is correct.
> >
> >
> > So what the heck are implementation oriented issues doing in the 1970
paper?
> > I believe Codd wanted to get across two main ideas: building a system
for
> > relational databases would be a good idea. And building such a system
was
> > also feasable. It's for this second reason that I believe Codd added
some
> > material that is primarily about implementation, rather than about the
power
> > of relational algebra itself.
> >
> > This is my insight, such as it is. I hope it helps.
>
> Again, I disagree slightly. While I do not know Codd's intent other than
> the intent expressed in his works, his observation that one can
> normalize quite mechanistically provides an implementation for the RVA.
>

Hmm....
> Of course, that won't necessarily protect one from the update anomalies
> the higher normal forms address.

Agreed. Received on Fri Aug 31 2007 - 20:12:00 CEST

Original text of this message