Re: Sixth normal form

From: Jan Hidders <hidders_at_gmail.com>
Date: Tue, 07 Aug 2007 08:20:27 -0000
Message-ID: <1186474827.234713.207060_at_k79g2000hse.googlegroups.com>


On 7 aug, 06:38, "Brian Selzer" <br..._at_selzer-software.com> wrote:
> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>
> news:1186391813.648681.90280_at_w3g2000hsg.googlegroups.com...
>
> [big snip]
>
>
>
>
>
> >> > Agreed. I would add that this is not specific for going from 5NF to
> >> > 6NF but anywhere you decompose to go from a lower to a higher normal
> >> > form.
>
> >> Not always. If two sets of attributes are independent, as is the case
> >> when
> >> moving from 1NF to 2NF, then there is no need for a referential
> >> constraint;
> >> if two projections are independent, as is the case when moving from BCNF
> >> to
> >> 4NF, then there is no need for a referential constraint.
>
> > I have no idea what you mean here. In all those cases you need a
> > cyclic pair of nclusion dependencies if you want the new schema to be
> > equivalent with the old. And if you only want the new schema to
> > contain at least as much information as the old then you strictly
> > speaking don't need any inclusion dependency at all.
>
> If two sets of attributes are independent, then for each combination of
> values for one of the sets of attributes, there is a copy of the projection
> over all of the other attributes.

Huh? I assume you meant that for each two tuples t1 and t2 there is a tuple t3 such that t3[A] = t1[A] and t3[B] = t2[B] where t[A] denotes the projection of t on A. In other words, there is an MVD X->>A|B for some X.

> What I mean is, the cardinality of the
> 1NF relation is equal to the product of the cardinalities of the projections
> over each independent set of attributes.

I doubt that is what you mean. If for R(a,b,c) there is an MVD a ->> b | c then it is certainly not true that | R | = | R[b] | x | R[c] |. Of course this is true if R a restricted to the tuples with particular value in the 'a' column, so perhaps that is what you meant?

> A cyclical constraint would only
> prevent (pathologically, I should add) the case where the 1NF relation is
> empty and where either but not both of the 2NF relations is empty: [...]

This is simply not true. Also if the 1NF relation is nonempty the 2NF relations might violate the two inclusion dependencies if the are allowed to contain more than just the projections of the 1NF relation.

> The same
> can be said for independent projections of relations, as is the case when
> moving from BCNF to 4NF.

No, also there it isn't true.

> I guess I need to clarify what I mean by /at least the same/ information
> content. Consider the following example relation schema that is in 5NF:
>
> (1) {Whse, Item, TranId, RcvdDate, QtyRcvd, QtySold, Cost}
>
> such that
>
> {Whse, Item, TranId} --> {RcvdDate, QtyRcvd, QtySold, Cost}
>
> An instance of this schema enumerates a set of cost tiers for items in a
> warehouse. Cost is the total cost for the quantity received. The unit cost
> is therefore Cost / QtyRcvd, but is not calculated ahead of time to minimize
> rounding errors. There can be several cost tiers for an item in a
> warehouse, and the cost assigned to a sale is computed by pulling from the
> oldest tier first, and continuing with successive tiers until the quantity
> needed for the sale has been met. Now decomposing the above schema into 6NF
> gives:
>
> (2) {Whse, Item, TranId, RcvdDate}
> (3) {Whse, Item, TranId, QtyRcvd}
> (4) {Whse, Item, TranId, QtySold}
> (5) {Whse, Item, TranId, Cost}
>
> This schema allows a Cost without a QtyRcvd, thus preventing the calculation
> of unit cost. It permits the existence of a QtySold without a QtyRcvd, thus
> preventing the calculation of the remaining quantity. It also permits
> QtyRcvd without RcvdDate, thus preventing the selection of the oldest tier.
> Clearly the existence of QtySold and Cost depend upon the existence of
> QtyRcvd, and the existence of QtyRcvd depends upon the existence of
> RcvdDate. Consequently, the set of 6NF relation schemata can't contain /at
> least the same/ information content as the 5NF schema, because some valid
> instances of the 6NF schemata do not make sense.
>
> Suppose that tuples exist for a particular {Whse, Item, TranId} in instances
> of the 6NF schemata for (3) and (4), but not for (2) and (5). A query
> seeking the total quantity on hand for the Item (SUM(QtyRcvd - QtySold))
> could indicate that quantity required for a sale is available, but since the
> tuples for (2) and (5) do not exist, the cost to be assigned to the sale
> transaction cannot be computed. It certainly doesn't sound to me like a
> good thing to carry inventory that you can't sell!

You could, once the missing information has been added. Yes, there might be circumstances under which you would want to disallow such incomplete information, but it is also very conceivable that you would want to explicitly allow it.

Btw., did you really have to type all this to explain the rather trivial observation that if you move to a more liberal schema then certain assumptions that are necessary for certain queries to make sense may no longer hold? And again, this is not typical for the step from 5NF to 6NF, but it can happen at all normalization steps where you split and don't include inclusion dependencies in both directions.

Finally, your redefinition of the phrase "a schema with at least the same information" is a bit confusing, to say the least, because the problem you are describing is caused by the schema being too liberal and thereby allowing information that does not make sense. It would be more appropriate to say that the schema allows too much information.

> It should be obvious that if sets of dependent attributes are truly
> independent of each other (which is the only case that an inclusion
> dependency is not indicated), then they should have already been separated
> into their own relation schemata as part of the normalization process.

Assuming my guess of your definition of "independence" is correct this is not true.

> It
> means that a multivalued dependency exists that is not implied by the
> candidate keys.

Nope. Also that is not true.

> It means that the relation schema is not even in 4NF.

Also false.

> Therefore, when moving from 5NF to 6NF, a cyclical constraint is /always/
> necessary to avoid information loss.

Which is also not true, unless of course you redefine "avoid information loss" as "a schema that is not equivalent", which is how you seem to interpret it, and in which case, as I already said, this is true for any normalization split.

  • Jan Hidders
Received on Tue Aug 07 2007 - 10:20:27 CEST

Original text of this message