Re: Sixth normal form

From: Brian Selzer <brian_at_selzer-software.com>
Date: Fri, 03 Aug 2007 19:40:37 GMT
Message-ID: <VwLsi.31244$2v1.12149_at_newssvr14.news.prodigy.net>


"Brian Selzer" <brian_at_selzer-software.com> wrote in message news:deIsi.54745$5j1.747_at_newssvr21.news.prodigy.net...
>
> "Jan Hidders" <hidders_at_gmail.com> wrote in message
> news:1186049901.210928.158310_at_q75g2000hsh.googlegroups.com...
>> On 1 aug, 15:51, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>>> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>>>
>>> news:1185813030.422971.126780_at_k79g2000hse.googlegroups.com...
>>>
>>>
>>>
>>> > On 30 jul, 12:45, Sameeksha <sameeksha.ch..._at_gmail.com> wrote:
>>> >> Googling out for definition and explanation for sixth normal form
>>> >> only
>>> >> resulted in the following information - "6th normal form states that
>>> >> a
>>> >> relation R should not contain any non-trivial join dependencies".
>>> >> Also
>>> >> everywhere it is stated that this normal form takes into account the
>>> >> temporal (time) dimension to the relational model, and that current
>>> >> implementations like SQL server 2005 do not implement this normal
>>> >> form.
>>>
>>> > It would help if you first explained what you already know, so we
>>> > don't spend time on explaining what you already know. Do you know what
>>> > at join dependency is? Do you know when it is trivial?
>>>
>>> > Btw. where and in what context did you read that SQL server did not
>>> > support this normal form? That is a rather odd statement since the
>>> > normal form is just about how much to split your relations into
>>> > projections, so strictly speaking it needs no support at all form the
>>> > DBMS. But perhaps support for temporal features was meant?
>>>
>>> >> Any more explanation and preferably an example would help in
>>> >> understanding the concept behind this normal form.
>>>
>>> > Informally put it says that every distinct fact gets its own relation
>>> > or "if you can split, then you should". So if you have a relation
>>> > Student(student_id, name, address) then the fact that the student with
>>> > a certain id has a certain name is split form the fact the this
>>> > student lives at a certain address. This is different from 5NF since
>>> > there you only split when there is a risk of redundancy or update
>>> > anomalies.
>>>
>>> I think it is important to emphasize the fact that vertically splitting
>>> a
>>> 5NF relation into a set of 6NF relations has consequences, specifically
>>> the
>>> need to enforce mutual foreign keys or a circular inclusion dependency,
>>> depending upon the number of resulting 6NF relations. This is important
>>> because support in commercial RDBMSs for enforcing such constraints is
>>> severely limited, if not nonexistent, since it requires an
>>> implementation of
>>> multiple assignment. Therefore splitting should only be done when
>>> absolutely necessary, for example, to support a temporal dimension.
>>
>> Agreed. I would add that this is not specific for going from 5NF to
>> 6NF but anywhere you decompose to go from a lower to a higher normal
>> form.
>>
>
> Not always. If two sets of attributes are independent, as is the case
> when moving from 1NF to 2NF, then there is no need for a referential
> constraint; if two projections are independent, as is the case when moving
> from BCNF to 4NF, then there is no need for a referential constraint. But
> I see what you're saying. Decomposing a 2NF schema into a BCNF schema is
> not always dependency preserving. There is also the rare cyclical
> referential constraint that may be required when moving from 4NF to 5NF.
>
> It is my understanding that the normalization process is used to choose a
> database schema whose instances can contain /at least the same/
> information content as the former but avoids the update anomalies that
> exist in the less normalized schema. The presence of a functional
> dependency in a schema indicates that each tuple represents an atomic
> formula that contains an implication. The presence of "mutual" functional
> dependencies, such as is the case when there is more than one key,
> indicates that each tuple represents an atomic formula that contains a
> biconditional. It is easy to see that the absence of the second
> functional dependency indicates that a projection that does not include
> the dependent is independent; whereas the existence of a tuple in the
> projection that includes the dependent implies the existence of a tuple in
> the other. Thus decomposing a 2NF relation schema that has a transitive
> FD into two schemata requires only one referential constraint in order for
> an instance to have /at least the same/ information content. In order to
> ensure that instances of the new schema can contain /exactly the same/
> information content /and no more/, a pair of referential constraints is
> required ("mutual" foreign keys, if you will).
>
> The difference between moving from 1NF to 5NF and moving from 5NF to 6NF
> is that breaking up a set of two or more dependent attributes in a 5NF
> relation /always/ requires a cyclical constraint in order for an instance
> of the database schema to be able to contain /at least the same/
> information content. If a relation is already in 5NF, then there is no
> way to separate the set of dependent attributes without losing
> information. Suppose that A, B and C are dependent attributes in a 5NF
> relation schema. Then FD ABC --> ABC is implied by the candidate key, as
> well as the FDs ABC --> A, ABC --> B, and ABC --> C. Splitting the
> relation vertically so that each relation schema is in 6NF causes those
> FDs to be lost. The only way for an instance of the database schema to be
> able to contain /at least the same/ information content as an instance of
> the 5NF schema is to introduce a cyclical referential constraint.
>

A little clarification. Suppose that K is the set of attributes that is the key of the above 5NF relation schema. Then every valid relation value must satisfy the FD K --> ABC. As long as A, B and C are in the same relation, the closure of the set of FDs is

K --> A, K --> B, K --> C,
K --> AB, K --> AC, K --> BC, and
K --> ABC

Once A, B and C are split into 6NF relations, however, the closures of the set of FDs is

for {K, A} K --> A,
for {K, B} K --> B, and
for {K, C} K --> C.

Clearly, the following FDs are not part of the database schema:

K --> AB, K --> AC, K --> BC, and K --> ABC.

If a database schema is already in BCNF, then the loss of these FDs clearly signals information loss, since by definition, every FD must be a logical consequence of the candidate keys. That's why it's /always/ necessary to introduce a cyclical set of referential constraints when decomposing a 5NF relation schema into a set of 6NF schemata.

>>> It's somewhat unrelated, but important, nonetheless, to point out that
>>> if a
>>> relation has a temporal dimension, then either every key that it
>>> references
>>> in a foreign key constraint must be immune to updates, or the referenced
>>> relation must also have the same temporal dimension. Otherwise, history
>>> would have to be rewritten every time an update targets that referenced
>>> key.
>>
>> Why would the DBMS not allow history to be rewritten? Of course it
>> should allow you to specify that in certain cases it cannot, but in
>> general I don't think that is a good idea, if only because you want to
>> be able to correct mistakes afterwards.
>>
>
> I never meant that it shouldn't allow it; I meant that it shouldn't
> necessitate it.
>
>> -- Jan Hidders
>>
>
>
Received on Fri Aug 03 2007 - 21:40:37 CEST

Original text of this message