Re: Sixth normal form
Date: Mon, 20 Aug 2007 14:32:49 GMT
Message-ID: <lChyi.9370$3x.342_at_newssvr25.news.prodigy.net>
"Jan Hidders" <hidders_at_gmail.com> wrote in message news:1187603947.341232.24570_at_g4g2000hsf.googlegroups.com...
> On 20 aug, 10:16, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>>
>> news:1187520309.177299.208460_at_57g2000hsv.googlegroups.com...
>>
>>
>>
>> > On 18 aug, 09:26, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>> >> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>>
>> >>news:1187391299.353682.322830_at_w3g2000hsg.googlegroups.com...
>>
>> >> > On 17 aug, 19:15, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>>
>> >> >> [... big snip ...]
>>
>> >> >> If the goal is a database schema that can represent exactly the
>> >> >> same
>> >> >> information content, then the cyclical interrelational constraint
>> >> >> is
>> >> >> required; if the goal is a schema that can represent additional
>> >> >> information
>> >> >> without contradicting the closure of the set of FDs and INDs for
>> >> >> all
>> >> >> schemata that are equivalent to the less normalized schema, then
>> >> >> the
>> >> >> cyclical interrelational constraint is not always required, except,
>> >> >> of
>> >> >> course, when moving from 5NF to 6NF.
>>
>> >> > *sigh* You already said this, and I already explained that under the
>> >> > usual definitions of those terms they are *never* required,
>> >> > including
>> >> > when going from 5NF to 6NF. You replied that you are using other
>> >> > definitons but apart from some informal examples you never gave a
>> >> > good
>> >> > definition nor a good motivation why that should be the definition.
>> >> > I
>> >> > think the onus is on you here to show why you want to depart from
>> >> > rather well-established terminology.
>>
>> >> It all boils down to the domain closure assumption, which states that
>> >> the
>> >> only individuals that exist are represented by values in the body of
>> >> the
>> >> database, and the identity relation, =, which guarantees that no
>> >> matter
>> >> how
>> >> many times a value appears, there is only one individual represented
>> >> by
>> >> that
>> >> value. If you have a database schema consisting of a single relation
>> >> schema
>> >> that satisfies the functional dependency A --> B, then due to the
>> >> domain
>> >> closure assumption, the existence of an individual that is represented
>> >> by
>> >> a
>> >> value for A depends upon the existence of a specific individual that
>> >> is
>> >> represented by a value for B. So if the values a1 and b1, for A and B
>> >> respectively, appear in the same tuple, then a denial of the
>> >> existence
>> >> of
>> >> the individual represented by b1 denies the existence of the
>> >> individual
>> >> represented by a1, but a denial of the existence of the individual
>> >> represented by a1 does not necessarily deny the existence of the
>> >> individual
>> >> represented by b1, since there could be another tuple that has the
>> >> values
>> >> a2
>> >> and b1.
>>
>> >> Now suppose that the relation schema also satisfies the functional
>> >> dependency B --> C. Then if the values a1, b1 and c1, for A, B and C
>> >> respectively, appear in the same tuple, then a denial of the existence
>> >> of
>> >> the individual represented by c1 denies the existence the individual
>> >> represented by b1 and transitively the existence of the individual
>> >> represented by a1. When the relation schema is decomposed into a
>> >> family
>> >> of
>> >> relation schemata such that A and B appear in one relation schema and
>> >> B
>> >> and
>> >> C appear in another, then the denial of the existence of the
>> >> individual
>> >> represented by c1 no longer denies the existence of the individuals
>> >> represented by b1 and a1. This is the problem. This is why I think
>> >> that
>> >> an
>> >> inclusion dependency is required.
>>
>> > I am quite impressed. How can you make something so trivial sound so
>> > incredibly complicated? :-) Of course, if during normalization you
>> > split R(A,B,C) into R1(A,B) and R2(A,C) and don't add any inclusion
>> > dependencies you can have C's without associated B's and vice versa.
>> > If that is a problem, then add the INDs. That is basically all that
>> > you have shown in the above. But your claim was that this is (almost?)
>> > always a problem, and for that you have not provided any supporting
>> > argumentation at all.
>>
>> Perhaps it is because this is something that appears trivial since the
>> solution is so intuitively obvious that it is so incredibly complicated
>> to
>> formulate an argument. :-) The above was an attempt (apparently
>> inadequate)
>> to identify the root cause of the problem and to show that it can serve
>> as a
>> logical basis for determining when there should be an inclusion
>> dependency
>> (but not necessarily when there should not).
> > We already have that logical basis. You simply have to check if it is > ok if there are any C's without B's and/or if there are any B's > without C's. Invoking high-sounding principles like the domain closure > assumption does not explain anything that was not clear already. >
But not in the general case. See below.
>> > Note btw. that your argument is symmetric so if you indeed don't want
>> > to change the nature of the relationships you need INDs in both
>> > directions.
>>
>> I don't think it is, exactly. For a functional dependency X --> Y, there
>> can be more than one value for X that determines a particular value for
>> Y.
>> So if x1 determines y1 then the statement, "x1 cannot appear in the
>> relation
>> unless y1 also appears." is true, but the statement "y1 cannot appear in
>> the
>> relation unless x1 also appears." is false. So since the first statement
>> is
>> true, it should remain true for the family of relations; hence the need
>> for
>> an IND in that direction. Since the second statement is already false
>> there's no need for an IND in the other direction in order to ensure that
>> it
>> remains false.
> > You are looking at the wrong line of symmetry. If you split R(A,B,C) > into R1(A,B) and R2(A,C) then there are two lowerbounds you need to > worry about: > - for every C there is at least one B > - for every B there is at least one C > There is no reason to consider only one of them just because there is > an FD associated with it. Especialy since FDs don't really say > anything about lower bounds. >
> > -- Jan Hidders >Received on Mon Aug 20 2007 - 16:32:49 CEST