Re: Towards a definition of atomic

From: David BL <davidbl_at_iinet.net.au>
Date: Sun, 3 Feb 2008 18:09:34 -0800 (PST)
Message-ID: <4b3060eb-d57f-4a4e-aa99-43d547f85970_at_s19g2000prg.googlegroups.com>


On Feb 3, 11:41 pm, Jan Hidders <hidd..._at_gmail.com> wrote:
> On 2 feb, 02:45, David BL <davi..._at_iinet.net.au> wrote:
> > On Feb 2, 3:42 am, Jan Hidders <hidd..._at_gmail.com> wrote:
> > > On 1 feb, 14:30, David BL <davi..._at_iinet.net.au> wrote:
>
> > > > AFAIK the conventional wisdom is that no absolute definition of atomic
> > > > exists for domain types.
>
> > > On the contrary, we have too many of them. :-) But I don't think it is
> > > wise to go looking for a definition if it is not clear to you want you
> > > want to do with that definition. If you don't know where you are
> > > going, any road will take you there.
> > > > Throwing caution to the wind, in this post I
> > > > wish to conjecture a definition of atomic that, perhaps with some more
> > > > effort at its formalisation, can provide some absolute meaning for a
> > > > given attribute within a given RDB schema.
>
> > > In view of such blatant carelessness I can only respond by not heeding
> > > my own warning. :-)
>
> > > The usual intuition is that something is not atomic if it is
> > > decomposable into smaller parts. In this case that would mean parts
> > > with less information. For the case of decomposition into two
> > > components this leads to something like:
>
> > > DEFINITION: A domain D is said to be decomposable if there are domains
> > > D1 and D2 and functions f1 : D -> D1 and f2 : D -> D2 such that (1) f1
> > > and f2 are not injective and (2) <f1,f2> is injective where <f1,f2> :
> > > D -> (D1 x D2) is defined such that <f1,f2>(x) = (f1(x), f2(x)).
>
> > > Note that (1) says that each individual decomposition function loses
> > > information and (2) says that together they don't. However, we can
> > > then make the following observation:
>
> > > THEOREM: A domain is decomposable iff it contains more than 2 values.
>
> > I like your approach to defining a *non-trivial* decomposition, by
> > requiring that f1,f2 not be injective.
>
> > However, I don't think your definition captures the right idea. For
> > instance, my example 2 decomposes a domain expressed as a set of
> > person names to a single name. In that case what is D1,D2?
>
> The definition is aimed at only establishing that something is
> decomposable, not exactly how you should decompose it.

Yes, that makes sense.

> For a set of
> names it clearly holds that it is decomposable and you can pick many
> different D1s and D2s that show this. For example, you can split the
> set into a set of strings before "Laura" and a set after "Laura".
> Whether that makes sense or not, is another question because you are
> looking for an absolute definition. It might, and that is enough. If
> you want to say more about how you want to decompose, you could define
> something like the following:
>
> DEFINITION: A domain D is said to be set-decomposable if there is a
> domain D' and an injective function f : D -> P(D'), where P is the
> power-set operator, such that the number of values in D' is smaller
> than the number of values of D minus 1.
>
> It think you will understand why D' has to be smaller than D. The
> "minus 1" may be a bit surprising, but otherwise the notion becomes
> trivial in the sense that every domain will be set-decomposable: you
> can map the domain {v_1, v_2, ... } to P({v_2, ...}) by mapping v_1 to
> the empty set and for i > 1 mapping v_i to { v_i }.
>
> Guess what?
>
> THEOREM A domain is set-decomposable iff it contains more than 3
> values.
>
> So set-decomposability implies decomposability, but not the other way
> around.

What about when D is an infinite domain? Talking about number of values in D minus 1 doesn't seem applicable.

In example 2, the values to be decomposed were sets of person names. So I presume the domain D to be decomposed is the power set over the set of strings, D' is the set of strings and f can simply be the identity. Is that right? Now D and D' both have infinite cardinality. However it seems clear that D' should be deemed smaller in some sense. In fact Cantor proved there is no surjection from a set to its power set, which establishes the result that D' has a smaller cardinality than D. In fact only D' is countably infinite.

> > Since finite domains with more than two values are quite useful, my
> > attempted definition is clearly a waste of time.
>
> I don't think the exercise is completely pointless, or we wouldn't be
> having this conversation. :-) It's important to see what happens if
> you try to come up with an absolute definition of atomicity. But you
> have to keep in mind that as far as the relational model is concerned
> atomicity is not an absolute concept but a relative concept. It's
> relative to the Universe of Discourse that is being modeled. From
> different points of view the same thing might be considered atomic or
> not. This is related to how the data is going to be used in the
> database. If in your UoD something is considered to be composed then
> chances are that you may want separate indexes on these components,
> constraints that refer to these components, joins on only some of the
> components, etc., and so flattening the relation is probably a good
> idea.
>
> Apologies if I'm kicking in an open door here. It's just to offset a
> few remarks about atomicity not being relevant in the context of the
> relational model. IMO such remarks show a deep misunderstanding of
> what the relational model is essentially about.

What do you think of the idea to use existence of a bijection as a definition of information equivalence between databases (expressed with alternative DB schema)?

Do you think the following is a valid principle?: Identifiers that stand for mathematical values increase the entropy of the database and should be regarded as representing "noise", and indicate that decomposition of domain types has been taken too far. Received on Mon Feb 04 2008 - 03:09:34 CET

Original text of this message