Re: Some really confusing things about parent-child relationship
Date: Wed, 15 Aug 2007 09:01:04 -0700
Message-ID: <1187193664.775587.230660_at_e9g2000prf.googlegroups.com>
Um, huh? Seems like a confusing mix of object oriented, directed graph, and entity relationship ideas here.
1a) Super-classes and sub-classes
OO concept - the rule is, every instance of a sub-class is also an
instance of its super-class
(or classes if multiple inheritance is allowed). Thus, all Dogs are
Animals, all Birds are Animals, but not all Anmials are Dogs or
Birds. Thus, all Dogs have all attributes of an Animal, whereas there
may also be some Dog-specific properties not applicable to Animals in
general.
1b) Mapping to ER or relational
And here is where we get lost. UML diagrams allow you to draw
something that looks like a "pointer" to represent a sub-class to
super-class relationship. This is going to be many-to-one in a single
inheritance system like Java, forming a tree-like structure. Or it
could be an acyclic directed graph in a multiple-inheritance system.
Whatever you do, you may then be confusing this with some sort of
mapping to relations (or SQL tables). Approaches to this include
- Single giant table with lots of nullable columns to hold all classes
in the tree/graph. Ugly but often done when there are just a few
classes.
- Distinct tables for each sub-class, holding all attributes including
those inherited from super-class(es). No foreign keys. Advantage is
no joins to produce instances of a specific class, disadvantage is you
need UNION views to produce instances of super-classes, and you have a
maintenance headache whenever you change the superclass properties
- Distinct tables for each sub-class, holding just the class-specific
attributes, and with foreign keys pointing to the super-class
table(s). Advantage: properly normalized, makes schema maintenance
somewhat easier. Disadvantage: you have to join through all parent
classes to produce instances of a specific class, and this is painful
to insert/update to as well. Another disadvantage: you need to do
outer-joins producing nulls to get instances at a super-class level if
you still want the class-specific attributes to surface.
2a) child-to-parent relationship
This question is confusing as phrased. It seems to be talking trying
to model class relationships themselves - is that so? If so, it
doesn't matter if we're talking about the network model or not, the
main question is do you need to support multiple-inheritance or not.
If the answer is no, then there is a 1:N relationship between a Class
and another Class (in an ER diagram this could be represented as a
line circling from the Class box back into itself with a "crows foot"
on one end). If the answer is yes, then in order to map this to SQL
you will need two tables, one for Classes and one for SubClasses.
There is a 1:N relationship between Classes and SubClasses, and there
is another 1:N relationship from SubClasses to Classes. If on the
other hand you are not using a relational representation, then all
bets are off. You seem to have switched from talking about subclasses
(an OO term with a well-accepted definition) to "child classes" which,
as used in your statements, appear to really refer to dependent or
weak entities. For example, you seem to be talking not about
situations like dog versus animal, but situations like OrderLine
versus Order - is that correct?
2b) Can a child have several parents?
Depends entirely on your definition. Let's consider a classic N:M
situation, Classes, Students, and StudentClassAssignments. How are we
to record the fact that Harry Potter is taking Potions and
Divination? In a relational system, you'd record that with a relation
having two foreign keys, one to Classes and one to Students. It would
be easy to enroll Hermiony Granger in Potions by adding a row to that
table, or to have Ron Weasley drop Divination by deleting one. In a
hierarchical model, how these things work may depend on the
implementation. For example, one could chose to "store" a list of
(pointers to) classes underneath each student. Now it's still easy to
enroll or drop a student from a class, by only changing that one
student's record. But suppose that a class is dropped (because the
Divination teachers was kicked out of Hogwarts). Now it's a pain in
the ass to go round and update all the student records for that class,
isn't it? The reverse is also true, if you store (pointers to)
students with Classes, it's easy to drop or add a Class, but it's
harder to expel the Weasley twins because you have to update all the
classes they are taking. Some pointer-based systems actually store
pointers on *both* sides of such a relationship - o the fun. It
sounds like the simple network model imposes the same restriction as a
SQL system would - you are only allowed N:1 relationships so any N:M
relationship must be broken up into two N:1 relationships with an
"intersection entity" such as StudentClassAssignments. I hope the
above examples show why this is generally a good idea anyway.
2c) Which is the child?
In a purely relational system, this seems like a meaningless
question. That said, it is often the case that when creating an ER
model, designers will choose to view a relationship as parent-child,
while choosing to view another relationship with a similar depiction
as being a referencing relationship. As a simple example, consider
Order versus OrderLine, and then consider Order versus Customer. A
Customer may place many orders (we hope), so there is an N:1
relationship from Order to Customer. Similarly, an order may be for
multiple items at once, so there's an N:1 relationship from OrderLine
to Order. Yet a designer may choose to view OrderLine as a weak/
dependent/child object of the Order, whereas he/she would choose to
view the link to Customer as a reference. In general, child objects
are on the "many" side of a relationship, and are deleted when the
object on the "one" side of the relationship is deleted. But this is
not definitive.
2d) Many relations between two entities
This one's pretty easy and probably the same between relational and
network. Suppose that I have an entity such as Sale that has two
*different* relationships to Agent, one is for the buyer's agent and
the other is for the seller's agent. These are both N:1 relationships
because Agents may make more than one sale. In fact, on a particular
Sale both relations could point to the same agent. But the point is,
they need not, and so they are distinct relationships. From the agent
side, it is a reasonable query to ask for all sales where an agent was
the seller's agent, and in a different query to ask for all sales
where an agent was the buyer's agent. Whether the relations are N:1
or N:M is another matter. If you need to allow for the possibility
that more than one agent may get credit for a sale, you could imagine
modeling both relationships (which are *still* different
relationships) as N:M. Representing that is another matter.
N:M relationships are "bad" because modeling them that way assumes that there are *never* going to be any facts particular to the instances of the relationship. For example, if I model the buyer's agent and seller's agent relationships as N:M, and I later want to capture an additional fact such as percent credit (i.e. the lead agent gets 75% and another agent gets 25%), then I will wish I had just introduced an "intersection" entity where that information could be captured, replacing the N:M with two 1:N relationships. (I would of course have to do that twice, once for the buyer's agent relationship and once again for the seller's agent relationship.)
I'm not sure if any of this is helpful. I'm not an expert on network databases. Received on Wed Aug 15 2007 - 18:01:04 CEST