Re: Some really confusing things about parent-child relationship

From: <Doug_McMahon_at_yahoo.com>
Date: Wed, 15 Aug 2007 09:01:04 -0700
Message-ID: <1187193664.775587.230660_at_e9g2000prf.googlegroups.com>


Um, huh? Seems like a confusing mix of object oriented, directed graph, and entity relationship ideas here.

1a) Super-classes and sub-classes
OO concept - the rule is, every instance of a sub-class is also an instance of its super-class
(or classes if multiple inheritance is allowed). Thus, all Dogs are
Animals, all Birds are Animals, but not all Anmials are Dogs or Birds. Thus, all Dogs have all attributes of an Animal, whereas there may also be some Dog-specific properties not applicable to Animals in general.

1b) Mapping to ER or relational
And here is where we get lost. UML diagrams allow you to draw something that looks like a "pointer" to represent a sub-class to super-class relationship. This is going to be many-to-one in a single inheritance system like Java, forming a tree-like structure. Or it could be an acyclic directed graph in a multiple-inheritance system. Whatever you do, you may then be confusing this with some sort of mapping to relations (or SQL tables). Approaches to this include - Single giant table with lots of nullable columns to hold all classes in the tree/graph. Ugly but often done when there are just a few classes.
- Distinct tables for each sub-class, holding all attributes including those inherited from super-class(es). No foreign keys. Advantage is no joins to produce instances of a specific class, disadvantage is you need UNION views to produce instances of super-classes, and you have a maintenance headache whenever you change the superclass properties - Distinct tables for each sub-class, holding just the class-specific attributes, and with foreign keys pointing to the super-class table(s). Advantage: properly normalized, makes schema maintenance somewhat easier. Disadvantage: you have to join through all parent classes to produce instances of a specific class, and this is painful to insert/update to as well. Another disadvantage: you need to do outer-joins producing nulls to get instances at a super-class level if you still want the class-specific attributes to surface.

2a) child-to-parent relationship
This question is confusing as phrased. It seems to be talking trying to model class relationships themselves - is that so? If so, it doesn't matter if we're talking about the network model or not, the main question is do you need to support multiple-inheritance or not. If the answer is no, then there is a 1:N relationship between a Class and another Class (in an ER diagram this could be represented as a line circling from the Class box back into itself with a "crows foot" on one end). If the answer is yes, then in order to map this to SQL you will need two tables, one for Classes and one for SubClasses. There is a 1:N relationship between Classes and SubClasses, and there is another 1:N relationship from SubClasses to Classes. If on the other hand you are not using a relational representation, then all bets are off. You seem to have switched from talking about subclasses
(an OO term with a well-accepted definition) to "child classes" which,
as used in your statements, appear to really refer to dependent or weak entities. For example, you seem to be talking not about situations like dog versus animal, but situations like OrderLine versus Order - is that correct?

2b) Can a child have several parents?
Depends entirely on your definition. Let's consider a classic N:M situation, Classes, Students, and StudentClassAssignments. How are we to record the fact that Harry Potter is taking Potions and Divination? In a relational system, you'd record that with a relation having two foreign keys, one to Classes and one to Students. It would be easy to enroll Hermiony Granger in Potions by adding a row to that table, or to have Ron Weasley drop Divination by deleting one. In a hierarchical model, how these things work may depend on the implementation. For example, one could chose to "store" a list of
(pointers to) classes underneath each student. Now it's still easy to
enroll or drop a student from a class, by only changing that one student's record. But suppose that a class is dropped (because the Divination teachers was kicked out of Hogwarts). Now it's a pain in the ass to go round and update all the student records for that class, isn't it? The reverse is also true, if you store (pointers to) students with Classes, it's easy to drop or add a Class, but it's harder to expel the Weasley twins because you have to update all the classes they are taking. Some pointer-based systems actually store pointers on *both* sides of such a relationship - o the fun. It sounds like the simple network model imposes the same restriction as a SQL system would - you are only allowed N:1 relationships so any N:M relationship must be broken up into two N:1 relationships with an "intersection entity" such as StudentClassAssignments. I hope the above examples show why this is generally a good idea anyway.

2c) Which is the child?
In a purely relational system, this seems like a meaningless question. That said, it is often the case that when creating an ER model, designers will choose to view a relationship as parent-child, while choosing to view another relationship with a similar depiction as being a referencing relationship. As a simple example, consider Order versus OrderLine, and then consider Order versus Customer. A Customer may place many orders (we hope), so there is an N:1 relationship from Order to Customer. Similarly, an order may be for multiple items at once, so there's an N:1 relationship from OrderLine to Order. Yet a designer may choose to view OrderLine as a weak/ dependent/child object of the Order, whereas he/she would choose to view the link to Customer as a reference. In general, child objects are on the "many" side of a relationship, and are deleted when the object on the "one" side of the relationship is deleted. But this is not definitive.

2d) Many relations between two entities
This one's pretty easy and probably the same between relational and network. Suppose that I have an entity such as Sale that has two *different* relationships to Agent, one is for the buyer's agent and the other is for the seller's agent. These are both N:1 relationships because Agents may make more than one sale. In fact, on a particular Sale both relations could point to the same agent. But the point is, they need not, and so they are distinct relationships. From the agent side, it is a reasonable query to ask for all sales where an agent was the seller's agent, and in a different query to ask for all sales where an agent was the buyer's agent. Whether the relations are N:1 or N:M is another matter. If you need to allow for the possibility that more than one agent may get credit for a sale, you could imagine modeling both relationships (which are *still* different relationships) as N:M. Representing that is another matter.

N:M relationships are "bad" because modeling them that way assumes that there are *never* going to be any facts particular to the instances of the relationship. For example, if I model the buyer's agent and seller's agent relationships as N:M, and I later want to capture an additional fact such as percent credit (i.e. the lead agent gets 75% and another agent gets 25%), then I will wish I had just introduced an "intersection" entity where that information could be captured, replacing the N:M with two 1:N relationships. (I would of course have to do that twice, once for the buyer's agent relationship and once again for the seller's agent relationship.)

I'm not sure if any of this is helpful. I'm not an expert on network databases. Received on Wed Aug 15 2007 - 18:01:04 CEST

Original text of this message