Re: Representation for Heterogeneous Attribute Set

From: <robertbrown1971_at_yahoo.com>
Date: 11 Feb 2005 09:05:53 -0800
Message-ID: <1108141553.730732.25110@f14g2000cwb.googlegroups.com>

Joe Celko,

I actually bought and studied back to back your book "Trees and Hierarchies in SQL" in search of the solution to this problem. The book is great and is the only one on the market (that I could find) that deals head-on with the common problem of hierarchy represnetation in SQL. I learned about the nested set model, path enumeration, id-parentid etc. It has earned a permanent place on my reference book shelf but unfortunately it does not address the problem I outlined in this post. I'm not being faceteous here - I really did like the book a lot.

I am also well familiar with the post you provided in reponse to my question. I studied it in the archives and I understand it. In fact, I used the approach you outlined elsewhere several times. I also own your "SQL for smarties book" which is also very good. Now to address your specific questions:

-CELKO- wrote:
> Do the auditors and the feds know that anyone can invent a new kind
of
> security in your system? I doubt it.

Robert replies:

This is not a trading system. It is an ANALYSIS tool where the analyst can invent new bonds and other instruments and see how they behave and how they affect the portfolio behaviour. A common way to analyze hedges for a portfolio is to try out different possibilities to see which ones work in a computer simulation. Then once you find the right one - then you see if there are existing combinations of securities (bonds, call/put options, floating bonds, commodity linked bonds etc) that can approximate the behaviour of the ideal hedge you modelled.

-CELKO- wrote:

> The classic scenario calls for a root class with all the common
> attributes and then specialized sub-classes under it. As an example,
> let's take the class of Vehicles and find an industry standard
> identifier (VIN), and add two mutually exclusive sub-classes, Sport
> utility vehicles and sedans ('SUV', 'SED').

Robert repies:

This is the generalization hierarchy approach. It does not work in this case since the subtypes are quite numerous and can be created dynamically. Your beef was primarily with dynamic creation of entities. That's actually not the only issue. Suppose we ditch the dynamic creation you still have the problem that you have thousands of tables, one for each bond type. If your portfolio has 800 bonds of 100 different types then you need to do a 100 joins when the user pulls up the portfolio. The performance of that is far worse than that of the aesthetically ugly name-value approach.
The generalization hiearachy makes sense *only* when many attributes are shared and only some attributes are specific to the subtype. That is not the case here there is very little overlap in attributes among the different bond. The only attribute they all have in commong is perhaps maturity but even here there are exceptions. For example one might think all debt instruments would have an interest rate. Not the case, there are zero coupon bonds. You might think every debt instrument has a maturity. Not the case, the maturity can be conidtional. And so on.
When wrote the post I suspected, as Dawn Walthius wrote, that this problem is simply not amenable to relational modelling. As several posters noted this use case can be very easily represented in an OO system or in an hierarchical persistence such as XML. I do not see a good solution for this in relational terms. It's a choice among imperfect alternatives.

> CREATE TABLE Vehicles
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) NOT NULL
> CHECK(vehicle_type IN ('SUV', 'SED')),
> UNIQUE (vin, vehicle_type),
> ..);
>
> Notice the overlapping candidate keys. I then use a compound
candidate
> key (vin, vehicle_type) and a constraint in each sub-class table to
> assure that the vehicle_type is locked and agrees with the Vehicles
> table. Add some DRI actions and you are done:
>
> CREATE TABLE SUV
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
> CHECK(vehicle_type = 'SUV'),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Vehicles(vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> CREATE TABLE Sedans
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
> CHECK(vehicle_type = 'SED'),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Vehicles(vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> I can continue to build a hierarchy like this. For example, if I had
a
> Sedans table that broke down into two-door and four-door sedans, I
> could a schema like this:
>
> CREATE TABLE Sedans
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
> CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Vehicles(vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> CREATE TABLE TwoDoor
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
> CHECK(vehicle_type = '2DR'),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Sedans(vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> CREATE TABLE FourDoor
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
> CHECK(vehicle_type = '4DR'),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Sedans (vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> The idea is to build a chain of identifiers and types in a UNIQUE()
> constraint that go up the tree when you use a REFERENCES constraint.
> Obviously, you can do variants of this trick to get different class
> structures.
>
> If an entity doesn't have to be exclusively one subtype, you play
with
> the root of the class hierarchy:
>
> CREATE TABLE Vehicles
> (vin CHAR(17) NOT NULL,
> vehicle_type CHAR(3) NOT NULL
> CHECK(vehicle_type IN ('SUV', 'SED')),
> PRIMARY KEY (vin, vehicle_type),
> ..);
>
> Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD
> OF trigger to those VIEWs.
Received on Fri Feb 11 2005 - 11:05:53 CST