Re: Re-design of data storage table

From: Mxyplx <jcelko212_at_earthlink.net>
Date: Sat, 21 Feb 2009 14:51:43 -0800 (PST)
Message-ID: <7c423c72-31ed-4e8a-8a35-630874a2777a_at_f11g2000vbf.googlegroups.com>



Google up the design flaw known as EAV (Entity-Attribute-Value) design. Let me do a "cut & paste" on it to give you an idea how bad it can be to query. The lack of data integrity is another problem, too.

 I found an old "cut & paste". Someone like you posted this:

CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10), -- what does null mean?
 attrib_value VARCHAR (50)); -- what does null mean?

INSERT INTO EAV VALUES ('LOCATION', 'Bedroom');
INSERT INTO EAV VALUES ('LOCATION', 'Dining Room');
INSERT INTO EAV VALUES ('LOCATION', 'Bathroom');
INSERT INTO EAV VALUES ('LOCATION', 'courtyard');
INSERT INTO EAV VALUES ('EVENT', 'verbal aggression');
INSERT INTO EAV VALUES ('EVENT', 'peer');
INSERT INTO EAV VALUES ('EVENT', 'bad behavior');
INSERT INTO EAV VALUES ('EVENT', 'other');

CREATE TABLE EAV_DATA --no constraints, defaults, DRI
(id INTEGER IDENTITY (1, 1) NOT NULL, --vague names
 bts_id INTEGER NULL,
 key_col VARCHAR (10) NULL,
 attrib_value VARCHAR (50) NULL );

INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

Ideally, the result set of the query would be Location Event count
(headings if possible)

Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1

Also, if possible, another query would return this result set. (I think I know how to do this one.)

Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1

Here is an answer From: Thomas Coleman

SELECT Locations.locationvalue, Events.eventvalue,

       (SELECT COUNT(*)
          FROM (SELECT LocationData.locationvalue,
EventData.eventvalue
                  FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
                          FROM eav_data AS TD1
                         WHERE TD1.key = 'location') AS LocationData
               INNER JOIN
               (SELECT TD2.bts_id, TD2.value AS eventvalue
              FROM eav_data AS TD2
              WHERE TD2.key = 'event'
            ) AS EventData
            ON LocationData.bts_id = EventData.bts_id
      ) AS CollatedEventData
    WHERE CollatedEventData.locationvalue = Locations.locationvalue
      AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
      FROM EAV AS T1
      WHERE T1.key = 'location') AS Locations,
   (SELECT T2.value AS eventvalue
      FROM EAV AS T2
     WHERE T2.key = 'event') AS Events

ORDER BY Locations.locationvalue, Events.eventvalue , SELECT Locations.locationvalue, Events.eventvalue
       (SELECT COUNT(*)
          FROM (SELECT LocationData.locationvalue,
EventData.eventvalue
                  FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
                          FROM eav_data AS TD1
                         WHERE TD1.key = 'location') AS LocationData
              INNER JOIN
              (SELECT TD2.bts_id, TD2.value AS eventvalue
                 FROM eav_data AS TD2
                WHERE TD2.key = 'event') AS EventData
              ON LocationData.bts_id = EventData.bts_id)
              AS CollatedEventData
    WHERE CollatedEventData.locationvalue = Locations.locationvalue
      AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
        FROM EAV AS T1
      WHERE T1.key = 'location') AS Locations,
    (SELECT T2.value AS eventvalue
       FROM EAV AS T2
       WHERE T2.key = 'event') AS Events;

Is the same thing in a proper schema as:

SELECT L.locationvalue, E.eventvalue, COUNT(*)   FROM Locations AS L, Events AS E
 WHERE L.btd_id = E.btd_id
 GROUP BY L.locationvalue, E.eventvalue;

The reason that I had to use so many subqueries is that those entities are all plopped into the same table. There should be separate tables for Locations and Events.

The column names are seriously painful. Don't use reserved words like "key" and "value" for column names. It means that the developer *has* surround the column name with double quotes for everything. And they are too vague to be data element names anyway!

There is such a thing as "too" generic. There has to be some structure or everything becomes nothing more than a couple of tables called "things". The real key (no pun intended) is commonality. Is there a pattern to the data that they want to store? It may not be possible to create one structure to rule them all and in the darkness bind them.

"To be is to be something in particular; to be nothing in particular is to be nothing." --Law of Identity

All data integrity is destroyed. Any typo becomes a new attribute or entity. Entities are found missing attributes, so all the reports are wrong.

Try to write a single CHECK() constraint that works for all the attributes of those 30+ entities your users created because you were too dumb or too lazy to do your job. It can be done! You need a case expression almost 70 WHEN clauses for a simple invoice and order system when I tried it as an exercise.

Try to write a single DEFAULT clause for 30+ entities crammed into one column. Impossible!

Try to set up DRI actions among the entities. If you thought the WHEN clauses in the single CASE expression were unmaintainable, wait until you see the "TRIGGERs from Hell" -- Too bad that they might not fit into older SQL Server which had some size limits. Now maintain it.

For those who are interested, there are couple of links to articles I found on the net:

Generic Design of Web-Based Clinical Databases http://www.jmir.org/2003/4/e27­/

The EAV/CR Model of Data Representation
http://ycmi.med.yale.edu/nadka­rni/eav_CR_contents.htm

An Introduction to Entity-Attribute-Value Design for Generic Clinical Study Data Management Systems
http://ycmi.med.yale.edu/nadka­rni/Introduction%20to%20EAV%20­systems.htm

Data Extraction and Ad Hoc Query of an Entity— Attribute— Value Database
http://www.pubmedcentral.nih.g­ov/articlerender.fcgi?tool=pub­med&pubme...

Exploring Performance Issues for a Clinical Database Organized Using an Entity-Attribute-Value Representation http://www.pubmedcentral.nih.g­ov/articlerender.fcgi?tool=pub­med&pubme...

A really good horror story about this kind of disaster is at:

http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/ Received on Sat Feb 21 2009 - 16:51:43 CST

Original text of this message