Re: which data model is better?

From: Daniel Morgan <damorgan_at_x.washington.edu>
Date: Sat, 17 Jan 2004 19:13:30 -0800
Message-ID: <1074395538.965924@yasure>

Hans Forbrich wrote:

> dx wrote:
>

>>Hi Daniel,
>>
>>Thanks for your comment. But, actually for our application, in 99.99%
>>there is no need for third address, I'm pretty sure there is more
>>chance to add other attributes like spoken_language, birth_date, etc
>>than to add a third address. So even you choose a separate address and
>>phone table, the seperate table model still has almost the same chance
>>to change with one table model.
>>
>>In my opinion, performance and ease of use is always my goal to design
>>data model. Suppose a simple business question like getting all
>>information of customers who live in city 'TORONTO' and business phone
>>like '416%', it's always much harder to get answer from seperate table
>>model than
>>
>>select * from customer where home_city = 'TORONTO' and business_phone
>>like '416%';

>
>
> Retrieving all this data is easily handled by a view on the base
> tables. IMO, this is not a reasonable excuse to denormalize.
>
>

>>And how many extra logic I/O will be performed for seperate table
>>model? you really want to trade performance and ease of use for this
>>imagined flexibility?

>
>
> Will there be inserts and updates? If so, will there be indexes across
> all the subsections of data?
>
> When you are looking mainly at performance, if you look at the whole
> picture (including DML) you *might* find that maintaining all the
> indexes across a denormalized table (such as you propose) will actually
> hurt in comparison to maintaining the same set of indexes across several
> tables.
>
> Also, having split the data across several tables, I perceive some
> situations where the logical IOs can be reduced. Suppose the
> 'selecting' table can fit into fewer blocks - a full scan will get the
> matching criteria faster. (Joins, being done on a primary key, should
> be fast and few IOs). Also, if a denormalized row ends up being
> extended to larger than a block, you could end up with some potential
> negative impacts.
>
>

>>Yes, I would follow CODD's rule, but I don't want to rigidly stick to
>>it.

>
>
> This does need some detailed benchmarking in your environment, including
> the actual DML patterns for your environment. Especially with 8i, I've
> found that a balance in performance is achieved by sticking to the
> rules.

I think one can safely say, as a generalization, that flat file models are faster than relational models. The "compelling" reason I refer to when I say there should be a compelling reason to denormalize ...is performance. But that performance hit should be measured ... not anticipated. If it exists ... then denormalize. Consider how many indexes it would take to handle phone number area codes if they are in separate columns ... not just as many ... but more.

But as you point out ... a smaller row, one that fits into a block, will have a benefit too. Or a situation where fetching a single block returns all of the information I need rather than several blocks ... is far more efficient of resources.

One thing I hear often is complaints from report writers who get tables designed by horizontal thinkers and who then expect some poor person, at a different desk in a different department, to figure out how to put it all back together again. If developers included report writers in their design meetings they might find out just how little appreciation there is for many denormalizations.

-- 
Daniel Morgan
http://www.outreach.washington.edu/ext/certificates/oad/oad_crs.asp
http://www.outreach.washington.edu/ext/certificates/aoa/aoa_crs.asp
damorgan_at_x.washington.edu
(replace 'x' with a 'u' to reply)

Received on Sat Jan 17 2004 - 21:13:30 CST