RE: Looking for table design input

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 27 Mar 2009 15:31:35 -0400
Message-ID: <858715DCCC804FFDB2B358AA251B5B9E_at_rsiz.com>



I'm not sure why you're calling that a denormalization.  

Count1 through Count15 are likely on the same domain, but they are each in different roles.  

Now if you use the role name intelligently, say 20090315_count, 20090401_count, etc. then you actually need only the ID as the primary key to completely convey to the user the meaning of each column in the 16-tuple.  

Now I find it surpassing strange that a single abstract ID gives meaning about your data to your users, but perhaps there is another table to which id is a foreign key that has further description of the rows. Not strictly needed, since it is likely to be 1 to 1, but possibly useful especially if there is a lot in it and it is somewhat static compared to the (id,count(1..15)) relation, forgiving the shorthand which I hope is understandable.  

What about rotating months? That is, making a new relation where the oldest count column is removed and a newest count column is added? It seems likely to me that the drop column, add column on this relatively short row is likely to remain within a single block, so that is probably not a big deal.  

Now, as for maintenance of programs, your first impression might be "OUCH!" now if I want to set up some query that compares the two most recent "count" columns you're going to have to constantly update the programs.  

However, THAT is a tremendously good use of views, where it is quite easy to define views that you only have to trivally update in concert (just after) the drop column add column exercise. Then the reports on the relative columns remain static and are valid in context of when they are run, while ad hoc reports specifically naming columns make sense and no one has to look up what count1 means, because the column name (or role.domain column as Dr. Codd put it) is clear unto itself.  

Further, since presumably there is period in time where the newest yyyymmdd_count column must be populated, you could well populate that column before redefining the views and then drop the oldest column at your leisure. Users directly using the yyyymmdd_count name would need to have some mechanism to know the column is valid, but users of views would never be exposed to columns partially filled or partially dropped.  

Now I suspect some folks are just itching to write back that this is denormalized. Before you pull that trigger, I suggest you actually read at least pages 377-380 of Communications of the ACM, Volume 13, Number 6, June 1970.  

Regards,  

mwf  


From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Michael Moore
Sent: Friday, March 27, 2009 1:44 PM
To: oracle-l_at_freelists.org
Subject: Re: Looking for table design input  

If performance is going to be a problem, and you can improve performance be denormalization, then do it. There are many factors which you have not mentioned such as Service Level Agreement, frequency of update, average transaction size etc etc. that will determine your final decision.

On Fri, Mar 27, 2009 at 8:13 AM, <genegurevich_at_discover.com> wrote:

Hello everybody:

I am looking for some input on a table design (oracle 10.2.0.3). Here is what I know:

  • the table will probably have several hundreds of millions of rows (I am looking for a better estimate from my customers)
  • it will store the counts per ID pre date per code for each of the ID. Something like

ID - Year - Month - Day - Code1 - Code2 - Code3 - Count

  • We will have a primary key on all the columns except for the count
  • The data will be stored for the 15 half-month periods (the Day will be either 1 or 15)
  • The data will be selected based on the ID and the users will need to get all the data
  • I will partition the table by Year/Month/Day to make purging easier.

What I wonder is whether the following design (which I do not like very much) may give a better performance:

ID - Code1 - Code2 - Code3 - Count1 - Count2 - ... Count15

Because it may potentially return less rows and therefore require less IO.

Any thoughts?

thank you

Gene Gurevich

Please consider the environment before printing this email.

--
http://www.freelists.org/webpage/oracle-l



 




--
http://www.freelists.org/webpage/oracle-l
Received on Fri Mar 27 2009 - 14:31:35 CDT

Original text of this message