Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: Statistical sampling and representative stats collection
In addition to the valid points already noted, another point to be made about histograms is that if you don't use the column that the histograms exist on in the WHERE clause of your statement and with a literal value (not a bind) (pre-9i), then the histogram is doing you no good.
That's what I've read in all the Oracle manuals and other sources.
Can anyone tell me if histograms take up extra disk space?
Thanks,
Cherie Machler
Oracle DBA
Gelco Information Network
Jack Silvey <jack_silvey_at_y To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com> ahoo.com> cc: Sent by: Subject: RE: Statistical sampling and representative stats root_at_fatcity.c collection om 05/25/02 09:18 AM Please respond to ORACLE-L
Larry,
You are too self-effacing. I am trying to deal with the policital and territorial aspects of this db so *we* can do some analyze testing.
I would just like to say for public consumption that Larry has database tuning skills that he undoubtedly will end up going to hell for. DID you have to sign that contract with Old Scratch in blood to get those skills, or was there a forehead branding involved, or anything?
;)
Two things that occur to me that I would like to share for discussion, comments, and outright contradiction. Didn't realize this was going to be so long, fair warning:
One measure of a central point in a data set is the average - adding all the data points together and dividing by the number of elements. Average is the mathematical central point of a data set.
Another measure is median, in which the beginning and end data points are added together and dividied by 2. Median is the point at which half of the data points are on one side and half on the other.
In a dataset with normal distribution (non-skewed), measures of centrality will be the same point. If these two (or any) measures of data centrality are not the same, your data is skewed.
Now, data skewness should not be an automatic trigger for histograms. The percentage of data skewness needs to be taken into account and the cost of suboptimal query access paths identified. For instance, if you have a large dataset that has a column with unique values, and you insert one duplicate, this will introduce skewness, but will not necessarily mean that you should add the extra parse overhead of histograms.
So, what level of skewness should trigger histogram creation? This is going to be access dependent (didn't you know I was going to say that.)
In the absence of histograms, Oracle *must* assume that the data is normally distributed, which can lead to incorrect access paths.
Let's assume a table of 100,000 people. The range of ages will be from 0 to 120 years old.
Since we will have few people in the 110-120 range *relative to the other ranges* the data is skewed. However, in the absence of histograms, Oracle will assume that the number of people in this range is equal to the number of people in all the other ranges.
Let's assume that we have a query for people in the 80-120 year old range. Since in a normal distribution this would represent 33 percent of the table, Oracle may choose to do a full table scan. However, we know from experience in the real world that there are not many people in this range, so Oracle should be using an index for this lookup.
Are you an insurance company that wants to evaluate how many people will expire next year? Histograms are in the cards for you. Are you an HR person that wants to know how many people have 30 or more working years left? Since this range will be relatively normal perhaps a histogram will cost more than it is worth.
The key to determining which colums need histograms should be a balancing act between extra parse cost and cost of incorrect access paths.
At least, *I* think so, however, I also think that Star Trek is a historical document sent back through time by our future selves. Amazing how much the future looks just like the 1960's.
;)
Picard-Riker in 2004.
/jack silvey
-- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: jack_silvey_at_yahoo.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: INET: Cherie_Machler_at_gelco.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).Received on Tue May 28 2002 - 08:23:19 CDT
![]() |
![]() |