Bitmap Index and Histogram [message #181798] |
Tue, 11 July 2006 10:07 |
srinivas4u2
Messages: 66 Registered: June 2005
|
Member |
|
|
Hi,
I have a question with regard to "Bitmap Index" and "Histograms". Let me explain my understanding of them first -
Bitmap indexes- Use them in low cardinality and with "and" and "or" predicates. Low cardinality is with refernce to the total number of rows in the table.
Histograms - Create histograms on tables where the data is skewed. Now here does skewed mean totally irrelevant values (of the column say sal) between the rows or does it mean one value being very very repetitive?
What I mean by irrelevant is say -
row1 - sal =1000
row2 - sal =10000000
row3 - sal =10
row4 - sal = 999999999999
I know that's a great sal to get...I am just saying for an eg!
And repetitive -
1000 rows haning sal as 100000 another 1000 having 50000 another 500 having 25000 etc.
And on what decision does histogram help the optimizer to make? Is it with respect to using an index or the kind of join?
If histograms are to be created based on irrrelevant values then I have no confusion with histograms and bitmap. But if histograms are based on repetitive values then Bitmap and Histogram seem to be closely related. That is, if it's an OLTP then we'd not create bitmap and use b-tree if needed with histogram and if it's not an OLTP then we create bitmap (due to dml)? And after creating bitmap do we need histogram?
Thanks,
Srinivas
|
|
|
Re: Bitmap Index and Histogram [message #181845 is a reply to message #181798] |
Tue, 11 July 2006 21:53 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Your idea of bitmap indexes is pretty good. Histograms are good for identifying highly repetetive values in a column that otherwise has low cardinality (not many rows per value).
Histograms can have a number of effects on the optimizer, but it all boils down to the CBO estimating how many rows will be returned from a particular table in a query. All other effects (to use or not to use an index, which join type to use) flow on from there.
eg. If you have a predicate WHERE sal= 0 and there are millions of rows that have 0, then a Full Table Scan might be better than an index scan. However if WHERE sal = 100 would return only a few rows, then an index might be better. In this way, two similar queries can have different execution plans.
This type of skewed column is a bad candidate for a bitmap index because it has way too many different values for a bitmap index to be efficient - only one or two of the values are skewed. Also remember that bitmap indexes are typically only useful when combined on in two or more AND/OR predicates.
Ross Leishman
|
|
|
Re: Bitmap Index and Histogram [message #182042 is a reply to message #181798] |
Wed, 12 July 2006 12:35 |
srinivas4u2
Messages: 66 Registered: June 2005
|
Member |
|
|
Thanks for the response Ross!
However, I am still not clear about the concept.
Let's try to work with an example, consider a table is having one million rows. Taking into consideration each of the distinct value being used in the "where" clause please tell me in which scenario bitmap or histogram is best.
For the sake of simplicity let us further assume that this table in involved in multi "and" predicates, which is an ideal situation for bitmap.
1. If tha table has 100 distinct values equally distributed among the 1 million rows.
2. If the table has 10 distinct values equally distributed among the 1 million rows.
3. If the table has 100 distinct values with 2 distinct values forming 0.9 million rows and the rest 98 distinct values forming the remaining 0.1 million rows.
4. If the table has 10 distinct values with 2 distinct values forming 0.4 million, 4 distinct values forming 0.4 million and the remaining 2 distinct values forming 0.2 million rows.
5. If tha table has 2 (let's say m & f) distinct values with one (m) forming 0.9 million rows and the other (f) forming 0.1 million rows.
Thanks,
Srinivas
|
|
|
Re: Bitmap Index and Histogram [message #182044 is a reply to message #181798] |
Wed, 12 July 2006 12:44 |
srinivas4u2
Messages: 66 Registered: June 2005
|
Member |
|
|
Is there any way how we can determine which column is a good candidate for histogram? And what is histogram number that could be used in datawarehouse in general?
Appreciate your response!
Thanks,
Srinivas
|
|
|
|
Re: Bitmap Index and Histogram [message #182055 is a reply to message #181798] |
Wed, 12 July 2006 14:02 |
srinivas4u2
Messages: 66 Registered: June 2005
|
Member |
|
|
Mahesh,
Appreciate your interest in helping!
But I have gone through that article and many more. I know how a bitmap and btree works and the pros and cons of both, but I am not in "Complete" understanding of the histogram concept yet as the permutations are high whenever the data is skewed.
And what is the line that separates "data skewness - for histogram" and "repetitive data - but ideal for bitmap". Maybe if I get answers to the questions it would help!
Thanks,
Srinivas
|
|
|
Re: Bitmap Index and Histogram [message #182080 is a reply to message #182042] |
Wed, 12 July 2006 22:21 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
1. If tha table has 100 distinct values equally distributed among the 1 million rows. Bitmap if used in conjunction with other columns.
2. If the table has 10 distinct values equally distributed among the 1 million rows. Bitmap
3. If the table has 100 distinct values with 2 distinct values forming 0.9 million rows and the rest 98 distinct values forming the remaining 0.1 million rows. Bitmap and histogram
4. If the table has 10 distinct values with 2 distinct values forming 0.4 million, 4 distinct values forming 0.4 million and the remaining 2 distinct values forming 0.2 million rows. Bitmap
5. If tha table has 2 (let's say m & f) distinct values with one (m) forming 0.9 million rows and the other (f) forming 0.1 million rows. Bitmap and histogram
6. If the table has 1M rows with 100,002 distinct values with 2 distinct values forming .5 million, 100,000 distinct values forming 0.5 million. B-tree and histogram
7. If the table has 1M rows distinct values evenly distributed. B-tree
100 is not a large number of distinct values for a bitmap index. 2500 is getting up there, but I would still be benchmarking it against a b-tree index on a large table for up to 10000 distinct values.
Indexed access to a table is ONLY efficient if it will allow you to identify less than between 1 and 10 percent of the rows. Somewhere between 1% and 10% - depending on your scenario - it becomes faster to do a full table scan.
Thats why a bitmap index needs to be used in conjunction with other bitmap indexes. On its own, it will typically return more than 1% and frequently more than 10%. You need to add further AND predicates to trim down the number of matching rows.
Histograms help here, because they can tell you whether the values you used in your SQL are skewed. If so, then the CBO will be able to ignore that index REGARDLESS of whether it is bitmap or b-tree.
Ross Leishman
|
|
|
Re: Bitmap Index and Histogram [message #182188 is a reply to message #181798] |
Thu, 13 July 2006 08:43 |
srinivas4u2
Messages: 66 Registered: June 2005
|
Member |
|
|
Ross,
Great!
That's a PERFECT answer!
Thanks A Ton!
I was wanting an answer like that!
Is it bad to collect histogram on all columns for all tables when you don't know whether the data is skewed or not? Does it have any negative impact on performance?
Thanks,
Srinivas
|
|
|
Re: Bitmap Index and Histogram [message #182237 is a reply to message #182188] |
Thu, 13 July 2006 21:42 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
No material performance drop that I know of for collecting histograms on all columns. Obviously, it will take longer to gather statistics though. If you have enormous tables, you may want to find some compromise.
Also, remember that histograms are only of any practical use when you hard-code values in your WHERE clause. If you use bind-variables (which is best-practise), then histograms are very nearly useless. Search the Performance Tuning Guide for "bind variable peeking". It is both a boon and a pain to tuning bin-variable SQLs on skewed data.
Ross Leishman
|
|
|
|