Cannot collect Top Frequency Histogram on column [message #590855] |
Tue, 23 July 2013 08:36 |
Andrey_R
Messages: 441 Registered: January 2012 Location: Israel
|
Senior Member |
|
|
Hi all,
I'm trying to collect histograms for column COL_C of table TAB_A(150K records),
So an index "BAD_IDX" will *not* be used in a query when the value is not selective.
This is my query:
SELECT COL_A, COL_B , COL_C , COL_D , COL_E , COL_F
FROM TAB_A
WHERE COL_A = 050
AND COL_B = 13012345
AND COL_C = 0
AND COL_D = 0
AND COL_D >= '07/23/2013 00:00:00'
ORDER BY COL_E ASC;
Now, I have index "BAD_IDX" on columns (COL_C, COL_E ).
and the distribution of values looks like this:
select COL_C, count(*)
FROM TAB_A --very not selective for 0, selective for the rest, also no histogram
group by rollup(COL_C)
order by 2 desc;
and the result is 20k row long (20k distincts),
So I'll post just the top part of it:
COL_C COUNT(*)
----- --------
158382
0 86356
6955837 72
6230441 69
6595589 64
6800455 64
6896642 63
6523986 61
7011681 60
6706623 60
6878281 60
6644220 59
7015911 59
Now, the problem with the query was that "COL_B = 13012345" was the most selective predicate,
And an index for it did not exist, so the index "BAD_IDX" is used, and is scanning 86k records (all the "0" value records for column COL_C)!
So, I created an index
Create index GOOD_IDX on TAB_A(COL_B) compute statistics;
However, that BAD_IDX index is still being used!!!
I've thought that maybe it's because the lack of histograms for the column COL_C.
I've also understood from documentation I've read that the suitable histogram type is TOP FREQUENCY,
Because although I have 20k distincts here for the column COL_C, what does the difference is the 86k records of value 0.
So, I try to create histograms for the column:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (
ownname => 'ANDREY'
, tabname => 'TAB_A'
, method_opt => 'FOR COLUMNS COL_C'
);
END;
/
But then when I check - I see that
1. My query is still using the wrong index
2. My histogram is height-balanced and not FREQUENCY
select histogram from dba_tab_columns where table_name='TAB_A'
and column_name='COL_C' and owner=user;
The result is:
HISTOGRAM
----------
HEIGHT BALANCED
What do I need to do in order to
1. Make my query use the correct index
2. Create a FREQUENCY histogram (or please explain if it doesn't make sense and I got things wrong with my approach)
Thanks in advance,
Andrey
|
|
|
Re: Cannot collect Top Frequency Histogram on column [message #590880 is a reply to message #590855] |
Tue, 23 July 2013 13:06 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
The CBO decides the path on the basis of the information fed to it. YOU say "BAD_IDX" to be bad and "GOOD_IDX" to be good, however, the optimizer decided the other way round to be the best way!
So based on what do you say BAD_IDX is bad. Can you post the execution plan with the predicate info included. Also, can you check the clustering factor of the indexes that are being used.
If you totally wan't the BAD_IDX not to be used, then use the hint - /*+ NO_INDEX(TAB_A BAD_IDX) */ and check the execution plan to verify.
[Updated on: Tue, 23 July 2013 13:15] Report message to a moderator
|
|
|
Re: Cannot collect Top Frequency Histogram on column [message #591215 is a reply to message #590880] |
Thu, 25 July 2013 10:04 |
Andrey_R
Messages: 441 Registered: January 2012 Location: Israel
|
Senior Member |
|
|
Quote:So based on what do you say BAD_IDX is bad
1. Based on the fact that the cardinality of the values in the table.
COL_B = 13012345 appears 17 times, whereas COL_C = 0 appears 86,500 approx.
This is why I assumed that the BAD_IDX is bad (it has no COL_C as a leading column, doesn't have COL_B at all).
2. Based on the fact that when I executed the query with an index hint for GOOD_IDX - it ran in a fraction of a second.
At night we have a statistic collection job running,
And as I check now - my GOOD_IDX is used (histogram is still HEIGHT BALANCED though).
The query now retrieves no records(it retrieved 1 yesterday) - I assume this does not matter, but just to be sure,
I'll wait for a couple of days for a situation when I can re-test this with users performing the process in the application.
At the moment I assume this had to do with the table's statistics,
And I will monitor this for a few more days.
Thank you for your help.
If anyone can give me a few words explanation about why the histogram is height balanced and not frequency - It will be great.
Thanks for your effort.
Regards,
Andrey
[Updated on: Thu, 25 July 2013 10:10] Report message to a moderator
|
|
|
|