I'm stumped - removing statistics does the trick

From: <Cherie_Machler_at_gelco.com>
Date: Fri, 18 Aug 2000 07:37:06 -0500
Message-Id: <10593.114944@fatcity.com>

I'm at a bit of a loss. I've inherited a data warehouse app that has been having performance
problems that have steadily increased in the last month or so. Upon investigation, I realized
that some of tables/columns/indexes have old statistics and the existing histograms hadn't
been refreshed since I'd come here a month and a half ago.

So I set about refreshing and re-analyzing hoping that performance would improve. It didn't
in most cases. If anything, things got worse in a lot of areas. I tried to add a few more histograms to
critical indexed columns but there was little improvement.

Looking at the explain plan of some problem queries (huge nested explains using mostly
b-tree indexes and some small full- table scans), we isolated the problem in each query
to a single, different table. Another DBA recommended removing all the statistics on one of the small dimension tables to see what would happen. It completely changed the explain plan to one using a bunch of
unnested bit-mapped index reads. We tried the same thing on the other, large dimension table giving
us trouble and the exact same thing happened. The explain plan changed from nested loop ten levels
deep utlizing regular indexes to unnested bit-mapped index utilization.

Last night's performance was great on our batch jobs for the first time in a long time. I suspect performance
will also be very good today.

This just runs contrary to all that I understand. I would think that the statistics would help the cost-based
analyzer make the best decision and that the histograms would eradicate any problems we would have
with skewed data (which we do have).

I should note that this is not a case of switching from cost-based to rule-based (we use CHOOSE) in
these queries because all other tables in the same query still have statistics on them. I also should note
that bit-mapped indexes seem to be being misused because they are used on columns with very high
cardinality such as date fields (they worked so well on low-cardinality fields that they also used them
on their date fields as well as others). Would this make any difference? Perhaps the statistics were
revealing the high level of selectivity in the bit-mapped columns and the cost-based optimizer was
telling the database to use the regular indexes instead of the existing bit-mapped indexes. Perhaps
without the statistics, the optimizer was able to recommend the use of bit-mapped indexes instead.

We are UNIX-based, this database is version 8.0.4, the schema design is an imperfect star schema.

I am truly at a loss as to why removing the statistics would help in this situation. If anyone can give
me an idea on which path to go down, I'll research that avenue and try to figure out better what is Received on Fri Aug 18 2000 - 07:37:06 CDT