Re: Large Tables, Bad Indexes and Fake Statistics

From: Don Seiler <don_at_seiler.us>
Date: Fri, 17 Aug 2007 14:32:50 -0500
Message-ID: <716f7a630708171232u56270e88ma798aab0bb7315b1@mail.gmail.com>

I just wanted to follow-up on this issue. Thanks to Wolfgang Breitling, I've got the issue resolved. I wrote a summary on my blog, http://ora.seiler.us/2007/08/dr-statslove-or-how-i-learned-to-stop.html

Hopefully it makes sense. Try not to laugh too hard at my expense.

Thanks again to Wolfgang!

Don.

On 6/19/07, Don Seiler <don_at_seiler.us> wrote:
> I may have mentioned this before, but the archives only showed me a
> tangential reference [1].
>
> The setup:
> * Oracle 10.2.0.2 on RHEL 3.
> * Table FOO is range partitioned by date, total of 4 monthly
> partitions (rolling window)
> * There are two indexes on FOO, the PK has 10 fields (say A-J), and a
> second index has 5 fields (A,C,D from PK and two others K, L).
> * Statistics are gathered automatically via 10g default gather_stats_job
>
> The problem:
> A partition of FOO is bulk loaded around 10 times a month. The
> partition is created a few days before it will first be loaded into.
> We have a recurring problem when, after the partition is loaded into
> (2-3 million records per load), queries on FOO that have A, B, C, D in
> the WHERE clause will start using the second index. The fact that
> this second index doesn't use column B means it is walking over a HUGE
> set of records, practically a complete partition scan.
>
> It seems like an out-of-whack statistics issue. After I gather stats
> on the table (takes 2.5 hours) the queries then use the PK as desired.
> To fight this, we had just this month started calling
> dbms_stats.set_table_stats and set_index_stats to indicate a full
> month's worth of stats before loading. Then for the rest of the day
> we could fool Oracle, until the nightly stats job then set the real
> stats. However this didn't work yesterday. I'm not sure why, but I
> did notice that the dba_tab_statistics table showed a sample_size for
> my user-set stats, when I don't believe there was one before.
>
> The solution:
> I'm placing blame first on the PK. A 10-field primary key is absurd,
> especially when 82% of the queries (logged via FGA) only go against
> those first four fields. This starts a whole natural-vs-synthetic key
> argument that only dictatorial control will resolve. Anyway, my
> theory is that the CBO recognizes that, after the load, the data is
> drastically different than the statistics. Now the touchy-feely part
> of the theory is that the CBO then says all bets are off and looks at
> the two indexes and sees that the second index is much, much smaller
> and will be much easier to deal with. And that's why it chooses that
> one. Obviously this isn't very technical and my developers aren't
> buying into it. What traces can I do to spell out what the CBO is
> thinking and why it is making certain choices?
>
> So, avoiding a change to the PK (as much as it kills me), the long
> term proposed solution is to load data into a staging table, build
> indexes and gather statistics and then use partition exchange to bring
> all of that online in the "real" table. However, that would required
> a sub-partitioning of the FOO table by load units, so we can't just do
> it right off.
>
> My near term work-around is to build a non-unique index on those first
> four columns. In my opinion we'd have just this one and not that PK,
> but such things are not up to me now.
>
> I'm up for any theories or proposals at this point. My frustration
> level with this design and the refusal to fault it, is at an all-time
> high.
>
> [1] http://www.freelists.org/archives/oracle-l/01-2007/msg00995.html
>
> --
> Don Seiler
> oracle blog: http://ora.seiler.us
> ultimate: http://www.mufc.us
>

-- 
Don Seiler
oracle: http://ora.seiler.us
ultimate: http://www.mufc.us
--
http://www.freelists.org/webpage/oracle-l

Received on Fri Aug 17 2007 - 14:32:50 CDT