Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: wildcard search and full tablescans
Charles Hooper wrote:
> DA Morgan wrote:
> > Charles Hooper wrote:
> > > I recall reading various write-ups by Tom Kyte, Jonathan Lewis, or
> > > others that detail why the index cannot be used in such a such a case,
> > > but cannot locate a reference to one of the write-ups. As a general
> > > rule, if more than 20% to 25% of a table's rows are expected to be
> > > read, it is usually less expensive to perform a full table scan, rather
> > > than a scan of the index and then an access by ROWID to read the data
> > > from the table. Note that using a % or _ wildcard for the initial
> > > character, Oracle would be required to access every block in the index
> > > and then potentially every row in the table by ROWID.
> > >
> > > Charles Hooper
> > > PC Support Specialist
> > > K&M Machine-Fabricating, Inc.
> >
> > Jonathan and other, oh how I hate to try paraphrasing others accurately,
> > have been generally dismissive of throwing out that 20-25% number. I
> > have run tests in my lab where I have been able to show wildly different
> > numbers for the optimizer depending on many factors including version,
> > caching related parameters, etc. But as a general rule it is certainly a
> > good thing to keep in mind that >50% is probably not a winning
> > percentage for index usage.
> > --
> > Daniel A. Morgan
> > University of Washington
> > damorgan_at_x.washington.edu
> > (replace x with u to respond)
> > Puget Sound Oracle Users Group
> > www.psoug.org
>
> Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte:
> B*Tree indexes should be used only when retrieving a small portion of
> the rows in a table - less than 20% of the rows.
>
> I have seen similar comments from other respectable sources. The 20%
> to 25% is a rough guideline. A read through "Cost-Based Oracle
> Fundamentals" leaves the impression that nothing is that simple. If
> the database has an 8KB block size, and the average row length is 20
> bytes, does the 20% to 25% rough guideline hold true? If the database
> has an 8KB block size, and the average row length is 8000 bytes, does
> the 20% to 25% rough guideline hold true? What if the last 80% of the
> rows added to the table are deleted causing a high water mark that is
> far beyond the last used block, does the 20% to 25% rough guideline
> hold true? Yes, No, and Maybe are all possibly correct answers.
>
> Maybe Gaja Vaidyanatha put it best in "Oracle Performance Tuning 101"
> (page 76), when referring to an article written by Cary Millsap in
> 1993:
> "It is pretty clear that the use of an index for a query should not be
> determined by some arbitrary percentage of the rows processed or
> selected from the table, instead it should be determined by the number
> of blocks visited to retrieve the data. If the number of blocks
> visited for an index is lower than a full table scan, then the index
> will be useful. Otherwise, a full tablescan will provide much better
> performance."
>
> But then, you must also consider the number of index and table blocks
> that may already be in the buffer cache, the multiblock read count, the
> time required to retrieve a single block compared with the time to
> retrieve multiple blocks, etc.
>
> I am not disagreeing with your comments on this topic. However, if
> someone asks me why Oracle is not using my index, it is very easy to
> ask, does Oracle expect to retrieve more than 20% to 25% of the rows by
> index? If the answer is Yes or Maybe, then that is why Oracle is not
> using your index.
There's no magic formula.
To me it doesn't matter in the case of a search as submitted by the OP whether an index or a full tablescan is used.
Either type of approach is likely to produce an application that has inherent scalability limitations due to a large number of Logical IO's.
Searching "anywhere" within a text string requires thought and research and normally an approach that does not rely on using a regular oracle index based on some character or varchar column.
Searching from the beginning of a string is different.
Discussion of percentages etc in this particular case is missing the forest because of the trees. Received on Fri Dec 08 2006 - 18:57:57 CST