Re: Most selective columns first (was: Re: Key Compression vs. Selectivity)

From: <hasta_l3_at_hotmail.com>
Date: 18 Dec 2006 02:55:55 -0800
Message-ID: <1166439355.228927.143640@f1g2000cwa.googlegroups.com>

Robert Klemme a écrit :
> On 18.12.2006 07:35, hasta_l3_at_hotmail.com wrote:
> > Jonathan Lewis a écrit :
> >> "Robert Klemme" <shortcutter_at_googlemail.com> wrote in message
> >>> The general rule of thumb for indexing is to put the most selective
> >>> columns first in order to foster fast lookups. (Of course there are
> >>> other considerations involved, which single index covers most queries
> >>> etc.)
> >
> > Jonathan and Robert....
> >
> > Why would the rule of thumb "put the most selective column first"
> > be a good one to any degree ?
> >
> > (let's assume that permutations of index columns don't
> > impact the number of queries covered)
>
> Because - that's at least the theory - more selective columns cut down
> the number of rows faster than less selective columns. Assume you have
> a people table with gender and year of birth. Gender has low
> selectivity, so if you query by gender="male" and year="1984" with the
> gender column first in the index, the first index seek step just cuts
> the number of records (and thus key entries) in half (roughly) while
> selection on year will select a much smaller portion of the data
> (assuming not every person in that table was born in 1984).

Yes, Robert.

However, as pointed out by Tom Kyte, a quick test I just performed does not seem to confirm the theory.

I dont claim to understand why....

Regards

Raoul

create table test(Gender char(1), DateOfBirth DATE, filler INTEGER); begin
  for filler in 1..100000 loop
    for year in 1920..2006 loop
      insert into test(Gender, DateOfBirth, filler) values('M', to_date(year, 'yyyy'), filler);

insert into test(Gender, DateOfBirth, filler) values('F', to_date(year, 'yyyy'), filler);

end loop;
end loop;
end;
commit;
create index idx_gd on test(Gender, DateOfBirth); create index idx_dg on test(DateOfBirth, Gender);

---

SQL> alter system flush buffer_cache;

Système modifié.

SQL> select /*+INDEX(test idx_gd) */ count(*) from test where gender =
'M' and DateOfBirth = to_date
(1980, 'yyyy');

  COUNT(*)
----------
    100000


Plan d'exécution
----------------------------------------------------------
Plan hash value: 2862478417

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time
   |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     1 |    12 |   294   (2)|
00:00:04 |
|   1 |  SORT AGGREGATE   |        |     1 |    12 |            |
   |
|*  2 |   INDEX RANGE SCAN| IDX_GD | 89592 |  1049K|   294   (2)|
00:00:04 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("GENDER"='M' AND "DATEOFBIRTH"=TO_DATE('1980','yyyy'))

Note
-----
   - dynamic sampling used for this statement


Statistiques
----------------------------------------------------------
          5  recursive calls
          0  db block gets
        370  consistent gets
        641  physical reads
          0  redo size
        419  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> alter system flush buffer_cache;

Système modifié.

SQL> select /*+INDEX(test idx_dg) */ count(*) from test where gender =
'M' and DateOfBirth = to_date
(1980, 'yyyy');

  COUNT(*)
----------
    100000


Plan d'exécution
----------------------------------------------------------
Plan hash value: 366629259

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time
   |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     1 |    12 |   294   (2)|
00:00:04 |
|   1 |  SORT AGGREGATE   |        |     1 |    12 |            |
   |
|*  2 |   INDEX RANGE SCAN| IDX_DG | 89592 |  1049K|   294   (2)|
00:00:04 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATEOFBIRTH"=TO_DATE('1980','yyyy') AND "GENDER"='M')

Note
-----
   - dynamic sampling used for this statement


Statistiques
----------------------------------------------------------
          4  recursive calls
          0  db block gets
        370  consistent gets
        641  physical reads
          0  redo size
        419  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Received on Mon Dec 18 2006 - 04:55:55 CST