Re: indexing null values curious case?

From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
Date: Thu, 21 Nov 2019 19:12:46 +0530
Message-ID: <CAP-RywyC4-PPb=KxapPCLnecvY2R43bmwxneNgJQoweSALJOJA_at_mail.gmail.com>

not sure if this behaviour is due to function based index that oracle created when the last column we specify is 1. here mark1 is protected still by a not null constraint.

SQL> create index idx on temp(mark1, mark2, mark3);

Index created.

SQL> select * from temp where mark2 is null;

   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000
   1000012 VISHNU 1 1000 16 rows selected.

Execution Plan

Plan hash value: 729852075

| 0 | SELECT STATEMENT | | 16 | 1136 | 20 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TEMP | 16 | 1136 | 20 (0)| 00:00:01 |
|* 2 | INDEX SKIP SCAN | IDX | 16 | | 4 (0)| 00:00:01 |

SQL> drop index idx;

Index dropped.

SQL> create index idx on temp(mark1, mark2,1);

SQL> mp where mark2 is null;

Execution Plan

Plan hash value: 1896031711

| 0 | SELECT STATEMENT | | 16 | 1136 | 4684 (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEMP | 16 | 1136 | 4684 (1)| 00:00:01 |

Same goes when i create an index such as the following: CREATE INDEX IDX ON TEMP(MARK1, MARK2, MARK3/2); <-- function based index.

thanks,
Vishnu

On Thu, Nov 21, 2019 at 6:35 PM Vishnu Potukanuma < vishnupotukanuma_at_gmail.com> wrote:

>
> Hi Jonathan,
>
> Thanks for your email, your response was invaluable, I kept on thinking
> what was happening.
> It never occurred to me that the null values could overlap, yap, yes it is
> the correct behavior.
>
> But the following still bugs me and could not get a valid explanation as
> to why oracle doesn't consider a skip scan even in the best case scenario.
>
> Regarding mark2 is null.
> I just created a same test but this time favouring index skip scan, even
> in the best possible way with only one distinct value for mark1.
> the test is as follows:
> create table temp (roll number, name varchar2(20), mark1 number, mark2
> number, mark3 number);
> insert into temp select rownum, dbms_random.string(0,20), 1,
> dbms_random.value(0,100), dbms_random.value(0,100) from dual connect by
> level < 100000;
> here i made sure that mark1 has only 1 distinct value.
> insert into temp values (1000012, 'VISHNU',1, null, 1000);
> commit;
> created the index in the best possible case as well.
> create index idx on temp(mark1, mark2,1);
> exec dbms_stats.gather_table_statS('VISHNU','TEMP',cascade=>true);
>
> now the query goes with the full table scan..
> SQL> select * from temp where mark2 is null;
> 1000012 VISHNU 1 1000
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 1896031711
> --------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
> --------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | 71 | 308 (1)| 00:00:01 |
> |* 1 | TABLE ACCESS FULL| TEMP | 1 | 71 | 308 (1)| 00:00:01 |
> --------------------------------------------------------------------------
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 1 - filter("MARK2" IS NULL)
>
> I was wondering why the optimizer was not even considering other scans,
> and took the optimizer trace, it clearly didn't consider skip scan as an
> option and ended up with the full table scan, it was considering only a
> index full scan.
>
> =======================================
> SPD: BEGIN context at query block level
> =======================================
> Query Block SEL$1 (#0)
> Return code in qosdSetupDirCtx4QB: NOCTX
> =====================================
> SPD: END context at query block level
> =====================================
> Access path analysis for TEMP
> ***************************************
> SINGLE TABLE ACCESS PATH
> Single Table Cardinality Estimation for TEMP[TEMP]
> SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
>
> kkecdn: Single Table Predicate:"TEMP"."MARK2" IS NULL
> Column (#4): MARK2(NUMBER)
> AvgLen: 21 NDV: 99999 Nulls: 1 Density: 0.000010 Min: 0.000284 Max:
> 99.999341
> Table: TEMP Alias: TEMP
> Card: Original: 100000.000000 Rounded: 1 Computed: 1.000000 Non
> Adjusted: 1.000000
> Scan IO Cost (Disk) = 307.000000
> Scan CPU Cost (Disk) = 29018761.440000
> Cost of predicates:
> io = NOCOST, cpu = 20.000000, sel = 0.000010 flag = 2048
> ("TEMP"."MARK2" IS NULL)
> Total Scan IO Cost = 307.000000 (scan (Disk))
> + 0.000000 (io filter eval) (= 0.000000 (per row)
> * 100000.000000 (#rows))
> = 307.000000
> Total Scan CPU Cost = 29018761.440000 (scan (Disk))
> + 2000000.000000 (cpu filter eval) (= 20.000000
> (per row) * 100000.000000 (#rows))
> = 31018761.440000
> Access Path: TableScan
> Cost: 307.870629 Resp: 307.870629 Degree: 0
> Cost_io: 307.000000 Cost_cpu: 31018761
> Resp_io: 307.000000 Resp_cpu: 31018761
> kkofmx: index filter:"TEMP"."MARK2" IS NULL
>
> ****** Costing Index IDX
> SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
> Access Path: index (FullScan)
> Index: IDX
> resc_io: 533.000000 resc_cpu: 23795958
> ix_sel: 1.000000 ix_sel_with_filters: 1.0000e-05
> Cost: 533.724036 Resp: 533.724036 Degree: 1
> Best:: AccessPath: TableScan
> Cost: 307.870629 Degree: 1 Resp: 307.870629 Card: 1.000000
> Bytes: 0.000000
>
> ***************************************
>
> Then again i tried with the hint index skip scan:
> select /*+INDEX_SS(A IDX)*/ * from temp A where mark2 is null;
> In this case optimizer simply ignored the skip_scan hint and and from the
> optimizer trace 10053 12, clearly it didn't consider skip scan.:
>
> =====================================
> Access path analysis for TEMP
> ***************************************
> SINGLE TABLE ACCESS PATH
> Single Table Cardinality Estimation for TEMP[A]
> SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
> kkecdn: Single Table Predicate:"A"."MARK2" IS NULL
> Column (#4): MARK2(NUMBER)
> AvgLen: 21 NDV: 99999 Nulls: 1 Density: 0.000010 Min: 0.000284 Max:
> 99.999341
> Table: TEMP Alias: A
> Card: Original: 100000.000000 Rounded: 1 Computed: 1.000000 Non
> Adjusted: 1.000000
> kkofmx: index filter:"A"."MARK2" IS NULL
> ****** Costing Index IDX
> SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
> Access Path: index (FullScan)
> Index: IDX
> resc_io: 533.000000 resc_cpu: 23795958
> ix_sel: 1.000000 ix_sel_with_filters: 1.0000e-05
> Cost: 533.724036 Resp: 533.724036 Degree: 1
> Best:: AccessPath: IndexRange
> Index: IDX
> Cost: 533.724036 Degree: 1 Resp: 533.724036 Card: 1.000000
> Bytes: 0.000000
> ***************************************
>
> Hint Report:
> Query Block: SEL$1
> Table: ("A"_at_"SEL$1")
> Unused: INDEX_SS(A IDX)
> End Hint Report
>
> This time, i went an another step ahead, giving more chance.
> modified the mark1 as not null.
> alter table temp modify (mark1 not null);
> recreated and index and collected the statistics (this may not be
> necessary).
> now with not null constraint we are 100% sure that mark1 is never null,
> and now since only 1 distinct value, and it would have been straight
> forward to have selected a skip scan, provided I even added 1 (virtual
> column) to the index. even in this case, the optimizer goes with a full
> table scan, even with the hint.
>
> Thanks,
> Vishnu
>
> On Thu, Nov 21, 2019 at 5:48 PM Jonathan Lewis <
> jonathan_at_jlcomp.demon.co.uk> wrote:
>
>> Vishnu,
>>
>> Your STUDENTS example looks like a bug. (The stats for the NULL and ''
>> case are identical, by the way, so it's likely to be related to the way
>> that Oracle stores the '' as the DEFAULT for the column definition).
>>
>> re:
>> select * from temp where mark1 is null; --> it always goes with a full
>> table scan
>> It's got to; Oracle know that there are some rows with mark1 null, some
>> with mark2 null, so it have to asume that the two sets of rows may overlap
>> to give a row which does not appear in thei index.
>>
>> select * from temp where mark1 is null and mark2 = 123123;
>> Now you're looking for a row which (if it exists) will be in the index.
>> You could equally change the mark2 = 123123 to mark2 is not null.
>>
>> re:
>> all i did was add an extra column to the index (virtual column).
>> So now you have a column in the index which is NEVER null, so every row
>> in the table will appear in the index, so the index can be used for mark1
>> is null
>>
>>
>> re
>> mark2 is null
>> The possibly paths are
>> index skip scan
>> index full scan
>> index fast full scan
>> tablescan
>>
>> The choice of tablescan is almost certainly dictated by cost.
>>
>>
>> BOTTOM LINE - your students case looks like a bug relating to the special
>> case of the way Oracle handles the DEFAULT for a constant empty string. The
>> rest are correct behaviour.
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>> ________________________________________
>> From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
>> behalf of Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
>> Sent: 21 November 2019 11:07
>> To: Andy Sayer
>> Cc: oracle-l_at_freelists.org
>> Subject: Re: indexing null values curious case?
>>
>> Apparently looks like this entire section dealing with the null is not
>> developed in its entirety.. the following is a different case as well
>> regarding the nulls.
>>
>> even with a multi-column index, considering another scenario as the
>> following:
>> create table temp (roll number, name varchar2(20), mark1 number, mark2
>> number);
>> insert into temp select rownum, dbms_random.string(0,20),
>> round(dbms_random.value(0,100), round(dbms_random.value(0,200)) from dual
>> connect by level < 1000000;
>> insert into temp (roll, name, mark2) select rownum,
>> dbms_random.string(0,20), round(dbms_random.value(0,200) from dual connect
>> by level < 10;
>> insert into temp (roll, name, mark1) select rownum,
>> dbms_random.string(0,20), round(dbms_random.value(0,200) from dual connect
>> by level < 10;
>> commit;
>> create index idx on temp(mark1, mark2);
>> Here i have only 10 rows from each of the columns as null;
>>
>> exec dbms_Stats.gather_table_stats('VISHNU','TEMP',CASCADE=>TRUE);
>>
>> from the leaf blocks we can find that the nulls are stored in the index.
>> The leaf blocks containing the NULLs is as follows:
>> row#0[8019] flag: -------, lock: 0, len=13
>> col 0; len 2; (2): c1 02
>> col 1; NULL
>> col 2; len 6; (6): 01 c0 21 de 00 00
>> row#1[7992] flag: -------, lock: 2, len=14
>> col 0; len 3; (3): c2 02 04
>> col 1; NULL
>> col 2; len 6; (6): 01 c0 21 de 00 02
>> row#2[8006] flag: -------, lock: 0, len=13
>> col 0; NULL
>> col 1; len 2; (2): c1 02
>> col 2; len 6; (6): 01 c0 21 de 00 01
>> row#3[7978] flag: -------, lock: 2, len=14
>> col 0; NULL
>> col 1; len 3; (3): c2 0d 20
>> col 2; len 6; (6): 01 c0 21 de 00 03
>>
>> if we issue a query like select * from temp where mark1 is null; --> it
>> always goes with a full table scan even when 10 rows only have null values
>> and apparently the index leaf blocks do contain entries regarding the null
>> values), for the leading as well as the tail columns as shown above. But
>> the wierd thing is if we include both the columns mark1, mark2 in the
>> predicate clause as following:
>> select * from temp where mark1 is null and mark2 = 123123;
>> the optimizer selects the index.
>>
>>
>> now comes even better part.
>> drop the index and create the index as follows:
>> create index idx on temp(mark1,mark2,1);
>> all i did was add an extra column to the index (virtual column).
>> collected the statistics and ran the query back again.
>> select * from temp where mark1 is null;
>> this time it selects the index properly which makes me wonder if they
>> have included or written the code or made the optimizer to consider the
>> nulls in the predicate clause only in the presence of virtual columns.
>>
>> Another part I believe is that Oracle's statistics (including histogram
>> doesn't maintain any statistics regarding nulls, which makes sense in a
>> way), as these are only explicit cases and the concept of statistics
>> optimizer everything resolves only around the values that are present but
>> not null..
>>
>> also an another distinct behavior observed is this regarding of whether
>> the index is
>> create index idx on temp(mark1,mark2)
>> or create index idx on temp(mark1,mark2,1);
>>
>> any query with predicates involving only mark2 regarding nulls goes with
>> full table scan forget skip scans etc etc.
>> queries like
>> select * from temp where mark1 is not null and mark2 is null;
>> select * from temp where mark2 is null;
>> apparently some explanation for this can be the way oracle doesn't
>> include nulls in the histograms.. and I am tired for the day!
>>
>> I can't make any assumptions as of which of the optimizer code has to be
>> developed just with these observations, but the way oracle handles nulls
>> should be improved consistently.
>>
>> can someone please tell me if i am missing any thing here? Oracle is a
>> like a ocean. so many concepts no wonder i became bald at a very young age.
>>
>> Thanks,
>> Vishnu
>>
>> On Thu, Nov 21, 2019 at 2:05 PM Vishnu Potukanuma <
>> vishnupotukanuma_at_gmail.com<mailto:vishnupotukanuma_at_gmail.com>> wrote:
>> This is reproducible case and probably hitting an undeveloped section of
>> oracle code or a bug. Sure, let me see what I can do.
>>
>> Thanks,
>> Vishnu
>>
>> On Thu, Nov 21, 2019 at 1:55 PM Andy Sayer <andysayer_at_gmail.com<mailto:
>> andysayer_at_gmail.com>> wrote:
>> Hi Vishnu
>>
>> File your reproducible test case with Oracle support if you suspect
>> you’re hitting a bug.
>>
>> It is not expected for the existence of a special index to change the
>> result of a query.
>>
>> If you want to find null values and the selectivity is decent for an
>> index then I would use an index on (column_name,0).
>>
>> Thanks,
>> Andy
>>
>> On Thu, 21 Nov 2019 at 08:18, Vishnu Potukanuma <
>> vishnupotukanuma_at_gmail.com<mailto:vishnupotukanuma_at_gmail.com>> wrote:
>> Hi,
>>
>> If all the columns in the index are null then the entries are not stored
>> in the index. This is a known fact and empty strings are stored as null
>> values. combining both breaks the consistency part where the query returns
>> wrong results.
>>
>> this is really a worst case as no one creates such a index as
>> (column_name,null) or (column_name,'') as this really doesn't make sense
>> but this breaks the consistency aspect of database as the query gives
>> wrong result as the optimizer consider index even when the leading column
>> is null.
>>
>> the scenario is basically as following:
>> Create table students (student_id number, name varchar2(20);
>> insert into students select rownum, dbms_random.string(0,20) form dual
>> connect by level < 1000000;
>> insert into students (name) select dbms_random.string(0,20) from dual
>> connect by level < 100;
>> commit;
>> create index idx on student(student_id,null);
>> exec dbms_stats.gather_Table_stats('VISHNU','students',cascade=>True);
>> select * from students where student_id is null;
>> in this case it goes with the full table scan which is correct since both
>> nulls are not stored.
>>
>> so now we drop the index idx;
>> Here we create the index as this.
>> create index idx on student(student_id,'');
>> exec dbms_stats.gather_Table_stats('VISHNU','students',cascade=>True);
>> select * from students where student_id is null;
>> now Oracle retrives the results as 0, with the index scan.
>>
>> the leaf blocks store the details as following since the empty strings
>> are treated as nulls
>> row#0[8019] flag: -------, lock: 0, len=13
>> col 0; len 2; (2): c1 02
>> col 1; NULL
>> col 2; len 6; (6): 01 c0 21 dc 00 00
>> row#1[8006] flag: -------, lock: 2, len=13
>> col 0; len 2; (2): c1 03
>> col 1; NULL
>> col 2; len 6; (6): 01 c0 21 dc 00 02
>> but things as expected, index doesn't store entries when the leading
>> column is null.
>>
>> here the oracle goes with the index scan and returns zero results. which
>> is inconsistent result.
>> optimizer trace considers that index with (student_id, '');
>>
>> Access Path: TableScan
>> Cost: 21017.655599 Resp: 21017.655599 Degree: 0
>> Cost_io: 20995.000000 Cost_cpu: 807173669
>> Resp_io: 20995.000000 Resp_cpu: 807173669
>> Access Path: index (index (FFS))
>> Index: IDX
>> resc_io: 1039.000000 resc_cpu: 237414695
>> ix_sel: 0.000000 ix_sel_with_filters: 1.000000
>> Access Path: index (FFS)
>> Cost: 1045.663711 Resp: 1045.663711 Degree: 1
>> Cost_io: 1039.000000 Cost_cpu: 237414695
>> Resp_io: 1039.000000 Resp_cpu: 237414695
>> ****** Costing Index IDX
>> SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
>> SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
>> Access Path: index (IndexOnly)
>> Index: IDX
>> resc_io: 5.000000 resc_cpu: 235407
>> ix_sel: 6.6556e-04 ix_sel_with_filters: 6.6556e-04
>> Cost: 5.006607 Resp: 5.006607 Degree: 1
>> Best:: AccessPath: IndexRange
>> Index: IDX
>> Cost: 5.006607 Degree: 1 Resp: 5.006607 Card: 999.000000
>> Bytes: 0.000000
>>
>> probably we are hitting an undeveloped section of optimizer? producing a
>> wrong plan is ok, but producing wrong results?
>>
>> Thanks,
>> vishnu
>> --
>> http://www.freelists.org/webpage/oracle-l
>>
>>
>>

--
http://www.freelists.org/webpage/oracle-l

Received on Thu Nov 21 2019 - 14:42:46 CET

This message: [ Message body ]
Next message: Vishnu Potukanuma: "Re: indexing null values curious case?"
In reply to Vishnu Potukanuma: "Re: indexing null values curious case?"
Next in thread: Jonathan Lewis: "Re: indexing null values curious case?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message