Re: indexing null values curious case?

From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
Date: Thu, 21 Nov 2019 16:37:59 +0530
Message-ID: <CAP-RywwqCA7UiKOXfJBV7Nr7kS1Fpn91ziTojEjtnzEvwiMVLQ_at_mail.gmail.com>



Apparently looks like this entire section dealing with the null is not developed in its entirety.. the following is a different case as well regarding the nulls.

even with a multi-column index, considering another scenario as the following:
create table temp (roll number, name varchar2(20), mark1 number, mark2 number);
insert into temp select rownum, dbms_random.string(0,20), round(dbms_random.value(0,100), round(dbms_random.value(0,200)) from dual connect by level < 1000000;
insert into temp (roll, name, mark2) select rownum, dbms_random.string(0,20), round(dbms_random.value(0,200) from dual connect by level < 10;
insert into temp (roll, name, mark1) select rownum, dbms_random.string(0,20), round(dbms_random.value(0,200) from dual connect by level < 10;
commit;
create index idx on temp(mark1, mark2);
Here i have only 10 rows from each of the columns as null;

exec dbms_Stats.gather_table_stats('VISHNU','TEMP',CASCADE=>TRUE);

from the leaf blocks we can find that the nulls are stored in the index. The leaf blocks containing the NULLs is as follows: row#0[8019] flag: -------, lock: 0, len=13

col 0; len 2; (2):  c1 02
col 1; NULL
col 2; len 6; (6):  01 c0 21 de 00 00

row#1[7992] flag: -------, lock: 2, len=14
col 0; len 3; (3):  c2 02 04
col 1; NULL
col 2; len 6; (6):  01 c0 21 de 00 02

row#2[8006] flag: -------, lock: 0, len=13
col 0; NULL
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c0 21 de 00 01

row#3[7978] flag: -------, lock: 2, len=14
col 0; NULL
col 1; len 3; (3):  c2 0d 20
col 2; len 6; (6):  01 c0 21 de 00 03

if we issue a query like select * from temp where mark1 is null; --> it always goes with a full table scan even when 10 rows only have null values and apparently the index leaf blocks do contain entries regarding the null values), for the leading as well as the tail columns as shown above. But the wierd thing is if we include both the columns mark1, mark2 in the predicate clause as following:
select * from temp where mark1 is null and mark2 = 123123; the optimizer selects the index.

now comes even better part.
drop the index and create the index as follows: create index idx on temp(mark1,mark2,1); all i did was add an extra column to the index (virtual column). collected the statistics and ran the query back again. select * from temp where mark1 is null;
this time it selects the index properly which makes me wonder if they have included or written the code or made the optimizer to consider the nulls in the predicate clause only in the presence of virtual columns.

Another part I believe is that Oracle's statistics (including histogram doesn't maintain any statistics regarding nulls, which makes sense in a way), as these are only explicit cases and the concept of statistics optimizer everything resolves only around the values that are present but not null..

also an another distinct behavior observed is this regarding of whether the index is
create index idx on temp(mark1,mark2)
or create index idx on temp(mark1,mark2,1);

any query with predicates involving only mark2 regarding nulls goes with full table scan forget skip scans etc etc. queries like
select * from temp where mark1 is not null and mark2 is null; select * from temp where mark2 is null;
apparently some explanation for this can be the way oracle doesn't include nulls in the histograms.. and I am tired for the day!

I can't make any assumptions as of which of the optimizer code has to be developed just with these observations, but the way oracle handles nulls should be improved consistently.

can someone please tell me if i am missing any thing here? Oracle is a like a ocean. so many concepts no wonder i became bald at a very young age.

Thanks,
Vishnu

On Thu, Nov 21, 2019 at 2:05 PM Vishnu Potukanuma < vishnupotukanuma_at_gmail.com> wrote:

> This is reproducible case and probably hitting an undeveloped section of
> oracle code or a bug. Sure, let me see what I can do.
>
> Thanks,
> Vishnu
>
> On Thu, Nov 21, 2019 at 1:55 PM Andy Sayer <andysayer_at_gmail.com> wrote:
>
>> Hi Vishnu
>>
>> File your reproducible test case with Oracle support if you suspect
>> you’re hitting a bug.
>>
>> It is not expected for the existence of a special index to change the
>> result of a query.
>>
>> If you want to find null values and the selectivity is decent for an
>> index then I would use an index on (column_name,0).
>>
>> Thanks,
>> Andy
>>
>> On Thu, 21 Nov 2019 at 08:18, Vishnu Potukanuma <
>> vishnupotukanuma_at_gmail.com> wrote:
>>
>>> Hi,
>>>
>>> If all the columns in the index are null then the entries are not stored
>>> in the index. This is a known fact and empty strings are stored as null
>>> values. combining both breaks the consistency part where the query returns
>>> wrong results.
>>>
>>> this is really a worst case as no one creates such a index as
>>> (column_name,null) or (column_name,'') as this really doesn't make sense
>>> but this breaks the consistency aspect of database as the query gives
>>> wrong result as the optimizer consider index even when the leading column
>>> is null.
>>>
>>> the scenario is basically as following:
>>> Create table students (student_id number, name varchar2(20);
>>> insert into students select rownum, dbms_random.string(0,20) form dual
>>> connect by level < 1000000;
>>> insert into students (name) select dbms_random.string(0,20) from dual
>>> connect by level < 100;
>>> commit;
>>> create index idx on student(student_id,null);
>>> exec dbms_stats.gather_Table_stats('VISHNU','students',cascade=>True);
>>> select * from students where student_id is null;
>>> in this case it goes with the full table scan which is correct since
>>> both nulls are not stored.
>>>
>>> so now we drop the index idx;
>>> Here we create the index as this.
>>> create index idx on student(student_id,'');
>>> exec dbms_stats.gather_Table_stats('VISHNU','students',cascade=>True);
>>> select * from students where student_id is null;
>>> now Oracle retrives the results as 0, with the index scan.
>>>
>>> the leaf blocks store the details as following since the empty strings
>>> are treated as nulls
>>> row#0[8019] flag: -------, lock: 0, len=13
>>> col 0; len 2; (2): c1 02
>>> col 1; NULL
>>> col 2; len 6; (6): 01 c0 21 dc 00 00
>>> row#1[8006] flag: -------, lock: 2, len=13
>>> col 0; len 2; (2): c1 03
>>> col 1; NULL
>>> col 2; len 6; (6): 01 c0 21 dc 00 02
>>> but things as expected, index doesn't store entries when the leading
>>> column is null.
>>>
>>> here the oracle goes with the index scan and returns zero results. which
>>> is inconsistent result.
>>> optimizer trace considers that index with (student_id, '');
>>>
>>> Access Path: TableScan
>>> Cost: 21017.655599 Resp: 21017.655599 Degree: 0
>>> Cost_io: 20995.000000 Cost_cpu: 807173669
>>> Resp_io: 20995.000000 Resp_cpu: 807173669
>>> Access Path: index (index (FFS))
>>> Index: IDX
>>> resc_io: 1039.000000 resc_cpu: 237414695
>>> ix_sel: 0.000000 ix_sel_with_filters: 1.000000
>>> Access Path: index (FFS)
>>> Cost: 1045.663711 Resp: 1045.663711 Degree: 1
>>> Cost_io: 1039.000000 Cost_cpu: 237414695
>>> Resp_io: 1039.000000 Resp_cpu: 237414695
>>> ****** Costing Index IDX
>>> SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
>>> SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
>>> Access Path: index (IndexOnly)
>>> Index: IDX
>>> resc_io: 5.000000 resc_cpu: 235407
>>> ix_sel: 6.6556e-04 ix_sel_with_filters: 6.6556e-04
>>> Cost: 5.006607 Resp: 5.006607 Degree: 1
>>> Best:: AccessPath: IndexRange
>>> Index: IDX
>>> Cost: 5.006607 Degree: 1 Resp: 5.006607 Card: 999.000000
>>> Bytes: 0.000000
>>>
>>> probably we are hitting an undeveloped section of optimizer? producing a
>>> wrong plan is ok, but producing wrong results?
>>>
>>> Thanks,
>>> vishnu
>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 21 2019 - 12:07:59 CET

Original text of this message