Re: quick FK question
Date: Mon, 21 Dec 2009 16:46:47 +0200
Message-ID: <6e49b6d00912210646r3702cfebg943b2e5f15c6b00f_at_mail.gmail.com>
I'm simply copying below email I sent ~2 years ago for similar topic in different list. Although I admit - it was mostly about OLTP type apps. So proooobably there might be difference for warehouses. On the other hand - the problems with data distribution and additional predicates are not eliminated, one always has to remember that. For more such examples look in blogs by Jonathan Lewis and Richard Foote. Also ask warehouse modelere for real examples - why and how outer join means "worries"? What worries are eliminated and what worries created? Try to understand where NULLS will fight back, they'll definitely do that! The only question is - whether the fightback will be more or less painful than problems it masks.
-- the original email--
Not allowing NULLs is complete stupidity from my viewpoint. And I'll explain why with examples.
1. Let's start with NOT NULL foreign keys that in reality can be NULL. So to make that one has to add a fake row in the parent table for example with id = -1. Some years ago I for the first and last time have seen that but it seems there are more people around doing that. So what means a fake row in parent table? It means that it always has to be excluded from queries and also other dml that affects the table not referencing it by id. For instance for a search query you should always add some constraint
WHERE id <> -1
The same is true for all reports. I'm completely sure that in such application there are places where data are shown/counted/whatever incorrectly due to the fact that someone forget to add where id <> -1. For some kind of queries it may introduce absolutely unnecessary execution plan operations.
for example lets create a trivial table with 2 columns having one for pk and other just some stuff. And suppose we have index on stuff column as well.
SQL> create table x (x_id number not null, x_business_value varchar2(3) not null);
Table created.
SQL> alter table x add constraint x_pk primary key (x_id);
Table altered.
SQL> create index x_idx on x (x_business_value);
Index created.
SQL> insert into x select rownum, substr(object_name, 1, 3) from dba_objects;
36170 rows created.
SQL> exec dbms_stats.gather_table_stats(user, 'x');
PL/SQL procedure successfully completed.
SQL> set autot traceonly explain
Normally for general case we'd run following query. As you can see all the rows can be gathered without touching the table.
SQL> select count(*) from x where x_business_value = 'DBA';
Execution Plan
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1 Card=1 Bytes=4 ) 1 0 SORT (AGGREGATE) 2 1 INDEX (RANGE SCAN) OF 'X_IDX' (NON-UNIQUE) (Cost=1 Card= 37 Bytes=148)
But if we'd have our fake row we'd have to add a filter criteria, and now we get one more step (table access by index rowid) in our execution plan.
SQL> select count(*) from x where x_business_value = 'DBA' 2 and x_id <> -1;
Execution Plan
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=2 Card=1 Bytes=8 ) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'X' (Cost=2 Card=37 Byt es=296) 3 2 INDEX (RANGE SCAN) OF 'X_IDX' (NON-UNIQUE) (Cost=1 Car d=37)
The other thing is - when we insert into child table if foreign key
columns can be null then Oracle don't have to check the parent table.
If we have faked NOT NULL column then Orace always have to check
whether parent table has the appropriate value -1.
Suppose we create two tables
SQL> desc ref_x_faked
Name Null? Type ----------------------------------------- -------- ------- RXF_ID NOT NULL NUMBER RXF_X_ID NOT NULL NUMBER SQL> desc ref_x_normal Name Null? Type ----------------------------------------- -------- ------- RXN_ID NOT NULL NUMBER RXN_X_ID NUMBER
Where faked table has artificial NOT NULL FK column but normal has
normal NULL FK column.
So here are inserts and autotrace statistics of them
For normal table:
SQL> ed
Wrote file afiedt.buf
1* insert into ref_x_normal values (1, null) SQL> / 1 row created.
Statistics
0 recursive calls 3 db block gets 1 consistent gets 0 physical reads 224 redo size 623 bytes sent via SQL*Net to client 541 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed
For faked table:
SQL> insert into ref_x_faked values (1, -1);
1 row created.
Statistics
0 recursive calls 5 db block gets 1 consistent gets 0 physical reads 232 redo size 622 bytes sent via SQL*Net to client 538 bytes received via SQL*Net from cl 3 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed
We always get 5 db block gets for faked table and 3 db block gets for normal table, because for faked table Oracle has to check primary key index of referenced x table.
It is nothing for a single user system but it counts up if you have many users, many fake FKses and many transactions.
2. OK now about simple fake values for example some exotic dates and
other values. In one of our previous projects we used year 1800
instead of null start date and year 3000 instead of null end date. I
wasn't there from the start but found following text along commented
index creation script for these date columns: "index creation
commented out because Oracle somehow becomes crazy and always uses
only these indexes". So what was the reason why Oracle eagerly used
only these indexes? The answer is simple - optimizer is quite
optimistic and thinks that rows are distributed evenly.
So look at the example
We'll create a persons table and add persons simulating their
birth_date. Then we'll add 100 persons with unknown birth date, just
like we'd normally do inserting null into birth_date column.
SQL> create table persons (prs_id number not null, prs_name
varchar2(40) not null,
2 prs_birth_date date);
Table created.
SQL> insert into persons select rownum, object_name, trunc(sysdate -
mod(rownum, 50))
2 from dba_objects
3 /
36169 rows created.
SQL> insert into persons select rownum + 36169, object_name, null 2 from dba_objects where rownum <=100;
100 rows created.
SQL> commit;
Commit complete.
Lets look what are our max and min values:
SQL> select max(prs_birth_date), min(prs_birth_date) from persons;
MAX(PRS_BIRTH_DATE) MIN(PRS_BIRTH_DATE)
------------------- -------------------
2007-03-29:00:00:00 2007-02-08:00:00:00
Let's create index on birth_date:
SQL> create index prs_idx1 on persons(prs_birth_date);
Index created.
SQL> exec dbms_stats.gather_table_stats(user, 'persons');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.02
And now the most important let's see how many rows optimizer predict to get and how many we actually got
SQL> set autot traceonly
SQL> select * from persons where prs_birth_date
2 between to_date ('2007-02-08', 'yyyy-mm-dd') and to_date
('2007-02-13', 'yyyy-mm-dd');
4338 rows selected.
Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=25 Card=4414 Bytes =150076) 1 0 TABLE ACCESS (FULL) OF 'PERSONS' (Cost=25 Card=4414 Bytes= 150076)
So optmizer predicted 4414 rows and we got 4338 - not perfect but the
mistake is less than 2%.
So now comes THE BARCHITECT and says: nulls are not allowed let's use
year 1800 instead, because noone has actually born then so we can
safely use it.
Lets update our pure 100 persons to 1800 birth date:
SQL> set autot off
SQL> update persons set prs_birth_date = to_date ('1800-01-01', 'yyyy-mm-dd')
2 where prs_birth_date is null;
100 rows updated.
Elapsed: 00:00:00.00
SQL> alter table persons modify prs_birth_date not null;
Table altered.
Elapsed: 00:00:00.00
Now get new stats:
SQL> exec dbms_stats.gather_table_stats(user, 'persons');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.05
And run our query again:
SQL> set autot traceonly
SQL> select * from persons where prs_birth_date
2 between to_date ('2007-02-08', 'yyyy-mm-dd') and to_date
('2007-02-13', 'yyyy-mm-dd');
4338 rows selected.
Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=19 Card=735 Bytes= 25725) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'PERSONS' (Cost=19 Card=7 35 Bytes=25725) 2 1 INDEX (RANGE SCAN) OF 'PRS_IDX1' (NON-UNIQUE) (Cost=2 Ca rd=735)
WHAT? The plan has changed, optimizer predicted 735 rows but of course the actual row count hasn't changed it is 4338 as before. Now the mistake is 6 times and the plan actaully is worse than full scan because full scan this time is better.
So to sum up all this - eliminating nulls you are only postponing the
problem somewhere deeper. Oracle knows about such concept like "NULL"
but Oracle doesn't know anything of such values like -1, year 1800 or
whatever else that actually is NULL.
If you are eliminating NULLS they will come back and attack in some
other face, be it unnecessary operation in execution plans, necessity
for histograms to show Oracle the real distribution of data, wrong
data in reports because someone forget to add a stupid filter and so
on.
I'm completely sure I did not show all possible bad cases and probably
even not the worst ones.
So if the column is null, let it be null, of course it doesn't mean
one should not carefully analyze requirements and identify all REAL
not null attributes.
Gints Plivna
http://www.gplivna.eu
2009/12/21 Rumpi Gravenstein <rgravens_at_gmail.com>:
> <snip> > I've seen once from some duhvelopers - they created all FK columns NOT > NULL. Unfortunately of course there were cases with FK columns where > actual value could not be provided. So what did they do? An obvious > solution! ;) Added one row with id = -1 as a stub to all db tables. > </snip> > > Gints I wanted to follow-up on this statement as our warehouse data > modeler insists on doing this for all dimension joins claiming that > this is good warehouse design as it avoids having to worry about outer > joins. �How would you respond? > > -- > Rumpi Gravenstein >
-- http://www.freelists.org/webpage/oracle-lReceived on Mon Dec 21 2009 - 08:46:47 CST