Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Finding index keys in a v large table that are not in a large table.
Ok, I have two tables called audit and audit_summary. Both are indexed
on audit_seq (audit is non unique and audit_summary is unique):-
Audit defn:-
audit_seq, message_number, message_text
Audit_summary defn:-
audit_seq, <various other fields parsed out of the message text>
In an online transaction processing system, many records can be written
into
the audit table all with the same audit_seq per transaction.
Every so often, a job starts up that gets all the new audit_seqs that
are
not in the audit_summary table and inserts the new records into it.
These tables are getting pretty big eg Audit is >15Million rows and Audit_summary is around a few million.
Anyone have any thoughts on the most efficient way to extract the new audit_seqs in the audit table?
select audit_seq,message_text
from audit
where audit_seq > (select max(audit_seq) from audit_summary)
is no good as there is potential for audit_seqs to get skipped.
I created two test tables called big (~ 1 Million rows) and small
(~100K
rows) both with columns audit_seq,object_id.
I then ran the following:-
1 select big.audit_seq, object_id
2 from big, (select audit_seq
3 from big 4 minus 5 select audit_seq 6 from small) seqs
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4187 Card=1204031 By tes=30100775) 1 0 HASH JOIN (Cost=4187 Card=1204031 Bytes=30100775) 2 1 TABLE ACCESS (FULL) OF 'BIG' (Cost=470 Card=1004032 Byte s=12048384) 3 1 VIEW (Cost=3054 Card=1204031 Bytes=15652403) 4 3 MINUS 5 4 SORT (UNIQUE) (Cost=2536 Card=1004032 Bytes=4016128) 6 5 TABLE ACCESS (FULL) OF 'BIG' (Cost=470 Card=100403 2 Bytes=4016128) 7 4 SORT (UNIQUE) (Cost=518 Card=199999 Bytes=799996) 8 7 TABLE ACCESS (FULL) OF 'SMALL' (Cost=94 Card=19999 9 Bytes=799996)
Statistics
18 recursive calls 47 db block gets 4335 consistent gets 5578 physical reads 0 redo size
53604 SQL*Net roundtrips to/from client 5 sorts (memory) 1 sorts (disk) 804033 rows processed
Then I tried ...
1 select audit_seq, object_id
2* from big where not exists (select null from small where
big.audit_seq =
small.audit_seq)
SQL> /
804033 rows selected.
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=470 Card=50202 Bytes =602424) 1 0 FILTER 2 1 TABLE ACCESS (FULL) OF 'BIG' (Cost=470 Card=50202 Bytes= 602424) 3 1 INDEX (UNIQUE SCAN) OF 'SMALL_INDEX' (UNIQUE) (Cost=1 Ca rd=1 Bytes=4)
Statistics
0 recursive calls 15 db block gets 2063539 consistent gets 1443 physical reads 0 redo size
53604 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 804033 rows processed
Now the consistent gets are way bigger for the not exits (I imagine it
is
something todo with repeated index access on small for each row in big,
but
i am not sure), so I am thinking the 'minus' query is better - BUT the
minus query requires two full scans of BIG - so why are the consistent
gets so big for the not exists query?
Surely there must be a better way? I also tired using where not in (as
oppose to the not
exists) but it didnt return any data for ages longer than the other two
methods, so i killed it.
On the production system the table access full on Big takes a very long
time
which is becoming unacceptable!
Any ideas?
Thanks,
Stephen. Received on Thu Feb 03 2005 - 07:03:42 CST