Fine tuning a delete statement [message #48363] |
Tue, 16 November 2004 12:59 |
Leonard Martin
Messages: 45 Registered: May 2002 Location: Canada
|
Member |
|
|
Hello Oracle gurus
I have the following delete statement that would require to delete about
70,000 records. How could I fine tune this statement. If I use a cursor, how could
I frame the cursors to delete. The situation here is a NOT IN.
DELETE
FROM A
WHERE UNIQUE_ID NOT IN (SELECT unique_id FROM B WHERE filename='TEMP.DAT')
Thanks in advance
|
|
|
Re: Fine tuning a delete statement [message #48364 is a reply to message #48363] |
Tue, 16 November 2004 13:45 |
Edward Stoever
Messages: 58 Registered: August 2002
|
Member |
|
|
Avoid "NOT IN" whenever possible. It's not efficient.
Try turning your DELETE statement into a SELECT statement. Then you can compare different queries you come up with before actually deleting anything.
I don't know your data; it may work out to do something like this:
Select * FROM A
WHERE UNIQUE_ID IN (SELECT unique_id FROM B WHERE filename<>'TEMP.DAT')
But, that would depend how tightly tied together the two tables are.
Edward
;-) -- www.database-expert.com
|
|
|
|
Re: Fine tuning a delete statement [message #48370 is a reply to message #48363] |
Wed, 17 November 2004 03:51 |
shoblock
Messages: 325 Registered: April 2004
|
Senior Member |
|
|
First - "NOT IN" is not inefficient. It is very efficient when used against small amounts of data (if the subquery returns 1 row, then NOT IN may be the best solution ever).
Rewrite your sql - if sql, anything can be written multiple ways. try them and determine which works best.
DELETE A
WHERE UNIQUE_ID NOT IN (SELECT unique_id FROM B WHERE filename='TEMP.DAT')
where not exists (
select null from b where b.unique_id = a.unique_id
and b.filename='TEMP.DAT')
where unique_id in (
select unique_id from a
minus
select unique_id FROM B WHERE filename='TEMP.DAT'
)
|
|
|
Re: Fine tuning a delete statement [message #48372 is a reply to message #48367] |
Wed, 17 November 2004 04:44 |
William Robertson
Messages: 1643 Registered: August 2003 Location: London, UK
|
Senior Member |
|
|
It will work, but it could give different results. Your original query looks for UNIQUE_ID values that do not occur in B for filename 'TEMP.DAT'. This one looks for UNIQUE_ID values that occur for any filename that is not 'TEMP.DAT'. This requires a row to exist in B, while the original query does not.
|
|
|
Re: Fine tuning a delete statement [message #48376 is a reply to message #48364] |
Wed, 17 November 2004 05:34 |
William Robertson
Messages: 1643 Registered: August 2003 Location: London, UK
|
Senior Member |
|
|
> Avoid "NOT IN" whenever possible. It's not efficient.
Really? Perhaps NOT EXISTS is faster?
SQL*Plus: Release 9.2.0.1.0 - Developer's Release on Wed Nov 17 14:20:18 2004
<b>BIG_TABLE is a few columns from DBA_OBJECTS, duplicated a few times:</b>
SQL> desc big_table
Name Null? Type
----------------------------------- -------- ------------------------
OBJECT_ID NOT NULL NUMBER
OBJECT_NAME VARCHAR2(128)
OBJECT_TYPE VARCHAR2(18)
SQL> SELECT COUNT(*) FROM big_table;
COUNT(*)
----------
390000
1 row selected.
<b>SMALL_TABLE is a copy of USER_OBJECTS with a unique index on OBJECT_ID:</b>
SQL> desc small_table
Name Null? Type
----------------------------------------------------------- -------- -----------------------------------------
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NOT NULL NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(18)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
SQL> SELECT COUNT(*) FROM small_table;
COUNT(*)
----------
203
1 row selected.
<b>Check indexes (one on SMALL_TABLE, none on BIG_TABLE):</b>
SQL> @ind big_table
SQL>
SQL> @ind small_table
Distinct
Index Unique? Type Status Rows keys Column name
------------------------------ ------- ----------------- -------- ----------- ----------- -------------------------
SMALL_PK Y NORMAL VALID 203 203 OBJECT_ID
<b>Both tables are analyzed:</b>
SQL> SELECT table_name, num_rows FROM user_tables WHERE table_name IN ('SMALL_TABLE','BIG_TABLE');
TABLE_NAME NUM_ROWS
------------------------------ ----------
BIG_TABLE 390000
SMALL_TABLE 203
2 rows selected.
<b>Test NOT EXISTS query:</b>
SQL> SELECT COUNT(*)
2 FROM big_table b
3 WHERE NOT EXISTS
4 ( SELECT 1 FROM small_table s
5 WHERE s.object_id = b.object_id );
COUNT(*)
----------
380336
1 row selected.
Elapsed: 00:00:41.98
SQL> @xplan
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 404
| 1 | SORT AGGREGATE | | 1 | 4 |
|* 2 | FILTER | | | |
| 3 | TABLE ACCESS FULL | BIG_TABLE | 19500 | 78000 | 403
|* 4 | INDEX UNIQUE SCAN | SMALL_PK | 1 | 4 |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "SMALL_TABLE" "S"
WHERE "S"."OBJECT_ID"=:B1))
4 - access("S"."OBJECT_ID"=:B1)
Note: cpu costing is off
<b>Test NOT IN query:</b>
SQL> SELECT COUNT(*)
2 FROM big_table b
3 WHERE b.object_id NOT IN
4 ( SELECT s.object_id
5* FROM small_table s )
COUNT(*)
----------
380336
1 row selected.
Elapsed: 00:00:15.91
SQL> @xplan
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 403
| 1 | SORT AGGREGATE | | 1 | 8 |
| 2 | NESTED LOOPS ANTI | | 376K| 2944K| 403
| 3 | TABLE ACCESS FULL | BIG_TABLE | 390K| 1523K| 403
|* 4 | INDEX UNIQUE SCAN | SMALL_PK | 7 | 28 |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("B"."OBJECT_ID"="S"."OBJECT_ID")
Note: cpu costing is off
<b>Repeat in case the difference was to do with block caching etc:</b>
SQL> SELECT COUNT(*)
2 FROM big_table b
3 WHERE NOT EXISTS
4 ( SELECT 1 FROM small_table s
5* WHERE s.object_id = b.object_id );
COUNT(*)
----------
380336
1 row selected.
Elapsed: 00:00:54.58
SQL> SELECT COUNT(*)
2 FROM big_table b
3 WHERE b.object_id NOT IN
4* ( SELECT s.object_id FROM small_table s );
COUNT(*)
----------
380336
1 row selected.
Elapsed: 00:00:15.56
SQL>
|
|
|
Re: Fine tuning a delete statement [message #48396 is a reply to message #48372] |
Wed, 17 November 2004 22:38 |
kil
Messages: 10 Registered: April 2004
|
Junior Member |
|
|
So it should be :
delete from a
where not exists(select 1
from b
where b.filename = 'TEMP.DAT'
and b.unique_id = a.unique_id)
I think the not exists would be faster than the not in statement.
|
|
|
Re: Fine tuning a delete statement [message #48407 is a reply to message #48396] |
Thu, 18 November 2004 07:05 |
William Robertson
Messages: 1643 Registered: August 2003 Location: London, UK
|
Senior Member |
|
|
(Not sure what happened to my post above - looks fine in preview and on the test forum, so trying again...)
Yes, that should do it.
The problem with the earlier query arose if table B was (extreme example) empty. Leonard's original query,
DELETE FROM a
WHERE unique_id NOT IN
( SELECT unique_id
FROM b
WHERE filename='TEMP.DAT' );
would delete all rows from A, but
DELETE FROM a
WHERE EXISTS
( SELECT 1
FROM b
WHERE b.unique_id = a.unique_id
AND b.filename <> 'TEMP.DAT' );
or for that matter,
DELETE FROM a
WHERE unique_id IN
( SELECT unique_id
FROM b
WHERE filename <> 'TEMP.DAT' );
would delete nothing.
DELETE FROM a
WHERE NOT EXISTS
( SELECT 1
FROM b
WHERE b.unique_id = a.unique_id
AND b.filename = 'TEMP.DAT' );
is a valid variation on the original query. It might be faster, or slower, or come to the same thing, depending on data volumes, distribution, optimizer settings etc. See www.orafaq.com/msgboard/plsql/messages/19327.htm.
Actually now I think about it, NOT IN and NOT EXISTS are not be quite the same thing if b.unique_id can be null:
SQL> CREATE TABLE a AS SELECT rownum AS unique_id FROM user_objects;
Table created.
SQL> CREATE TABLE b AS SELECT unique_id, 'TEMP.DAT' AS filename FROM a;
Table created.
SQL> update b set unique_id = null where rownum = 1;
1 row updated.
SQL> SELECT unique_id FROM a
2 MINUS
3 SELECT unique_id FROM b;
UNIQUE_ID
----------
1
1 row selected.
SQL> commit;
Commit complete.
SQL> DELETE FROM a
1 WHERE unique_id NOT IN
2 ( SELECT unique_id
3 FROM b
4 WHERE filename='TEMP.DAT' );
<b>0 rows deleted.</b>
SQL> DELETE FROM a
2 WHERE NOT EXISTS
3 ( SELECT 1
4 FROM b
5 WHERE b.unique_id = a.unique_id
6* AND b.filename = 'TEMP.DAT' );
<b>1 row deleted.</b>
SQL> roll
Rollback complete.
SQL> DELETE FROM a
2 WHERE unique_id NOT IN
3 ( SELECT NVL(unique_id,-1)
4 FROM b
5* WHERE filename='TEMP.DAT' );
<b>1 row deleted.</b>
[Updated on: Fri, 18 February 2005 23:32] Report message to a moderator
|
|
|