Hi, Suhen
The following is a set of notes I have cut from various list messages on deleting duplicates. For the 60M rows you are talking about the first option looks the best (committing frequently). Can I suggest you disable archive logging if at all possible and you will need to rebuild tables/index afterwards. The other alternative to try is geeting exceptions into another table and deleting using rowid.
HTH
John
How about the following? This is what I use. For a huge table, you may want to do intermittent commits based on the rowcount.
declare
cursor get_dups is
select pk_col1, pk_col2, pk_col3, count(*)
from table
group by pk_col1, pk_col2, pk_col3
having count(*) > 1;
dupRec get_dups%rowtype;
begin
for dupRec in get_dups loop
delete from table
where pk_col1 = dupRec.pk_col1
and pk_col2 = dupRec.pk_col2
and pk_col3 = dupRec.pk_col3
and rownum = 1;
end loop;
end;
/
Identify duplicate records:
select COL1, COL2, COL#, COUNT(*)
from <OWNER>.<TABLE_NAME>
group by COL1, COL2, COL#
having count(*) > 1;
Remove duplicate records:
delete from <OWNER>.<TABLE_NAME> a
where rowid < (
select max(rowid)
from <OWNER>.<TABLE_NAME> b
where b.COL1 = a.COL1
and b.COL2 = a.COL2
and b.COL# = a.COL#);
===========================================
Not particularly fast, but it will work:
Create an exception table (you can use any valid table name):
create table dups (
row_id rowid,
owner varchar2(30),
table_name varchar2(30),
constraint varchar2(30))
tablespace temp;
Then try create a unique key which will be violated by the
duplicates which you are trying to remove:
alter table <your_table> add constraint <your_constraint>
unique
(
<column1>,
<column2>
)
exceptions into dups;
When you execute the alter table command, you will get an error message
telling you that there are duplicates. The dups table will contain
the rowids for the offending rows.
alter table my_table add constraint XYZ unique (col1, col2, col3) disable
;
create table my_table_dups (
dup_row rowid,
constr_name varchar2 (40),
constr_owner varchar2)
storage clause ......... ;
alter table my_table enable constraint XYZ exceptions into my_table_dups
;
This will not enable the unique key constraint if you have any duplicates
on (col1, col2, col3) of my_table, but will place info on only the
duplicates into table my_table_dups. Now since you have the rowid of the
duplicates, depending on how many you have, you may want to write a
PL/SQL construct to delete all duplicates from the original, or you could
simply code something like :
> delete from my_table where rowid in
(select rowid from my_table_dups
where constr_name='XYZ' and constr_owner = 'MY_USER_ID') ;
>commit;
Some times you will have a multi-million row table, in which case it will
be good to use PL/SQL and delete from both tables as you go (also good
for re-startability), committing every so often. This reduces likelihood
of rollback segment contention, also avoid the dreaded 'snapshot too old'
error and keeps all users happy since you are not locking thousands or
hundred thousands of rows waiting for a commit and the end of a single
DML.
Hope this helps. It is a different approach to this problem than many
types of SQL statements which can sometimes get very complicated. It
approaches the problem from a DDL perspective rather than DML in the
identification of duplicates.
Hi,
I think this is what you want,using the rowid you can achieve this
delete from table_name1
where rowid < (select max(rowid) from table_name2
where table_name1.col_name = table_name2.col_name)
table_name1 and table_name2 are the same table.
You can try to select before you delete
select * from table_name
where rowid < (select max(rowid) from table_name2
where table_name1.col_name = table_name2.col_name)
-----Original Message-----
From: Suhen Pather [mailto:Suhen.Pather_at_strandbags.com.au]
Sent: 12 September 01 05:50
To: Multiple recipients of list ORACLE-L
Subject: help with deleting duplicate records from very large table
List,
I need to delete duplicate records from a very large table (60 millions records +).
There would be about 3 million duplicate entries.
What is the quickest way to do this?
The syntax that I am using is
delete from invaudee
where rowid not in (select min(rowid) from invaudee
group by audit_number);
This is taking a long time to run. I cannot see any entries in v$transaction
for the delete.
There is no indexes on the INVAUDEE table.
I created an index on the primary key column but it still takes forever to run.
I do not have the space to CTAS.
Or should I write the duplicates to an EXCEPTIONS table and perform the delete based on the entries
in the EXCEPTIONS table.
Any help would be greatly appreciated.
Suhen
This email and any attachments may be confidential and the subject of
legal professional privilege. Any disclosure, use, storage or copying
of this email without the consent of the sender is strictly prohibited.
Please notify the sender immediately if you are not the intended
recipient and then delete the email from your inbox and do not
disclose the contents to another person, use, copy or store the
information in any medium.
Received on Wed Sep 12 2001 - 02:29:26 CDT