Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Removing duplicated rows revisited - I'm feeln' mighty lame
I was involved in Airlines travel late yesterday so forgive me if I miss the
mark: 1) If I remember right an exists clause fires its subquery once for
every row returned in the parent query so I suspect that your subquery is
being fired once for every row in the call_table . The explain output would
show if this is how Oracle is doing the delete. Since you have already
extracted one copy of the rows you want to delete I think opening the temp
table as a cursor in a pl/sql block would allow you to issue a delete once
for each of your 34,000 captured rows which would save Oracle a ton of work.
You could possibly get similiar results by turing the exists into a in clause
on your temp_table if you do not want to use pl/sql, but I think that is the
best bet. 2) You might consider building an index on every column in your
temp table select. This should allow Oracle to use an index to perform the
group by, but the time to build the index plus the actual build may not save
you anything over the current actual build if this is a one-time job, and you
do not want to keep a large multi-column index around just for once or twice
a year clean up. But, you should still consider it, if there are a lot of
other columns in your call_table.
In article <36ee69fc.4304658_at_usenet.acw.vcu.edu>,
bgwillia_at_vcu.edu (Boyce G. Williams, Jr.) wrote:
> >
> I found a table where there is a lot of duplicated telephone calls in
> it. To be more precise: 34,000 duplicate calls from a table containing
> 1.5 million rows. Some were posted the same day as its duplicate,
> others were posted on different days. I wrote a series of queries to
> extract the first posted call of a duplicated set, delete all the
> duplicated calls, then reinsert the first call back in. My concern is
> the actual delete step: it uses a correlated subquery. Previous
> experience with our machine (a RS/6000 running Oracle) showed me it
> has a hard time working with correlated subqueries of any kind. At
> one point I ran a SELECT correlated subquery for six hours without it
> ever completing. When I broke up the query into two, the inner query
> being a VIEW and the outer query being a SELECT based on the VIEW, it
> took two hours and completed. Go figure.
>
> I'm presenting what I'm planning to do in the hopes someone may give
> advice on improving it. I'm mainly concerned about the DELETE step
> since every source I've found on deleting rows from one table based on
> another uses the correlated subquery method. Maybe someone knows a
> better way...
>
> --
> -- create a table where the first call of a duplicated set is
> -- stored based on the earliest posted date
> --
> DROP TABLE TEMP_TABLE
> CREATE TABLE TEMP_TABLE AS
> SELECT DISTINCT
> person_id ,
> date_called ,
> origin_phone ,
> destination_phone ,
> duration_minutes ,
> duration_seconds ,
> MIN(date_posted) AS date_posted
> FROM CALL_TABLE
> GROUP BY person_id ,
> date_called ,
> origin_phone ,
> destination_phone ,
> duration_minutes ,
> duration_seconds
> HAVING Count(*)>1 ;
> --
> -- Delete all duplicated calls from a table
> --
> DELETE FROM CALL_TABLE
> WHERE EXISTS
> ( SELECT *
> FROM CALL_TABLE Tmp
> GROUP BY Tmp.person_id ,
> Tmp.date_called ,
> Tmp.destination_phone ,
> Tmp.origin_phone ,
> Tmp.duration_minutes ,
> Tmp.duration_seconds
> HAVING Count(*)>1
> And Tmp.person_id = person_id
> And Tmp.date_called = date_called
> And Tmp.destination_phone = destination_phone
> And Tmp.origin_phone = origin_phone
> And Tmp.duration_minutes = duration_minutes
> And Tmp.duration_seconds = duration_seconds ) ;
> --
> -- Reinsert the first posted record
> --
> INSERT INTO CALL_TABLE
> ( person_id ,
> date_called ,
> origin_phone ,
> destination_phone ,
> duration_minutes ,
> duration_seconds ,
> date_posted )
> SELECT person_id ,
> date_called ,
> origin_phone ,
> destination_phone ,
> duration_minutes ,
> duration_seconds ,
> date_posted
> FROM TEMP_TABLE ;
> --
> -- end of procedure
> --
>
> Thanks in advance,
> Boyce G. Williams, Jr.
> Boyce G. Williams, Jr.
>
> .--------------------------------------------------------------------.
> | "People should have two virtues: purpose- the courage to envisage |
> | and pursue valued goals uninhibited by the defeat of infantile |
> | fantasies, by guilt and the failing fear punishment; and wisdom- a|
> | detached concern with life itself, in the face of death itself." |
> | Norman F. Dixon|
> '--------------------------------------------------------------------'
>
Mark D. Powell -- The only advice that counts is the advice that you follow so follow your own advice --
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own Received on Thu Mar 18 1999 - 15:30:41 CST
![]() |
![]() |