Re: Non-duplicate test on LARGE table
Date: Tue, 27 May 2008 09:24:31 -0700
Message-ID: <1211905484.642977@bubbleator.drizzle.com>
mattsteel wrote:
> Hello.
> My question is a little broad but maybe you already saw something like
> this ;-)
>
> Here is my scenario:
> Let us have a large table which contains over 20 million records and
> search for rows that are duplicated under some rule.
> The column involved in that comparison rule, named AREA, is declared
> VARCHAR2(4000) NOT NULL.
>
> Seeing that an usual index built on column AREA would not be so good,
> I tried to add a column, HASHVALUE, which always contains a suitable
> hash-value given by the standard "dbms_utility.get_hash_value"
> function.
>
> Now that I have a much smaller column to search through for
> duplicates, should I put an index on column HASHVALUE, do you think
> overall performance would be improved or not?
>
> Thank you in advance.
>
> Matt.
Let's see what we are missing:
1. Version information 2. DDL 3. DML 4. Explain Plan output
About the only thing I can say is that 20M rows is not that large and the answer to your question with respect to overall performance is complex and can only be solved through testing.
That said ... if all you want to do is find duplicates create a disabled unique constraint, use DBMS_ERRLOG to create an error table and try enabling it ... or do a simple query using GROUP BY with HAVING COUNT(*) > 1.
-- Daniel A. Morgan Oracle Ace Director & Instructor University of Washington damorgan_at_x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.orgReceived on Tue May 27 2008 - 11:24:31 CDT