RE: Temporary space needed to create a constraint

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 12 Jul 2012 11:43:03 -0400
Message-ID: <011a01cd6045$0714b560$153e2020$_at_rsiz.com>



I seem to be missing bits of this thread, such as the ddl. So while I'm pretty doggone sure the comments of JL in the thread are spot on, the apparent leaps in reasoning are hard to follow with the some of the thread missing. 3 possibilities come to mind: 1) an inadvertently private communication off thread, 2) a different subject line, 3) my email is broken somehow.

Since I find this thread interesting, I'd appreciate the bits I'm missing.

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Peter Hitchman
Sent: Thursday, July 12, 2012 9:11 AM
To: oracle-l
Subject: Re: Temporary space needed to create a constraint

Hi,
When I cam in this morning , I enabled the constraint novalidate and then ran this:

SELECT DOCUMENT_ID, PATENT_AP_PERMID, LOGICAL_UNIT, REFID, NAME_TYPE, ORGANIZATIONpERMiD,WCAID FROM OL_PATENT_ORGANIZATION GROUP BY DOCUMENT_ID, PATENT_AP_PERMID, LOGICAL_UNIT, REFID, NAME_TYPE,ORGANIZATIONpERMiD,WCAID HAVING COUNT(*) >1 It took it 1.5 hours to find one duplicate row, using parallel slaves to do a FTS. OK not quick but a lot better than what was happening before.

With a plan like this (numbers from a test system)

| Id | Operation | Name

               | Rows | Bytes | Cost (%CPU)| Time | TQ
|IN-OUT| PQ Distrib |





| 0 | SELECT STATEMENT |
         |  1098 | 65880 |    32   (4)| 00:00:01 |             |

| |
| 1 | PX COORDINATOR |
| | | | | | | |

| 2 | PX SEND QC (RANDOM) | :TQ10001
 | 1098 | 65880 | 32 (4)| 00:00:01 | Q1,01 | P->S | QC (RAND)
|
|* 3 | FILTER |
                    |           |          |              |
    |  Q1,01 | PCWC |            |

| 4 | HASH GROUP BY |
| 1098 | 65880 | 32 (4)| 00:00:01 | Q1,01 | PCWP | |
| 5 | PX RECEIVE |
| 21948 | 1286K| 31 (0)| 00:00:01 | Q1,01 | PCWP | |
| 6 | PX SEND HASH | :TQ10000
| 21948 | 1286K| 31 (0)| 00:00:01 | Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR |
| 21948 | 1286K| 31 (0)| 00:00:01 | Q1,00 | PCWC | |
| 8 | TABLE ACCESS FULL | OL_PATENT_ORGANIZATION | 21948 |
1286K| 31 (0)| 00:00:01 | Q1,00 | PCWP | | ---------------------------------------------------------------------------- ----------------------------------------------------

A query like this:

SELECT /*+ INDEX(A OL_PATENT_ORGANIZATION_PK) */ DOCUMENT_ID, PATENT_AP_PERMID, LOGICAL_UNIT, REFID, NAME_TYPE, ORGANIZATIONpERMiD,WCAID FROM OL_PATENT_ORGANIZATION A GROUP BY DOCUMENT_ID, PATENT_AP_PERMID, LOGICAL_UNIT, REFID, NAME_TYPE,ORGANIZATIONpERMiD,WCAID HAVING COUNT(1) >1 produces this explain plan:





| Id | Operation | Name
                        | Rows        | Bytes | Cost (%CPU)| Time

|




| 0 | SELECT STATEMENT |
                 |    85M       |  5807M|    19M  (1)   | 32:31:45 |

|* 1 | FILTER |
| | | | |
| 2 | SORT GROUP BY NOSORT|
| 85M | 5807M| 19M (1) | 32:31:45 |
| 3 | INDEX FULL SCAN | OL_PATENT_ORGANIZATION_PK |
1715M | 113G | 19M (1) | 32:31:45 | ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- -------------------

and on this system it looks like using the parallel slaves is quicker than scanning the index, because this query is going to take about 2.5 hours to complete.

Thanks for your help.

Regards
Pete

On Thu, Jul 12, 2012 at 10:51 AM, Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk> wrote:

> Peter,
>
> If  "half as much" is a reasonably accurate measure then it makes sense.
> I'm going to do a blog about this (using a small table of my own) to 
> point out a few idiocies, but the arithmetic is as follows:
>
> The major code is:
>     hash join
>         fast full scan of index (to build hash table)
>         aggregate of fast full scan (to probe hash table)
>
> The hash table probably spills to disc because it has to hold ALL the 
> data from the index before the next step can happen The aggregate of 
> the index probably spills to disc because (using a fast full scan) it 
> has to spill to disc before it completes.
>
> The "ordered" hint in the SQL seems particularly silly, as we could 
> assume that we expect virtually no duplicates (relatively) so want to 
> aggregate first and then nested loop to find the few matches.  If we 
> aggregate first we could choose to do a serial index full scan and 
> aggregate without sorting or hashing - the cost is then the cost of 
> serially reading the index (which wouldn't be quick, but could be a 
> better bet than all the I/O you're currently doing).
>
> Regards
>
> Jonathan Lewis
> http://jonathanlewis.wordpress.com/all_postings
>
> Author: Oracle Core (Apress 2011)
> http://www.apress.com/9781430239543
>
--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 12 2012 - 10:43:03 CDT

Original text of this message