Merge Join (Cartesian)

From: Ty <housequake_at_yahoo.com>
Date: 22 Apr 2003 08:52:01 -0700
Message-ID: <55a9978a.0304220752.6cbfc9a5@posting.google.com>

I have a perplexing problem involving cartesian joins (Oracle 8.1.7.4). The following query will run in around a minute or so:

SELECT nci.health_service_id, nci.i_product_id /* there is an index on these two columns */

FROM ncpdp_claim_item nci, claim c, drug_clinical dc, tlp t WHERE nci.i_product_id = dc.product_id

      AND dc.jurisdiction_cd = ' '
      AND dc.rec_eff_dt_comp = (SELECT MIN(rec_eff_dt_comp)
                                    FROM drug_clinical
                                    WHERE jurisdiction_cd = ' '
                                      AND product_id = dc.product_id
                                      AND rec_inactive_flag = 'A')
      AND dc.seq_no = (SELECT MIN(seq_no)
                           FROM drug_clinical
                           WHERE jurisdiction_cd = ' '
                             AND product_id = dc.product_id
                             AND rec_inactive_flag = 'A'
                             AND rec_eff_dt_comp = dc.rec_eff_dt_comp)
      AND nci.health_service_id = c.health_service_id
      AND nci.health_service_id = t.hsn

The explain plan is:

SELECT STATEMENT Optimizer=CHOOSE (Cost=873 Card=2 Bytes=142) FILTER
NESTED LOOPS (Cost=873 Card=2 Bytes=142)

      HASH JOIN (Cost=871 Card=2 Bytes=120)
        INDEX (FAST FULL SCAN) OF DRUG_CLINICAL_MIDX03 (NON-UNIQUE)


(Cost=4 Card=1 Bytes=29)

        NESTED LOOPS (Cost=866 Card=24658 Bytes=764398)
          INDEX (FAST FULL SCAN) OF NCPDP_CLAIM_ITEM_MIDX02


(NON-UNIQUE) (Cost=866 Card=2755100 Bytes=57857100)

          INDEX (UNIQUE SCAN) OF TLP_MIDX01 (UNIQUE)
      INDEX (UNIQUE SCAN) OF CLAIM_IDX1 (UNIQUE) (Cost=1 Card=2754328

Bytes=30297608)

SORT (AGGREGATE)

      FIRST ROW (Cost=2 Card=2 Bytes=48)
        INDEX (RANGE SCAN (MIN/MAX)) OF DRUG_CLINICAL_MIDX04

(NON-UNIQUE) (Cost=2 Card=2)

SORT (AGGREGATE)

      FIRST ROW (Cost=2 Card=1 Bytes=31)
        INDEX (RANGE SCAN (MIN/MAX)) OF DRUG_CLINICAL_MIDX04

(NON-UNIQUE) (Cost=2 Card=1)

I now alter the query by adding a column from the same table to the select list:

SELECT nci.health_service_id, nci.i_product_id, nci.i_rx_no /* there is no index on these three columns */

FROM ncpdp_claim_item nci, claim c, drug_clinical dc, tlp t WHERE nci.i_product_id = dc.product_id

      AND dc.jurisdiction_cd = ' '
      AND dc.rec_eff_dt_comp = (SELECT MIN(rec_eff_dt_comp)
                                    FROM drug_clinical
                                    WHERE jurisdiction_cd = ' '
                                      AND product_id = dc.product_id
                                      AND rec_inactive_flag = 'A')
      AND dc.seq_no = (SELECT MIN(seq_no)
                           FROM drug_clinical
                           WHERE jurisdiction_cd = ' '
                             AND product_id = dc.product_id
                             AND rec_inactive_flag = 'A'
                             AND rec_eff_dt_comp = dc.rec_eff_dt_comp)
      AND nci.health_service_id = c.health_service_id
      AND nci.health_service_id = t.hsn

The explain plan is:

SELECT STATEMENT Optimizer=CHOOSE (Cost=17528 Card=2 Bytes=152) FILTER
NESTED LOOPS (Cost=17528 Card=2 Bytes=152)

      NESTED LOOPS (Cost=17526 Card=2 Bytes=130)
        MERGE JOIN (CARTESIAN) (Cost=6 Card=5840 Bytes=227760)
          INDEX (FAST FULL SCAN) OF DRUG_CLINICAL_MIDX03 (NON-UNIQUE)


(Cost=4 Card=1 Bytes=29)

          SORT (JOIN) (Cost=2 Card=5840 Bytes=58400)
            TABLE ACCESS (FULL) OF TLP (Cost=2 Card=5840 Bytes=58400)
        TABLE ACCESS (BY INDEX ROWID) OF NCPDP_CLAIM_ITEM (Cost=3
Card=2755100 Bytes=71632600)
          INDEX (RANGE SCAN) OF NCPDP_CLAIM_ITEM_MIDX02 (NON-UNIQUE)


(Cost=2 Card=2755100)

      INDEX (UNIQUE SCAN) OF CLAIM_IDX1 (UNIQUE) (Cost=1 Card=2754328

Bytes=30297608)

SORT (AGGREGATE)

      FIRST ROW (Cost=2 Card=2 Bytes=48)
        INDEX (RANGE SCAN (MIN/MAX)) OF DRUG_CLINICAL_MIDX04

(NON-UNIQUE) (Cost=2 Card=2)

SORT (AGGREGATE)

      FIRST ROW (Cost=2 Card=1 Bytes=31)
        INDEX (RANGE SCAN (MIN/MAX)) OF DRUG_CLINICAL_MIDX04

(NON-UNIQUE) (Cost=2 Card=1)

Adding the column to the select list causes a cartesian join and the query will run for hours before it finally fails with ORA-01555 "Snapshot too old". It seems odd to me that adding a column from the same table to the select list could have such a devasting effect on performance. Have any of you gurus seen anything like this? Oracle support has told me that this is a normal sql tuning issue. Perhaps it is, but I would still like to know what is going on here...

Thanks for the help,
Ty
housequake_at_yahoo.com Received on Tue Apr 22 2003 - 10:52:01 CDT