performance tuning questions: replace IN (values) by JOIN

From: Jérôme Lacoste - CoffeeBreaks <lacostej_at_coffebreaks.org>
Date: Fri, 29 Aug 2003 13:40:24 GMT
Message-ID: <cJI3b.17331$BD3.2757926@juliett.dax.net>

Hi,

I am trying to improve an SQL query design while still keeping its performance.

DB Model

I have 2 tables, one called BAGS the other called BAG_LOG. Each has a primary key called ID. The BAG_LOG table has a foreign key called BAG_ID pointing to BAGS.ID.

The relation is: a BAG has 1 or more BAG_LOG(s).

The BAG_LOG table has a DTM_TIMESTAMP column.

There are 40000 rows in BAGS and 170000 in BAG_LOG.

There's no specific INDEX appart the ones generated automatically on the primary and foreign key.

Query to improve

Let's say I want to find all BAGS who have had a TIMESTAMP within a specific period. Additional criteria will add later on.

One solution is to use 3 queries:

select DISTINCT(BAG_ID) FROM BAG_LOG

                    WHERE DTM_TIMESTAMP >= (TO_DATE('2003-19-08 
0:00:00', 'yyyy-dd-mm HH24:MI:SS'))
                      AND DTM_TIMESTAMP <= (TO_DATE('2003-19-08

2:00:00', 'yyyy-dd-mm HH24:MI:SS'));

SELECT * FROM BAGS where ID IN
(346564,346565,346566,346567,346568,346569);

SELECT * FROM BAG_LOG where BAG_ID IN
(346564,346565,346566,346567,346568,346569);

The first query find the BAG IDs who satisfy the condition, the second and third queries find the rows of the BAG and BAG_LOG table that match the result of the first query. The second and third query are built dynamically by the calling program, based on the results returned by the first query.

This has two drawbacks:
- it doesn't exploit SQL capabilities
- by using IN (list of values) one can hit a limitation if the first query returns more than a certain number of rows. (Oracle 9 has a default limit of 1000 possible values).

It has one advantage
- it is very simple SQL and will probably work on all DB I know.

Possible improvement?

The 3 queries are joined into one. The columns are renamed so to be able to retrieve the values by column name (otherwise identical BAGS and BAG_LOG column names conflict).

select BAGS.ID as BAGS__ID, [...],

        BAG_LOG.* FROM BAGS, BAG_LOG
        WHERE (BAGS.ID = BAG_LOG.BAG_ID AND
        BAGS.ID IN (select DISTINCT(BAG_ID) FROM BAG_LOG
                    WHERE DTM_TIMESTAMP >= (TO_DATE('2003-19-08 
00:00:00', 'yyyy-dd-mm HH24:MI:SS'))
                      AND DTM_TIMESTAMP <= (TO_DATE('2003-19-08 
12:00:00', 'yyyy-dd-mm HH24:MI:SS'))
                   )
              ) ORDER BY BAGS.ID DESC, BAG_LOG.DTM_TIMESTAMP DESC;

replace [...] by the full list of fields from the BAGS table one wants to retrieve. In my case all.

[Note that I also added sorting in that query, but it doesn't seem to affect the performance in my case].

Performance results & analysis:

(1) set of queries takes about 14 sec to perform
(2) takes 18 secs to perform, a 25% performance loss

Here are some info using the autotrace function:

(1) -------- 6 rows selected.

Elapsed: 00:00:07.01

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE

1 0 SORT (UNIQUE) 2 1 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics

0 recursive calls

0 db block gets

34199 consistent gets

34184 physical reads

0 redo size

456 bytes sent via SQL*Net to client

499 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

6 rows processed

6 rows selected.

Elapsed: 00:00:00.03

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE

1 0 CONCATENATION 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 3 2 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 4 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 5 4 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 6 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 7 6 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 8 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 9 8 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 10 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 11 10 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 12 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 13 12 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) Statistics

0 recursive calls

0 db block gets

18 consistent gets

0 physical reads

0 redo size

915 bytes sent via SQL*Net to client

499 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

6 rows processed

12 rows selected.

Elapsed: 00:00:07.07

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE

1 0 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics

0 recursive calls

0 db block gets

34200 consistent gets

34185 physical reads

0 redo size

1774 bytes sent via SQL*Net to client

499 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

12 rows processed

(2) ------

12 rows selected.

Elapsed: 00:00:18.06

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE

1 0 SORT (ORDER BY) 2 1 MERGE JOIN 3 2 SORT (JOIN) 4 3 NESTED LOOPS 5 4 TABLE ACCESS (FULL) OF 'BAG_LOG' 6 4 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 7 6 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 8 2 SORT (JOIN) 9 8 VIEW OF 'VW_NSO_1' 10 9 SORT (UNIQUE) 11 10 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics

0 recursive calls

26 db block gets

345418 consistent gets

71578 physical reads

0 redo size

2422 bytes sent via SQL*Net to client

499 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

3 sorts (memory)

1 sorts (disk)

12 rows processed

I clearly see that my second solution uses much more memory (consistent gets is about 3 times the total used by the first 3 queries), and also do 2 more sorts in memory and one in the disk.

But I do not see any obvious design flaw in my query....

Questions --
Q: Is the performance loss due to the NESTED LOOPS? or is it due to the JOIN in itself ? Q: I was wondering what kind of query should I use in order to manage to improve the design of my query (1) especially removing its limitations without introducing the performance loss as with my second query. Should I use a temporary table? Q: [slightly unrelated] How one can enable the autotrace to also display
(Cost, Card and Byte) as shown in most of Oracle examples?

Any help appreciated...

Cheers,

Jerome Received on Fri Aug 29 2003 - 08:40:24 CDT