Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> performance tuning questions: replace IN (values) by JOIN
Hi,
I am trying to improve an SQL query design while still keeping its performance.
DB Model
The relation is: a BAG has 1 or more BAG_LOG(s).
The BAG_LOG table has a DTM_TIMESTAMP column.
There are 40000 rows in BAGS and 170000 in BAG_LOG.
There's no specific INDEX appart the ones generated automatically on the primary and foreign key.
Query to improve
Let's say I want to find all BAGS who have had a TIMESTAMP within a specific period. Additional criteria will add later on.
One solution is to use 3 queries:
select DISTINCT(BAG_ID) FROM BAG_LOG
WHERE DTM_TIMESTAMP >= (TO_DATE('2003-19-08 0:00:00', 'yyyy-dd-mm HH24:MI:SS')) AND DTM_TIMESTAMP <= (TO_DATE('2003-19-082:00:00', 'yyyy-dd-mm HH24:MI:SS'));
SELECT * FROM BAGS where ID IN
(346564,346565,346566,346567,346568,346569);
SELECT * FROM BAG_LOG where BAG_ID IN
(346564,346565,346566,346567,346568,346569);
The first query find the BAG IDs who satisfy the condition, the second and third queries find the rows of the BAG and BAG_LOG table that match the result of the first query. The second and third query are built dynamically by the calling program, based on the results returned by the first query.
This has two drawbacks:
- it doesn't exploit SQL capabilities
- by using IN (list of values) one can hit a limitation if the first
query returns more than a certain number of rows. (Oracle 9 has a
default limit of 1000 possible values).
It has one advantage
- it is very simple SQL and will probably work on all DB I know.
Possible improvement?
The 3 queries are joined into one. The columns are renamed so to be able to retrieve the values by column name (otherwise identical BAGS and BAG_LOG column names conflict).
select BAGS.ID as BAGS__ID, [...],
BAG_LOG.* FROM BAGS, BAG_LOG WHERE (BAGS.ID = BAG_LOG.BAG_ID AND BAGS.ID IN (select DISTINCT(BAG_ID) FROM BAG_LOG WHERE DTM_TIMESTAMP >= (TO_DATE('2003-19-08 00:00:00', 'yyyy-dd-mm HH24:MI:SS')) AND DTM_TIMESTAMP <= (TO_DATE('2003-19-08 12:00:00', 'yyyy-dd-mm HH24:MI:SS')) ) ) ORDER BY BAGS.ID DESC, BAG_LOG.DTM_TIMESTAMP DESC;
replace [...] by the full list of fields from the BAGS table one wants to retrieve. In my case all.
[Note that I also added sorting in that query, but it doesn't seem to affect the performance in my case].
Performance results & analysis:
Here are some info using the autotrace function:
Elapsed: 00:00:07.01
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (UNIQUE) 2 1 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics
0 recursive calls
0 db block gets
34199 consistent gets
34184 physical reads
0 redo size
456 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
6 rows selected.
Elapsed: 00:00:00.03
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 CONCATENATION 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 3 2 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 4 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 5 4 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 6 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 7 6 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 8 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 9 8 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 10 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 11 10 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 12 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 13 12 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) Statistics
0 recursive calls
0 db block gets
18 consistent gets
0 physical reads
0 redo size
915 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed
12 rows selected.
Elapsed: 00:00:07.07
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics
0 recursive calls
0 db block gets
34200 consistent gets
34185 physical reads
0 redo size
1774 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12 rows processed
12 rows selected.
Elapsed: 00:00:18.06
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (ORDER BY) 2 1 MERGE JOIN 3 2 SORT (JOIN) 4 3 NESTED LOOPS 5 4 TABLE ACCESS (FULL) OF 'BAG_LOG' 6 4 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS' 7 6 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) 8 2 SORT (JOIN) 9 8 VIEW OF 'VW_NSO_1' 10 9 SORT (UNIQUE) 11 10 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics
0 recursive calls
26 db block gets
345418 consistent gets
71578 physical reads
0 redo size
2422 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
3 sorts (memory)
1 sorts (disk)
12 rows processed
I clearly see that my second solution uses much more memory (consistent gets is about 3 times the total used by the first 3 queries), and also do 2 more sorts in memory and one in the disk.
But I do not see any obvious design flaw in my query....
Any help appreciated...
Cheers,
Jerome Received on Fri Aug 29 2003 - 08:40:24 CDT