Tuning "BETWEEN" Queries
A common sight in databases is a table that contains the start and end values of a range - usually dates. One or both of the dates are typically part of the primary key; sometimes they are the entire key. Some examples:
- History tables
History tables record the changes made to a row over time. Instead of updating a row and losing the prior values of each non-key attribute, a new row in inserted. Each row is differentiated by two columns that record the date the row was created (
START_DATE
) and the date it was superceded (END_DATE
).The primary key of a history table is typically a non-unique identifier combined with the
START_DATE
, with an unenforced constraint that the start and end dates of rows for any given identifier are both contiguous and non-overlapping. - Range tables
Range tables are tables where the start and end values of the range form the entire primary key. For example, a table to determine the delivery method depending on volume ordered:
LOWER_BOUND UPPER_BOUND DELIVERY_METHOD ----------- ----------- -------------------- 0 12 Mail 13 144 Courier 145 10000 Curtain-side truck 10001 50000 Semi-trailer 50001 99999999 Ship
Like history tables, range tables typically have an unenforced constraint that the lower and upper bounds of successive ranges are both contiguous and non-overlapping.
The Problem
A common feature of History and Range Tables is the way they are queried:
History Table | Range Table |
---|---|
SELECT * FROM cust_hist WHERE cust_num = :a AND :b BETWEEN cust_eff_date AND cust_exp_date |
SELECT * FROM delivery_method WHERE :b BETWEEN lower_bound AND upper_bound |
A BETWEEN
clause is internally expanded out into separate <
/ >
clauses. For example:
SELECT * FROM delivery_method WHERE lower_bound <= :b AND upper_bound >= :b
The problem is that an index can only scan on one column with a range predicate (<
, >
, LIKE
, BETWEEN
). So even if an index contained both the lower_bound
and upper_bound
columns, the index scan will return all of the rows matching lower_bound <= :b
, and then filter the rows that do not match upper_bound >= :b
.
In the case where the sought value is somewhere in the middle, the range scan will return half of the rows in the table in order to find a single row. In the worst case where the most commonly sought rows are at the top (highest values), the index scan will process almost every row in the table for every lookup.
This type of behaviour can usually be detected in Explain Plan's Access Predicates and Filter Predicates. Access Predicates are clauses used to find rows in a structure (such as an index); non-matching rows are not read. Filter Predicates are clauses used to filter rows after they have been found but before they are projected to the next step of the plan. Explain Plan produces confusing results with unbounded range scans (like we are doing here). Note how both clauses appear as both access predicates and filter predicates:
1 SELECT * 2 FROM delivery_method 3* where :b BETWEEN lower_bound AND upper_bound ------------------------------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | 1 | |* 2 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DELIVERY_METHOD"."UPPER_BOUND">=TO_NUMBER(:Z) AND "DELIVERY_METHOD"."LOWER_BOUND"<=TO_NUMBER(:Z)) filter("DELIVERY_METHOD"."LOWER_BOUND"<=TO_NUMBER(:Z) AND "DELIVERY_METHOD"."UPPER_BOUND">=TO_NUMBER(:Z))
Explain Plan appears to be telling us that lower_bound <= :b
is used as both an Access Predicate and a Filter Predicate, where in fact it is only an Access Predicate. Similarly, upper_bound >= :b
appears to be both an Access and a Filter Predicate, where in fact it is only a Filter Predicate. More distressingly, the Performance Tuning manual seems to also claim that we can scan on the second column of an index after a range predicate on the first column:
The optimizer uses a range scan when it finds one or more leading columns of an index specified in conditions, such as the following:
- col1 = :b1
- col1 < :b1
- col1 > :b1
- AND combination of the preceding conditions for leading columns in the index
- col1 like 'ASD%' wild-card searches should not be in a leading position otherwise the condition col1 like '%ASD' does not result in a range scan.
To dispel any doubts, a test is in order to demonstrate that the Access Predicate on the second column in the index is not resulting in reduced IO:
- First, create some test data
SQL> create table delivery_method ( 2 lower_bound number(*) 3 ,upper_bound number(*) 4 ,delivery_method varchar2(50) 5 ); Table created. SQL> SQL> insert /*+append*/ into delivery_method 2 select rownum 3 , 1 4 , dbms_random.string('A', 25) 5 from dual 6 connect by level <= 100000 7 / 100000 rows created. SQL> SQL> create index delivery_method_pk 2 on delivery_method(lower_bound, upper_bound) 3 / Index created.
- Run a baseline query with an unbounded range scan on the first indexed column and filters on the second indexed column. The not-equal predicate is definitely a filter; even Explain Plan agrees with this:
SQL> set autotrace on SQL> select /*+index(delivery_method)*/ * 2 from delivery_method 3 where lower_bound >= 0 4 and upper_bound <> 1 5 / no rows selected ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 53 | 2 (50)| | 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | 1 | 53 | 2 (50)| |* 2 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | | 3 (34)| ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DELIVERY_METHOD"."LOWER_BOUND">=0) filter("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND"<>1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 132 consistent gets 132 physical reads 0 redo size 417 bytes sent via SQL*Net to client 456 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
- Now run a query with an unbounded range on both columns in the index. If Oracle is truly using the 2nd column as an Access Predicate, then it will perform fewer Consistent Gets or Physical Reads. If not, they should be the same.
SQL> select /*+index(delivery_method)*/ * 2 from delivery_method 3 where lower_bound >= 0 4 and upper_bound > 1 5 / no rows selected ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 53 | 2 (50)| | 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | 1 | 53 | 2 (50)| |* 2 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | | 3 (34)| ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND">1) filter("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND">1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 132 consistent gets 132 physical reads 0 redo size 417 bytes sent via SQL*Net to client 456 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Same result. Even though the second query contained the range predicateupper_bound > 1
, which should filter out EVERY row since every row contains the value '1', it still processed the same amount of IO as the filter. In order to get a level playing field, I performed some additional steps to flush the buffer cache that are not shown here . I also pasted theDBMS_XPLAN
instead of the inferiorAUTOTRACE
plan. Your results will be no less compelling, but may differ slighlty if you repeat these steps.
The problem is similar but less serious for the History Table. By prefixing the index with the identifier (cust_num
in the example above), Oracle will range-scan the index to return the rows with matching identifier and cust_eff_date
before filtering rows with non-matching cust_exp_date
. Unlike the range table where we could inadvertently scan the entire table, the cost of this scan will never exceed the number of rows that match the non-unique identifier.
Indexing range columns
Despite some of the tips below that help to avoid scanning non-matching rows for range-based queries, it is somtimes unavoidable to scan every row. Ensure that both the start and the end columns of the range are included in the same index; this allows Oracle to filter out non-matching rows as they are retrieved from the index without having to lookup the table.
Note in the example below that the filter predicate is performed against the TABLE ACCESS
step when the index does not contain both dates.
SQL> create index delivery_method_pk 2 on delivery_method(lower_bound) 3 / Index created. ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 53 | 2 (50)| |* 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | 1 | 53 | 2 (50)| |* 2 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | | 3 (34)| ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("DELIVERY_METHOD"."UPPER_BOUND">=3000) 2 - access("DELIVERY_METHOD"."LOWER_BOUND"<=3000)
Looking at the TK*Prof output of this query, we can see that the index passed 301 rows back to the TABLE ACCESS
, which filtered out 300 of them to leave a single row.
select /*+index(delivery_method)*/ * from delivery_method where 3000 BETWEEN lower_bound AND upper_bound call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.00 3 4 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.01 0.00 3 4 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 1 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 301 INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326005)
By adding both range columns to the index, Oracle still reads the 301 rows, but the filtering is done in the index; only one row is retrieved from the table.
SQL> create index delivery_method_pk 2 on delivery_method(lower_bound, upper_bound) 3 / Index created. select /*+index(delivery_method)*/ * from delivery_method where 3000 BETWEEN lower_bound AND upper_bound call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.04 3 4 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.00 0.04 3 4 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 1 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 1 INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326178)
Note the trap in interpretation here: accoring to TK*Prof, it appears as if a single row was processed by the INDEX RANGE SCAN
. In fact, a single row was returned from the index range scan; it actually processed 301 rows and filtered 300 of them but TK*Prof does not show this. When Oracle performs an Access and a Filter in a single step, it is impossible to tell without further testing how many rows were accessed and how many were filtered.
Single-row Selects
When ranges are known to be non-overlapping, queries of the style shown above will not return more than a single row. Unfortunately, non-overlapping ranges cannot be enforced by a database constraint or a unique index, so Oracle resolves the query as if it can return more than one row.
One tuning technique for single-row selects on non-overlapping ranges involves the use of ROWNUM = 1
to tell Oracle that only a single row is required.
SELECT * FROM delivery_method WHERE :b BETWEEN lower_bound AND upper_bound AND ROWNUM = 1
But this is not enough. ROWNUM = 1
will short-circuit the query so that it does not go looking for subsequent rows, but this assumes that a matching row is found at the beginning of the search. If we have to scan and filter out a large number of non-matching rows then there is no saving at all.
An index on (lower_bound, upper_bound)
will be stored in ascending order of lower_bound
. Remember that the BETWEEN
predicate is expanded out to an index access predicate (lower_bound <= :b
) and a filter predicate of (upper_bound >= :b)
: if the ranges are non-overlapping then of all the rows with lower_bound <= :b
, only the last one scanned will have upper_bound >= :b
.
select /*+index(delivery_method)*/ * from delivery_method where 3000 BETWEEN lower_bound AND upper_bound and rownum = 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.00 3 3 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.00 0.00 3 3 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 1 COUNT STOPKEY 1 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 301 INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326010)
Note that 301 rows were still scanned in the index. In this case, the ROWNUM = 1
is useless because we don't find a match for the filter predicates until the last row. To make ROWNUM = 1
effective, we need to ensure that the matching row is the first one scanned, not the last. This is possible with a descending index scan (/*+INDEX_DESC(..) */ hint
), but it is simpler and more robust to simply reverse the order of columns in the index. If upper_bound
precedes lower_bound
in the index, then upper_bound >= :b
becomes the Access Predicate and lower_bound <= :b
becomes the Filter Predicate.
Of all the non-overlapping ranges with upper_bound >= :b
, the one that matches lower_bound <= :b
will be the first one scanned. This time, the ROWNUM = 1
will short-circuit after the first row is read and returned, preventing a scan of all of the remaining rows with upper_bound >= :b
.
SQL> create index delivery_method_pk 2 on delivery_method(upper_bound) 3 / Index created. select /*+index(delivery_method)*/ * from delivery_method where 3000 BETWEEN lower_bound AND upper_bound and rownum = 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.00 3 3 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.00 0.00 3 3 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 1 COUNT STOPKEY 1 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 1 INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326016)
But watch out for:
- This only works for single-row selects, not joins to other tables - see below.
- The short-circuit only works if a matching row is found. If there is no row matching the required range then Oracle could process and filter a large number of rows.
- It cannot be used with overlapping ranges that may need to return 2 or more rows.
The same technique can be applied to History Tables by indexing the non-unique identifier followed by the end_date
rather than the non-unique identifier followed by the start_date
.
Range Table Single-row Lookup
The ROWNUM = 1
technique above does not extend to table joins where a lookup is required on the Range Table. Consider the following SQL:
SELECT * FROM orders o JOIN delivery_method m ON o.order_amt BETWEEN m.lower_bound and m.upper_bound
The purpose of this query is to lookup the delivery_method
table for every order
. Adding ROWNUM = 1
to this query would result in only one order being returned rather than limiting the lookup to the first matching row. As it stands though, the join query suffers the same problem as described above: one column is used to scan the index, the other is used to filter the results (notwitshanding the misleading Explain Plan output).
----------------------------------------------------------- | Id | Operation | Name | ----------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | | 2 | NESTED LOOPS | | | 3 | TABLE ACCESS FULL | ORDERS | |* 4 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | ----------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("O"."ORDER_AMT"<="M"."UPPER_BOUND" AND "O"."ORDER_AMT">="M"."LOWER_BOUND") filter("O"."ORDER_AMT">="M"."LOWER_BOUND" AND "O"."ORDER_AMT"<="M"."UPPER_BOUND")
If we can be certain that ranges are non-overlapping, then this SQL can be tuned by adding a sub-query. We make the assumption that if a matching row exists then it will be the one with the lowest upper_bound
that is greater than o.order_amt
(remember from above that the upper_bound
is now placed first in the index).
SELECT * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound AND m.upper_bound = ( SELECT min(m1.upper_bound) FROM delivery_method m1 WHERE m1.upper_bound >= o.order_amt )
The sub-query exploits a feature of the cost-based optimizer whereby the min()
or max()
of an index range can be resolved by simply picking the first or last rows respectively from the index range. It is not necessary to scan the entire range.
The o.order_amt BETWEEN m.lower_bound and m.upper_bound
is no longer used in the index access; it is merely a filter clause. The delivery_band
row that is joined to each order is accessed by the subquery predicate AND m.lower_bound = (...)
using an index scan on the lower_bound
column. In the example below, it was necessary to add hints to force a Nested Loops join, but the result was a 300-fold performance improvement.
Without sub-query | With sub-query |
---|---|
SELECT * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound call count cpu elapsed disk query ------- ------ -------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 Execute 1 0.00 0.00 0 0 Fetch 603 206.44 213.85 61 218928 ------- ------ -------- ---------- ---------- ---------- total 605 206.44 213.85 61 218928 Misses in library cache during parse: 0 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 9021 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 50184810 NESTED LOOPS 10000 TABLE ACCESS FULL ORDERS 50174809 INDEX RANGE SCAN DELIVERY_METHOD_PK |
SELECT /*+ordered use_nl(m)*/ * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound AND m.upper_bound = ( SELECT min(m1.upper_bound) FROM delivery_method m1 WHERE m1.upper_bound >= o.order_amt ) call count cpu elapsed disk query ------- ------ -------- ---------- ---------- ---------- Parse 1 0.01 0.00 0 0 Execute 1 0.00 0.00 0 0 Fetch 603 0.42 0.77 78 50936 ------- ------ -------- ---------- ---------- ---------- total 605 0.43 0.77 78 50936 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 9021 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 20001 NESTED LOOPS 10000 TABLE ACCESS FULL ORDERS 10000 INDEX RANGE SCAN DELIVERY_METHOD_PK |
Keen observers will note that the plan shown above with the sub-query is incorrect. This appears to be a bug in SQL*Trace on the database used for this demo. I checked the trace file and the TK*Prof output several times to make sure. The actual plan used by the optimizer is shown below.
-------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 15 | 1575 | 12614 (4)| | 1 | TABLE ACCESS BY INDEX ROWID | DELIVERY_METHOD | 1 | 35 | 3 (34)| | 2 | NESTED LOOPS | | 15 | 1575 | 12614 (4)| | 3 | TABLE ACCESS FULL | ORDERS | 6054 | 413K| 19 (37)| |* 4 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | | 2 (50)| | 5 | SORT AGGREGATE | | 1 | 5 | | | 6 | FIRST ROW | | 500 | 2500 | 3 (34)| |* 7 | INDEX RANGE SCAN (MIN/MAX)| DELIVERY_METHOD_PK | 20 | | 3 (34)| --------------------------------------------------------------------------------------------
In this plan, the Nested Loops join executes steps 4-7 for each row returned from the ORDERS
table (step 3). Steps 6 and 7 ensure that only a single row is scanned in each iteration of the loop.
But watch out for:
- The technique only works with non-overlapping ranges. Overlapping ranges are not slow; they return incorrect results!
- Since the technique uses a feature of the Cost Based Optimizer, it does not work under the Rule Based Optimizer. Ensure that statistics are gathered for the table and index, check that the
OPTIMIZER_MODE
/OPTIMIZER_GOAL
session parameter is not set toRULE
, and do not use the/*+RULE*/
hint. - Range tables with a relatively snall number of rows (say, <20) tend not to benefit greatly from this technique. A Sort-Merge join (see below) tends to be more efficient for smaller range tables.
- Small Range Tables and History tables with a relatively small number of history records (say, <10) do not benefit greatly from this technique. The improvement is barely noticeable over small volumes, and is not as fast as a hash-join (History Tables) or a sort-merge join (Range Tables) when the inner table is large.
High Volume Range Joins
High volume joins to smaller range tables are somewhat easier to tune. Since we will be joining a lot of rows, we will also be reading a lot of rows from the Range Table. Under these circumstances, the cost of reading every row in the range table is just a small proportion of the cost of the query, providing we do it just once.
In the example above, the Nested Loops join is performing repeated scans of the Range Table. By changing the Nested Loops join to a Sort-Merge join, the Range Table is scanned just once, sorted, and joined to a sorted copy of the inner table. For the example below, DELIVERY_METHOD
has been rebuilt with just 10 rows instead of 10,000. Curiously, TK*Prof shows the correct plan for the sub-query on the smaller table. When I saw this, I rebuilt the larger table and tried again but received the wrong plan in TK*Prof again.
Nested Loops sub-query | Sort-Merge |
---|---|
SELECT /*+ordered use_nl(m)*/ * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound AND m.upper_bound = ( SELECT min(m1.upper_bound) FROM delivery_method m1 WHERE m1.upper_bound >= o.order_amt ) call count cpu elapsed disk query ------- ------ -------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 Execute 1 0.00 0.00 0 0 Fetch 34 0.23 0.41 42 30597 ------- ------ -------- ---------- ---------- ---------- total 36 0.23 0.41 42 30597 Misses in library cache during parse: 0 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 491 FILTER 491 NESTED LOOPS 10000 TABLE ACCESS FULL ORDERS 491 TABLE ACCESS FULL DELIVERY_METHOD 491 SORT AGGREGATE 491 FIRST ROW 491 INDEX RANGE SCAN (MIN/MAX) DELIVERY_METHOD_PK |
SELECT * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound call count cpu elapsed disk query ------- ------ -------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 Execute 1 0.00 0.00 0 0 Fetch 34 0.14 0.15 40 43 ------- ------ -------- ---------- ---------- ---------- total 36 0.14 0.16 40 43 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 491 MERGE JOIN 10 SORT JOIN 10 TABLE ACCESS FULL DELIVERY_METHOD 491 FILTER 97708 SORT JOIN 10000 TABLE ACCESS FULL ORDERS |
Oracle will often choose a Sort-Merge join automatically for a non-equijoin. If not, a /*+ USE_MERGE(..)*/
hint may be required.
But watch out for:
- This technique is usually inappropriate for joins involving History Tables. In such joins, the bulk of the work is performed by the equijoin on the non-unique identifier; the
BETWEEN
clause on the date range columns merely filters the results. For equi-joins, Hash Joins are usually are much faster option.
Conclusion and other tips
- Take care when interpreting TK*Prof results. The Rows column against the plan is not the number of rows processed, it is the number of rows passed back to the parent step. A step of the plan that processes 1M rows and filters them all will show in the TK*Prof plan with 0 rows, not 1 million.
- Always include both the start and end columns for range and history tables in the index. Even if they are not required for uniqueness, they will allow filtering to occur in the index access, saving on table accesses.
- Create indexes with the end column before the start column to ensure matching rows are found at the beginning of the index scan, not at the end.
- Modify the syntax of your SQL using
ROWNUM = 1
or sub-queries to exploit non-overlapping ranges. - For large non-overlapping range tables and history tables with many ranges per identifier, avoid Sort-Merge and Hash joins; use Nested Loops instead with a sub-query.
- Consider clustering history tables so that all rows for a nonunique identifier are stored together in the same database block rather being scattered across the table. This will improve some queries that unavoidably filter out non-matching rows by reducing IO. Discuss this option with your DBA to confirm that it is appropriate for your application.
Always check the Access Predicates and Filter Predicates of Explain Plan carefully to determine which columns are contributing to a Range Scan and which columns are merely filtering the returned rows. Be sceptical if the same clause is shown in both.
- rleishman's blog
- Log in to post comments
Comments
I see that I am not the only one who has problems with that
Hi,
I've been fighting with such issues for some time. Now let me add my 5 cents:
1) According to Oracle you should always user /*+ [hint] index(table_name index_name) */.
2) I have to note that "index" is not the only hint - if using more indexes, you (any reader of this post) should probably use index_combine or something else.
3) I have a table which is partitioned (1 partition per month) by range over a column of type DATE. My query is:
SELECT * FROM my_table WHERE my_column BETWEEN TRUNC(SYSDATE)-1 AND TRUNC(SYSDATE)
. The query could use >=,<= as well - result should be similar. The table is huga and using B-tree local index over my_column. Execution stats: 250k consistent gets.4) I have the same table as before, but my query is:
SELECT * FROM my_table WHERE my_column IN (TRUNC(SYSDATE-1),TRUNC(SYSDATE))
. Execution stats: 2k consistent gets. Note huge difference.5) Note that DATE in Oracle contains more than only the date (the information granule is second). In my example the table had only round dates, thus "SELECT DISTINCT TO_CHAR(my_column,'HH24:MM:SS') FROM my_table" returns always single row "00:00:00".
Oracle used was 10.2G
There is another way in
Hi,
I just try a few things that might help you.
If you have something like this (obviously this is just an easy example)
In my case, without the between clause the response was immediate, whit it it last 3 minutes, so I did this:
Amazingly the response was immediate as always.
I hope this helps someone.
My best regard.
Ricardo Chicas
Additional difficulty: DATE differs when in memory and disk
Hi,
Trying to keep this post complete.
Some time ago I was astonished by Oracle warning in a query like this:
select 1 from T where T.DATE_COL=TRUNC(T.DATE_COL);
The column DATE_COL is of type DATE.
Seems that the data types of T.DATE_COL and TRUNC(T.DATE_COL) differ.
The data on the disk are using 7 bytes and this is the Oracle DATE type.
The data in the memory (result of TRUNC) are using 8 bytes and this is the ANSI C date (with alignment).
Both types have different ID. Oracle goes weird when index on such a column should be used: it won't match the column type (8 bytes != 7 bytes), functional index won't be matched...
I have found no solution for that issue yet. At least no solution other than keeping date as a number or some other type.
Regards.
For that specific query, a
For that specific query, a date-index can never help you, as far as I can see.
What you are looking for (where T.DATE_COL=TRUNC(T.DATE_COL)) are rows, where T.DATE_COL is precisely at midnight, regardless of the date, right?
For this, you would need a function-index on, for example:
to_char(T.DATE_COL, 'HH24MISS') -- in this case a string-index
then query for rows:
... where to_char(T.DATE_COL, 'HH24MISS')='000000'