Tuning "BETWEEN" Queries

Submitted by rleishman on Thu, 2007-03-08 23:58

articles:

A common sight in databases is a table that contains the start and end values of a range - usually dates. One or both of the dates are typically part of the primary key; sometimes they are the entire key. Some examples:

History tables History tables record the changes made to a row over time. Instead of updating a row and losing the prior values of each non-key attribute, a new row in inserted. Each row is differentiated by two columns that record the date the row was created (START_DATE) and the date it was superceded (END_DATE).
The primary key of a history table is typically a non-unique identifier combined with the START_DATE, with an unenforced constraint that the start and end dates of rows for any given identifier are both contiguous and non-overlapping.
Range tables

Range tables are tables where the start and end values of the range form the entire primary key. For example, a table to determine the delivery method depending on volume ordered:
```
LOWER_BOUND UPPER_BOUND DELIVERY_METHOD
----------- ----------- --------------------
          0          12 Mail
         13         144 Courier
        145       10000 Curtain-side truck
      10001       50000 Semi-trailer
      50001    99999999 Ship
```
Like history tables, range tables typically have an unenforced constraint that the lower and upper bounds of successive ranges are both contiguous and non-overlapping.

The Problem

A common feature of History and Range Tables is the way they are queried:

History Table	Range Table
SELECT * FROM cust_hist WHERE cust_num = :a AND :b BETWEEN cust_eff_date AND cust_exp_date	SELECT * FROM delivery_method WHERE :b BETWEEN lower_bound AND upper_bound

A BETWEEN clause is internally expanded out into separate < / > clauses. For example:

SELECT *
FROM   delivery_method
WHERE  lower_bound <= :b
AND    upper_bound >= :b

The problem is that an index can only scan on one column with a range predicate (<, >, LIKE, BETWEEN). So even if an index contained both the lower_bound and upper_bound columns, the index scan will return all of the rows matching lower_bound <= :b, and then filter the rows that do not match upper_bound >= :b.

In the case where the sought value is somewhere in the middle, the range scan will return half of the rows in the table in order to find a single row. In the worst case where the most commonly sought rows are at the top (highest values), the index scan will process almost every row in the table for every lookup.

This type of behaviour can usually be detected in Explain Plan's Access Predicates and Filter Predicates. Access Predicates are clauses used to find rows in a structure (such as an index); non-matching rows are not read. Filter Predicates are clauses used to filter rows after they have been found but before they are projected to the next step of the plan. Explain Plan produces confusing results with unbounded range scans (like we are doing here). Note how both clauses appear as both access predicates and filter predicates:

  1  SELECT *
  2  FROM   delivery_method
  3* where :b BETWEEN lower_bound AND upper_bound

-------------------------------------------------------------------
| Id  | Operation                   |  Name               | Rows  |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD     |     1 |
|*  2 |   INDEX RANGE SCAN          | DELIVERY_METHOD_PK  |     1 |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DELIVERY_METHOD"."UPPER_BOUND">=TO_NUMBER(:Z) AND
              "DELIVERY_METHOD"."LOWER_BOUND"<=TO_NUMBER(:Z))
       filter("DELIVERY_METHOD"."LOWER_BOUND"<=TO_NUMBER(:Z) AND
              "DELIVERY_METHOD"."UPPER_BOUND">=TO_NUMBER(:Z))

Explain Plan appears to be telling us that lower_bound <= :b is used as both an Access Predicate and a Filter Predicate, where in fact it is only an Access Predicate. Similarly, upper_bound >= :b appears to be both an Access and a Filter Predicate, where in fact it is only a Filter Predicate. More distressingly, the Performance Tuning manual seems to also claim that we can scan on the second column of an index after a range predicate on the first column:

Performance Tuning Guide wrote:

The optimizer uses a range scan when it finds one or more leading columns of an index specified in conditions, such as the following:

col1 = :b1

col1 < :b1

col1 > :b1

AND combination of the preceding conditions for leading columns in the index

col1 like 'ASD%' wild-card searches should not be in a leading position otherwise the condition col1 like '%ASD' does not result in a range scan.

To dispel any doubts, a test is in order to demonstrate that the Access Predicate on the second column in the index is not resulting in reduced IO:

First, create some test data

SQL> create table delivery_method (
  2  lower_bound number(*)
  3  ,upper_bound number(*)
  4  ,delivery_method varchar2(50)
  5  );

Table created.

SQL>
SQL> insert /*+append*/ into delivery_method
  2  select rownum
  3  , 1
  4  , dbms_random.string('A', 25)
  5  from dual
  6  connect by level <= 100000
  7  /

100000 rows created.

SQL>
SQL> create index delivery_method_pk
  2  on delivery_method(lower_bound, upper_bound)
  3  /

Index created.

Run a baseline query with an unbounded range scan on the first indexed column and filters on the second indexed column. The not-equal predicate is definitely a filter; even Explain Plan agrees with this:

SQL> set autotrace on
SQL> select /*+index(delivery_method)*/ *
  2  from delivery_method
  3  where lower_bound >= 0
  4  and upper_bound <> 1
  5  /

no rows selected


----------------------------------------------------------------------------------------
| Id  | Operation                   |  Name               | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    53 |     2  (50)|
|   1 |  TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD     |     1 |    53 |     2  (50)|
|*  2 |   INDEX RANGE SCAN          | DELIVERY_METHOD_PK  |     1 |       |     3  (34)|
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DELIVERY_METHOD"."LOWER_BOUND">=0)
       filter("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND"<>1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        132  consistent gets
        132  physical reads
          0  redo size
        417  bytes sent via SQL*Net to client
        456  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Now run a query with an unbounded range on both columns in the index. If Oracle is truly using the 2nd column as an Access Predicate, then it will perform fewer Consistent Gets or Physical Reads. If not, they should be the same.

SQL> select /*+index(delivery_method)*/ *
  2  from delivery_method
  3  where lower_bound >= 0
  4  and upper_bound > 1
  5  /

no rows selected


----------------------------------------------------------------------------------------
| Id  | Operation                   |  Name               | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    53 |     2  (50)|
|   1 |  TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD     |     1 |    53 |     2  (50)|
|*  2 |   INDEX RANGE SCAN          | DELIVERY_METHOD_PK  |     1 |       |     3  (34)|
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND">1)
       filter("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND">1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        132  consistent gets
        132  physical reads
          0  redo size
        417  bytes sent via SQL*Net to client
        456  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Same result. Even though the second query contained the range predicate upper_bound > 1, which should filter out EVERY row since every row contains the value '1', it still processed the same amount of IO as the filter. In order to get a level playing field, I performed some additional steps to flush the buffer cache that are not shown here . I also pasted the DBMS_XPLAN instead of the inferior AUTOTRACE plan. Your results will be no less compelling, but may differ slighlty if you repeat these steps.

The problem is similar but less serious for the History Table. By prefixing the index with the identifier (cust_num in the example above), Oracle will range-scan the index to return the rows with matching identifier and cust_eff_date before filtering rows with non-matching cust_exp_date. Unlike the range table where we could inadvertently scan the entire table, the cost of this scan will never exceed the number of rows that match the non-unique identifier.

Indexing range columns

Despite some of the tips below that help to avoid scanning non-matching rows for range-based queries, it is somtimes unavoidable to scan every row. Ensure that both the start and the end columns of the range are included in the same index; this allows Oracle to filter out non-matching rows as they are retrieved from the index without having to lookup the table.

Note in the example below that the filter predicate is performed against the TABLE ACCESS step when the index does not contain both dates.

SQL> create index delivery_method_pk
  2  on delivery_method(lower_bound)
  3  /

Index created.

----------------------------------------------------------------------------------------
| Id  | Operation                   |  Name               | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    53 |     2  (50)|
|*  1 |  TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD     |     1 |    53 |     2  (50)|
|*  2 |   INDEX RANGE SCAN          | DELIVERY_METHOD_PK  |     1 |       |     3  (34)|
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DELIVERY_METHOD"."UPPER_BOUND">=3000)
   2 - access("DELIVERY_METHOD"."LOWER_BOUND"<=3000)

Looking at the TK*Prof output of this query, we can see that the index passed 301 rows back to the TABLE ACCESS, which filtered out 300 of them to leave a single row.

select /*+index(delivery_method)*/ *
from delivery_method
where 3000 BETWEEN lower_bound AND upper_bound

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          3          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.00          3          4          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD
    301   INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326005)

By adding both range columns to the index, Oracle still reads the 301 rows, but the filtering is done in the index; only one row is retrieved from the table.

SQL> create index delivery_method_pk
  2  on delivery_method(lower_bound, upper_bound)
  3  /

Index created.

select /*+index(delivery_method)*/ *
from delivery_method
where 3000 BETWEEN lower_bound AND upper_bound

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.04          3          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.04          3          4          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD
      1   INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326178)

Note the trap in interpretation here: accoring to TK*Prof, it appears as if a single row was processed by the INDEX RANGE SCAN. In fact, a single row was returned from the index range scan; it actually processed 301 rows and filtered 300 of them but TK*Prof does not show this. When Oracle performs an Access and a Filter in a single step, it is impossible to tell without further testing how many rows were accessed and how many were filtered.

Single-row Selects

When ranges are known to be non-overlapping, queries of the style shown above will not return more than a single row. Unfortunately, non-overlapping ranges cannot be enforced by a database constraint or a unique index, so Oracle resolves the query as if it can return more than one row.

One tuning technique for single-row selects on non-overlapping ranges involves the use of ROWNUM = 1 to tell Oracle that only a single row is required.

SELECT *
FROM   delivery_method
WHERE  :b BETWEEN lower_bound 
       AND upper_bound
AND    ROWNUM = 1

But this is not enough. ROWNUM = 1 will short-circuit the query so that it does not go looking for subsequent rows, but this assumes that a matching row is found at the beginning of the search. If we have to scan and filter out a large number of non-matching rows then there is no saving at all.

An index on (lower_bound, upper_bound) will be stored in ascending order of lower_bound. Remember that the BETWEEN predicate is expanded out to an index access predicate (lower_bound <= :b) and a filter predicate of (upper_bound >= :b): if the ranges are non-overlapping then of all the rows with lower_bound <= :b, only the last one scanned will have upper_bound >= :b.

select /*+index(delivery_method)*/ *
from delivery_method
where 3000 BETWEEN lower_bound AND upper_bound
and   rownum = 1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          3          3          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          3          3          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  COUNT STOPKEY
      1   TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD
    301    INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326010)

Note that 301 rows were still scanned in the index. In this case, the ROWNUM = 1 is useless because we don't find a match for the filter predicates until the last row. To make ROWNUM = 1 effective, we need to ensure that the matching row is the first one scanned, not the last. This is possible with a descending index scan (/*+INDEX_DESC(..) */ hint), but it is simpler and more robust to simply reverse the order of columns in the index. If upper_bound precedes lower_bound in the index, then upper_bound >= :b becomes the Access Predicate and lower_bound <= :b becomes the Filter Predicate.

Of all the non-overlapping ranges with upper_bound >= :b, the one that matches lower_bound <= :b will be the first one scanned. This time, the ROWNUM = 1 will short-circuit after the first row is read and returned, preventing a scan of all of the remaining rows with upper_bound >= :b.

SQL> create index delivery_method_pk
  2  on delivery_method(upper_bound)
  3  /

Index created.

select /*+index(delivery_method)*/ *
from delivery_method
where 3000 BETWEEN lower_bound AND upper_bound
and   rownum = 1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          3          3          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          3          3          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  COUNT STOPKEY
      1   TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD
      1    INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326016)

But watch out for:

This only works for single-row selects, not joins to other tables - see below.
The short-circuit only works if a matching row is found. If there is no row matching the required range then Oracle could process and filter a large number of rows.
It cannot be used with overlapping ranges that may need to return 2 or more rows.

The same technique can be applied to History Tables by indexing the non-unique identifier followed by the end_date rather than the non-unique identifier followed by the start_date.

Range Table Single-row Lookup

The ROWNUM = 1 technique above does not extend to table joins where a lookup is required on the Range Table. Consider the following SQL:

SELECT *
FROM   orders o
JOIN   delivery_method m
ON     o.order_amt BETWEEN m.lower_bound and m.upper_bound

The purpose of this query is to lookup the delivery_method table for every order. Adding ROWNUM = 1 to this query would result in only one order being returned rather than limiting the lookup to the first matching row. As it stands though, the join query suffers the same problem as described above: one column is used to scan the index, the other is used to filter the results (notwitshanding the misleading Explain Plan output).

-----------------------------------------------------------
| Id  | Operation                   |  Name               |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD     |
|   2 |   NESTED LOOPS              |                     |
|   3 |    TABLE ACCESS FULL        | ORDERS              |
|*  4 |    INDEX RANGE SCAN         | DELIVERY_METHOD_PK  |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


   4 - access("O"."ORDER_AMT"<="M"."UPPER_BOUND" AND
              "O"."ORDER_AMT">="M"."LOWER_BOUND")
       filter("O"."ORDER_AMT">="M"."LOWER_BOUND" AND
              "O"."ORDER_AMT"<="M"."UPPER_BOUND")

If we can be certain that ranges are non-overlapping, then this SQL can be tuned by adding a sub-query. We make the assumption that if a matching row exists then it will be the one with the lowest upper_bound that is greater than o.order_amt (remember from above that the upper_bound is now placed first in the index).

SELECT *
FROM   orders o
,      delivery_method m
WHERE  o.order_amt BETWEEN m.lower_bound and m.upper_bound
AND    m.upper_bound =  (
       SELECT min(m1.upper_bound)
       FROM   delivery_method m1
       WHERE  m1.upper_bound >= o.order_amt
)

The sub-query exploits a feature of the cost-based optimizer whereby the min() or max() of an index range can be resolved by simply picking the first or last rows respectively from the index range. It is not necessary to scan the entire range.

The o.order_amt BETWEEN m.lower_bound and m.upper_bound is no longer used in the index access; it is merely a filter clause. The delivery_band row that is joined to each order is accessed by the subquery predicate AND m.lower_bound = (...) using an index scan on the lower_bound column. In the example below, it was necessary to add hints to force a Nested Loops join, but the result was a 300-fold performance improvement.

Without sub-query

With sub-query

SELECT *
FROM   orders o
,      delivery_method m
WHERE  o.order_amt BETWEEN m.lower_bound and m.upper_bound






call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0
Execute      1      0.00       0.00          0          0
Fetch      603    206.44     213.85         61     218928
------- ------  -------- ---------- ---------- ----------
total      605    206.44     213.85         61     218928

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
   9021  TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD
50184810   NESTED LOOPS
  10000    TABLE ACCESS FULL ORDERS
50174809    INDEX RANGE SCAN DELIVERY_METHOD_PK

SELECT /*+ordered use_nl(m)*/ *
FROM   orders o
,      delivery_method m
WHERE  o.order_amt BETWEEN m.lower_bound and m.upper_bound
AND    m.upper_bound =  (
       SELECT min(m1.upper_bound)
       FROM   delivery_method m1
       WHERE  m1.upper_bound >= o.order_amt
)

call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.01       0.00          0          0
Execute      1      0.00       0.00          0          0
Fetch      603      0.42       0.77         78      50936
------- ------  -------- ---------- ---------- ----------
total      605      0.43       0.77         78      50936

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
   9021  TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD
  20001   NESTED LOOPS
  10000    TABLE ACCESS FULL ORDERS
  10000    INDEX RANGE SCAN DELIVERY_METHOD_PK

Keen observers will note that the plan shown above with the sub-query is incorrect. This appears to be a bug in SQL*Trace on the database used for this demo. I checked the trace file and the TK*Prof output several times to make sure. The actual plan used by the optimizer is shown below.

--------------------------------------------------------------------------------------------
| Id  | Operation                       |  Name               | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                     |    15 |  1575 | 12614   (4)|
|   1 |  TABLE ACCESS BY INDEX ROWID    | DELIVERY_METHOD     |     1 |    35 |     3  (34)|
|   2 |   NESTED LOOPS                  |                     |    15 |  1575 | 12614   (4)|
|   3 |    TABLE ACCESS FULL            | ORDERS              |  6054 |   413K|    19  (37)|
|*  4 |    INDEX RANGE SCAN             | DELIVERY_METHOD_PK  |     1 |       |     2  (50)|
|   5 |     SORT AGGREGATE              |                     |     1 |     5 |            |
|   6 |      FIRST ROW                  |                     |   500 |  2500 |     3  (34)|
|*  7 |       INDEX RANGE SCAN (MIN/MAX)| DELIVERY_METHOD_PK  |    20 |       |     3  (34)|
--------------------------------------------------------------------------------------------

In this plan, the Nested Loops join executes steps 4-7 for each row returned from the ORDERS table (step 3). Steps 6 and 7 ensure that only a single row is scanned in each iteration of the loop.

But watch out for:

The technique only works with non-overlapping ranges. Overlapping ranges are not slow; they return incorrect results!
Since the technique uses a feature of the Cost Based Optimizer, it does not work under the Rule Based Optimizer. Ensure that statistics are gathered for the table and index, check that the OPTIMIZER_MODE / OPTIMIZER_GOAL session parameter is not set to RULE, and do not use the /*+RULE*/ hint.
Range tables with a relatively snall number of rows (say, <20) tend not to benefit greatly from this technique. A Sort-Merge join (see below) tends to be more efficient for smaller range tables.
Small Range Tables and History tables with a relatively small number of history records (say, <10) do not benefit greatly from this technique. The improvement is barely noticeable over small volumes, and is not as fast as a hash-join (History Tables) or a sort-merge join (Range Tables) when the inner table is large.

High Volume Range Joins

High volume joins to smaller range tables are somewhat easier to tune. Since we will be joining a lot of rows, we will also be reading a lot of rows from the Range Table. Under these circumstances, the cost of reading every row in the range table is just a small proportion of the cost of the query, providing we do it just once.

In the example above, the Nested Loops join is performing repeated scans of the Range Table. By changing the Nested Loops join to a Sort-Merge join, the Range Table is scanned just once, sorted, and joined to a sorted copy of the inner table. For the example below, DELIVERY_METHOD has been rebuilt with just 10 rows instead of 10,000. Curiously, TK*Prof shows the correct plan for the sub-query on the smaller table. When I saw this, I rebuilt the larger table and tried again but received the wrong plan in TK*Prof again.

Nested Loops sub-query

Sort-Merge

SELECT /*+ordered use_nl(m)*/ *
FROM   orders o
,      delivery_method m
WHERE  o.order_amt BETWEEN m.lower_bound and m.upper_bound
AND    m.upper_bound =  (
       SELECT min(m1.upper_bound)
       FROM   delivery_method m1
       WHERE  m1.upper_bound >= o.order_amt
)

call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0
Execute      1      0.00       0.00          0          0
Fetch       34      0.23       0.41         42      30597
------- ------  -------- ---------- ---------- ----------
total       36      0.23       0.41         42      30597

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
    491  FILTER
    491   NESTED LOOPS
  10000    TABLE ACCESS FULL ORDERS
    491    TABLE ACCESS FULL DELIVERY_METHOD
    491   SORT AGGREGATE
    491    FIRST ROW
    491     INDEX RANGE SCAN (MIN/MAX) DELIVERY_METHOD_PK

SELECT *
FROM   orders o
,      delivery_method m
WHERE  o.order_amt BETWEEN m.lower_bound and m.upper_bound






call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0
Execute      1      0.00       0.00          0          0
Fetch       34      0.14       0.15         40         43
------- ------  -------- ---------- ---------- ----------
total       36      0.14       0.16         40         43

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
    491  MERGE JOIN
     10   SORT JOIN
     10    TABLE ACCESS FULL DELIVERY_METHOD
    491   FILTER
  97708    SORT JOIN
  10000     TABLE ACCESS FULL ORDERS

Oracle will often choose a Sort-Merge join automatically for a non-equijoin. If not, a /*+ USE_MERGE(..)*/ hint may be required.

But watch out for:

This technique is usually inappropriate for joins involving History Tables. In such joins, the bulk of the work is performed by the equijoin on the non-unique identifier; the BETWEEN clause on the date range columns merely filters the results. For equi-joins, Hash Joins are usually are much faster option.

Conclusion and other tips

Take care when interpreting TK*Prof results. The Rows column against the plan is not the number of rows processed, it is the number of rows passed back to the parent step. A step of the plan that processes 1M rows and filters them all will show in the TK*Prof plan with 0 rows, not 1 million.
Always include both the start and end columns for range and history tables in the index. Even if they are not required for uniqueness, they will allow filtering to occur in the index access, saving on table accesses.
Create indexes with the end column before the start column to ensure matching rows are found at the beginning of the index scan, not at the end.
Modify the syntax of your SQL using ROWNUM = 1 or sub-queries to exploit non-overlapping ranges.
For large non-overlapping range tables and history tables with many ranges per identifier, avoid Sort-Merge and Hash joins; use Nested Loops instead with a sub-query.
Consider clustering history tables so that all rows for a nonunique identifier are stored together in the same database block rather being scattered across the table. This will improve some queries that unavoidably filter out non-matching rows by reducing IO. Discuss this option with your DBA to confirm that it is appropriate for your application.

Always check the Access Predicates and Filter Predicates of Explain Plan carefully to determine which columns are contributing to a Range Scan and which columns are merely filtering the returned rows. Be sceptical if the same clause is shown in both.

rleishman's blog
Log in to post comments

Comments

I see that I am not the only one who has problems with that

Permalink Submitted by wakula on Mon, 2009-03-09 17:10.

Hi,
I've been fighting with such issues for some time. Now let me add my 5 cents:
1) According to Oracle you should always user /*+ [hint] index(table_name index_name) */.
2) I have to note that "index" is not the only hint - if using more indexes, you (any reader of this post) should probably use index_combine or something else.
3) I have a table which is partitioned (1 partition per month) by range over a column of type DATE. My query is:SELECT * FROM my_table WHERE my_column BETWEEN TRUNC(SYSDATE)-1 AND TRUNC(SYSDATE). The query could use >=,<= as well - result should be similar. The table is huga and using B-tree local index over my_column. Execution stats: 250k consistent gets.
4) I have the same table as before, but my query is:SELECT * FROM my_table WHERE my_column IN (TRUNC(SYSDATE-1),TRUNC(SYSDATE)). Execution stats: 2k consistent gets. Note huge difference.
5) Note that DATE in Oracle contains more than only the date (the information granule is second). In my example the table had only round dates, thus "SELECT DISTINCT TO_CHAR(my_column,'HH24:MM:SS') FROM my_table" returns always single row "00:00:00".
Oracle used was 10.2G

There is another way in

Permalink Submitted by ricardocr on Tue, 2009-12-29 16:11.

Hi,
I just try a few things that might help you.
If you have something like this (obviously this is just an easy example)

select name,salary,job
from employees
where salary between 100 and 500

In my case, without the between clause the response was immediate, whit it it last 3 minutes, so I did this:

select name,sum(salary) salary,job
from employees
where salary between 100 and 500
group by name,job
having sum(salary) between 100 and 500

Amazingly the response was immediate as always.
I hope this helps someone.
My best regard.
Ricardo Chicas

Additional difficulty: DATE differs when in memory and disk

Permalink Submitted by wakula on Wed, 2011-01-12 15:47.

Hi,

Trying to keep this post complete.

Some time ago I was astonished by Oracle warning in a query like this:

select 1 from T where T.DATE_COL=TRUNC(T.DATE_COL);

The column DATE_COL is of type DATE.
Seems that the data types of T.DATE_COL and TRUNC(T.DATE_COL) differ.
The data on the disk are using 7 bytes and this is the Oracle DATE type.
The data in the memory (result of TRUNC) are using 8 bytes and this is the ANSI C date (with alignment).
Both types have different ID. Oracle goes weird when index on such a column should be used: it won't match the column type (8 bytes != 7 bytes), functional index won't be matched...

I have found no solution for that issue yet. At least no solution other than keeping date as a number or some other type.

Regards.

For that specific query, a

Permalink Submitted by rop4 on Thu, 2012-11-29 08:30.

For that specific query, a date-index can never help you, as far as I can see.

What you are looking for (where T.DATE_COL=TRUNC(T.DATE_COL)) are rows, where T.DATE_COL is precisely at midnight, regardless of the date, right?

For this, you would need a function-index on, for example:

to_char(T.DATE_COL, 'HH24MISS') -- in this case a string-index

then query for rows:

... where to_char(T.DATE_COL, 'HH24MISS')='000000'

Site search

Site navigation

Recent blog posts

Tuning "BETWEEN" Queries

The Problem

Indexing range columns

Single-row Selects

Range Table Single-row Lookup

High Volume Range Joins

Conclusion and other tips

Comments

I see that I am not the only one who has problems with that

There is another way in

Additional difficulty: DATE differs when in memory and disk

For that specific query, a

Site search

User login

Site navigation

Recent blog posts

You are here

Tuning "BETWEEN" Queries

Comments