Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #277781] |
Wed, 31 October 2007 09:21 |
hellcat
Messages: 84 Registered: January 2006 Location: Zug, Switzerland
|
Member |
|
|
Hi
I'm trying to get the Optimizer to do a hash anti-join with a select statement having a:
WHERE (col1, col2) NOT IN (select ...)
Version: 9.2.0.6.0
select *
from tableA
where col1 not in (
select /*+ hash_aj(tableB) */
col1
from tableB
);
gives a execution plan like:
SELECT STATEMENT
HASH JOIN(ANTI)
TABLE ACCESS(FULL) TABLEA
INDEX(FAST FULL SCAN) I_TABLEB_1 --*1
select *
from tableA
where (col1, col2) not in (
select /*+ hash_aj(tableB) */
col1
, col2
from tableB
);
gives a execution plan like:
SELECT STATEMENT
FILTER
TABLE ACCESS(FULL) TABLEA
TABLE ACCESS(FULL) TABLEB
*1 col1 has a regular b*tree index, col2 is un-indexed.
Somehow I can't manage to make the Optimizer use a anti-join for the second statement.
I search in the documentation but couldn't find any hint why this should not be working...
Anybody any ideas?
Many thanks.
/hc
|
|
|
Re: Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #277791 is a reply to message #277781] |
Wed, 31 October 2007 10:27 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
All a hint does is apply a preferential weighting to the chosen access path in the CBO.
If the path you've hinted for is still more expensive after this weighting then it won't get chosen.
Tehey're called HINTs for a good reason - the CBO is not bound to obey them.
I'm guessing that because COL_2 is not indexed, the CBO knows that it has to do a FTS of tableB, and having done this, its quicker to just filter the results.
I suspect that if you add COL_2 to the index on COL_1 then the cost might drop far enough to use the HASH_AJ.
Why are you determined to use ths access path?
|
|
|
Re: Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #277853 is a reply to message #277791] |
Wed, 31 October 2007 21:49 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I doubt it is an indexing issue; one of the strengths of HASH_AJ is that you can do NOT INs without indexes.
Most likely case is that one or both columns is NULLable in one or both tables. You need NOT NULL constraints on all NOT IN columns of the outer query and the SELECT columns of the sub-query to force an anti-join.
Alternatively, you can use an expression that is guaranteed to produce a non-null result.
select *
from tableA
where (sys_op_map_nonnull(col1), sys_op_map_nonnull(col2)) not in (
select /*+ hash_aj(tableB) */
sys_op_map_nonnull(col1)
, sys_op_map_nonnull(col2)
from tableB
);
Or you can remove NULLs from the equation with WHERE clauses:
select *
from tableA
where col1 is not null
and col2 is not null
and (col1, col2) not in (
select /*+ hash_aj(tableB) */
col1
, col2
from tableB
where col1 is not null
and col2 is not null
);
Oracle is smart enough to realise that NULLs are handled in both cases, and puts AJ back on the table as a possible plan.
Ross Leishman
|
|
|
Re: Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #277913 is a reply to message #277853] |
Thu, 01 November 2007 03:24 |
hellcat
Messages: 84 Registered: January 2006 Location: Zug, Switzerland
|
Member |
|
|
Hi
Thank you guys for the help!
@Ross
That's exactly what it is..! Perfect.
But why doesn't it work with NVL?
@JRowbottom
Thanks, I know what a HINT is. I was asking why it does not take effect.
Quote: | I'm guessing that because COL_2 is not indexed,
|
No need to guess, in the original post I said col2 was not indexed...
Quote: | Why are you determined to use ths access path?
|
Because both are multi-million record tables and anti-join would definitively be the better choice.
[Updated on: Thu, 01 November 2007 03:29] Report message to a moderator
|
|
|
Re: Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #278123 is a reply to message #277913] |
Thu, 01 November 2007 21:58 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
You can use NVL() if you like. But I didn't know your data types or whether there were any values that could be used as synonyms for NULL. By all means, if they are VARCHAR2 with only printable characters, use NVL(..., chr(1)) or similar.
SYS_OP_MAP_NONNULL() is just convenient because it works with dates, numbers, and strings, and doesn't need to use notionally "illegal" values as the default.
Ross Leishman
|
|
|