Home » RDBMS Server » Performance Tuning » Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause
Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #277781] Wed, 31 October 2007 09:21 Go to next message
hellcat
Messages: 84
Registered: January 2006
Location: Zug, Switzerland
Member
Hi

I'm trying to get the Optimizer to do a hash anti-join with a select statement having a:
WHERE (col1, col2) NOT IN (select ...)

Version: 9.2.0.6.0

select *
from tableA
where col1 not in (
  select /*+ hash_aj(tableB) */
      col1
  from tableB
);


gives a execution plan like:

SELECT STATEMENT
  HASH JOIN(ANTI)
    TABLE ACCESS(FULL) TABLEA
    INDEX(FAST FULL SCAN) I_TABLEB_1 --*1


select *
from tableA
where (col1, col2) not in (
  select /*+ hash_aj(tableB) */
      col1
    , col2
  from tableB
);


gives a execution plan like:
SELECT STATEMENT
  FILTER
    TABLE ACCESS(FULL) TABLEA
    TABLE ACCESS(FULL) TABLEB


*1 col1 has a regular b*tree index, col2 is un-indexed.

Somehow I can't manage to make the Optimizer use a anti-join for the second statement.
I search in the documentation but couldn't find any hint why this should not be working...

Anybody any ideas?
Many thanks.

/hc
Re: Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #277791 is a reply to message #277781] Wed, 31 October 2007 10:27 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
All a hint does is apply a preferential weighting to the chosen access path in the CBO.
If the path you've hinted for is still more expensive after this weighting then it won't get chosen.

Tehey're called HINTs for a good reason - the CBO is not bound to obey them.

I'm guessing that because COL_2 is not indexed, the CBO knows that it has to do a FTS of tableB, and having done this, its quicker to just filter the results.

I suspect that if you add COL_2 to the index on COL_1 then the cost might drop far enough to use the HASH_AJ.

Why are you determined to use ths access path?
Re: Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #277853 is a reply to message #277791] Wed, 31 October 2007 21:49 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I doubt it is an indexing issue; one of the strengths of HASH_AJ is that you can do NOT INs without indexes.

Most likely case is that one or both columns is NULLable in one or both tables. You need NOT NULL constraints on all NOT IN columns of the outer query and the SELECT columns of the sub-query to force an anti-join.

Alternatively, you can use an expression that is guaranteed to produce a non-null result.

select *
from tableA
where (sys_op_map_nonnull(col1), sys_op_map_nonnull(col2)) not in (
  select /*+ hash_aj(tableB) */
      sys_op_map_nonnull(col1)
    , sys_op_map_nonnull(col2)
  from tableB
);


Or you can remove NULLs from the equation with WHERE clauses:

select *
from tableA
where col1 is not null
and col2 is not null
and (col1, col2) not in (
  select /*+ hash_aj(tableB) */
      col1
    , col2
  from tableB
  where col1 is not null
  and col2 is not null
);


Oracle is smart enough to realise that NULLs are handled in both cases, and puts AJ back on the table as a possible plan.

Ross Leishman
Re: Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #277913 is a reply to message #277853] Thu, 01 November 2007 03:24 Go to previous messageGo to next message
hellcat
Messages: 84
Registered: January 2006
Location: Zug, Switzerland
Member
Hi

Thank you guys for the help!


@Ross
That's exactly what it is..! Perfect.
But why doesn't it work with NVL?


@JRowbottom
Thanks, I know what a HINT is. I was asking why it does not take effect.
Quote:

I'm guessing that because COL_2 is not indexed,

No need to guess, in the original post I said col2 was not indexed...
Quote:

Why are you determined to use ths access path?

Because both are multi-million record tables and anti-join would definitively be the better choice.

[Updated on: Thu, 01 November 2007 03:29]

Report message to a moderator

Re: Anti-Join (HASH_AJ) with multiple columns in the "NOT IN" clause [message #278123 is a reply to message #277913] Thu, 01 November 2007 21:58 Go to previous message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
You can use NVL() if you like. But I didn't know your data types or whether there were any values that could be used as synonyms for NULL. By all means, if they are VARCHAR2 with only printable characters, use NVL(..., chr(1)) or similar.

SYS_OP_MAP_NONNULL() is just convenient because it works with dates, numbers, and strings, and doesn't need to use notionally "illegal" values as the default.

Ross Leishman
Previous Topic: reading AWR report
Next Topic: Different Query Plan and execution time of same Query
Goto Forum:
  


Current Time: Sat Nov 23 05:21:27 CST 2024