Home » RDBMS Server » Performance Tuning » Function Based Join Without Nested Loops (Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi)
Function Based Join Without Nested Loops [message #345418] Wed, 03 September 2008 08:55 Go to next message
annagel
Messages: 220
Registered: April 2006
Senior Member
I have a matching program which compares values in two tables and selects records where matches are found. Currently we are doing simple equality matching with some basic normalization of the data before we get to the matching stage. Our user have expressed that they may want to modify this set-up to employ some type of functional matching. I have started exploring the feasibility of such a method and have run into a serious performance wall. First some sample code to set-up a sample set of tables load them with some random sample data, gather some stats and then compare the basic matching method to a function based equivalent match.

--source data tables
CREATE TABLE sample1
(id1 NUMBER,
join_val VARCHAR2(20));

CREATE TABLE sample2
(id2 NUMBER,
join_val VARCHAR2(20));

--results of the join
CREATE TABLE res
(id1 NUMBER ,
id2 NUMBER);

--used for functional comparison
CREATE OR REPLACE FUNCTION sample_compare (in1 IN VARCHAR2, in2 IN VARCHAR2)
   RETURN NUMBER
IS
BEGIN
   IF in1 = in2
   THEN
      RETURN 1;
   ELSE
      RETURN 0;
   END IF;
END;

--load some random junk into the tables
DECLARE
   i        NUMBER;
   l_val    VARCHAR2 (20);
   l_len1   NUMBER;
   l_len2   NUMBER;
BEGIN
   FOR i IN 20 .. 20000
   LOOP
      l_len1 := TRUNC (DBMS_RANDOM.VALUE (1, 20));
      l_len2 := TRUNC (DBMS_RANDOM.VALUE (1, 20));
      l_val := DBMS_RANDOM.STRING ('X', 20);

      INSERT INTO sample1
           VALUES (i,
                   SUBSTR (l_val, 1, l_len1));

      INSERT INTO sample2
           VALUES (i,
                   SUBSTR (l_val, 1, l_len2));
   END LOOP;
END;

--get stats for the two tables, replace with your
--default schema
BEGIN
--replace schema name
   DBMS_STATS.gather_table_stats ('NAGEL', 'SAMPLE1');
   DBMS_STATS.gather_table_stats ('NAGEL', 'SAMPLE2');
END;

--normal equality join accomplished via hashing
INSERT INTO res
   SELECT id1,
          id2
     FROM sample1 t1, sample2 t2
    WHERE t1.join_val = t2.join_val;

--function based join accomplished with nested loops at a great
--impact on performance
INSERT INTO res
   SELECT id1,
          id2
     FROM sample1 t1, sample2 t2
    WHERE sample_compare (t1.join_val, t2.join_val) = 1;


And explain plans for the two query methods:

--normal join
INSERT STATEMENT  ALL_ROWSCost: 29  Bytes: 642,180  Cardinality: 21,406  		
	3 HASH JOIN  Cost: 29  Bytes: 642,180  Cardinality: 21,406  	
		1 TABLE ACCESS FULL TABLE NAGEL.SAMPLE1 Cost: 14  Bytes: 299,715  Cardinality: 19,981  
		2 TABLE ACCESS FULL TABLE NAGEL.SAMPLE2 Cost: 14  Bytes: 299,715  Cardinality: 19,981   

--functional join
INSERT STATEMENT  ALL_ROWSCost: 258,929  Bytes: 119,772,120  Cardinality: 3,992,404  		
	3 NESTED LOOPS  Cost: 258,929  Bytes: 119,772,120  Cardinality: 3,992,404  	
		1 TABLE ACCESS FULL TABLE NAGEL.SAMPLE1 Cost: 14  Bytes: 299,715  Cardinality: 19,981  
		2 TABLE ACCESS FULL TABLE NAGEL.SAMPLE2 Cost: 13  Bytes: 3,000  Cardinality: 200  


Looking at the explain plans it seems fairly obvious why the functional method takes so much longer to arrive at a result, the nested loop is killing it, but because the function employs values from each of my join tables I can't think of any other method which could be used to get these tables joined.

Is there any method but a nested loop which Oracle could use to join the two tables more efficiently? Or am I out of luck?

Andrew
Re: Function Based Join Without Nested Loops [message #345432 is a reply to message #345418] Wed, 03 September 2008 10:20 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
If you could write your sample_compare function in such a way that it took a value from one table only, then you could create a function based index on each of the tables, and things should go much quicker.
ie if you could write the query like:
NSERT INTO res
   SELECT id1,
          id2
     FROM sample1 t1, sample2 t2
    WHERE sample_compare (t1.join_val) = sample_compare(t2.join_val);
Re: Function Based Join Without Nested Loops [message #345434 is a reply to message #345432] Wed, 03 September 2008 10:25 Go to previous messageGo to next message
annagel
Messages: 220
Registered: April 2006
Senior Member
I agree that it would make it faster but it is not going to meet the need of a functional compare. This is all very abstract at this point I am just trying to figure out if a functional comparison is even feasible, based on all the testing I have done thus far it isn't which is why the post is up to see if it is in a way I either do not know of or have overlooked.
Re: Function Based Join Without Nested Loops [message #345440 is a reply to message #345434] Wed, 03 September 2008 10:41 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
I'm pretty certain that it can be done with two functions.
Your current comparison function has only one value from each table to work with.
I'm sure you can have one function create a hash value from sample1.join_value, and have the other function create a hash value from sample2.join_value in such a way that if the two hash values match then you have a valid join condition.
Re: Function Based Join Without Nested Loops [message #345441 is a reply to message #345440] Wed, 03 September 2008 10:51 Go to previous messageGo to next message
annagel
Messages: 220
Registered: April 2006
Senior Member
Of course I can but this is just a sample of a function based comparison because there is no actual comparison methodology yet.

I am simply trying to see if the use of a functional comparison is feasible in the first place so our users don't go down the path of developing a comparison algorithm only to have me tell them the only way they will be able to use it is if they are willing to wait a couple days for the results.
Re: Function Based Join Without Nested Loops [message #345508 is a reply to message #345441] Wed, 03 September 2008 22:07 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
HASH join only works for equi-joins.

If you can write the join as @JR has shown, you should be able to force a hash join.

If you have fuzzier logic like
AND fuzzy_match_func(a.val, b.val) = 'MATCH'
then you are going to be in trouble. Logically this can ONLY be done with a FULL-FULL-NL join.

If you are just talking about a range-based match (<=, >=) then you may be able to make use of a SORT MERGE join - depends on the query.

Ross Leishman

Re: Function Based Join Without Nested Loops [message #345655 is a reply to message #345508] Thu, 04 September 2008 06:42 Go to previous message
annagel
Messages: 220
Registered: April 2006
Senior Member
Thanks for the help Ross and JR, this is what I thought the result would be going in and my initial recommendations to our users reflect this. I just wanted to see if someone out there knew something on the subject I did not.

Andrew
Previous Topic: Index not being used for Min/Max
Next Topic: What does it signifies
Goto Forum:
  


Current Time: Tue Nov 26 15:00:37 CST 2024