Home » RDBMS Server » Performance Tuning » Function Based Join Without Nested Loops (Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi)
Function Based Join Without Nested Loops [message #345418] |
Wed, 03 September 2008 08:55 |
annagel
Messages: 220 Registered: April 2006
|
Senior Member |
|
|
I have a matching program which compares values in two tables and selects records where matches are found. Currently we are doing simple equality matching with some basic normalization of the data before we get to the matching stage. Our user have expressed that they may want to modify this set-up to employ some type of functional matching. I have started exploring the feasibility of such a method and have run into a serious performance wall. First some sample code to set-up a sample set of tables load them with some random sample data, gather some stats and then compare the basic matching method to a function based equivalent match.
--source data tables
CREATE TABLE sample1
(id1 NUMBER,
join_val VARCHAR2(20));
CREATE TABLE sample2
(id2 NUMBER,
join_val VARCHAR2(20));
--results of the join
CREATE TABLE res
(id1 NUMBER ,
id2 NUMBER);
--used for functional comparison
CREATE OR REPLACE FUNCTION sample_compare (in1 IN VARCHAR2, in2 IN VARCHAR2)
RETURN NUMBER
IS
BEGIN
IF in1 = in2
THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
END;
--load some random junk into the tables
DECLARE
i NUMBER;
l_val VARCHAR2 (20);
l_len1 NUMBER;
l_len2 NUMBER;
BEGIN
FOR i IN 20 .. 20000
LOOP
l_len1 := TRUNC (DBMS_RANDOM.VALUE (1, 20));
l_len2 := TRUNC (DBMS_RANDOM.VALUE (1, 20));
l_val := DBMS_RANDOM.STRING ('X', 20);
INSERT INTO sample1
VALUES (i,
SUBSTR (l_val, 1, l_len1));
INSERT INTO sample2
VALUES (i,
SUBSTR (l_val, 1, l_len2));
END LOOP;
END;
--get stats for the two tables, replace with your
--default schema
BEGIN
--replace schema name
DBMS_STATS.gather_table_stats ('NAGEL', 'SAMPLE1');
DBMS_STATS.gather_table_stats ('NAGEL', 'SAMPLE2');
END;
--normal equality join accomplished via hashing
INSERT INTO res
SELECT id1,
id2
FROM sample1 t1, sample2 t2
WHERE t1.join_val = t2.join_val;
--function based join accomplished with nested loops at a great
--impact on performance
INSERT INTO res
SELECT id1,
id2
FROM sample1 t1, sample2 t2
WHERE sample_compare (t1.join_val, t2.join_val) = 1;
And explain plans for the two query methods:
--normal join
INSERT STATEMENT ALL_ROWSCost: 29 Bytes: 642,180 Cardinality: 21,406
3 HASH JOIN Cost: 29 Bytes: 642,180 Cardinality: 21,406
1 TABLE ACCESS FULL TABLE NAGEL.SAMPLE1 Cost: 14 Bytes: 299,715 Cardinality: 19,981
2 TABLE ACCESS FULL TABLE NAGEL.SAMPLE2 Cost: 14 Bytes: 299,715 Cardinality: 19,981
--functional join
INSERT STATEMENT ALL_ROWSCost: 258,929 Bytes: 119,772,120 Cardinality: 3,992,404
3 NESTED LOOPS Cost: 258,929 Bytes: 119,772,120 Cardinality: 3,992,404
1 TABLE ACCESS FULL TABLE NAGEL.SAMPLE1 Cost: 14 Bytes: 299,715 Cardinality: 19,981
2 TABLE ACCESS FULL TABLE NAGEL.SAMPLE2 Cost: 13 Bytes: 3,000 Cardinality: 200
Looking at the explain plans it seems fairly obvious why the functional method takes so much longer to arrive at a result, the nested loop is killing it, but because the function employs values from each of my join tables I can't think of any other method which could be used to get these tables joined.
Is there any method but a nested loop which Oracle could use to join the two tables more efficiently? Or am I out of luck?
Andrew
|
|
|
Re: Function Based Join Without Nested Loops [message #345432 is a reply to message #345418] |
Wed, 03 September 2008 10:20 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
If you could write your sample_compare function in such a way that it took a value from one table only, then you could create a function based index on each of the tables, and things should go much quicker.
ie if you could write the query like:NSERT INTO res
SELECT id1,
id2
FROM sample1 t1, sample2 t2
WHERE sample_compare (t1.join_val) = sample_compare(t2.join_val);
|
|
|
Re: Function Based Join Without Nested Loops [message #345434 is a reply to message #345432] |
Wed, 03 September 2008 10:25 |
annagel
Messages: 220 Registered: April 2006
|
Senior Member |
|
|
I agree that it would make it faster but it is not going to meet the need of a functional compare. This is all very abstract at this point I am just trying to figure out if a functional comparison is even feasible, based on all the testing I have done thus far it isn't which is why the post is up to see if it is in a way I either do not know of or have overlooked.
|
|
|
Re: Function Based Join Without Nested Loops [message #345440 is a reply to message #345434] |
Wed, 03 September 2008 10:41 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
I'm pretty certain that it can be done with two functions.
Your current comparison function has only one value from each table to work with.
I'm sure you can have one function create a hash value from sample1.join_value, and have the other function create a hash value from sample2.join_value in such a way that if the two hash values match then you have a valid join condition.
|
|
|
Re: Function Based Join Without Nested Loops [message #345441 is a reply to message #345440] |
Wed, 03 September 2008 10:51 |
annagel
Messages: 220 Registered: April 2006
|
Senior Member |
|
|
Of course I can but this is just a sample of a function based comparison because there is no actual comparison methodology yet.
I am simply trying to see if the use of a functional comparison is feasible in the first place so our users don't go down the path of developing a comparison algorithm only to have me tell them the only way they will be able to use it is if they are willing to wait a couple days for the results.
|
|
|
Re: Function Based Join Without Nested Loops [message #345508 is a reply to message #345441] |
Wed, 03 September 2008 22:07 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
HASH join only works for equi-joins.
If you can write the join as @JR has shown, you should be able to force a hash join.
If you have fuzzier logic like
AND fuzzy_match_func(a.val, b.val) = 'MATCH' then you are going to be in trouble. Logically this can ONLY be done with a FULL-FULL-NL join.
If you are just talking about a range-based match (<=, >=) then you may be able to make use of a SORT MERGE join - depends on the query.
Ross Leishman
|
|
|
Re: Function Based Join Without Nested Loops [message #345655 is a reply to message #345508] |
Thu, 04 September 2008 06:42 |
annagel
Messages: 220 Registered: April 2006
|
Senior Member |
|
|
Thanks for the help Ross and JR, this is what I thought the result would be going in and my initial recommendations to our users reflect this. I just wanted to see if someone out there knew something on the subject I did not.
Andrew
|
|
|
Goto Forum:
Current Time: Fri Jan 10 01:51:19 CST 2025
|