Home » Server Options » Text & interMedia » Domain (Text) Index for Table join (Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi)
Domain (Text) Index for Table join [message #352171] |
Mon, 06 October 2008 08:30 |
annagel
Messages: 220 Registered: April 2006
|
Senior Member |
|
|
As a disclaimer, this could also be considered a performance post, but I think the route of the problem is more about what can be done with domain indexes which I would call a SQL problem...
I have recently been using Oracle Domain Context index along with some other custom indexes to do some fuzzy data lookup. I have been using this mostly to do lookups to this point but am starting to process the whole thing up which means querying based on lookup data that gets saved rather than passing data in through ad-hoc queries. So I need to be able to match a table with a context index on a certain column with two other tables one which would hold the match threshold for the query and the other which will hold the values to be looked up.
Here is a set-up example:
CREATE TABLE test_score
(score_id NUMBER, score NUMBER);
INSERT INTO test_score
VALUES (1, 1);
CREATE TABLE test_lookup
(score_id NUMBER, lk_up_val VARCHAR2(20));
CREATE TABLE test_data
AS SELECT SUBSTR (DBMS_RANDOM.STRING ('U', 20),
1,
ROUND (DBMS_RANDOM.VALUE (1, 20))) data_value
FROM DUAL
CONNECT BY ROWNUM <= 10000;
INSERT INTO test_lookup
SELECT 1,
data_value
FROM test_data
WHERE ROWNUM = 1;
CREATE INDEX test_cntx
ON test_data(data_value)
INDEXTYPE IS ctxsys.CONTEXT;
BEGIN
DBMS_STATS.gather_table_stats ('<SCHEMA>', 'test_data');
DBMS_STATS.gather_table_stats ('<SCHEMA>', 'test_lookup');
DBMS_STATS.gather_table_stats ('<SCHEMA>', 'test_score');
END;
Now against this structure I want to execute the following query:
SELECT /*+INDEX(dt test_cntx) */
dt.data_value, sc.score
FROM test_score sc, test_lookup lk, test_data dt
WHERE sc.score_id = lk.score_id
AND contains (dt.data_value, lk.lk_up_val) >= sc.score;
With or without the hint, I cannot get the query to use the domain index, instead it will always do a full scan of the value table and do the comparison for each and every value. See the explain plan.
SELECT STATEMENT ALL_ROWSCost: 14 Bytes: 175 Cardinality: 5
6 HASH JOIN Cost: 14 Bytes: 175 Cardinality: 5
4 NESTED LOOPS Cost: 11 Bytes: 160 Cardinality: 5
1 TABLE ACCESS FULL TABLE NAGEL.TEST_LOOKUP Cost: 3 Bytes: 21 Cardinality: 1
3 TABLE ACCESS BY INDEX ROWID TABLE NAGEL.TEST_DATA Cost: 11 Bytes: 55 Cardinality: 5
2 DOMAIN INDEX INDEX (DOMAIN) NAGEL.TEST_CNTX Cost: 4
5 TABLE ACCESS FULL TABLE NAGEL.TEST_SCORE Cost: 3 Bytes: 3 Cardinality: 1
This is only slightly slower for the 100,000 lookup values in the sample dataset, but when employed against a real data set with millions of rows the difference is enormous.
The only way I can get it to use the index is by hard coding (or passing as a parameter) the score value:
SELECT dt.data_value, sc.score
FROM test_score sc, test_lookup lk, test_data dt
WHERE sc.score_id = lk.score_id
AND contains (dt.data_value, lk.lk_up_val) >= 1;
Here I get a "correct" explain plan:
SELECT STATEMENT ALL_ROWSCost: 14 Bytes: 175 Cardinality: 5
6 HASH JOIN Cost: 14 Bytes: 175 Cardinality: 5
4 NESTED LOOPS Cost: 11 Bytes: 160 Cardinality: 5
1 TABLE ACCESS FULL TABLE NAGEL.TEST_LOOKUP Cost: 3 Bytes: 21 Cardinality: 1
3 TABLE ACCESS BY INDEX ROWID TABLE NAGEL.TEST_DATA Cost: 11 Bytes: 55 Cardinality: 5
2 DOMAIN INDEX INDEX (DOMAIN) NAGEL.TEST_CNTX Cost: 4
5 TABLE ACCESS FULL TABLE NAGEL.TEST_SCORE Cost: 3 Bytes: 3 Cardinality: 1
Now I could implement something in PL/SQL to grab and store match results first pulling the scores from the score table, but the mantra always is SQL first and PL/SQL only when required so the question does anyone know why Oracle will not under any circumstances use the index in the first query? The fact that it is ignoring the hint tells me there must be some kind of technical reason but I am just not getting it.
Thanks,
Andrew
|
|
|
Re: Domain (Text) Index for Table join [message #352232 is a reply to message #352171] |
Mon, 06 October 2008 13:47 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
"Contains" is designed to be used with "> 0". So, if you use that to obtain the score, then compare the obtained score to the value in your test_score table, it will use the domain index even without a hint. Please see the reproduction and possible solution below.
-- test environment:
SCOTT@orcl_11g> CREATE TABLE test_score
2 (score_id NUMBER,
3 score NUMBER)
4 /
Table created.
SCOTT@orcl_11g> INSERT INTO test_score VALUES (1, 1)
2 /
1 row created.
SCOTT@orcl_11g> CREATE TABLE test_lookup
2 (score_id NUMBER,
3 lk_up_val VARCHAR2(20))
4 /
Table created.
SCOTT@orcl_11g> CREATE TABLE test_data AS
2 SELECT SUBSTR
3 (DBMS_RANDOM.STRING ('U', 20),
4 1,
5 ROUND (DBMS_RANDOM.VALUE (1, 20))) data_value
6 FROM DUAL
7 CONNECT BY ROWNUM <= 10000
8 /
Table created.
SCOTT@orcl_11g> INSERT INTO test_lookup
2 SELECT 1, data_value
3 FROM test_data
4 WHERE ROWNUM = 1
5 /
1 row created.
SCOTT@orcl_11g> CREATE INDEX test_cntx
2 ON test_data (data_value)
3 INDEXTYPE IS ctxsys.CONTEXT
4 /
Index created.
SCOTT@orcl_11g> BEGIN
2 DBMS_STATS.gather_table_stats (USER, 'test_data');
3 DBMS_STATS.gather_table_stats (USER, 'test_lookup');
4 DBMS_STATS.gather_table_stats (USER, 'test_score');
5 END;
6 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> COLUMN data_value FORMAT A30
SCOTT@orcl_11g> SET AUTOTRACE ON EXPLAIN
-- reproduction of problem:
SCOTT@orcl_11g> SELECT /*+INDEX(dt test_cntx) */
2 dt.data_value, sc.score
3 FROM test_score sc, test_lookup lk, test_data dt
4 WHERE sc.score_id = lk.score_id
5 AND contains (dt.data_value, lk.lk_up_val) >= sc.score;
DATA_VALUE SCORE
------------------------------ ----------
MWRUF 1
Execution Plan
----------------------------------------------------------
Plan hash value: 4195803967
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 500 | 13000 | 16 (7)| 00:00:01 |
| 1 | NESTED LOOPS | | 500 | 13000 | 16 (7)| 00:00:01 |
|* 2 | HASH JOIN | | 1 | 15 | 7 (15)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST_SCORE | 1 | 6 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TEST_LOOKUP | 1 | 9 | 3 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | TEST_DATA | 500 | 5500 | 9 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("SC"."SCORE_ID"="LK"."SCORE_ID")
5 - filter("SC"."SCORE"<="CTXSYS"."CONTAINS"("DT"."DATA_VALUE","LK"."LK_
UP_VAL"))
SCOTT@orcl_11g>
-- possible solultion:
SCOTT@orcl_11g> SELECT dt.data_value, sc.score
2 FROM test_score sc, test_lookup lk, test_data dt
3 WHERE sc.score_id = lk.score_id
4 AND contains (dt.data_value, lk.lk_up_val, 1) > 0
5 AND SCORE (1) >= sc.score;
DATA_VALUE SCORE
------------------------------ ----------
MWRUF 1
Execution Plan
----------------------------------------------------------
Plan hash value: 1200543538
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 14 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 26 | 14 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 5 | 100 | 11 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST_LOOKUP | 1 | 9 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| TEST_DATA | 5 | 55 | 11 (0)| 00:00:01 |
|* 5 | DOMAIN INDEX | TEST_CNTX | | | 4 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | TEST_SCORE | 1 | 6 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("SC"."SCORE_ID"="LK"."SCORE_ID")
filter("SC"."SCORE"<="CTXSYS"."SCORE"(1))
5 - access("CTXSYS"."CONTAINS"("DT"."DATA_VALUE","LK"."LK_UP_VAL",1)>0)
SCOTT@orcl_11g>
[Updated on: Mon, 06 October 2008 13:52] Report message to a moderator
|
|
|
Re: Domain (Text) Index for Table join [message #352233 is a reply to message #352232] |
Mon, 06 October 2008 14:12 |
annagel
Messages: 220 Registered: April 2006
|
Senior Member |
|
|
It works, but it is wasteful because now you need to call the score operator which has overhead which is unnecessary. The domain operator is passed the value you are looking for so say >= exactly what you are looking for should always be the most efficient way of doing it.
In other words this approach is a massive improvement because even using the index to find anything with even a remotely possible match is infinitely better than not using it at all it is still less efficient than using the index to find exactly what you are looking for.
|
|
|
Re: Domain (Text) Index for Table join [message #352448 is a reply to message #352233] |
Tue, 07 October 2008 13:27 |
annagel
Messages: 220 Registered: April 2006
|
Senior Member |
|
|
Doing a bit more research into this topic and I think I may have an idea why it will not execute using the index when a non-bind or non-static value is passed.
Although the example I gave here used the context domain index, I am also working on a custom index method implemented using extensible indexing interface. To this point I have implemented an index and operators and just started looking at the extensible optimizer interface. And I think the problem is the way costs are calculated for a domain index vs a domain function.
The domain function costs interface is here and the index access cost is here.
The function is pretty run of the mill, just the cost of evoking a single call to the function no matter what, for the index things are a little different it is looking to get the cost of invoking the index for a specific range which is the way the interface will actually invoke the index in the end, when the bound condition is totally known at run time (ie no bind or anything) it is impossible to be able to estimate the cost of index access so it ops for the "safe" alternative of the function. Don't know that there is any way to fix this. The method mentioned by Barbara will get it to use the index, but it will also get it to work harder than it needs to retrieval of the rows and should also (assuming the cardinality function is well written) result in bad cardinality value which could lead to poor decision making in future joins on the returned data.
|
|
|
Goto Forum:
Current Time: Thu Jan 30 13:44:11 CST 2025
|