Home » Server Options » Text & interMedia » Text Index return min of words specified (Oracle 11G)
Text Index return min of words specified [message #616613] |
Thu, 19 June 2014 01:11 |
|
sherlocksher
Messages: 14 Registered: June 2013
|
Junior Member |
|
|
Hi,
We have a text index that is built on a multi column data store . The column names are DISC_CMPLNT_TX and MAINT_ACT_TX .
Our requirement is as follows :
1. The user can specify two types of words for searching. The first type is a REQUIRED KEYWORD and the second type is an OPTIONAL keyword.
2. Records that are returned should satisfy the condition where the required keyword is present and along with it ATLEAST 50% of the no of optional keywords should also be present .
For e.g. If the required keyword is RIGHT and the OPTIONAL keyword are SEAT, MAINTENANCE, ENGINE, SUPPLY,WING then records that contain the word RIGHT and three or more than three words from the optional keywords specified should be retrieved.
3. In addition, fuzzy search is switched on by default and each word can have synonyms.
4. For the REQUIRED keyword condition, I am able to specify the condition in the below manner where RHT,RT,R/H and RGT given below are synonyms of the word RIGHT.
SELECT a.ac_maint_id,
a.disc_cmplnt_tx,
a.MAINT_ACT_TX
score (1),
score (2)
FROM ac_maint a
WHERE project_id = 1261
AND contains (DISC_CMPLNT_TX,
'((?{RIGHT} = {RHT} = {RT} | {R/H} | {RGT}))',
1) > 0
5. Can somebody please clarify on how do I get atleast 50% of the OPTIONAL KEYWORDS. Also if the same OPTIONAL keyword is duplicated multiple times, then it should be treated as a single word.
|
|
|
Re: Text Index return min of words specified [message #616705 is a reply to message #616613] |
Thu, 19 June 2014 17:10 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
I believe the following meets your requirements and does so efficiently, using bind variables and one domain index access.
SCOTT@orcl12c> -- table:
SCOTT@orcl12c> CREATE TABLE ac_maint
2 (ac_maint_id NUMBER,
3 project_id NUMBER,
4 disc_cmplnt_tx VARCHAR2(20),
5 maint_act_tx VARCHAR2(30))
6 /
Table created.
SCOTT@orcl12c> -- data:
SCOTT@orcl12c> INSERT ALL
2 INTO ac_maint VALUES ( 1, 1261, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
3 INTO ac_maint VALUES ( 2, 1261, 'SEAT ENGINE SUPPLY', 'MAINTENANCE WING RIGHT')
4 INTO ac_maint VALUES ( 3, 1261, 'RICHT', 'SEAT MAINTENANCE SUPPLE')
5 INTO ac_maint VALUES ( 4, 1261, 'SEAT WING SUPPLE', 'RICHT')
6 INTO ac_maint VALUES ( 5, 1261, 'R/H', 'ENGINE WING SUPPLE')
7 INTO ac_maint VALUES ( 6, 1261, 'ENGINE WING SUPPLE', 'R/H')
8 INTO ac_maint VALUES ( 7, 1261, 'RIGHT', 'WING WING WING')
9 INTO ac_maint VALUES ( 8, 9999, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
10 SELECT * FROM DUAL
11 /
8 rows created.
SCOTT@orcl12c> INSERT INTO ac_maint
2 SELECT ROWNUM + 10, 8888, NULL, SUBSTR (object_name, 1, 30)
3 FROM all_objects
4 /
89921 rows created.
SCOTT@orcl12c> -- multi_column_datastore:
SCOTT@orcl12c> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('test_ds', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('test_ds', 'COLUMNS', 'disc_cmplnt_tx, maint_act_tx');
4 END;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> -- index:
SCOTT@orcl12c> CREATE INDEX text_index
2 ON ac_maint (disc_cmplnt_tx)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 FILTER BY project_id
5 PARAMETERS
6 ('DATASTORE test_ds')
7 /
Index created.
SCOTT@orcl12c> -- thesaurus with synonyms:
SCOTT@orcl12c> BEGIN
2 CTX_THES.CREATE_THESAURUS ('test_thes');
3 CTX_THES.CREATE_RELATION ('test_thes', 'RIGHT', 'SYN', 'RHT');
4 CTX_THES.CREATE_RELATION ('test_thes', 'RIGHT', 'SYN', 'RT');
5 CTX_THES.CREATE_RELATION ('test_thes', 'RIGHT', 'SYN', 'R/H');
6 CTX_THES.CREATE_RELATION ('test_thes', 'RIGHT', 'SYN', 'RGT');
7 END;
8 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> -- required and optional variables and values:
SCOTT@orcl12c> VARIABLE required VARCHAR2(100)
SCOTT@orcl12c> EXEC :required := 'RIGHT'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> VARIABLE optional VARCHAR2(100)
SCOTT@orcl12c> EXEC :optional := 'SEAT, MAINTENANCE, ENGINE, SUPPLY,WING'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> -- query:
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx
2 FROM ac_maint a,
3 (SELECT TRIM (REGEXP_SUBSTR (:optional, '[^,]+', 1, ROWNUM)) word,
4 COUNT (*) OVER () words
5 FROM DUAL
6 CONNECT BY LEVEL <= REGEXP_COUNT (:optional, ',') + 1) t
7 WHERE CONTAINS
8 (a.disc_cmplnt_tx,
9 '(?{' || :required || '} OR SYN ({' || :required || '}, test_thes)) AND
10 SDATA (project_id = 1261) AND
11 (?{' || t.word || '} OR SYN ({' || t.word || '}, test_thes))',
12 1) > 0
13 GROUP BY a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx, words
14 HAVING COUNT (*) / t.words >= 0.5
15 ORDER BY COUNT (*) DESC
16 /
AC_MAINT_ID DISC_CMPLNT_TX MAINT_ACT_TX
----------- -------------------- ------------------------------
2 SEAT ENGINE SUPPLY MAINTENANCE WING RIGHT
1 RIGHT SEAT ENGINE MAINTENANCE SUPPLY WING
4 SEAT WING SUPPLE RICHT
6 ENGINE WING SUPPLE R/H
5 R/H ENGINE WING SUPPLE
3 RICHT SEAT MAINTENANCE SUPPLE
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3029711498
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 42 | 5586 | 16 (0)| 00:00:01 |
| 1 | SORT ORDER BY | | 42 | 5586 | 16 (0)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | HASH GROUP BY | | 42 | 5586 | 16 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | 42 | 5586 | 16 (0)| 00:00:01 |
| 5 | VIEW | | 1 | 79 | 2 (0)| 00:00:01 |
| 6 | WINDOW BUFFER | | 1 | | 2 (0)| 00:00:01 |
| 7 | COUNT | | | | | |
|* 8 | CONNECT BY WITHOUT FILTERING| | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | AC_MAINT | 42 | 2268 | 16 (0)| 00:00:01 |
|* 11 | DOMAIN INDEX | TEXT_INDEX | | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(COUNT(*)/"WORDS">=0.5)
8 - filter(LEVEL<= REGEXP_COUNT (:OPTIONAL,',')+1)
11 - access("CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(?{'||:REQUIRED||'} OR SYN
({'||:REQUIRED||'}, test_thes)) AND SDATA (project_id = 1261) AND
(?{'||"T"."WORD"||'} OR SYN ({'||"T"."WORD"||'}, test_thes))',1)>0)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SCOTT@orcl12c>
|
|
|
Re: Text Index return min of words specified [message #616934 is a reply to message #616705] |
Mon, 23 June 2014 05:41 |
|
sherlocksher
Messages: 14 Registered: June 2013
|
Junior Member |
|
|
Thanks a lot Barbara ! . You are a lifesaver
Some more questions.
1. Instead of using a create thesaurus, how can I use the synonyms directly for each OPTIONAL KEYWORD . We are using dynamic queries to retrieve data and insert records. Basically I need the text that needs to replace the below text with synonyms for t.word. Since we are using dynamic queries, I can build it and substitute it. What is the format in which it needs to be built.
OR SYN ({' || t.word || '}, test_thes)
2. Also , when the query is formed dynamically and the synonyms are added, how do we handle for cases where the synonym contains special characters like 'R/H'.
[Updated on: Mon, 23 June 2014 06:23] Report message to a moderator
|
|
|
Re: Text Index return min of words specified [message #616975 is a reply to message #616934] |
Mon, 23 June 2014 13:27 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
The following revised demonstration does not use a thesaurus. Instead, it assumes that your synonyms will be passed within your variables with the | symbol between them, which means OR to Oracle Text. The brackets { and } are then wrapped around each individual word or synonym as part of the query to handle the special characters. I also made it so that it does fuzzy matching on the synonyms as well as the main words.
SCOTT@orcl12c> -- table:
SCOTT@orcl12c> CREATE TABLE ac_maint
2 (ac_maint_id NUMBER,
3 project_id NUMBER,
4 disc_cmplnt_tx VARCHAR2(20),
5 maint_act_tx VARCHAR2(30))
6 /
Table created.
SCOTT@orcl12c> -- data:
SCOTT@orcl12c> INSERT ALL
2 INTO ac_maint VALUES ( 1, 1261, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
3 INTO ac_maint VALUES ( 2, 1261, 'CHAIR ENGINE SUPPLY', 'MAINTENANCE WING RIGHT')
4 INTO ac_maint VALUES ( 3, 1261, 'RICHT', 'CHAIR MAINTENANCE SUPPLE')
5 INTO ac_maint VALUES ( 4, 1261, 'CHIAR WING SUPPLE', 'RICHT')
6 INTO ac_maint VALUES ( 5, 1261, 'R/H', 'ENGINE WING SUPPLE')
7 INTO ac_maint VALUES ( 6, 1261, 'ENGINE WING SUPPLE', 'R/H')
8 INTO ac_maint VALUES ( 7, 1261, 'RIGHT', 'WING WING WING')
9 INTO ac_maint VALUES ( 8, 9999, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
10 SELECT * FROM DUAL
11 /
8 rows created.
SCOTT@orcl12c> INSERT INTO ac_maint
2 SELECT ROWNUM + 10, 8888, NULL, SUBSTR (object_name, 1, 30)
3 FROM all_objects
4 /
89874 rows created.
SCOTT@orcl12c> -- multi_column_datastore:
SCOTT@orcl12c> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('test_ds', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('test_ds', 'COLUMNS', 'disc_cmplnt_tx, maint_act_tx');
4 END;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> -- index:
SCOTT@orcl12c> CREATE INDEX text_index
2 ON ac_maint (disc_cmplnt_tx)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 FILTER BY project_id
5 PARAMETERS
6 ('DATASTORE test_ds')
7 /
Index created.
SCOTT@orcl12c> -- required and optional variables and values:
SCOTT@orcl12c> VARIABLE required VARCHAR2(100)
SCOTT@orcl12c> EXEC :required := 'RIGHT|RHT|RT|R/H|RGT'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> VARIABLE optional VARCHAR2(100)
SCOTT@orcl12c> EXEC :optional := 'SEAT|CHAIR, MAINTENANCE, ENGINE, SUPPLY,WING'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> -- query:
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx
2 FROM ac_maint a,
3 (SELECT TRIM (REGEXP_SUBSTR (:optional, '[^,]+', 1, ROWNUM)) word,
4 COUNT (*) OVER () words
5 FROM DUAL
6 CONNECT BY LEVEL <= REGEXP_COUNT (:optional, ',') + 1) t
7 WHERE CONTAINS
8 (a.disc_cmplnt_tx,
9 '(?{' || REPLACE (:required, '|', '}|?{') || '}) AND
10 SDATA (project_id = 1261) AND
11 (?{' || REPLACE (t.word, '|', '}|?{') || '})',
12 1) > 0
13 GROUP BY a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx, words
14 HAVING COUNT (*) / t.words >= 0.5
15 ORDER BY COUNT (*) DESC
16 /
AC_MAINT_ID DISC_CMPLNT_TX MAINT_ACT_TX
----------- -------------------- ------------------------------
2 CHAIR ENGINE SUPPLY MAINTENANCE WING RIGHT
1 RIGHT SEAT ENGINE MAINTENANCE SUPPLY WING
5 R/H ENGINE WING SUPPLE
4 CHIAR WING SUPPLE RICHT
3 RICHT CHAIR MAINTENANCE SUPPLE
6 ENGINE WING SUPPLE R/H
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3029711498
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 42 | 5586 | 17 (6)| 00:00:01 |
| 1 | SORT ORDER BY | | 42 | 5586 | 17 (6)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | HASH GROUP BY | | 42 | 5586 | 17 (6)| 00:00:01 |
| 4 | NESTED LOOPS | | 42 | 5586 | 16 (0)| 00:00:01 |
| 5 | VIEW | | 1 | 79 | 2 (0)| 00:00:01 |
| 6 | WINDOW BUFFER | | 1 | | 2 (0)| 00:00:01 |
| 7 | COUNT | | | | | |
|* 8 | CONNECT BY WITHOUT FILTERING| | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | AC_MAINT | 42 | 2268 | 16 (0)| 00:00:01 |
|* 11 | DOMAIN INDEX | TEXT_INDEX | | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(COUNT(*)/"WORDS">=0.5)
8 - filter(LEVEL<= REGEXP_COUNT (:OPTIONAL,',')+1)
11 - access("CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(?{'||REPLACE(:REQUIRED,'|','}|?{'
)||'}) AND SDATA (project_id = 1261) AND
(?{'||REPLACE("T"."WORD",'|','}|?{')||'})',1)>0)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SCOTT@orcl12c>
[Updated on: Mon, 23 June 2014 14:03] Report message to a moderator
|
|
|
|
Re: Text Index return min of words specified [message #617075 is a reply to message #617050] |
Tue, 24 June 2014 13:41 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Brilliant! It uses the comma, which is the accum operator, whose scoring is based on the number of matching terms. That enables you to tell where to limit the score, based on the number of terms. Please see the following section of the online documentation for a more detailed explanation and examples:
http://docs.oracle.com/cd/E16655_01/text.121/e17747/cqoper.htm#CCREF0302
Using that method necessitates putting that in a separate contains clause, which requires two domain index hits, but that may still perform better than what I previously suggested.
Using the = operator seems to not be able to handle variables with / like R/H even when surrounded by { and }, so I would use | instead.
I believe you need to place your ? in front of each word if you want to apply fuzzy to each word. If you want to have more options you could even use the word fuzzy and it's parameters, instead of just ? which uses the default parameters.
I don't know how you are dynamicaly creating your query. I hope you are not concatenating variables. You should be using bind variables for your required and optional values for maximum efficiency.
Please see my revised demonstration below.
SCOTT@orcl12c> -- table:
SCOTT@orcl12c> CREATE TABLE ac_maint
2 (ac_maint_id NUMBER,
3 project_id NUMBER,
4 disc_cmplnt_tx VARCHAR2(20),
5 maint_act_tx VARCHAR2(30))
6 /
Table created.
SCOTT@orcl12c> -- data:
SCOTT@orcl12c> INSERT ALL
2 INTO ac_maint VALUES ( 1, 1261, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
3 INTO ac_maint VALUES ( 2, 1261, 'SEAT ENGINE SUPPLY', 'MAINTENANCE WING RIGHT')
4 INTO ac_maint VALUES ( 3, 1261, 'RICHT', 'SEAT MAINTENANCE SUPPLE')
5 INTO ac_maint VALUES ( 4, 1261, 'SEAT WING SUPPLE', 'RICHT')
6 INTO ac_maint VALUES ( 5, 1261, 'R/H', 'ENGINE WING SUPPLE')
7 INTO ac_maint VALUES ( 6, 1261, 'ENGONE WING SUPPLE', 'R/H')
8 INTO ac_maint VALUES ( 7, 1261, 'RIGHT', 'WING WING WING')
9 INTO ac_maint VALUES ( 8, 9999, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
10 SELECT * FROM DUAL
11 /
8 rows created.
SCOTT@orcl12c> INSERT INTO ac_maint
2 SELECT ROWNUM + 10, 8888, NULL, SUBSTR (object_name, 1, 30)
3 FROM all_objects
4 /
89830 rows created.
SCOTT@orcl12c> -- multi_column_datastore:
SCOTT@orcl12c> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('test_ds', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('test_ds', 'COLUMNS', 'disc_cmplnt_tx, maint_act_tx');
4 END;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> -- index:
SCOTT@orcl12c> CREATE INDEX text_index
2 ON ac_maint (disc_cmplnt_tx)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 FILTER BY project_id
5 PARAMETERS ('DATASTORE test_ds')
6 /
Index created.
SCOTT@orcl12c> -- required and optional variables and values:
SCOTT@orcl12c> VARIABLE required VARCHAR2(100)
SCOTT@orcl12c> EXEC :required := 'RIGHT|RHT|RT|R/H|RGT'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> VARIABLE optional VARCHAR2(100)
SCOTT@orcl12c> EXEC :optional := 'SEAT|SEATS,MAINTENANCE|MAINT,ENGINE|ENG|ENGINES,SUPPLY|SUPPY|SUP|SUPPL,WING|WINGS'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> -- query:
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx
2 FROM ac_maint a
3 WHERE CONTAINS
4 (a.disc_cmplnt_tx,
5 '(((?{' ||
6 REPLACE (REPLACE (:required, '|', '}|?{'), ',', '}),(?{') ||
7 '}))) AND
8 SDATA (project_id = 1261)',
9 1) > 0
10 AND CONTAINS
11 (a.disc_cmplnt_tx,
12 '(((?{' ||
13 REPLACE (REPLACE (:optional, '|', '}|?{'), ',', '}),(?{') ||
14 '})))',
15 2) > (100 / (REGEXP_COUNT (:optional, ',') + 1)) *
16 ((ROUND ((REGEXP_COUNT (:optional, ',') + 1) * 0.5)) - 1)
17 ORDER BY SCORE(2) DESC
18 /
AC_MAINT_ID DISC_CMPLNT_TX MAINT_ACT_TX
----------- -------------------- ------------------------------
1 RIGHT SEAT ENGINE MAINTENANCE SUPPLY WING
2 SEAT ENGINE SUPPLY MAINTENANCE WING RIGHT
6 ENGONE WING SUPPLE R/H
4 SEAT WING SUPPLE RICHT
5 R/H ENGINE WING SUPPLE
3 RICHT SEAT MAINTENANCE SUPPLE
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1435967249
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 54 | 8 (0)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 54 | 8 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| AC_MAINT | 1 | 54 | 8 (0)| 00:00:01 |
| 3 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 4 | BITMAP AND | | | | | |
| 5 | BITMAP CONVERSION FROM ROWIDS | | | | | |
| 6 | SORT ORDER BY | | | | | |
|* 7 | DOMAIN INDEX | TEXT_INDEX | | | 4 (0)| 00:00:01 |
| 8 | BITMAP CONVERSION FROM ROWIDS | | | | | |
| 9 | SORT ORDER BY | | | | | |
|* 10 | DOMAIN INDEX | TEXT_INDEX | | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(((?{'||REPLACE(REPLACE(:REQUIRED,'
|','}|?{'),',','}),(?{')||'}))) AND SDATA (project_id = 1261)',1)>0 AND
"CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(((?{'||REPLACE(REPLACE(:OPTIONAL,'|','}|?{'),','
,'}),(?{')||'})))',2)>100/( REGEXP_COUNT (:OPTIONAL,',')+1)*(ROUND(( REGEXP_COUNT
(:OPTIONAL,',')+1)*0.5)-1))
10 - access("CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(((?{'||REPLACE(REPLACE(:OPTIONAL,'
|','}|?{'),',','}),(?{')||'})))',2)>100/( REGEXP_COUNT (:OPTIONAL,',')+1)*(ROUND((
REGEXP_COUNT (:OPTIONAL,',')+1)*0.5)-1))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SCOTT@orcl12c>
|
|
|
Re: Text Index return min of words specified [message #617102 is a reply to message #617075] |
Wed, 25 June 2014 01:27 |
|
sherlocksher
Messages: 14 Registered: June 2013
|
Junior Member |
|
|
Thanks Barbara for the link.
a. Yes. We are using two domain index hits. One for searching required words with contains > 0 and the other for the optional words with contains > computed value.
b. The exact query that we are using is as follows
/* Formatted on 6/25/2014 11:55:59 AM (QP5 v5.227.12220.39724) */
SELECT a.ac_maint_id,
a.disc_cmplnt_tx,
a.MAINT_ACT_TX,
a.LGBK_CMT_TX,
score (1),
score (2)
FROM ac_maint a
WHERE project_id = 1261
AND contains (DISC_CMPLNT_TX,
'((?{RIGHT} = {RHT} = {RT} | {R/H} | {RGT}))',
1) > 0
AND contains (
DISC_CMPLNT_TX,
'((?{SEAT} = {SEATS}) , (?{ENTRY}) , (?{MAINTENANCE} = {MX} = {MTC} = {MAINT}) , (?{MUSIC}) , (?{ENGINE} = {ENGINES} = {ENG}) , (?{SUPPLY} = {SUP} = {SUPPL} = {SUPPY}) , (?{WING} = {WINGS}))',
2) > 40
c. We are not using bind variables. We are just forming the dynamic query with the concatenated variables. We will change the implementation to use bind variables as you suggested to improve performance.
Now using the OR operator is not giving the expected results when the text contains Special characters . Please see example below.
1. The scores in Step 6 for both ID 1 and 2 are 18 and 34 respectively whereas they should be identical. Since the word "C/B" is present for ID2 and we are searching for the synonym "A/C", it is matching the "C" of the synonym with the "C" of the text and returning a higher score.
2. I referred the link https://community.oracle.com/thread/2261631 where you had answered to use printjoins for special characters to ensure EQUIV operator works as expected and when I dropped the index and re-created with this , then the scores for both the records came out as 18 which was what we expected. Note : In this case both OR and EQUIV operator returned the same score
-- STEP 1
CREATE TABLE accumtbl
(
ID NUMBER,
disc_cmplnt_tx CLOB,
maint_act_tx CLOB
);
-- STEP 2
INSERT INTO accumtbl
VALUES (
1,
'RT ENG IGNITION WOULD NOT START ON PUSHBACK ON RT ENG. RESET CB ON RT ENG AFTER MANY ATTEMPT TO CLEAR FUEL OUT OF TAIL PIPE/OPS OK. MITCHELL',
'ABCD');
-- STEP 3
INSERT INTO accumtbl
VALUES (
2,
'RT ENG IGNITION WOULD NOT START ON PUSHBACK ON RT ENG. RESET C/B ON RT ENG AFTER MANY ATTEMPT TO CLEAR FUEL OUT OF TAIL PIPE/OPS OK. MITCHELL',
'DEFG');
-- STEP 4
BEGIN
CTX_DDL.CREATE_PREFERENCE ('test_ds', 'MULTI_COLUMN_DATASTORE');
CTX_DDL.SET_ATTRIBUTE ('test_ds',
'COLUMNS',
'disc_cmplnt_tx, maint_act_tx');
END;
/
-- STEP 5
CREATE INDEX accumtbl_idx
ON accumtbl (disc_cmplnt_tx)
INDEXTYPE IS ctxsys.context;
-- STEP 6
SELECT id,
disc_cmplnt_tx,
score (1),
score (2)
FROM accumtbl
WHERE contains (disc_cmplnt_tx,
'({RIGHT} | {R/H} | {RHT} = {RGT} = {RT})',
1) > 0
AND contains (
disc_cmplnt_tx,
'(({AIRCRAFT} = {ACFT} | {A/C}) , ({SEAT} = {SEATS}) , ({FUEL} = {FUELS}) , ({ENGINE} = {ENG} = {ENGINES}) , ({SUPPLY} = {SUPPY} = {SUP} = {SUPPL}) , ({WING} = {WINGS}))',
2) > 0;
-- STEP 7
DROP INDEX accumtbl_idx;
-- STEP 8
BEGIN
CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
CTX_DDL.SET_ATTRIBUTE ('test_lex', 'PRINTJOINS', '/');
END;
/
-- STEP 9
CREATE INDEX accumtbl_idx
ON accumtbl (disc_cmplnt_tx)
INDEXTYPE IS ctxsys.context
PARAMETERS ('LEXER test_lex');
-- STEP 10
SELECT id,
disc_cmplnt_tx,
score (1),
score (2)
FROM accumtbl
WHERE contains (disc_cmplnt_tx,
'({RIGHT} | {R/H} | {RHT} = {RGT} = {RT})',
1) > 0
AND contains (
disc_cmplnt_tx,
'(({AIRCRAFT} = {ACFT} | {A/C}) , ({SEAT} = {SEATS}) , ({FUEL} = {FUELS}) , ({ENGINE} = {ENG} = {ENGINES}) , ({SUPPLY} = {SUPPY} = {SUP} = {SUPPL}) , ({WING} = {WINGS}))',
2) > 0;
[Updated on: Wed, 25 June 2014 01:38] Report message to a moderator
|
|
|
Re: Text Index return min of words specified [message #617167 is a reply to message #617102] |
Wed, 25 June 2014 12:40 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Without the printjoins, it converts / to a space, so it tokenizes C/B as C and B and given A/C searches for A or C and thus finds the C from A/C as the C from C/B. In my previous testing, using = without setting / as a printjoin produced an error, but using OR did not. Apparently, setting the / as a printjoin solves both problems. So, it looks like you've got it all figured out correctly now. Thanks for providing the full explanation and code.
|
|
|
Goto Forum:
Current Time: Sat Nov 23 05:38:42 CST 2024
|