Home » Server Options » Text & interMedia » Oracle Text query with multiple contains performance
Oracle Text query with multiple contains performance [message #264520] |
Mon, 03 September 2007 11:20 |
Agus211
Messages: 39 Registered: September 2007
|
Member |
|
|
Hi, Im having performance issues with an Oracle Text query that makes multiple CONTAINS filters. Can you help me tuning it, because Im a begginer in this type of things and I don't know how to do it.
The query is:
SELECT /*+ FIRST_ROWS(50) */
(score (1) + score (2) + score (3)
) AS RANK, ID, TO_CHAR (dateinsert, 'dd/mm/yyyy')
FROM NOTICIAS
WHERE ( contains (resumen, v_palabra, 1) > 0
OR contains (insert, v_palabra, 2) > 0
OR contains (tema, v_palabra, 3) > 0
)
AND ouid = v_ouid
ORDER BY dateinsert DESC;
Someone told me I should'nt have so many "contains" in the query, is that right?.
Thanks in advance.
|
|
|
|
|
Re: Oracle Text query with multiple contains performance [message #264576 is a reply to message #264555] |
Mon, 03 September 2007 17:26 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
You should see a significant performance improvement with a multi_column_datastore, because it will be using one index instead of three. I have provided a basic example below. You can find detailed information and additional options in the online documentation. For maximum performance you should also have indexes on any columns used in filter conditions, like ouid, or ordering, like dateinsert. Also, you should avoid using reserved words, like insert, as column names.
-- table and data for demo:
SCOTT@10gXE> CREATE TABLE noticias
2 (id NUMBER,
3 ouid NUMBER,
4 dateinsert DATE,
5 resumen CLOB,
6 inserted VARCHAR2 (10),
7 tema VARCHAR2 (10),
8 search_cols VARCHAR2 (1))
9 /
Table created.
SCOTT@10gXE> INSERT ALL
2 INTO noticias VALUES (1, 10, SYSDATE, 'testing', 'something', 'whatever', null)
3 INTO noticias VALUES (2, 10, SYSDATE, 'something', 'testing', 'whatever', null)
4 INTO noticias VALUES (3, 10, SYSDATE, 'something', 'whatever', 'testing', null)
5 INTO noticias VALUES (4, 10, SYSDATE, 'teting', 'testing', 'testing', null)
6 INTO noticias VALUES (5, 10, SYSDATE, 'and', 'another', 'thing', null)
7 SELECT * FROM DUAL
8 /
5 rows created.
-- create multi_column_datastore and specify columns:
SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('noticias_mcds', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('noticias_mcds', 'COLUMNS', 'resumen, inserted, tema');
4 END;
5 /
PL/SQL procedure successfully completed.
-- create index using multicolumn_datastore:
SCOTT@10gXE> CREATE INDEX noticias_index
2 ON noticias (search_cols)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS ('DATASTORE noticias_mcds')
5 /
Index created.
-- variables for searching:
SCOTT@10gXE> VARIABLE v_palabra VARCHAR2 (30)
SCOTT@10gXE> VARIABLE v_ouid NUMBER
SCOTT@10gXE> EXEC :v_palabra := 'testing'
PL/SQL procedure successfully completed.
SCOTT@10gXE> EXEC :v_ouid := 10
PL/SQL procedure successfully completed.
-- sample query:
SCOTT@10gXE> COLUMN resumen FORMAT A10 word_wrapped
SCOTT@10gXE> SELECT /*+ FIRST_ROWS(50) */
2 score (1) AS RANK,
3 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
4 resumen, inserted, tema
5 FROM NOTICIAS
6 WHERE contains (search_cols, :v_palabra, 1) > 0
7 AND ouid = :v_ouid
8 ORDER BY dateinsert DESC
9 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
3 1 03/09/2007 testing something whatever
3 2 03/09/2007 something testing whatever
3 3 03/09/2007 something whatever testing
7 4 03/09/2007 teting testing testing
SCOTT@10gXE>
|
|
|
Re: Oracle Text query with multiple contains performance [message #264767 is a reply to message #264520] |
Tue, 04 September 2007 09:07 |
Agus211
Messages: 39 Registered: September 2007
|
Member |
|
|
Hi, first of all I want to thank you for that example and the advices, they helped me a lot. Now that the performance improved in that query, they told me if I could make an another query that searches in one of the columns that are contained in the multi_column_datastore.
The thing is that I should create an Oracle Text index for resumen, inserted and tema besides noticias_index with this new requirement and use a simple datastore.
Isn't it a mix up of things that would cause a mess?.
Im sorry if I don't explain myself very well, but I don't speak English as I would like to.
Again, thank you.
|
|
|
|
Re: Oracle Text query with multiple contains performance [message #264845 is a reply to message #264767] |
Tue, 04 September 2007 14:42 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Here is an expansion of the previous example that includes a section group, one section per column, and examples of searching individual columns using "within".
-- table and data for demo:
SCOTT@10gXE> CREATE TABLE noticias
2 (id NUMBER,
3 ouid NUMBER,
4 dateinsert DATE,
5 resumen CLOB,
6 inserted VARCHAR2 (10),
7 tema VARCHAR2 (10),
8 search_cols VARCHAR2 (1))
9 /
Table created.
SCOTT@10gXE> INSERT ALL
2 INTO noticias VALUES (1, 10, SYSDATE, 'testing', 'something', 'whatever', null)
3 INTO noticias VALUES (2, 10, SYSDATE, 'something', 'testing', 'whatever', null)
4 INTO noticias VALUES (3, 10, SYSDATE, 'something', 'whatever', 'testing', null)
5 INTO noticias VALUES (4, 10, SYSDATE, 'testing', 'testing', 'testing', null)
6 INTO noticias VALUES (5, 10, SYSDATE, 'whatever', 'nothing', 'something', null)
7 SELECT * FROM DUAL
8 /
5 rows created.
-- create multi_column_datastore and specify columns:
SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('noticias_mcds', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('noticias_mcds', 'COLUMNS', 'resumen, inserted, tema');
4 END;
5 /
PL/SQL procedure successfully completed.
-- create section group and add one section per column:
SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_SECTION_GROUP ('noticias_sg', 'BASIC_SECTION_GROUP');
3 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'resumen', 'resumen', TRUE);
4 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'inserted', 'inserted', TRUE);
5 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'tema', 'tema', TRUE);
6 END;
7 /
PL/SQL procedure successfully completed.
-- create index using multicolumn_datastore and section group:
SCOTT@10gXE> CREATE INDEX noticias_index
2 ON noticias (search_cols)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS
5 ('DATASTORE noticias_mcds
6 SECTION GROUP noticias_sg')
7 /
Index created.
-- variables for searching:
SCOTT@10gXE> VARIABLE v_palabra VARCHAR2 (30)
SCOTT@10gXE> VARIABLE v_ouid NUMBER
SCOTT@10gXE> EXEC :v_palabra := 'testing'
PL/SQL procedure successfully completed.
SCOTT@10gXE> EXEC :v_ouid := 10
PL/SQL procedure successfully completed.
-- search of all columns:
SCOTT@10gXE> COLUMN resumen FORMAT A10 word_wrapped
SCOTT@10gXE> SELECT /*+ FIRST_ROWS(50) */
2 score (1) AS RANK,
3 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
4 resumen, inserted, tema
5 FROM NOTICIAS
6 WHERE contains (search_cols, :v_palabra, 1) > 0
7 AND ouid = :v_ouid
8 ORDER BY dateinsert DESC
9 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
3 1 04/09/2007 testing something whatever
3 2 04/09/2007 something testing whatever
3 3 04/09/2007 something whatever testing
10 4 04/09/2007 testing testing testing
-- searches of individual columns using "within":
SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN resumen', 1) > 0
6 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 1 04/09/2007 testing something whatever
4 4 04/09/2007 testing testing testing
SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN inserted', 1) > 0
6 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 2 04/09/2007 something testing whatever
4 4 04/09/2007 testing testing testing
SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN tema', 1) > 0
6 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 3 04/09/2007 something whatever testing
4 4 04/09/2007 testing testing testing
SCOTT@10gXE>
|
|
|
Re: Oracle Text query with multiple contains performance [message #265603 is a reply to message #264520] |
Thu, 06 September 2007 12:53 |
Agus211
Messages: 39 Registered: September 2007
|
Member |
|
|
Thank you for all the examples. I'd like to ask you since Im new on this if having only one Oracle Text index with multicolumn datastore and sections is more performant than having 3 single column Oracle Text indexes and one multicolumn index.
Im asking you this because I don't know if when you want to query one of the sections declared in the index, it "moves around" all the sections and then retrieves the one I desire. Does this happen?. If this is true, should I have 3 Oracle Text indexes besides the multicolumn one to make single columns queries?.
Thank you.
|
|
|
Re: Oracle Text query with multiple contains performance [message #265631 is a reply to message #265603] |
Thu, 06 September 2007 15:02 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
The multi_column_datatstore was invented for this purpose because it is faster. When in doubt, test and compare for yourself. If you do not get better performance then there may be other reasons. If you are searching on all three columns, then the sections are not involved. If you are searching within only one or two columns, then it is still faster than using separate indexes. With separate indexes you would likely find that the execution plan would use only one index or none of them and too many contains clauses and or conditions might just result in an error. One index with a multi_column_datastore and sections is the right way to do this. Oracle Text indexes are not like regular indexes. The data is stored in tables. For example, if your index is named your_index, when you create a context index, it creates several tables, such as dr$your_index$i, and query results are retrieved based on what is in those tables. If you "select token_text from dr$your_index$i;", you will see all of the words that have been tokenized, indexed, and are searchable. There are other columns in that table that relate it to the rows in your data table. Some of it is stored in a format that requires usage of Oracle functions to decipher it. You really need to start reading the Oracle Text Reference and Oracle Text Application Developer's Guide in the online documentation. It is all searchable. For example, you can search for "multi_column_datastore" and find plenty of information and examples. If this is all too complicated for you at this time, then you may want to consider hiring a consultant. You mentioned a problem with English previously, but your English is clear. However, if you prefer to post in Spanish or some other language, you may do so in the separate forums listed at the bottom of the main forum page.
|
|
|
Re: Oracle Text query with multiple contains performance [message #265712 is a reply to message #265631] |
Fri, 07 September 2007 02:12 |
guinevere
Messages: 1 Registered: September 2007
|
Junior Member |
|
|
I have similar situation where I need to create a unified search tool from various table and various columns. To achieve this I am planning to create multi_column_datastore that includes a section group, one section per column.
My Text Table would look like this:
CREATE TABLE myTable
2 (id NUMBER,
3 col1 CLOB,
4 col2 CLOB,
5 col3 CLOB,
6 col4 CLOB)
Each columns are populated with text delimited from various columns from various tables.
For Example:
col1 - Data1<:>Data2<:>Data3<:>Data4...etc
Table1 - Data1 Varchar2(200)
Data2 Clob
Table2 - Data3 Varchar2(400)
Data4 Clob
Use context search to achive unified search from various tables and various columns. Please advice whether the approach is correct.
I know I shall hit the road block on the CLOB datasize while concatinating the text from various columns from various tables. How do you solve this one?
|
|
|
Re: Oracle Text query with multiple contains performance [message #265899 is a reply to message #265712] |
Fri, 07 September 2007 14:34 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
If your columns come from different tables, then you should use a user_datastore with a procedure to append the clobs, with tags inbetween to mark the sections. Please see the example below that includes explained plans to show index usage. I originally posted the example below on another forum, in response to someone who was using two context indexes on the individual columns, querying with two contains clauses and getting poor performance and the explain plan showed that neither context index was being used. You want to use one index whenever possible and avoid multiple contains clauses. Unless you have a data warehouse situation, if your data is going to change, you should also avoid creating additional tables that select columns from your base tables, as this will create synchronization problems. You should also be aware that Oracle Text will not allow you to create more than one index of the same type on one column.
SCOTT@10gXE> CREATE TABLE foros_mens
2 (id NUMBER,
3 visible VARCHAR2 (1),
4 fecha DATE,
5 titulo VARCHAR2 (30),
6 CONSTRAINT foros_mens_id_pk PRIMARY KEY (id))
7 /
Table created.
SCOTT@10gXE> INSERT ALL
2 INTO foros_mens VALUES (1, 's', SYSDATE, 'xxxxx')
3 INTO foros_mens VALUES (2, 's', SYSDATE, 'whatever')
4 INTO foros_mens VALUES (3, 's', SYSDATE, 'xxxxx')
5 INTO foros_mens VALUES (4, 's', SYSDATE, 'whatever')
6 SELECT * FROM DUAL
7 /
4 rows created.
SCOTT@10gXE> CREATE TABLE foros_texto
2 (id_mens NUMBER REFERENCES foros_mens (id),
3 texto CLOB,
4 CONSTRAINT foros_texto_id_mens_fk FOREIGN KEY (id_mens)
5 REFERENCES foros_mens (id))
6 /
Table created.
SCOTT@10gXE> INSERT ALL
2 INTO foros_texto VALUES (1, 'whatever')
3 INTO foros_texto VALUES (2, 'xxxxx')
4 INTO foros_texto VALUES (3, 'xxxxx')
5 INTO foros_texto VALUES (4, 'whatever')
6 SELECT * FROM DUAL
7 /
4 rows created.
SCOTT@10gXE> CREATE OR REPLACE PROCEDURE foros_proc
2 (p_rowid IN ROWID,
3 p_clob IN OUT CLOB)
4 AS
5 BEGIN
6 FOR r IN
7 (SELECT m.titulo, t.texto
8 FROM foros_mens m, foros_texto t
9 WHERE m.id=t.id_mens
10 AND t.ROWID = p_rowid)
11 LOOP
12 DBMS_LOB.WRITEAPPEND (p_clob, 8, '<titulo>');
13 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r.titulo), r.titulo);
14 DBMS_LOB.WRITEAPPEND (p_clob, 16, '</titulo><texto>');
15 DBMS_LOB.APPEND (p_clob, r.texto);
16 DBMS_LOB.WRITEAPPEND (p_clob, 8, '</texto>');
17 END LOOP;
18 END foros_proc;
19 /
Procedure created.
SCOTT@10gXE> SHOW ERRORS
No errors.
SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('foros_datastore', 'USER_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('foros_datastore', 'PROCEDURE', 'foros_proc');
4 END;
5 /
PL/SQL procedure successfully completed.
SCOTT@10gXE> CREATE INDEX foros_index ON foros_texto (texto)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4 ('DATASTORE foros_datastore
5 SECTION GROUP CTXSYS.AUTO_SECTION_GROUP')
6 /
Index created.
SCOTT@10gXE> SET AUTOTRACE ON EXPLAIN
SCOTT@10gXE> COLUMN texto FORMAT A30 WORD_WRAPPED
SCOTT@10gXE> -- search of both columns:
SCOTT@10gXE> SELECT *
2 FROM foros_mens m, foros_texto t
3 WHERE m.visible = 's'
4 AND CONTAINS (t.texto,'xxxxx') > 0
5 AND m.fecha > (SYSDATE - 90)
6 AND m.id = t.id_mens
7 /
ID V FECHA TITULO ID_MENS TEXTO
---------- - --------- ------------------------------ ---------- ------------------------------
1 s 05-SEP-07 xxxxx 1 whatever
2 s 05-SEP-07 whatever 2 xxxxx
3 s 05-SEP-07 xxxxx 3 xxxxx
Execution Plan
----------------------------------------------------------
Plan hash value: 396663544
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2068 | 5 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 2068 | 5 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| FOROS_TEXTO | 1 | 2027 | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | FOROS_INDEX | | | 4 (0)| 00:00:01 |
|* 4 | TABLE ACCESS BY INDEX ROWID| FOROS_MENS | 1 | 41 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | FOROS_MENS_ID_PK | 1 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CTXSYS"."CONTAINS"("T"."TEXTO",'xxxxx')>0)
4 - filter("M"."VISIBLE"='s' AND "M"."FECHA">SYSDATE@!-90)
5 - access("M"."ID"="T"."ID_MENS")
Note
-----
- dynamic sampling used for this statement
SCOTT@10gXE> -- searches of individual columns:
SCOTT@10gXE> SELECT *
2 FROM foros_mens m, foros_texto t
3 WHERE m.visible = 's'
4 AND CONTAINS (t.texto,'xxxxx WITHIN titulo') > 0
5 AND m.fecha > (SYSDATE - 90)
6 AND m.id = t.id_mens
7 /
ID V FECHA TITULO ID_MENS TEXTO
---------- - --------- ------------------------------ ---------- ------------------------------
1 s 05-SEP-07 xxxxx 1 whatever
3 s 05-SEP-07 xxxxx 3 xxxxx
Execution Plan
----------------------------------------------------------
Plan hash value: 396663544
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2068 | 5 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 2068 | 5 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| FOROS_TEXTO | 1 | 2027 | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | FOROS_INDEX | | | 4 (0)| 00:00:01 |
|* 4 | TABLE ACCESS BY INDEX ROWID| FOROS_MENS | 1 | 41 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | FOROS_MENS_ID_PK | 1 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CTXSYS"."CONTAINS"("T"."TEXTO",'xxxxx WITHIN titulo')>0)
4 - filter("M"."VISIBLE"='s' AND "M"."FECHA">SYSDATE@!-90)
5 - access("M"."ID"="T"."ID_MENS")
Note
-----
- dynamic sampling used for this statement
SCOTT@10gXE> SELECT *
2 FROM foros_mens m, foros_texto t
3 WHERE m.visible = 's'
4 AND CONTAINS (t.texto,'xxxxx WITHIN texto') > 0
5 AND m.fecha > (SYSDATE - 90)
6 AND m.id = t.id_mens
7 /
ID V FECHA TITULO ID_MENS TEXTO
---------- - --------- ------------------------------ ---------- ------------------------------
2 s 05-SEP-07 whatever 2 xxxxx
3 s 05-SEP-07 xxxxx 3 xxxxx
Execution Plan
----------------------------------------------------------
Plan hash value: 396663544
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2068 | 5 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 2068 | 5 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| FOROS_TEXTO | 1 | 2027 | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | FOROS_INDEX | | | 4 (0)| 00:00:01 |
|* 4 | TABLE ACCESS BY INDEX ROWID| FOROS_MENS | 1 | 41 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | FOROS_MENS_ID_PK | 1 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CTXSYS"."CONTAINS"("T"."TEXTO",'xxxxx WITHIN texto')>0)
4 - filter("M"."VISIBLE"='s' AND "M"."FECHA">SYSDATE@!-90)
5 - access("M"."ID"="T"."ID_MENS")
Note
-----
- dynamic sampling used for this statement
SCOTT@10gXE>
[Updated on: Fri, 07 September 2007 14:38] Report message to a moderator
|
|
|
Re: Oracle Text query with multiple contains performance [message #270393 is a reply to message #264520] |
Wed, 26 September 2007 14:24 |
elin@atcc.org
Messages: 4 Registered: September 2007
|
Junior Member |
|
|
I had a similar domain index created for all_fields search. However, we ran into a problem that the search didn't return the right result when some culumns have the same values like their culumn names. (e.g. "authenticult" value is stored in authenticult column). The result always returns all rows. Can someone help?
|
|
|
|
Re: Oracle Text query with multiple contains performance [message #270428 is a reply to message #270415] |
Wed, 26 September 2007 21:13 |
elin@atcc.org
Messages: 4 Registered: September 2007
|
Junior Member |
|
|
Here is a test case. 4 rows are inserted and a context index is created for all fields. The search against all fields returns 4 rows when the search value is same as one of the column names. (It should only return 1 row.) You may get a better idea when you see the DDLs below. Thanks for the help.
/*** CREATE TABLE ***/
create table test (a int, authenticult clob, preceptrol clob, safetsource clob, all_fields char(1));
/*** INSERT DATA ***/
insert into test values (1,'col1','col2','col3',null);
insert into test values (2,'authenticult',null,null,null);
insert into test values (3,null,'preceptrol',null,null);
insert into test values (4,null,null,'safetsource',null);
/*** CREATE INDEX ***/
begin
ctx_ddl.create_preference('test_ds','multi_column_datastore');
ctx_ddl.set_attribute('test_ds','columns','authenticult, preceptrol,safetsource');
end;
create index test_idx on test(all_fields) indextype is ctxsys.context parameters('datastore test_ds sync(on commit)');
/*** SEARCH QUERY ***/
select count(*) from test where contains (all_fields, 'authenticult') >0;
|
|
|
Re: Oracle Text query with multiple contains performance [message #270725 is a reply to message #270428] |
Thu, 27 September 2007 20:05 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
When you use a multi_column_datastore, it creates a virtual document, similar to what you might create using a user_datastore with a procedure to concatenate the columns. By default, it creates tags of the same names as the column names and uses them as delimiters and tokenizes and indexes them along with the data. You can override this default behavior by setting the delimiter attribute to newline for your multi_column_datastore. The column names then are ignored during indexing and only the data within them is indexed. Please see the reproduction and solution below.
SCOTT@10gXE> -- test environment you provided:
SCOTT@10gXE>
SCOTT@10gXE> /*** CREATE TABLE ***/
SCOTT@10gXE> create table test
2 (a int,
3 authenticult clob,
4 preceptrol clob,
5 safetsource clob,
6 all_fields char(1));
Table created.
SCOTT@10gXE>
SCOTT@10gXE> /*** INSERT DATA ***/
SCOTT@10gXE> insert into test values (1,'col1','col2','col3',null);
1 row created.
SCOTT@10gXE> insert into test values (2,'authenticult',null,null,null);
1 row created.
SCOTT@10gXE> insert into test values (3,null,'preceptrol',null,null);
1 row created.
SCOTT@10gXE> insert into test values (4,null,null,'safetsource',null);
1 row created.
SCOTT@10gXE>
SCOTT@10gXE> /*** CREATE INDEX ***/
SCOTT@10gXE> begin
2 ctx_ddl.create_preference('test_ds','multi_column_datastore');
3 ctx_ddl.set_attribute('test_ds','columns','authenticult, preceptrol,safetsource');
4 end;
5 /
PL/SQL procedure successfully completed.
SCOTT@10gXE>
SCOTT@10gXE> create index test_idx on test(all_fields)
2 indextype is ctxsys.context
3 parameters('datastore test_ds sync(on commit)');
Index created.
SCOTT@10gXE>
SCOTT@10gXE> -- reproduction:
SCOTT@10gXE>
SCOTT@10gXE> /*** SEARCH QUERY ***/
SCOTT@10gXE> select count(*) from test where contains (all_fields, 'authenticult') >0;
COUNT(*)
----------
4
SCOTT@10gXE>
SCOTT@10gXE> -- see what is tokenized, indexed, and searchable:
SCOTT@10gXE> column token_text varchar2(30)
SP2-0735: unknown COLUMN option beginning "varchar2(3..."
SCOTT@10gXE> select token_text, token_count from dr$test_idx$i
2 /
TOKEN_TEXT TOKEN_COUNT
---------------------------------------------------------------- -----------
AUTHENTICULT 4
COL1 1
COL2 1
COL3 1
PRECEPTROL 4
SAFETSOURCE 4
6 rows selected.
SCOTT@10gXE>
SCOTT@10gXE> -- solution:
SCOTT@10gXE> drop index test_idx
2 /
Index dropped.
SCOTT@10gXE> exec ctx_ddl.set_attribute ('test_ds', 'delimiter', 'newline')
PL/SQL procedure successfully completed.
SCOTT@10gXE>
SCOTT@10gXE> create index test_idx on test(all_fields)
2 indextype is ctxsys.context
3 parameters('datastore test_ds sync(on commit)');
Index created.
SCOTT@10gXE>
SCOTT@10gXE>
SCOTT@10gXE> -- see what is tokenized, indexed, and searchable:
SCOTT@10gXE> select token_text, token_count from dr$test_idx$i
2 /
TOKEN_TEXT TOKEN_COUNT
---------------------------------------------------------------- -----------
AUTHENTICULT 1
COL1 1
COL2 1
COL3 1
PRECEPTROL 1
SAFETSOURCE 1
6 rows selected.
SCOTT@10gXE>
SCOTT@10gXE>
SCOTT@10gXE> /*** SEARCH QUERY ***/
SCOTT@10gXE> select count(*) from test where contains (all_fields, 'authenticult') >0;
COUNT(*)
----------
1
SCOTT@10gXE>
|
|
|
|
Re: Oracle Text query with multiple contains performance [message #271451 is a reply to message #264520] |
Mon, 01 October 2007 12:36 |
elin@atcc.org
Messages: 4 Registered: September 2007
|
Junior Member |
|
|
I also got a solution from Oracle support by creating a basic_section_group. It is necessary to use BASIC_SECTION_GROUP so the column name is indexed as section not as a word..
SQL> exec ctx_ddl.create_section_group('test_section_group','BASIC_SECTION_GROUP');
SQL> create index test_idx on test(all_fields) indextype is ctxsys.context parameters('section group test_section_group datastore test_ds sync(on commit)');
|
|
|
Goto Forum:
Current Time: Thu Jan 30 18:43:49 CST 2025
|