Home » Server Options » Text & interMedia » Best approach for Multi Column text search (10.2)
Best approach for Multi Column text search [message #363322] |
Fri, 05 December 2008 16:50 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I have a table with many address columns such as street number, street name, city, etc. I need to provide the ability for users to perform a text search as if the columns were all concatenated.
I understand from Barbara's magnificent examples in this forum that I can use a CONTEXT index with a MULTI_COLUMN_DATASTORE, but I read somewhere that CONTEXT indexes - unlike CTXCAT - are not self-maintaining when you update the base data. Is that right?
Alternatively, I could concatenate the columns myself into a new VARCHAR2 column during the ETL (this is a data warehouse) and CTXCAT index this one column.
At face value, the CTXCAT option would seem to be a bit wasteful as it duplicates the address data, but then I don't know what CONTEXT is doing behind the scenes.
QUESTION: For those who have worked with these things, how onerous is the overhead to maintain CONTEXT indexes? Is it worth the space overhead to have the self-maintaining CTXCAT index?
Ross Leishman
|
|
|
Re: Best approach for Multi Column text search [message #363337 is a reply to message #363322] |
Fri, 05 December 2008 22:26 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
What you should use is a multi_column_datastore with a context index. You can create it in such a manner that it will allow searches of all fields or individual columns or combinations with one index access. The only other alternative that you might consider would be a user_datastore with a procedure if you want to do further customization, such as creating groups of columns that can be searched. When you create a multi_column_datastore, it is actually doing what creating a user_datastore with a procedure does behind the scenes.
When you create or alter the context index, you can specify in the parameters how often you want synchronization and optimization similar to how you would specify the same things for a materialized view. You can specify on commit or specified intervals for synchronization and fast or full for optimization. You can also do these things manually and you can also rebuild or drop and recreate. Synchronization makes the inserted, updated, or deleted changes immediately available to searches, but does so by adding individual rows to the dr$...$... domain index tables, which causes gradually increasing fragmentation. Optimization eliminates the fragmentation and combines the data into fewer rows by token. Optimization can be done online so that it does not interfere with searches.
You do not want a ctxcat index. Ctxcat indexes are intended for use when you have only one unstructured (text) column and perhaps some associated structured data and it offers limited search capabilities. You cannot use all of the search methods using a catsearch query on a ctxcat index as you can using a contains query on a context index, although you can do a workaround by using a query template to allow the context grammar. There is also one very important thing, that I consider a bug, although it is documented that this can happen. If the optimizer chooses to use functional invocation on a catsearch query, which usually happens if the structured portion of the query is more limiting than the unstructured (text) portion, then all you get is an error message, saying that functional invocation is not supported on ctxcat indexes. No results, just an error. I would consider this a sufficient reason not to use any ctxcat index on an application until this is fixed in some future version. Yes, it is fast when it works and you don't have to specify synchronization and optimization when you create the index, but it may not work at all at any given time, and sometimes all the hints and materialized subqueries that you provide cannot influence the optimizer not to use functional invocation. This problem does not exist with context indexes, since they support functional invocation.
Please let me know if you need further clarification or examples and feel free to post any code that you are working on.
|
|
|
|
Re: Best approach for Multi Column text search [message #438472 is a reply to message #363338] |
Sat, 09 January 2010 11:29 |
ora22
Messages: 8 Registered: December 2009 Location: BOSTON
|
Junior Member |
|
|
<quote>
You can create it in such a manner that it will allow searches of all fields or individual columns or combinations with one index access
</quote>
Hi Barbara
I am trying to figure out how to do the combinations bit with multi_column_datastore.
As an example - i have the following
---------------------------------
-- CREATE MULTI-COLUMN-DATASTORE
---------------------------------
connect ctxsys/<password>
grant execute on ctxsys.ctx_ddl to myschema;
-- only ctxsys can create MULTI_COLUMN preferences in Oracle 10g
begin
ctx_ddl.drop_preference ( 'gcp_multi_column_datastore' );
end;
/
begin
ctx_ddl.create_preference ( 'gcp_multi_column_datastore', 'multi_column_datastore' );
ctx_ddl.set_attribute ( 'gcp_multi_column_datastore', 'columns', 'eng_name, cli_short_name, entity_duns_name, ult_par_duns_name' );
end;
connect myschema/<password>
-------------------
-- CREATE TABLE
------------------
drop table test_tab1;
create table test_tab1
(
eng_name varchar2(100 byte),
cli_short_name varchar2(200 byte),
entity_duns_name varchar2(100 byte),
ult_par_duns_name varchar2(100 byte),
concat char(1) /* placeholder for user datastore */
);
insert into test_tab1 values ('pepsi','test1','test1','test1','');
insert into test_tab1 values ('test2','pepsi','test2','test2','');
insert into test_tab1 values ('test3','test3','exxon','test3','');
insert into test_tab1 values ('test4','test4','test4','exxon','');
commit;
-----------------------
-- CREATE SECTION GROUP
-----------------------
begin
ctx_ddl.drop_section_group
(
group_name => 'test1_section_group'
);
end;
begin
ctx_ddl.create_section_group
(
group_name => 'test1_section_group',
group_type => 'basic_section_group'
);
ctx_ddl.add_field_section
(
group_name => 'test1_section_group',
section_name => 'the_eng_name',
tag => 'eng_name',
visible => true
);
ctx_ddl.add_field_section
(
group_name => 'test1_section_group',
section_name => 'the_cli_short_name',
tag => 'cli_short_name',
visible => true
);
ctx_ddl.add_field_section
(
group_name => 'test1_section_group',
section_name => 'the_entity_duns_name',
tag => 'entity_duns_name',
visible => true
);
ctx_ddl.add_field_section
(
group_name => 'test1_section_group',
section_name => 'the_ult_par_duns_name',
tag => 'ult_par_duns_name',
visible => true
);
end;
/
---------------
-- CREATE INDEX
---------------
create index test_tab1_concat on test_tab1 ( concat )
indextype is ctxsys.context
parameters ( '
datastore ctxsys.gcp_multi_column_datastore
section group test1_section_group ' );
-------------------
-- SEARCH EXAMPLES
------------------
-- Look Anywhere - works great
select eng_name,cli_short_name,entity_duns_name,ult_par_duns_name
from test_tab1 where contains ( concat,'exxon or pepsi',1 ) > 0
-- Look In eng_name Only - Works Great
select eng_name,cli_short_name,entity_duns_name,ult_par_duns_name
from test_tab1 where contains ( concat,'(exxon or pepsi) within the_eng_name',1 ) > 0
My Question is - How do i do a search that looks in section the_eng_name & the_cli_short_name but not the others
in one index access
I tried
select eng_name,cli_short_name,entity_duns_name,ult_par_duns_name
from test_tab1 where contains ( concat,'(exxon or pepsi) within the_eng_name,the_cli_short_name',1 ) > 0
or
select eng_name,cli_short_name,entity_duns_name,ult_par_duns_name
from test_tab1 where contains ( concat,'(exxon or pepsi) within (the_eng_name or the_cli_short_name)',1 ) > 0
etc but none of those work
World appreciate your response.
|
|
|
|
|
Goto Forum:
Current Time: Thu Feb 06 23:38:56 CST 2025
|