Multi language querying [message #520738] |
Wed, 24 August 2011 06:42 |
|
karibou
Messages: 3 Registered: August 2011 Location: Belgium
|
Junior Member |
|
|
Hello,
I have a simple question about the querying on a multi-language index.
If I have 2 documents indexed, one in English and one in French and I search the "hello" and if my session language is set in English, can I retrieve the 2 documents if they contains both the word "hello" even if one of the document is indexed in French?
The goal of this question is to avoid the user to select a language corresponding to the language of the word searched before executing a search.
I hope I was clear...
Thanks in advance
Karibou
|
|
|
Re: Multi language querying [message #520803 is a reply to message #520738] |
Wed, 24 August 2011 14:59 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
When in doubt, test and see. Yes, you can find "hello" in either document. However, other features such as stemming are dependent upon the session language. Please see the demonstration below.
-- table, data, lexers, wordlist, and index:
SCOTT@orcl_11gR2> create table documents
2 (id number,
3 lang varchar2(3),
4 doc clob)
5 /
Table created.
SCOTT@orcl_11gR2> insert all
2 into documents values
3 (1, 'eng', 'I said, "hello" and il m''a dit "bonjour".')
4 into documents values
5 (2, 'fre', 'J''ai dit "bonjour" et he said "hello".')
6 select * from dual
7 /
2 rows created.
SCOTT@orcl_11gR2> begin
2 ctx_ddl.create_preference ('english_lexer','basic_lexer');
3 ctx_ddl.create_preference ('french_lexer', 'basic_lexer');
4 ctx_ddl.create_preference ('global_lexer', 'multi_lexer');
5 ctx_ddl.add_sub_lexer ('global_lexer', 'default', 'english_lexer');
6 ctx_ddl.add_sub_lexer ('global_lexer', 'french','french_lexer','fre');
7 ctx_ddl.create_preference ('global_wordlist', 'basic_wordlist');
8 ctx_ddl.set_attribute ('global_wordlist', 'stemmer', 'auto');
9 end;
10 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> create index documents_idx
2 on documents (doc)
3 indextype is ctxsys.context
4 parameters
5 ('lexer global_lexer
6 wordlist global_wordlist
7 language column lang')
8 /
Index created.
-- English query finds "hello" in both documents:
SCOTT@orcl_11gR2> column doc format a45
SCOTT@orcl_11gR2> select * from documents
2 where contains (doc, 'hello') > 0
3 /
ID LAN DOC
---------- --- ---------------------------------------------
1 eng I said, "hello" and il m'a dit "bonjour".
2 fre J'ai dit "bonjour" et he said "hello".
2 rows selected.
-- English query finds "said" by searching for "say",
-- but not "dit" by searching for "dire":
SCOTT@orcl_11gR2> select * from documents
2 where contains (doc, '$say') > 0
3 /
ID LAN DOC
---------- --- ---------------------------------------------
1 eng I said, "hello" and il m'a dit "bonjour".
2 fre J'ai dit "bonjour" et he said "hello".
2 rows selected.
SCOTT@orcl_11gR2> select * from documents
2 where contains (doc, '$dire') > 0
3 /
no rows selected
-- French query finds "hello" in both documents:
SCOTT@orcl_11gR2> alter session set nls_language = 'french'
2 /
Session altered.
SCOTT@orcl_11gR2> select * from documents
2 where contains (doc, 'hello') > 0
3 /
ID LAN DOC
---------- --- ---------------------------------------------
1 eng I said, "hello" and il m'a dit "bonjour".
2 fre J'ai dit "bonjour" et he said "hello".
2 rows selected.
-- French query does not find "said" by searching for "say",
-- but does find "dit" by searching for "dire",
SCOTT@orcl_11gR2> select * from documents
2 where contains (doc, '$say') > 0
3 /
no rows selected
SCOTT@orcl_11gR2> select * from documents
2 where contains (doc, '$dire') > 0
3 /
ID LAN DOC
---------- --- ---------------------------------------------
1 eng I said, "hello" and il m'a dit "bonjour".
2 fre J'ai dit "bonjour" et he said "hello".
2 rows selected.
SCOTT@orcl_11gR2>
|
|
|
|
|
|
|
Re: Multi language querying [message #521404 is a reply to message #521396] |
Mon, 29 August 2011 11:08 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
You will need to use ctx_ddl.add_stopword for each word for each language. However, if you have these words in a table or in a text file that you can access via an external table, then you can write some code to loop through the values and use ctx_ddl.add_stopword for each.
|
|
|