Home » Server Options » Text & interMedia » Multi language querying (Oracle 11g)
Multi language querying [message #520738] Wed, 24 August 2011 06:42 Go to next message
karibou
Messages: 3
Registered: August 2011
Location: Belgium
Junior Member
Hello,

I have a simple question about the querying on a multi-language index.

If I have 2 documents indexed, one in English and one in French and I search the "hello" and if my session language is set in English, can I retrieve the 2 documents if they contains both the word "hello" even if one of the document is indexed in French?

The goal of this question is to avoid the user to select a language corresponding to the language of the word searched before executing a search.

I hope I was clear...
Thanks in advance

Karibou
Re: Multi language querying [message #520803 is a reply to message #520738] Wed, 24 August 2011 14:59 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9100
Registered: November 2002
Location: California, USA
Senior Member
When in doubt, test and see. Yes, you can find "hello" in either document. However, other features such as stemming are dependent upon the session language. Please see the demonstration below.

-- table, data, lexers, wordlist, and index:
SCOTT@orcl_11gR2> create table documents
  2    (id    number,
  3  	lang  varchar2(3),
  4  	doc   clob)
  5  /

Table created.

SCOTT@orcl_11gR2> insert all
  2  into documents values
  3    (1, 'eng', 'I said, "hello" and il m''a dit "bonjour".')
  4  into documents values
  5    (2, 'fre', 'J''ai dit "bonjour" et he said "hello".')
  6  select * from dual
  7  /

2 rows created.

SCOTT@orcl_11gR2> begin
  2    ctx_ddl.create_preference ('english_lexer','basic_lexer');
  3    ctx_ddl.create_preference ('french_lexer', 'basic_lexer');
  4    ctx_ddl.create_preference ('global_lexer', 'multi_lexer');
  5    ctx_ddl.add_sub_lexer ('global_lexer', 'default', 'english_lexer');
  6    ctx_ddl.add_sub_lexer ('global_lexer', 'french','french_lexer','fre');
  7    ctx_ddl.create_preference ('global_wordlist', 'basic_wordlist');
  8    ctx_ddl.set_attribute ('global_wordlist', 'stemmer', 'auto');
  9  end;
 10  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11gR2> create index documents_idx
  2  on documents (doc)
  3  indextype is ctxsys.context
  4  parameters
  5    ('lexer		  global_lexer
  6  	 wordlist	  global_wordlist
  7  	 language column  lang')
  8  /

Index created.


-- English query finds "hello" in both documents:
SCOTT@orcl_11gR2> column doc format a45
SCOTT@orcl_11gR2> select * from documents
  2  where  contains (doc, 'hello') > 0
  3  /

        ID LAN DOC
---------- --- ---------------------------------------------
         1 eng I said, "hello" and il m'a dit "bonjour".
         2 fre J'ai dit "bonjour" et he said "hello".

2 rows selected.


-- English query finds "said" by searching for "say",
-- but not "dit" by searching for "dire":
SCOTT@orcl_11gR2> select * from documents
  2  where  contains (doc, '$say') > 0
  3  /

        ID LAN DOC
---------- --- ---------------------------------------------
         1 eng I said, "hello" and il m'a dit "bonjour".
         2 fre J'ai dit "bonjour" et he said "hello".

2 rows selected.

SCOTT@orcl_11gR2> select * from documents
  2  where  contains (doc, '$dire') > 0
  3  /

no rows selected


-- French query finds "hello" in both documents:
SCOTT@orcl_11gR2> alter session set nls_language = 'french'
  2  /

Session altered.

SCOTT@orcl_11gR2> select * from documents
  2  where  contains (doc, 'hello') > 0
  3  /

        ID LAN DOC
---------- --- ---------------------------------------------
         1 eng I said, "hello" and il m'a dit "bonjour".
         2 fre J'ai dit "bonjour" et he said "hello".

2 rows selected.


-- French query does not find "said" by searching for "say",
-- but does find "dit" by searching for "dire",
SCOTT@orcl_11gR2> select * from documents
  2  where  contains (doc, '$say') > 0
  3  /

no rows selected

SCOTT@orcl_11gR2> select * from documents
  2  where  contains (doc, '$dire') > 0
  3  /

        ID LAN DOC
---------- --- ---------------------------------------------
         1 eng I said, "hello" and il m'a dit "bonjour".
         2 fre J'ai dit "bonjour" et he said "hello".

2 rows selected.

SCOTT@orcl_11gR2>

Re: Multi language querying [message #520804 is a reply to message #520803] Wed, 24 August 2011 15:05 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9100
Registered: November 2002
Location: California, USA
Senior Member
There is also the auto_lexer and world_lexer, both of which are supposed to detect the language of the data automatically. However, they don't have as many features available as the multi_lexer. Also, they need a document of sufficient size to accurately detect the language. If you have small documents it may not get the language right, so I tend to favor the multi_lexer with a language column. Even with the automatic language detection, your queries are still dependent on the session langauge.

http://download.oracle.com/docs/cd/E11882_01/text.112/e16593/cdatadic.htm#CCREF0217
Re: Multi language querying [message #521053 is a reply to message #520804] Fri, 26 August 2011 01:39 Go to previous messageGo to next message
karibou
Messages: 3
Registered: August 2011
Location: Belgium
Junior Member
Many thanks for your answer Barabra.

I don't have the privileges to create tables or index on my company so it is difficult for me to play small tests without disturb the DBA (who never user Oracle text) ...

I will not use the the stemmer in my application but the fuzzy and wildcards yes.
This operators depends on the session language too?

Regards
Re: Multi language querying [message #521156 is a reply to message #521053] Fri, 26 August 2011 12:02 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9100
Registered: November 2002
Location: California, USA
Senior Member
karibou wrote on Thu, 25 August 2011 23:39


I will not use the the stemmer in my application but the fuzzy and wildcards yes.
This operators depends on the session language too?


No, fuzzy and wildcards do not depend upon the session language.
Re: Multi language querying [message #521396 is a reply to message #521156] Mon, 29 August 2011 10:18 Go to previous messageGo to next message
karibou
Messages: 3
Registered: August 2011
Location: Belgium
Junior Member
Thanks barbara.

I have another question.
I see that there is a set of default stoplist for different language.
On my side I have a multi lexer on 5 languages and I would like to add the default stoplist for each languages in the multi_stoplist.
Is it possible without use the CTX_DDL.ADD_STOPWORD for each word for each language?

Regards
Re: Multi language querying [message #521404 is a reply to message #521396] Mon, 29 August 2011 11:08 Go to previous message
Barbara Boehmer
Messages: 9100
Registered: November 2002
Location: California, USA
Senior Member
You will need to use ctx_ddl.add_stopword for each word for each language. However, if you have these words in a table or in a text file that you can access via an external table, then you can write some code to loop through the values and use ctx_ddl.add_stopword for each.
Previous Topic: Unique BLOB on a table
Next Topic: Oracle TEXT - 9i2, how to install
Goto Forum:
  


Current Time: Tue Nov 26 22:07:24 CST 2024