Home » Server Options » Text & interMedia » Domain index >stemming concept
Domain index >stemming concept [message #599106] |
Tue, 22 October 2013 04:36 |
|
rameshaimhigh@gmail.com
Messages: 10 Registered: October 2013
|
Junior Member |
|
|
Hi all,
I am using CONTAINS with stem($) operator in it,in SQL query. Will stem approach support non-english keyword?
if stem will not support non-english keyword then is there any way to rectify this.
please advice on this
English Keyword SQL Query:
--------------------------
select itemnum, score(1), description from maximo.item
where (( contains(description,' $bearing ', 1) > 0 )) and itemnum = '1357851011'
Non-English Keyword SQL Query:
--------------------------
select itemnum, score(1), description from maximo.item
where (( contains(description,' $素晴らしい教育水準 ', 1) > 0 )) and itemnum = '1357851011'
Waiting for your reply
Ramesh
[Updated on: Tue, 22 October 2013 04:43] Report message to a moderator
|
|
|
|
Re: Domain index >stemming concept [message #599131 is a reply to message #599117] |
Tue, 22 October 2013 06:44 |
|
rameshaimhigh@gmail.com
Messages: 10 Registered: October 2013
|
Junior Member |
|
|
No it is not returning any error. It will give the result for non-engish keword with exact match.
Say for example for stemming apporach, if i search engineer then it will check internally with engineer and engineering too and give the results.
But for non-english keyword if i search engineer with Chinese language then it will give the result of exact matching. But i need the result for chinese keyword of engineering too.
this is my problem. is there any solution? please welcome!!!!!!!!!!
[Updated on: Tue, 22 October 2013 06:46] Report message to a moderator
|
|
|
Re: Domain index >stemming concept [message #599181 is a reply to message #599131] |
Tue, 22 October 2013 16:21 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
According to the documentation, stemming is available through either the stemmer attribute of a wordlist or the index_stems attribute of a lexer. I have never seen stemming work with the index_stems attribute of a lexer, but the stemmer attribute of a wordlist does work. The two are in conflict. If you specify the index_stems attribute of a lexer, then it uses that instead of the stemmer attribute of the wordlist, so it is important not to use the index_stems attribute of the lexer. It also requires that the nls_language for the session match. However, such things are only available for certain languages. It appears that only the auto_lexer supports Chinese, but as I said the index_stems attribute for the auto_lexer doesn't seem to work. You might try posting your problem on the OTN Text forum with a clear example. In the following, it shows what should work but does not, then what works for most other languages.
It is important to note that if you do not specify a wordlist, then by default it uses a basic_wordlist with stemmer set to English (or other language depending on your language settings). Also, the default value of index_stems for the auto_lexer is yes. Also, sometimes the auto_lexer does not accurately detect the correct language. This tends to happen with shorter documents. The longer the document, the more likely it will accurately detect the language.
Please note that I do not know Chinese and am relying on Google translate, so if I have something wrong that may be part of the problem. It might help if you provided a Chinese example of what you are searching for and expecting to find, but not finding. In my example, I have found that searching for '工程' finds both that and '工程师', but searching for '工程师' only finds itself.
-- test environment (table and data):
SCOTT@orcl12c> CREATE TABLE item
2 (itemnum NUMBER,
3 description VARCHAR2(30))
4 /
Table created.
SCOTT@orcl12c> INSERT ALL
2 INTO item VALUES (1357851011, 'engineer')
3 INTO item VALUES (1357851011, 'engineering')
4 INTO item VALUES (1357851011, '工程师')
5 INTO item VALUES (1357851011, '工程')
6 SELECT * FROM DUAL
7 /
4 rows created.
-- what theoretically should work, but does not:
SCOTT@orcl12c> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('test_lex', 'AUTO_LEXER');
3 CTX_DDL.SET_ATTRIBUTE ('test_lex', 'INDEX_STEMS', 'YES');
4 END;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> CREATE INDEX item_description_idx
2 ON item (description)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS
5 ('LEXER test_lex')
6 /
Index created.
SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
2 FROM item
3 WHERE CONTAINS (description,' $engineer', 1) > 0
4 AND itemnum = 1357851011
5 /
ITEMNUM SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011 10 engineer
1 row selected.
SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
2 FROM item
3 WHERE CONTAINS (description,' $engineering', 1) > 0
4 AND itemnum = 1357851011
5 /
ITEMNUM SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011 8 engineer
1357851011 8 engineering
2 rows selected.
SCOTT@orcl12c> ALTER SESSION SET NLS_LANGUAGE = 'SIMPLIFIED CHINESE'
2 /
Session altered.
SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
2 FROM item
3 WHERE CONTAINS (description,' $工程师', 1) > 0
4 AND itemnum = 1357851011
5 /
ITEMNUM SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011 5 工程师
1 row selected.
SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
2 FROM item
3 WHERE CONTAINS (description,' $工程', 1) > 0
4 AND itemnum = 1357851011
5 /
ITEMNUM SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011 3 工程师
1357851011 3 工程
2 rows selected.
SCOTT@orcl12c> ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
2 /
Session altered.
-- what works for most other languages:
SCOTT@orcl12c> BEGIN
2 CTX_DDL.DROP_PREFERENCE ('test_lex');
3 CTX_DDL.CREATE_PREFERENCE ('test_lex', 'AUTO_LEXER');
4 CTX_DDL.SET_ATTRIBUTE ('test_lex', 'INDEX_STEMS', 'NO');
5 CTX_DDL.CREATE_PREFERENCE ('test_wordlist', 'BASIC_WORDLIST');
6 CTX_DDL.SET_ATTRIBUTE ('test_wordlist', 'STEMMER', 'AUTO');
7 END;
8 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> DROP INDEX item_description_idx
2 /
Index dropped.
SCOTT@orcl12c> CREATE INDEX item_description_idx
2 ON item (description)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS
5 ('LEXER test_lex
6 WORDLIST test_wordlist')
7 /
Index created.
SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
2 FROM item
3 WHERE CONTAINS (description,' $engineer', 1) > 0
4 AND itemnum = 1357851011
5 /
ITEMNUM SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011 5 engineer
1357851011 5 engineering
2 rows selected.
SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
2 FROM item
3 WHERE CONTAINS (description,' $engineering', 1) > 0
4 AND itemnum = 1357851011
5 /
ITEMNUM SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011 5 engineer
1357851011 5 engineering
2 rows selected.
SCOTT@orcl12c> ALTER SESSION SET NLS_LANGUAGE = 'SIMPLIFIED CHINESE'
2 /
Session altered.
SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
2 FROM item
3 WHERE CONTAINS (description,' $工程师', 1) > 0
4 AND itemnum = 1357851011
5 /
ITEMNUM SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011 5 工程师
1 row selected.
SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
2 FROM item
3 WHERE CONTAINS (description,' $工程', 1) > 0
4 AND itemnum = 1357851011
5 /
ITEMNUM SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011 4 工程师
1357851011 4 工程
2 rows selected.
|
|
|
|
Re: Domain index >stemming concept [message #599207 is a reply to message #599199] |
Wed, 23 October 2013 00:34 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
rameshaimhigh@gmail.com wrote on Wed, 23 October 2013 10:12Is there any common nls_language in session will support all languages?
Tha language component of the NLS_LANG specifies conventions such as the language used for Oracle messages, sorting, day names, and month names. Each supported language has a unique name; for example, AMERICAN , FRENCH , or GERMAN . The language argument specifies default values for the territory and character set arguments. If the language is not specified, then the value defaults to AMERICAN. So, each of the unique language specified will thus act uniquely.
Have a look at this link, http://www.oracle.com/technetwork/database/globalization/nls-lang-099431.html
[Updated on: Wed, 23 October 2013 00:34] Report message to a moderator
|
|
|
|
Re: Domain index >stemming concept [message #599343 is a reply to message #599199] |
Wed, 23 October 2013 14:54 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
rameshaimhigh@gmail.com wrote on Tue, 22 October 2013 21:42Thanks for your timing help and to spend time to create scripts and make me to get clear. it is really very helpful.
You mean to say that, if this needs to work means, then we need to alter the session's nls_language right?
Is there any common nls_language in session will support all languages?
What I said is that, if it were some other supported language other than Chinese, then changing the nls_language for the session is one of the things that needs to be done. However, it doesn't work for Chinese, no matter what you do, even though the documentation says that it should. This is why I suggested that you post the question on the OTN forums, where the Oracle Text product manager and others frequently respond. Alternatively, if you have paid Oracle support, then you could use that. In either case, you need to provide them with a clear test case that demonstrates the problem, similar to what I posted.
|
|
|
Goto Forum:
Current Time: Thu Jan 30 18:43:36 CST 2025
|