Home » Server Options » Text & interMedia » Storing reserved ( special ) word as tokens in Oracle Text Index.
icon5.gif  Storing reserved ( special ) word as tokens in Oracle Text Index. [message #147322] Thu, 17 November 2005 04:06 Go to next message
samyak_bhuta
Messages: 2
Registered: November 2005
Junior Member
Hi Folks,

We have a oracle text index to search for on column BNAME in table BBB with a custom lexer.


Below are the steps perfomrmed to create it.

---------------------------------------------------

exec ctx_ddl.create_preference('my_lexer','BASIC_LEXER');

exec ctx_ddl.drop_preference('my_lexer');

exec ctx_ddl.set_attribute('my_lexer','printjoins','-_~!@#$%^&*(){}[],=?\;|><.');

exec ctx_ddl.set_attribute('my_lexer','punctuations','.!? ');

CREATE INDEX bname_idx
ON BBB ( BNAME )
INDEXTYPE IS ctxsys.CONTEXT
PARAMETERS ('LEXER my_lexer');


---------------------------------------------------

To rebuild the index we are using the following command.

---------------------------------------------------

ALTER INDEX bname_idx REBUILD ONLINE PARAMETERS('SYNC').

---------------------------------------------------


Problem Description:

We are not able to search for any record which contains certin keyword (e.g AND , OR, NOT , ABOUT ) using the query given below.

The given query will fetch records 'Anderson' , 'Makrand' , 'BaND' etc. but wount fetch the records 'Jack and Jill'.
---------------------------------------------------
SELECT ...
FROM ...
WHERE CONTAINS ( BBB.BNAME,'%AND%') > 0

---------------------------------------------------



But for certain reserved words ( e.g FUZZY ) its working fine.


Investigeting further we found out that the token for these problematic special words don't exist.We used the below command to find out the token entries in the index 'bname_idx'. It

---------------------------------------------------
select token_text, token_type FROM DR$bname_idx$I
where token_text like 'AND'

---------------------------------------------------




Please tell us why certain keywords are not stored as tokens in the index. How to make Oracle store this ?






Re: Storing reserved ( special ) word as tokens in Oracle Text Index. [message #147520 is a reply to message #147322] Fri, 18 November 2005 01:13 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9100
Registered: November 2002
Location: California, USA
Senior Member
By default Oracle uses a stoplist of 76 words that it ignores during tokenizing. If you create your own empty stoplist and use that, then it will not ignore any words during tokenizing. Please see the demonstration below.

-- with default stoplist:
scott@ORA92> exec ctx_ddl.create_preference('my_lexer','BASIC_LEXER')

PL/SQL procedure successfully completed.

scott@ORA92> exec ctx_ddl.set_attribute('my_lexer','printjoins','-_~!@#$%^&*(){}[],=?\;|><.')

PL/SQL procedure successfully completed.

scott@ORA92> exec ctx_ddl.set_attribute('my_lexer','punctuations','.!? ')

PL/SQL procedure successfully completed.

scott@ORA92> CREATE TABLE bbb (bname VARCHAR2(30))
  2  /

Table created.

scott@ORA92> INSERT INTO bbb (bname) VALUES ('Jack and Jill')
  2  /

1 row created.

scott@ORA92> CREATE INDEX bname_idx
  2  ON BBB ( BNAME )
  3  INDEXTYPE IS ctxsys.CONTEXT
  4  PARAMETERS ('LEXER my_lexer')
  5  /

Index created.

scott@ORA92> SELECT spw_word
  2  FROM   ctx_stopwords
  3  WHERE  spw_stoplist = 'DEFAULT_STOPLIST'
  4  AND    spw_word = 'and'
  5  /

SPW_WORD
--------------------------------------------------------------------------------
and

scott@ORA92> select token_text
  2  FROM   DR$bname_idx$I
  3  where  token_text = 'AND'
  4  /

no rows selected

scott@ORA92> SELECT * FROM bbb
  2  WHERE CONTAINS ( BBB.BNAME,'%AND%') > 0
  3  /

no rows selected


-- with empty stoplist:
scott@ORA92> EXEC CTX_DDL.CREATE_STOPLIST ('my_stoplist', 'BASIC_STOPLIST')

PL/SQL procedure successfully completed.

scott@ORA92> DROP INDEX bname_idx
  2  /

Index dropped.

scott@ORA92> CREATE INDEX bname_idx
  2  ON BBB ( BNAME )
  3  INDEXTYPE IS ctxsys.CONTEXT
  4  PARAMETERS ('LEXER my_lexer STOPLIST my_stoplist')
  5  /

Index created.

scott@ORA92> SELECT spw_word
  2  FROM   ctx_stopwords
  3  WHERE  spw_stoplist = 'MY_STOPLIST'
  4  AND    spw_word = 'and'
  5  /

no rows selected

scott@ORA92> select token_text
  2  FROM   DR$bname_idx$I
  3  where  token_text = 'AND'
  4  /

TOKEN_TEXT
----------------------------------------------------------------
AND

scott@ORA92> SELECT * FROM bbb
  2  WHERE CONTAINS ( BBB.BNAME,'%AND%') > 0
  3  /

BNAME
------------------------------
Jack and Jill

scott@ORA92>

icon7.gif  Re: Storing reserved ( special ) word as tokens in Oracle Text Index. [message #147634 is a reply to message #147520] Fri, 18 November 2005 11:37 Go to previous message
samyak_bhuta
Messages: 2
Registered: November 2005
Junior Member
Hi Barbie ( hope its not offending ),

Thanks really for your reply.


I did find solution for the problem soon after posting, but your efforts are not wasted here. I learnt a way to list down and know the stopwords.

---------------------------------------------------
scott@ORA92> SELECT spw_word
2 FROM ctx_stopwords
3 WHERE spw_stoplist = 'MY_STOPLIST'
4 AND spw_word = 'and'
5 /
---------------------------------------------------

Also , you can use the CTXSYS.EMPTY_STOPLIST .It comes with oracle.So no need to create your own if you are not planning to add any stopword. Here is example ..

---------------------------------------------------
CREATE INDEX bname_idx
ON BBB (bname)
INDEXTYPE IS ctxsys.CONTEXT
PARAMETERS ('LEXER my_lexer STOPLIST CTXSYS.EMPTY_STOPLIST');
---------------------------------------------------

Thanks again,

Cheers,

Samyak

Previous Topic: How to create store procedure using Oracle Text [CONTAINS]?
Next Topic: Error using ctxload from 8.1.6 to 9.2.0
Goto Forum:
  


Current Time: Thu Nov 21 11:10:55 CST 2024