Home » Server Options » Text & interMedia » Special characters issues with CATSEARCH query (10.2.0.4)
Special characters issues with CATSEARCH query [message #478314] Thu, 07 October 2010 18:13 Go to next message
lkngstr82is
Messages: 33
Registered: January 2010
Location: USA
Member
We have names stored in a table with special character. for eg.
name SUSANB is stored as susan-b, su-sanb. When we run query using CATSEARCH, it doesn't bring all the results



 CREATE TABLE bs_orgnl_sbmsn
    (nm_tx  VARCHAR2 (30))
  /

Table created.

INSERT INTO bs_orgnl_sbmsn VALUES's-usanb');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES'su-sanb');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES('sus-anb');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES'susa-nb');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES('susan-b');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES('s.usanb');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES('su.sanb');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES('su.sanb');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES('sus.anb');
/
1 row created.
INSERT INTO bs_orgnl_sbmsn VALUES('susa.nb');
/
1 row created.

INSERT INTO bs_orgnl_sbmsn VALUES ('susan.b');
/
1 row created.
Commit;

CREATE INDEX INDX_BS_ORGNL_SBMSN_SHRT
     ON BS_ORGNL_SBMSN(NM_TX)
        INDEXTYPE IS CTXSYS.CTXCAT PARALLEL;

select * FROM BS_ORGNL_SBMSN
Where  catsearch (NM_TX,'SUSANB', '' ) > 0;


This will only give one result for SUSANB.


Re: Special characters issues with CATSEARCH query [message #478335 is a reply to message #478314] Thu, 07 October 2010 22:13 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
You can create a basic_lexer, set the skipjoins attribute for that lexer to include the hyphen and period, then use that lexer in your parameters during index creation. That will cause the strings on either side of those characters to be joined, ignoring the hyphen or period inbetween. You can check what is tokenized, indexed, and searchable by checking the dr$your_index_name$i domain index table that is created by indexing. Please see the reproduction and solution below.

-- table and data you provided:
SCOTT@orcl_11gR2> CREATE TABLE bs_orgnl_sbmsn
  2  	 (nm_tx  VARCHAR2 (30))
  3  /

Table created.

SCOTT@orcl_11gR2> INSERT ALL
  2  INTO bs_orgnl_sbmsn VALUES ('s-usanb')
  3  INTO bs_orgnl_sbmsn VALUES ('su-sanb')
  4  INTO bs_orgnl_sbmsn VALUES ('sus-anb')
  5  INTO bs_orgnl_sbmsn VALUES ('susa-nb')
  6  INTO bs_orgnl_sbmsn VALUES ('susan-b')
  7  INTO bs_orgnl_sbmsn VALUES ('s.usanb')
  8  INTO bs_orgnl_sbmsn VALUES ('su.sanb')
  9  INTO bs_orgnl_sbmsn VALUES ('su.sanb')
 10  INTO bs_orgnl_sbmsn VALUES ('sus.anb')
 11  INTO bs_orgnl_sbmsn VALUES ('susa.nb')
 12  INTO bs_orgnl_sbmsn VALUES ('susan.b')
 13  SELECT * FROM DUAL
 14  /

11 rows created.


-- reproduction of problem:
SCOTT@orcl_11gR2> CREATE INDEX INDX_BS_ORGNL_SBMSN_SHRT
  2  ON BS_ORGNL_SBMSN (NM_TX)
  3  INDEXTYPE IS CTXSYS.CTXCAT PARALLEL
  4  /

Index created.

SCOTT@orcl_11gR2> SELECT dr$token FROM dr$indx_bs_orgnl_sbmsn_shrt$i
  2  /

DR$TOKEN
----------------------------------------------------------------
ANB
ANB
B
B
NB
NB
SANB
SANB
SANB
SU
SU
SU
SUS
SUS
SUSA
SUSA
SUSAN
SUSAN
USANB
USANB

20 rows selected.

SCOTT@orcl_11gR2> select * FROM BS_ORGNL_SBMSN
  2  Where  catsearch (NM_TX,'SUSANB', '' ) > 0
  3  /

no rows selected


-- solution:
SCOTT@orcl_11gR2> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('your_lexer', 'BASIC_LEXER');
  3    CTX_DDL.SET_ATTRIBUTE ('your_lexer', 'SKIPJOINS', '-.');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11gR2> DROP INDEX indx_bs_orgnl_sbmsn_shrt
  2  /

Index dropped.

SCOTT@orcl_11gR2> CREATE INDEX INDX_BS_ORGNL_SBMSN_SHRT
  2  ON BS_ORGNL_SBMSN (NM_TX)
  3  INDEXTYPE IS CTXSYS.CTXCAT PARALLEL
  4  PARAMETERS ('LEXER your_lexer')
  5  /

Index created.

SCOTT@orcl_11gR2> SELECT dr$token FROM dr$indx_bs_orgnl_sbmsn_shrt$i
  2  /

DR$TOKEN
----------------------------------------------------------------
SUSANB
SUSANB
SUSANB
SUSANB
SUSANB
SUSANB
SUSANB
SUSANB
SUSANB
SUSANB
SUSANB

11 rows selected.

SCOTT@orcl_11gR2> select * FROM BS_ORGNL_SBMSN
  2  Where  catsearch (NM_TX,'SUSANB', '' ) > 0
  3  /

NM_TX
------------------------------
s-usanb
su-sanb
sus-anb
susa-nb
susan-b
s.usanb
su.sanb
su.sanb
sus.anb
susa.nb
susan.b

11 rows selected.

SCOTT@orcl_11gR2>


Re: Special characters issues with CATSEARCH query [message #478413 is a reply to message #478335] Fri, 08 October 2010 10:02 Go to previous messageGo to next message
lkngstr82is
Messages: 33
Registered: January 2010
Location: USA
Member
Thanks a lot !!
Re: Special characters issues with CATSEARCH query [message #478424 is a reply to message #478335] Fri, 08 October 2010 11:53 Go to previous messageGo to next message
lkngstr82is
Messages: 33
Registered: January 2010
Location: USA
Member
One more thing: How can I add space (like -,.) to the SKIPJOIN?
If I have record SUSAN B, how can I make CATSEARCH list it?

INSERT INTO bs_tx VALUES ('susan b');
/
1 row created.

commit;

select * FROM BS_TX
Where  catsearch (NM_TX,'SUSANB', '' ) > 0
NM_TX

susan-b
s.usanb
su.sanb
su.sanb
sus.anb
susa.nb
susan.b
susan$b

it will not list susan b.
Re: Special characters issues with CATSEARCH query [message #478425 is a reply to message #478424] Fri, 08 October 2010 12:15 Go to previous message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
lkngstr82is wrote on Fri, 08 October 2010 09:53

One more thing: How can I add space (like -,.) to the SKIPJOIN?
If I have record SUSAN B, how can I make CATSEARCH list it?


You can't, nor should you. The space is what is used to separate the tokens/words. If you could make the space a skipjoin, then every row would be all one token/word. You can't solve everything with generic searches. You have to expect reasonable input and searches.





Previous Topic: Problem with special chars in BLOB data type using contains keyword
Next Topic: Searching using CATSEARCH and OR
Goto Forum:
  


Current Time: Fri Dec 27 00:00:53 CST 2024