CONTAINS function not working! (split from hijacked thread by bb) [message #438293] |
Thu, 07 January 2010 21:18 |
ora22
Messages: 8 Registered: December 2009 Location: BOSTON
|
Junior Member |
|
|
hi
Oracle Version : 9.2.0.6.0
Everything Default : Lexer, Stoplist etc ...
the table tab1 has data as follows
ID col1
-- -----------------
1 Jack/Jill
2 Jack & Jill
3 Jack&Jill
4 Jack and Jill
5 Jack is a friend of Jill
I have a search screen in which say the user enters the search as "Jack and Jill"
i know "AND" is a stopword and also a keyword, so i remove it from the search criterion and issue the search
SELECT col1 FROM tab1 WHERE CONTAINS(col1, 'Jack Jill', 1) > 0;
system returns
col1
-----------------
Jack/Jill
Jack & Jill
Jack&Jill
but not the row with ID=4 and col1 = "Jack and Jill"
since and is a stopword and was not indexed - in row 4 - shouldn't oracle text treat as if Jack and Jill were right next to each other ?
can i make it work like that ?
the user would be happy to see the result as
ID col1
-- -----------------
1 Jack/Jill
2 Jack & Jill
3 Jack&Jill
4 Jack and Jill
but not ID 5 as Jack and Jill were far apart ...
Would greatly appreciate your thoughts ?
|
|
|
Re: CONTAINS function not working! (split from hijacked thread by bb) [message #438297 is a reply to message #438293] |
Thu, 07 January 2010 22:15 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Even though the "and" is not indexed, it still counts it as a word. You can enclose the "and" in curly brackets to treat it as a regular word intead of a keyword, but then you will miss the other rows. You can use the NEAR operator to specify within one word. Please see the examples below.
SCOTT@orcl_11g>
SCOTT@orcl_11g> CREATE table tab1
2 (ID NUMBER,
3 col1 VARCHAR2 (30))
4 /
Table created.
SCOTT@orcl_11g> SET DEFINE OFF
SCOTT@orcl_11g> INSERT ALL
2 INTO tab1 VALUES (1, 'Jack/Jill')
3 INTO tab1 VALUES (2, 'Jack & Jill')
4 INTO tab1 VALUES (3, 'Jack&Jill')
5 INTO tab1 VALUES (4, 'Jack and Jill')
6 INTO tab1 VALUES (5, 'Jack is a friend of Jill')
7 SELECT * FROM DUAL
8 /
5 rows created.
SCOTT@orcl_11g> CREATE INDEX tab1_col1_idx ON tab1 (col1)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 /
Index created.
SCOTT@orcl_11g> SELECT token_text FROM dr$tab1_col1_idx$i
2 /
TOKEN_TEXT
----------------------------------------------------------------
FRIEND
JACK
JILL
SCOTT@orcl_11g> SELECT col1 FROM tab1 WHERE CONTAINS(col1, 'Jack Jill', 1) > 0
2 /
COL1
------------------------------
Jack/Jill
Jack & Jill
Jack&Jill
SCOTT@orcl_11g> SELECT col1 FROM tab1 WHERE CONTAINS(col1, 'Jack {AND} Jill', 1) > 0
2 /
COL1
------------------------------
Jack and Jill
SCOTT@orcl_11g> SELECT col1 FROM tab1 WHERE CONTAINS(col1, 'NEAR ((Jack,Jill),1)', 1) > 0
2 /
COL1
------------------------------
Jack/Jill
Jack & Jill
Jack&Jill
Jack and Jill
SCOTT@orcl_11g> SELECT col1 FROM tab1 WHERE CONTAINS(col1, 'Jack AND Jill', 1) > 0
2 /
COL1
------------------------------
Jack/Jill
Jack & Jill
Jack&Jill
Jack and Jill
Jack is a friend of Jill
SCOTT@orcl_11g>
|
|
|
Re: CONTAINS function not working! (split from hijacked thread by bb) [message #438302 is a reply to message #438293] |
Thu, 07 January 2010 23:41 |
ora22
Messages: 8 Registered: December 2009 Location: BOSTON
|
Junior Member |
|
|
SORRY - POSTING AGAIN WITHOUT TABS THIS TIME ...
Hi Barbara - Thanks v much for your detailed reply
our's is a datawarehousing application (with distict batch processes updating data and no concerns regarding online updates etc)
so i was wondering if this would be a viable solution:
add a column to the table (say col2) - where we remove all STOPWORDS.
ID col1 col2
-- ----------------- ------------------
1 Jack/Jill Jack/Jill
2 Jack & Jill Jack & Jill
3 Jack&Jill Jack&Jill
4 Jack and Jill Jack Jill
5 Jack is a friend of Jill Jack friend Jill
Index and Serach on Column col2.
And when the user provides the search string - remove all the STOPWORDS from the search string too (we could use CTXSYS.CTX_STOPWORDS for this).
so if user provides "jack and jill" our modified search string would be "jack jill" and so the query would be :
SELECT col1 FROM tab1 WHERE CONTAINS(col2, 'Jack Jill', 1) > 0
which should give:
ID col1 col2
-- ----------------- ------------------
1 Jack/Jill Jack/Jill
2 Jack & Jill Jack & Jill
3 Jack&Jill Jack&Jill
4 Jack and Jill Jack Jill
Do you see any holes in this approach ?
Thanks as always !
[EDITED by LF: applied format preserving tags]
[Updated on: Fri, 08 January 2010 00:44] by Moderator Report message to a moderator
|
|
|
|