Home » Server Options » Text & interMedia » Multi-lingual index for blob column (Oracle 10g R2 10.2.0.1.0)
Multi-lingual index for blob column [message #513612] |
Tue, 28 June 2011 06:56 |
|
dkdms2124
Messages: 369 Registered: April 2010 Location: INDIA
|
Senior Member |
|
|
Hi All,
I have a test table with three columns (id,name,doc) where doc column is of blob type.
I am mentioning the step I have followed to create index and then search the document containing the keyword using the CONTAIN keyword.
I am able to find the documents in English Language but not able to find documents in other langauges.
Please help me out.
SQL> conn sample/sample
Connected.
SQL> create table test(
2 id number primary key,
3 name varchar2(2000),
4 doc blob);
Table created.
SQL> create sequence sample_seq;
SQL> conn sys/oracle as sysdba
Connected.
SQL> create or replace directory documents as 'C:\sample_work';
Directory created.
SQL> grant read,write on directory documents to sample;
SQL> create or replace procedure load_data ( p_file_name IN test.name%type) AS
2 v_bfile bfile;
3 v_blob blob;
4 begin
5 insert into test (id,name,doc)
6 values (sample_seq.nextval,p_file_name,empty_blob())
7 return doc into v_blob;
8 v_bfile := bfilename('DOCUMENTS',p_file_name);
9 dbms_lob.fileopen(v_bfile,dbms_lob.file_readonly);
10 dbms_lob.loadfromfile(v_blob,v_bfile,dbms_lob.getlength(v_bfile));
11 dbms_lob.fileclose(v_bfile);
12 commit;
13 end;
14 /
Procedure created.
SQL> EXEC load_data ('Clustering.doc');
PL/SQL procedure successfully completed.
SQL> exec load_data('connectivity.doc');
PL/SQL procedure successfully completed.
SQL> select id from test;
ID
----------
22
23
24
SQL> begin
2 ctx_ddl.create_preference('est_lexer', 'WORLD_LEXER');
3 end;
4 /
PL/SQL procedure successfully completed.
SQL> create index sample_doc_idx on test (doc) indextype IS ctxsys.context parameters(' LEXER EST_LEXER ');
Index created.
SQL> exec load_data('backtrack_oracle_tutorial.pdf');
PL/SQL procedure successfully completed.
SQL> exec load_data('Reading Logs Spanish.pdf');
PL/SQL procedure successfully completed.
SQL> exec load_data('Pan-2.4-fr_FR.pdf');
PL/SQL procedure successfully completed.
SQL> exec load_data('Kitchen-2.4-fr_FR.pdf');
PL/SQL procedure successfully completed.
SQL> exec load_data('dutch.txt');
PL/SQL procedure successfully completed.
SQL> select count(*) from test;
COUNT(*)
----------
8
SQL> set autotrace on
SQL> ed
Wrote file afiedt.buf
1 SELECT SCORE(1) score, id, name
2 FROM test
3 WHERE CONTAINS(doc, 'Tomcat', 1) > 0
4* ORDER BY SCORE(1) DESC
SQL>
SQL> /
SCORE ID NAME
---------- ---------- --------------------------------------------------------------------------------
100 23 Clustering.doc
Execution Plan
----------------------------------------------------------
Plan hash value: 2693406471
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 1632 | 1 (100)| 00:00:01 |
| 1 | SORT ORDER BY | | 3 | 1632 | 1 (100)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 3 | 1632 | 0 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | SAMPLE_DOC_IDX | | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CTXSYS"."CONTAINS"("DOC",'Tomcat',1)>0)
Statistics
----------------------------------------------------------
11 recursive calls
0 db block gets
19 consistent gets
0 physical reads
0 redo size
532 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
but When i try to look for the documents in other languages "no rows selected"
SQL> ed
Wrote file afiedt.buf
1 SELECT SCORE(1) score, id, name
2 FROM test
3 WHERE CONTAINS(doc, 'Oracle Application Express', 1) > 0
4* ORDER BY SCORE(1) DESC
SQL> //
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 2693406471
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 544 | 1 (100)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 544 | 1 (100)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 544 | 0 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | SAMPLE_DOC_IDX | | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CTXSYS"."CONTAINS"("DOC",'Oracle Application Express',1)>0)
Statistics
----------------------------------------------------------
148 recursive calls
0 db block gets
647 consistent gets
0 physical reads
0 redo size
380 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
0 rows processed
Please help me how I can use world_lexer for documents in various languages
Thanks
Deepak
|
|
|
|
|
|
|
|
|
|
Re: Multi-lingual index for blob column [message #513666 is a reply to message #513663] |
Tue, 28 June 2011 12:32 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Try testing just a simple example first, inserting test data directly through SQL, without loading a file, inserting before you index, checking that the data is loaded, checking that the data is indexed, then testing the queries, as shown below. I have provided the test script and a run of it separately. It does not display properly on my system due to character set issues, but it still indexes and finds the data. Please post the results from your system in the same manner. If that does not work, then we need to figure out why. If it does work, then you can add one more thing at a time to it until it does not work, which should show where the problem is. You should re-run the full script and check to make sure that the data is loaded and check to make sure that the data is indexed with each test after adding something. Frequently problems are that the data was not properly loaded or not done indexing or synchronizing.
create table test
(id number primary key,
name varchar2(2000),
doc blob)
/
-- insert data before indexing:
insert into test (id, name, doc)
values (1, 'test1', utl_raw.cast_to_raw ('Tomcat'))
/
insert into test (id, name, doc)
values (2, 'test2', utl_raw.cast_to_raw ('Datenbankebene'))
/
insert into test (id, name, doc)
values (3, 'test3', utl_raw.cast_to_raw ('箭头显示在两个方向'))
/
commit
/
-- make sure data is in the table:
column name format a5
column doc format a30
select id, name, length (doc), utl_raw.cast_to_varchar2 (doc) doc
from test
/
begin
ctx_ddl.create_preference ('est_lexer', 'world_lexer');
end;
/
create index sample_doc_idx on test (doc)
indextype is ctxsys.context
parameters ('lexer est_lexer')
/
-- make sure data is indexed:
select token_text from dr$sample_doc_idx$i
/
-- test queries:
select score (1) score, id, name,
utl_raw.cast_to_varchar2 (doc) doc
from test
where contains (doc, 'Tomcat', 1) > 0
order by score (1) desc
/
select score (1) score, id, name,
utl_raw.cast_to_varchar2 (doc) doc
from test
where contains (doc, 'Datenbankebene', 1) > 0
order by score (1) desc
/
select score (1) score, id, name,
utl_raw.cast_to_varchar2 (doc) doc
from test
where contains (doc, '箭头显示在两个方向', 1) > 0
order by score (1) desc
/
SCOTT@orcl_11gR2> create table test
2 (id number primary key,
3 name varchar2(2000),
4 doc blob)
5 /
Table created.
SCOTT@orcl_11gR2> -- insert data before indexing:
SCOTT@orcl_11gR2> insert into test (id, name, doc)
2 values (1, 'test1', utl_raw.cast_to_raw ('Tomcat'))
3 /
1 row created.
SCOTT@orcl_11gR2> insert into test (id, name, doc)
2 values (2, 'test2', utl_raw.cast_to_raw ('Datenbankebene'))
3 /
1 row created.
SCOTT@orcl_11gR2> insert into test (id, name, doc)
2 values (3, 'test3', utl_raw.cast_to_raw ('ç®å¤´æ˜¾ç¤ºåœ¨ä¸¤ä¸ªæ¹å''))
3 /
1 row created.
SCOTT@orcl_11gR2> commit
2 /
Commit complete.
SCOTT@orcl_11gR2> -- make sure data is in the table:
SCOTT@orcl_11gR2> column name format a5
SCOTT@orcl_11gR2> column doc format a30
SCOTT@orcl_11gR2> select id, name, length (doc), utl_raw.cast_to_varchar2 (doc) doc
2 from test
3 /
ID NAME LENGTH(DOC) DOC
---------- ----- ----------- ------------------------------
1 test1 6 Tomcat
2 test2 14 Datenbankebene
3 test3 56 ç®å¤´æ˜¾ç¤ºåœ¨ä¸¤ä¸ªæ¹å'
3 rows selected.
SCOTT@orcl_11gR2> begin
2 ctx_ddl.create_preference ('est_lexer', 'world_lexer');
3 end;
4 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> create index sample_doc_idx on test (doc)
2 indextype is ctxsys.context
3 parameters ('lexer est_lexer')
4 /
Index created.
SCOTT@orcl_11gR2> -- make sure data is indexed:
SCOTT@orcl_11gR2> select token_text from dr$sample_doc_idx$i
2 /
TOKEN_TEXT
----------------------------------------------------------------
DATENBANKEBENE
TOMCAT
¨
´
¹Å
ĸ¤Ä¸ªÆ
Ť
Ƙ¾Ç¤ºÅŒ
Ç®
9 rows selected.
SCOTT@orcl_11gR2> -- test queries:
SCOTT@orcl_11gR2> select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'Tomcat', 1) > 0
5 order by score (1) desc
6 /
SCORE ID NAME DOC
---------- ---------- ----- ------------------------------
4 1 test1 Tomcat
1 row selected.
SCOTT@orcl_11gR2> select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'Datenbankebene', 1) > 0
5 order by score (1) desc
6 /
SCORE ID NAME DOC
---------- ---------- ----- ------------------------------
4 2 test2 Datenbankebene
1 row selected.
SCOTT@orcl_11gR2> select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'ç®å¤´æ˜¾ç¤ºåœ¨ä¸¤ä¸ªæ¹å'', 1) > 0
5 order by score (1) desc
6 /
SCORE ID NAME DOC
---------- ---------- ----- ------------------------------
4 3 test3 ç®å¤´æ˜¾ç¤ºåœ¨ä¸¤ä¸ªæ¹å'
1 row selected.
SCOTT@orcl_11gR2>
|
|
|
Re: Multi-lingual index for blob column [message #513683 is a reply to message #513666] |
Tue, 28 June 2011 22:46 |
|
dkdms2124
Messages: 369 Registered: April 2010 Location: INDIA
|
Senior Member |
|
|
Hello Mam,
I tried your step and gor the below mentioned error in between.
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jun 29 09:21:43 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Enter user-name: sys/oracle as sysdba
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> select name from v$database;
NAME
---------
SOURCE
SQL> conn scott/tiger
Connected.
SQL> conn sys/oracle as sysdba
Connected.
SQL> grant execute on ctx_ddl to scott;
Grant succeeded.
SQL> grant dba to scott;
Grant succeeded.
SQL> conn scott/tiger
Connected.
SQL>
SQL> select * from tab;
TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
DEPT TABLE
EMP TABLE
BONUS TABLE
SALGRADE TABLE
TIME_TEST TABLE
SQL> ed
Wrote file afiedt.buf
1 create table test
2 (id number primary key,
3 name varchar2(2000),
4* doc blob)
SQL> /
Table created.
SQL> insert into test (id, name, doc)
2 values (1, 'test1', utl_raw.cast_to_raw ('Tomcat'));
1 row created.
SQL> insert into test (id, name, doc)
2 values (2, 'test2', utl_raw.cast_to_raw ('Datenbankebene'));
1 row created.
SQL> insert into test (id, name, doc)
2 values (3, 'test3', utl_raw.cast_to_raw ('?????????'));
1 row created.
SQL> commit;
Commit complete.
SQL> column name format a5
SQL> column doc format a30
SQL> select id, name, length (doc), utl_raw.cast_to_varchar2 (doc) doc
2 from test;
ID NAME LENGTH(DOC) DOC
---------- ----- ----------- ------------------------------
1 test1 6 Tomcat
2 test2 14 Datenbankebene
3 test3 9 ?????????
SQL> begin
2 ctx_ddl.create_preference ('est_lexer', 'world_lexer');
3 end;
4 /
PL/SQL procedure successfully completed.
SQL> create index sample_doc_idx on test (doc)
2 indextype is ctxsys.context
3 parameters ('lexer est_lexer');
Index created.
SQL> select token_text from dr$sample_doc_idx$i;
TOKEN_TEXT
----------------------------------------------------------------
DATENBANKEBENE
TOMCAT
SQL> select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'Tomcat', 1) > 0
5 order by score (1) desc;
SCORE ID NAME DOC
---------- ---------- ----- ------------------------------
4 1 test1 Tomcat
SQL> select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'Datenbankebene', 1) > 0
5 order by score (1) desc
6 /
SCORE ID NAME DOC
---------- ---------- ----- ------------------------------
4 2 test2 Datenbankebene
SQL> select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, '?????????', 1) > 0
5 order by score (1) desc
6 /
select score (1) score, id, name,
*
ERROR at line 1:
ORA-29902: error in executing ODCIIndexStart() routine
ORA-20000: Oracle Text error:
DRG-50901: text query parser syntax error on line 1, column 10
SQL> select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, '?????????', 1) > 0
5 order by score (1) desc
6 ;
select score (1) score, id, name,
*
ERROR at line 1:
ORA-29902: error in executing ODCIIndexStart() routine
ORA-20000: Oracle Text error:
DRG-50901: text query parser syntax error on line 1, column 10
SQL>
Should I proceed ignoring it or there is some reason why I got this error.
Thanks n Regards
Deepak
|
|
|
Re: Multi-lingual index for blob column [message #513688 is a reply to message #513683] |
Tue, 28 June 2011 23:22 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Did you insert some data that was just displayed as a string of question marks or did you actually insert a string of questions marks? If you inserted a string of question marks, then they would correctly not be indexed, and a query with just question marks in the search string would correctly cause an error. It looks like that is what happened. When I ran my test case, I put the test case in a sql file, then ran that file, spooling the results. Because of the special characters, I saved the sql file that I ran as utf-8 and included a blank line at the top of the file.
[Updated on: Tue, 28 June 2011 23:23] Report message to a moderator
|
|
|
Re: Multi-lingual index for blob column [message #513689 is a reply to message #513688] |
Tue, 28 June 2011 23:37 |
|
dkdms2124
Messages: 369 Registered: April 2010 Location: INDIA
|
Senior Member |
|
|
Hello Mam,
I have followed the same that you mentioned in your post and everything seems working now. Check out the result below.
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jun 29 09:32:04 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> conn hr/hr
ERROR:
ORA-01017: invalid username/password; logon denied
Warning: You are no longer connected to ORACLE.
SQL> conn sys/oracle as sysdba
Connected.
SQL> create user hrtest idenitified by hr;
create user hrtest idenitified by hr
*
ERROR at line 1:
ORA-00922: missing or invalid option
SQL> create user hrtest identified by hr;
User created.
SQL> grant dba to hrtest;
Grant succeeded.
SQL> grant execute on ctx_ddl to hrtest;
Grant succeeded.
SQL> conn hrtest/hr
Connected.
SQL> select * from tab;
no rows selected
SQL> create table test
2 2 (id number primary key,
3 3 name varchar2(2000),
4 4 doc blob);
2 (id number primary key,
*
ERROR at line 2:
ORA-00922: missing or invalid option
SQL> ed
Wrote file afiedt.buf
1 create table test
2 (id number primary key,
3 name varchar2(2000),
4* doc blob)
SQL> /
Table created.
SQL> ed
Wrote file afiedt.buf
1 insert into test (id, name, doc)
2* values (1, 'test1', utl_raw.cast_to_raw ('Tomcat'))
SQL> /
1 row created.
SQL> ed
Wrote file afiedt.buf
1 insert into test (id, name, doc)
2* values (2, 'test2', utl_raw.cast_to_raw ('Datenbankebene'))
SQL> /
1 row created.
SQL> ed
Wrote file afiedt.buf
1 insert into test (id, name, doc)
2* values (3, 'test3', utl_raw.cast_to_raw ('τ«¡σñ┤µÿ╛τñ║σ£¿Σ╕ñΣ╕¬µ╣σÉ'))
SQL> /
1 row created.
SQL> commit;
Commit complete.
SQL> column name format a5
SQL> column doc format a30
SQL> select id, name, length (doc), utl_raw.cast_to_varchar2 (doc) doc from test;
ID NAME LENGTH(DOC) DOC
---------- ----- ----------- ------------------------------
1 test1 6 Tomcat
2 test2 14 Datenbankebene
3 test3 25 τ«¡σñ┤µÿ╛τñ║σ£¿Σ╕ñΣ╕¬µ╣σÉ
SQL> ed
Wrote file afiedt.buf
1 begin
2 ctx_ddl.create_preference ('est_lexer', 'world_lexer');
3* end;
SQL> /
PL/SQL procedure successfully completed.
SQL> ed
Wrote file afiedt.buf
1 create index sample_doc_idx on test (doc)
2 indextype is ctxsys.context
3* parameters ('lexer est_lexer')
SQL> ;
1 create index sample_doc_idx on test (doc)
2 indextype is ctxsys.context
3* parameters ('lexer est_lexer')
SQL> /
Index created.
SQL> select token_text from dr$sample_doc_idx$i;
TOKEN_TEXT
----------------------------------------------------------------
DATENBANKEBENE
TOMCAT
¿
┤
─╕ñ─╕¬╞╣┼É
Ť
╞ÿ╛╟ñ║┼î
Ǯ
8 rows selected.
SQL> ed
Wrote file afiedt.buf
1 select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'Tomcat', 1) > 0
5 order by score (1) desc
6* /
SQL> /
/
*
ERROR at line 6:
ORA-00933: SQL command not properly ended
SQL> ed
Wrote file afiedt.buf
1 select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'Tomcat', 1) > 0
5 order by score (1) desc
6* /
SQL> /
/
*
ERROR at line 6:
ORA-00933: SQL command not properly ended
SQL> ed
Wrote file afiedt.buf
1 select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'Tomcat', 1) > 0
5* order by score (1) desc
SQL> /
SCORE ID NAME DOC
---------- ---------- ----- ------------------------------
4 1 test1 Tomcat
SQL> ed
Wrote file afiedt.buf
1 select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'Datenbankebene', 1) > 0
5* order by score (1) desc
SQL> /
SCORE ID NAME DOC
---------- ---------- ----- ------------------------------
4 2 test2 Datenbankebene
SQL> ed
Wrote file afiedt.buf
1 select score (1) score, id, name,
2 utl_raw.cast_to_varchar2 (doc) doc
3 from test
4 where contains (doc, 'τ«¡σñ┤µÿ╛τñ║σ£¿Σ╕ñΣ╕¬µ╣σÉ', 1) > 0
5* order by score (1) desc
SQL> /
SCORE ID NAME DOC
---------- ---------- ----- ------------------------------
4 3 test3 τ«¡σñ┤µÿ╛τñ║σ£¿Σ╕ñΣ╕¬µ╣σÉ
SQL>
Now why I am not able to get the same result when I try to search the srting from the files stored in the table.
Regards
Deepak
|
|
|
|
|
|
Re: Multi-lingual index for blob column [message #513693 is a reply to message #513691] |
Wed, 29 June 2011 00:05 |
|
dkdms2124
Messages: 369 Registered: April 2010 Location: INDIA
|
Senior Member |
|
|
Check this out Mam..
TOKEN_TEXT
---------------------------------------------------------------
Vÿ
V¼
V-B
V-FW7B
V-F-6
V+RIFIEZ
V-FVB
V-+¦
V+F¦
V_W
NPQMK
All this data is the result of the query to check tokens.
but when I rty to run the serach query for one of the string "V+RIFIEZ" still there is "no row returned".
SQL> ed
Wrote file afiedt.buf
1 select score (1) score, id, name
2 from test
3 where contains (doc, 'V+RIFIEZ', 1) > 0
4* order by score (1) desc
SQL> /
no rows selected
More over one of the character of the string changed automatically. When I tried to run the query on the SQL developer the strinf was exact same as it was in the token but still there was no row returned.
Below is the query which I ran in SQL Developer..
select score (1) score, id, name
from test
where contains (doc, 'V╔RIFIEZ', 1) > 0
order by score (1) desc
Now what could be the problem Mam?
Thanks
Deepak
[Updated on: Wed, 29 June 2011 00:06] Report message to a moderator
|
|
|
|
|
|
|
Goto Forum:
Current Time: Sat Nov 23 06:58:07 CST 2024
|