Home » Server Options » Text & interMedia » Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb)
Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #494583] Tue, 15 February 2011 16:44 Go to next message
endresma
Messages: 5
Registered: February 2011
Junior Member
Hi,

I have the same problem on Oracle 11g. However, the index on a full text does its work. Only the index I have created on PDF files does not select a record. Any ideas?
Thanks a lot
markus
Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #494621 is a reply to message #494583] Wed, 16 February 2011 01:38 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
Your problem is not related to the problem in the other thread. Please post your questions as separate topics in the future. There are various reasons why a pdf file might not get filtered, such as unsupported pdf versions, special fonts and characters, and password issues. Please click on the lick below for explanations of some of those reasons.

http://download.oracle.com/docs/cd/E11882_01/text.112/e16593/afilsupt.htm#CCREF23971
Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #494629 is a reply to message #494621] Wed, 16 February 2011 02:22 Go to previous messageGo to next message
endresma
Messages: 5
Registered: February 2011
Junior Member
Thanks a lot.
Well, I don't understand the problem, because I do not use unsupported pdf versions, no special fonts and no password. The pdf is definitely supported, because on the previous version of oracle the index worked perfectly. This means, I was able to built the index and select records using the score function in a select query.

However, after a new installation of oracle pdf won't be indexed. Could the encoding be the problem? My old version used ASCII whereas my new oracle installation uses UTF-8.
Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #494684 is a reply to message #494629] Wed, 16 February 2011 10:10 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
Please post any code related to how you are attempting to filter, index, and access the data, including create table statement, how you insert your data, such as whether you use a file_datastore or load into a blob or bfile column, your statement for index creation, and a sample query using contains.

Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #494816 is a reply to message #494684] Thu, 17 February 2011 03:46 Go to previous messageGo to next message
endresma
Messages: 5
Registered: February 2011
Junior Member
Hi, here is my code:

CREATE TABLE pdf(id INTEGER, pdf BLOB);
INSERT INTO PDF VALUES(1, empty_blob());
--  I insert the pdf BLOB by the SQL Client DbVisualizer

CREATE INDEX pdf_index ON pdf(pdf) INDEXTYPE IS CTXSYS.CONTEXT;

SELECT id FROM pdf WHERE CONTAINS(pdf, 'Join') > 0;


Creating the index works without error. Although the pdf contains the word 'Join' the query produces no result. Moreover, the table DR$PDF_INDEX$I is empty. I thought it should contain the pdf keywords.

Thanks for your help.
Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #495086 is a reply to message #494816] Thu, 17 February 2011 13:03 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
What is your operating system and version? What is your complete Oracle version (11.?...)? What is your pdf version? Are there any entries in ctx_user_index_errors? Can you select dbms_lob.getlength (pdf) to confirm that the document was loaded into the blob? Can you load, filter, index, and search other non-text documents like doc? Have you tried loading the pdf into the blob through pl/sql instead of dbvisualizer to see if that works? Have you tried loading any extremely simple pdf document with nothing more than a few simple words like "test data". When testing, please make sure that you do not use Oracle reserved or key words like join. Have you tried explicitly declaring ctxsys.auto_filter in your index parameters? What do you get when you run ctxhx.exe on the pdf file directly from the operating system?


Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #495144 is a reply to message #495086] Thu, 17 February 2011 14:42 Go to previous messageGo to next message
endresma
Messages: 5
Registered: February 2011
Junior Member
OS: Debian Linux 5.0
Oracle 11.1.0.6.0
PDF version 1.2
ctx_user_index_errors is empty
dbms_lob.getlength (pdf) is NOT empty
Index works perfect on text stored in e.g. varchar2
Index does not work on .doc documents
I tried to load the pdf by PL/SQL and by Java code. Does not work.
I tried a simple document with about 20 words. Does not work.
I used ctxsys.auto_filter as well as inso_filter. Does not work.
ctxhx on the pdf file gives no output.

Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #495179 is a reply to message #495144] Thu, 17 February 2011 15:48 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
Since it is loading the file into the blob, and working on a text file, but not on a pdf file or even a doc file, then it sounds like it is not doing any filtering. Ctxhx.exe is the lowest level at which to check the filtering. You mentioned switching from ascii to utf and that might be part of the problem. Please try running something like this from the operating system, substituting your home, paths, and files:

your_oracle_home/bin/ctxhx.exe source_path/source_file.pdf target_path/target_file.html ASCII8 utf8 H NOMETA 120 HEURISTIC FORMAT NOPDFROTATE

After running that, edit your target_path/target_file.html and see if the data in the pdf file has been converted to html or if you have an empty file or no file or what.

If that filters your document, then try adding a column in your table to specify the character set and adding that column as a charset column to your index parameters.

If none of this helps, then I am out of things to check and suggest that you re-post your problem on the OTN Text forum:

http://forums.oracle.com/forums/forum.jspa?forumID=71

The Oracle Text product manager regularly responds there and if he can't figure it out, he will refer you to support. Have you checked metalink to see if there is already an identified bug?

[Updated on: Thu, 17 February 2011 16:04]

Report message to a moderator

Re: Contains Clause is not working on certain scenario (split from unrelated hijacked thread by bb) [message #495538 is a reply to message #495179] Sun, 20 February 2011 15:41 Go to previous message
endresma
Messages: 5
Registered: February 2011
Junior Member
Hi, thanks a lot for your help.
However, there is not target_file. It is not created. I will follow your hint and post the problem on the OTN forum.
Previous Topic: Contains Clause is not working on certain scenario (merged 3)
Next Topic: Near Clause
Goto Forum:
  


Current Time: Thu Dec 26 23:38:09 CST 2024