Error trying to index a PDF file [message #483805] |
Wed, 24 November 2010 10:22 |
kastania
Messages: 19 Registered: May 2007
|
Junior Member |
|
|
We get an error when trying to index a PDF document(not all pdf documents)
DRG-11207: user filter command exited with status 1
DRG-11221: Third-party filter indicates this document is corrupted.
The documents opens with various PDF viewers, and the corruption is not visible to us.
the statement that creates the index is :
CREATE INDEX documents_ot_idx ON documents(document_binary) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('SYNC(ON COMMIT) STORAGE INDEX_STORAGE lexer EQF_LEXER');
Our version is 10.2.4. I have read about AUTO_FILTER. We do not use AUTO_FILTER in the create statement. Could this be the solution?
If no filter is specified during index creation, is there any default filter used?
The document is PDF-1.6. If we save it to PDF-1.4 it is indexed successfully. But this is not an accepted solution.
I have found this in a forum: AUTO_FILTER in 10.2.0.3 already supports PDF 1.6, see Note 309154.1 but my metalink contract has expired and I cannot read it. Anyone who can????
[Updated on: Wed, 24 November 2010 10:30] Report message to a moderator
|
|
|
|
Re: Error trying to index a PDF file [message #483810 is a reply to message #483809] |
Wed, 24 November 2010 11:03 |
kastania
Messages: 19 Registered: May 2007
|
Junior Member |
|
|
Unfortunattely we cannot migrate to 11g. Is there any other solution?
We and the client have the same version of Oracle Text. The same pdf is successfully indexed in our enviroment and not in theirs.... Who knows... I'll just tell them not to use 1.6 pdfs...
|
|
|
|
Re: Error trying to index a PDF file [message #483812 is a reply to message #483811] |
Wed, 24 November 2010 11:21 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Considering your other post, it sounds like you are having multiple things that work on your system, but not on your client's. I would start looking for what is different, even small seemingly unrelated things. For example, on one system it was found that the DBA had restricted select on the all_tables view. It turned out that, as part of an Oracle Text background process, there was a call to a routine that verifies schema and table names and such, to avoid SQL injection, that needed access to the all_tables view to do so, so it failed, but with an error message that gave no clue that was the problem. It was a lengthy, but interesting process to trace it down.
|
|
|
|
Re: Error trying to index a PDF file [message #483911 is a reply to message #483813] |
Thu, 25 November 2010 08:52 |
kastania
Messages: 19 Registered: May 2007
|
Junior Member |
|
|
Well according to your link 1.6 is not supported. I finally reactivated my Metalink account were I found note(ID 1120683.1) which states that: This is a bug in the Filtering technology used in 10.2.0.4.0. In Oracle database versions where the filtering technology has been changed from Verity 9.2 to OIT 8.2.X this problem has been resolved
The solution they recommend is to apply 10.2.0.5.0 patchset or Upgrade to 11.1.0.7.0 or 11.2.0.1.0. We will ask the client to apply he patch set, which is easier.
As far as my other post is concerned, you are right, running the queries from sqlplus does not fail. However running them from Java code, they fail. Those errors were on the query of the index of this post( that is PDF related), maybe this solution will solve my other problem.
It is difficult to debug the whole situation because the development is in another country, the client in another country and the database server in another one:) So the communication is sloooooww...
Just mentioning, that I have found another note on metalink that in few words said that another person had a peculiar problem which was not logged correctly and the explanation was that there weren't sufficient privileges on Oracle TMP directory were Oracle Text temporary stores the binary content(pdf) until it finishes their "processing". If nothing of the above works, I'll ask from the client to check that...
[Updated on: Thu, 25 November 2010 08:53] Report message to a moderator
|
|
|
Re: Error trying to index a PDF file [message #484041 is a reply to message #483911] |
Fri, 26 November 2010 10:42 |
kastania
Messages: 19 Registered: May 2007
|
Junior Member |
|
|
The client gets an empty ORA-20000 message in 10.2.0.4.
It is a bug of 10.2.0.4 not to provide an error message according to metalink note ID 889805.1.
How can a get more info on that error?
I managed to reproduce an empty ORA-20000 error running the same query on my enviroment, but I do not know if it is the same error as it doesnt provide info.
I traced the sql statement that fails and the .trc file does not show any error. The same with the alert.log, it does not have any logs for the error.
Where is this info held???
|
|
|
Re: Error trying to index a PDF file [message #484052 is a reply to message #484041] |
Fri, 26 November 2010 12:13 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Unfortunately, that information is obscured. The following is a simplified demonstration of what is happening. Suppose that somewhere within the Oracle Text code, certain values cause an error:
SCOTT@orcl_11gR2> declare
2 v_num number;
3 begin
4 select 1/0 into v_num from dual;
5 end;
6 /
declare
*
ERROR at line 1:
ORA-01476: divisor is equal to zero
ORA-06512: at line 4
What Oracle has done within its internal code is to obscure the actual error and line number with a generic message:
SCOTT@orcl_11gR2> declare
2 v_num number;
3 begin
4 select 1/0 into v_num from dual;
5 exception
6 when others then
7 raise_application_error (-20000, null);
8 end;
9 /
declare
*
ERROR at line 1:
ORA-20000:
ORA-06512: at line 7
Only Oracle has access to their own source code, so they can temporarily disable the exception handling and see the actual error and line number, then work from there.
You need to be able to provide a small script that will consistently reproduce the error, in order to report it as a bug to Oracle, so they can look into it and provide a patch or workaround or at least let you know what the problem is. If you can provide such a script, you can also post it on the OTN Text forum, where Oracle Text product manager Roger Ford regularly responds. He has been known to take such test cases and provide the snippet of internal code that raises the error, so that the group of us that regularly respond have been able to eventually discover the root of the problem. The important part is that you narrow it down to a reproducible test case, so that we can produce it on our systems.
|
|
|
|
|
|