Want to Search any kind of PDF and other types of documents in one single coulmn [message #188177] |
Thu, 17 August 2006 06:31 |
Bharath Kumar ,V
Messages: 18 Registered: February 2004
|
Junior Member |
|
|
Hi,
Want to Search any kind of PDF and other types of documents in one single coulmn
The text search was working fine for all types of documents,except it was not able to search certain pdf files. The documents name and paths were stored in the base table in varchar2 fileds
In order to get a solution, I went to the link
http://www.oracle.com/technology/products/text/htdocs/altfilters.htm
I choose XPDF way using user_filter. But now it is working only for PDF files (all kinds of PDF). It is not searching any other types of documents(like txt,html,xml and etc.,). I want to know, how to make it search all kinds of PDF as well as other types of documents also. Somewhere I understand, I have to create two filters, one(user_filter) for PDF and another for rest of the document types and comnine these two filters in creating an index. But not sure how to do this OR is there any other way?
The steps I have done for user_filter are as follows
begin
ctx_ddl.create_preference('DFILENAME','FILE_DATASTORE');
end;
begin
ctx_ddl.create_preference ('my_xpdf_filter', 'user_filter');
end;
begin
ctx_ddl.set_attribute('my_xpdf_filter', 'command', 'pdftotext.exe');
end;
create index docs_bak_idx on DOCS_BAK (DOC_PATH_FILENAME)
indextype is ctxsys.context
parameters ('Datastore DFILENAME filter my_xpdf_filter');
And finally made it as a job for sync index every 2 minutes.
If possible this post can be treated as URGENT.
Thanks
|
|
|
Re: Want to Search any kind of PDF and other types of documents in one single coulmn [message #188238 is a reply to message #188177] |
Thu, 17 August 2006 09:03 |
|
Barbara Boehmer
Messages: 9100 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
I don't think there is any way that you can combine two filters in one index like that. I think the most that you can do with one index is to have a format column where you can mark those that are just text and don't require any filter. Alternatively, you can have two separate tables, with two separate indexes, with two separate filters, and combine the search results. I recall that you posted the same question on the OTN forums a week ago. You might try posting on http://asktom.oracle.com to ask Tom Kyte. If Oracle expert Tom Kyte can't suggest something at least maybe he can confirm that it can't be done. He frequently has a large backlog of questions, but still accepts a few new ones briefly a few times per day, so you may have to check frequently to catch a time when you can post a new question. Or, if you can find a closely related thread about pdf's and filters, you might try adding your question as part of a review, which can be posted any time.
|
|
|
|