Oracle BLOB PDF Text Search Question [message #416454] |
Sun, 02 August 2009 23:52 ![Go to next message Go to next message](/forum/theme/orafaq/images/down.png) |
mike_s_6
Messages: 2 Registered: August 2009
|
Junior Member |
|
|
Good day!
I would like to ask for help regarding searching for strings inside a PDF in Oracle. Let's say I have a table named "documents", which has "id" as the primary key and a field named "document" which is the blob. The contains function is used to search for strings inside the document:
SELECT id FROM documents WHERE CONTAINS(document, 'value') > 0;
Now here's a sample of how a part of the PDF might look like:
![/forum/fa/6641/0/](/forum/fa/6641/0/)
The issue is that when the string "9518 9502" (the first two values in the first column) is searched, it returns true:
SELECT id FROM documents WHERE CONTAINS(document, '9518 9502') > 0;
But as you see, in the document, visibly there isn't a 9518(space)9502, instead there's a table break.
I have explained that it is the PDF's formatting that does this, but I think they still want to be able to determine that there's no '9518 9502' visible in the PDF. Now my question is, since the user seems to want the search to return false, is there a way for the code to discern this?
-
Attachment: sample.jpg
(Size: 7.92KB, Downloaded 4248 times)
|
|
|
|
Re: Oracle BLOB PDF Text Search Question [message #416515 is a reply to message #416468] |
Mon, 03 August 2009 03:18 ![Go to previous message Go to previous message](/forum/theme/orafaq/images/up.png) ![Go to next message Go to next message](/forum/theme/orafaq/images/down.png) |
Frank
Messages: 7901 Registered: March 2000
|
Senior Member |
|
|
Michel Cadot wrote on Mon, 03 August 2009 07:49 | PDF file is binary, Oracle functions works on char datatype family.
If you want some functions on binary data, you have to write them.
Regards
Michel
|
Not true, since Text indexes can also search in Word documents.
I guess it's up to Barbara, our Text Index expert.
|
|
|
|
|
|