PDF to HTML convert using ctxsys.auto_filter different result db 11.2 and 12.1 [message #650141] |
Sun, 17 April 2016 04:25 |
|
bwelter
Messages: 4 Registered: January 2012 Location: Netherlands
|
Junior Member |
|
|
Converting the same PDF doc gives different result between Oracle 11.2 and 12.1.
Using plaintext => false to get HTML output
Code:
declare
l_blob blob; -- holding PDF
l_clob clob; -- result of conversion
begin
--loading blob with pdf:
...
-- set policy:
ctx_ddl.create_policy('test_policy','ctxsys.auto_filter');
......
-- convert PDF:
ctx_doc.policy_filter( policy_name => 'test_policy' , document => l_blob , restab => l_clob , plaintext => false);
l_clob := replace(trim(g_clob), chr(13), chr(10));
l_clob := replace(g_clob, chr(10), chr(32) || '<<EOL>>' || chr(10)||'<<BOL>>');
....
end;
In the Oracle 12 database I get in l_clob:
<<BOL>><div class="c" style="top:592px;left:218px;font-size:9px;font-family:Arial, sans-serif;" <<EOL>>
<<BOL>>>TRANSFORMER SINGLE PHASE, PR AC440V SEC AC220/5,</div> <<EOL>>
<<BOL>><div class="c" style="top:592px;left:38px;font-size:9px;font-family:Arial, sans-serif;" <<EOL>>
In the Oracle 11 database I get with the same PDF the following result in l_clob:
<<BOL>> <<EOL>>
<<BOL>><p><font size="1" face="Arial">TRANSFORMER SINGLE PHASE, PR AC440V SEC AC220/5,</font></p> <<EOL>>
<<BOL>> <<EOL>>
I explicitly need this part of the converted PDF content:
..top:592px;left:218px..
Maybe it has something to do with settings?
What is the solution?
NB: I am aware of the fact that not all PDF documents contain nicely formatted texts and x-y positions. For my purpose now this is however a good solution.
[Updated on: Sun, 17 April 2016 04:27] Report message to a moderator
|
|
|
|
|
|
|