Problems with CTX_DOC.SNIPPET on HTML documents [message #354803] |
Tue, 21 October 2008 06:22 |
dugjason
Messages: 13 Registered: June 2008 Location: UK
|
Junior Member |
|
|
Hi,
I am running CTX_DOC.SNIPPET on the following HTML document, retrieved from my Oracle database:
<p class="Bodytext">
As <span class="myProduct">myProduct</span> is an Internet-based
application, all you need to do to start it is to go to
http://<span class="myProduct">myProduct</span> installation>.
</p>
<ul>
<li> Start
your usual browser.
</li>
</ul>
I have set the entity_translation parameter of ctx_doc.snippet => FALSE, which removes most of the HTML tags from my text, but in this case, it still leaves "<p class="Bodytext">", and the <span> tags in the final output.
I have had a play around with it, but it always seems to display some HTML in the output, although I would like it to be displayed as plain text.
I am not sure if the key to this is by editing the parameters of ctx_doc.snippet, or if I will have to edit my index to perhaps generate a plain text version of the HTML document?
Below are my index, and ctx_doc.snippet call:
Index:
create index help_text
on help_page (page_content)
indextype is ctxsys.context
parameters ( 'TRANSACTIONAL SYNC(EVERY "SYSDATE+15/1440")');
Snippet:
ctx_doc.snippet('help_text', to_char(page_results.page_id), p_string, '<B>', '</B>', false);
Any help will be greatly appreciated!
|
|
|
Re: Problems with CTX_DOC.SNIPPET on HTML documents [message #355095 is a reply to message #354803] |
Wed, 22 October 2008 11:33 |
|
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
I am using 11g and get slightly different results, but adding CTXSYS.AUTO_FILTER to the parameters during index creation seems to clean it up. Please see the reproduction and solution below.
-- test environment:
SCOTT@orcl_11g> CREATE TABLE help_page
2 (page_id NUMBER PRIMARY KEY,
3 page_content CLOB)
4 /
Table created.
SCOTT@orcl_11g> SET DEFINE OFF
SCOTT@orcl_11g> INSERT INTO help_page VALUES
2 (1,
3 '<p class="Bodytext">
4 As <span class="myProduct">myProduct</span> is an Internet-based
5 application, all you need to do to start it is to go to
6 http://<span class="myProduct">myProduct</span> installation>.
7 </p>
8 <ul>
9 <li> Start
10 your usual browser.
11 </li>
12 </ul>')
13 /
1 row created.
SCOTT@orcl_11g> create index help_text
2 on help_page (page_content)
3 indextype is ctxsys.context
4 parameters
5 ( 'TRANSACTIONAL SYNC(EVERY "SYSDATE+15/1440")')
6 /
Index created.
SCOTT@orcl_11g> VARIABLE g_test CLOB
SCOTT@orcl_11g> COLUMN g_test FORMAT A45 WORD_WRAPPED
-- reproduction:
SCOTT@orcl_11g> DECLARE
2 p_string VARCHAR2(30) := 'application';
3 BEGIN
4 CTX_DOC.SET_KEY_TYPE ('PRIMARY_KEY');
5 :g_test := ctx_doc.snippet
6 ('help_text',
7 '1',
8 p_string,
9 '<B>',
10 '</B>',
11 false);
12 END;
13 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> PRINT g_test
G_TEST
---------------------------------------------
myProduct">myProduct</span> is an
Internet-based
<B>application</B>, all you need to do to
start it is to go
-- solution:
SCOTT@orcl_11g> DROP INDEX help_text
2 /
Index dropped.
SCOTT@orcl_11g> create index help_text
2 on help_page (page_content)
3 indextype is ctxsys.context
4 parameters
5 ( 'TRANSACTIONAL SYNC(EVERY "SYSDATE+15/1440")
6 FILTER CTXSYS.AUTO_FILTER')
7 /
Index created.
SCOTT@orcl_11g> DECLARE
2 p_string VARCHAR2(30) := 'application';
3 BEGIN
4 CTX_DOC.SET_KEY_TYPE ('PRIMARY_KEY');
5 :g_test := ctx_doc.snippet
6 ('help_text',
7 '1',
8 p_string,
9 '<B>',
10 '</B>',
11 false);
12 END;
13 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> PRINT g_test
G_TEST
---------------------------------------------
As myProduct is an Internet-based
<B>application</B>, all you need to do to
start it is to go
SCOTT@orcl_11g>
|
|
|
|