Home » Server Options » Text & interMedia » Relevance rank within section weighting
Relevance rank within section weighting [message #137625] |
Thu, 15 September 2005 15:30 |
pmj1
Messages: 11 Registered: June 2005 Location: Ann Arobr, MI
|
Junior Member |
|
|
We are switching to Oracle Text from an existing full text application where we could weight hits that occurred in the title or Abstract fields higher than hits that occurred in the PDF text. I am not seeing how this might be done easily in Oracle.
Note, I am not talking about term weighting as in cat,dog*2 doubles the score for dog hits. I want to weight the cat or dog hits if they occur in the title as opposed to the associated PDF.
I could imagine doing several WITHIN section searches and creating a weighted, summed score but there must be an easier way that I am missing.
|
|
|
Re: Relevance rank within section weighting [message #137774 is a reply to message #137625] |
Fri, 16 September 2005 13:25 |
|
Barbara Boehmer
Messages: 9101 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
This subject came up before on the OTN forums and the best solution anybody came up with was a multi-column datastore with sections and multiple within clauses. I have provided a small demonstration below.
scott@ORA92> CONNECT ctxsys/ctxsys_password
Connected.
scott@ORA92> @ LOGIN
scott@ORA92> SET ECHO OFF
GLOBAL_NAME
----------------------------------------------------------------------------------------------------
ctxsys@ORA92
ctxsys@ORA92> EXEC CTX_DDL.DROP_PREFERENCE ('my_multi')
PL/SQL procedure successfully completed.
ctxsys@ORA92> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('my_multi', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('my_multi', 'COLUMNS', 'title, abstract, pdf, dummy');
4 END;
5 /
PL/SQL procedure successfully completed.
ctxsys@ORA92> CONNECT scott/tiger
Connected.
ctxsys@ORA92> @ LOGIN
ctxsys@ORA92> SET ECHO OFF
GLOBAL_NAME
----------------------------------------------------------------------------------------------------
scott@ORA92
scott@ORA92> DROP TABLE documents
2 /
Table dropped.
scott@ORA92> CREATE TABLE documents
2 (title VARCHAR2(15),
3 abstract VARCHAR2(15),
4 pdf VARCHAR2(15),
5 dummy VARCHAR2(1))
6 /
Table created.
scott@ORA92> INSERT ALL
2 INTO documents VALUES ('MATCH', 'OTHER', 'OTHER', NULL)
3 INTO documents VALUES ('OTHER', 'MATCH', 'OTHER', NULL)
4 INTO documents VALUES ('OTHER', 'OTHER', 'MATCH', NULL)
5 INTO documents VALUES ('MATCH', 'MATCH', 'OTHER', NULL)
6 INTO documents VALUES ('MATCH', 'OTHER', 'MATCH', NULL)
7 INTO documents VALUES ('OTHER', 'MATCH', 'MATCH', NULL)
8 SELECT * FROM DUAL
9 /
6 rows created.
scott@ORA92> EXEC CTX_DDL.DROP_SECTION_GROUP ('keyword_section_group')
PL/SQL procedure successfully completed.
scott@ORA92> BEGIN
2 CTX_DDL.CREATE_SECTION_GROUP
3 (group_name => 'keyword_section_group',
4 group_type => 'basic_section_group');
5 CTX_DDL.ADD_FIELD_SECTION
6 (group_name => 'keyword_section_group' ,
7 section_name => 'title',
8 tag => 'title',
9 visible => true );
10 CTX_DDL.ADD_FIELD_SECTION
11 (group_name => 'keyword_section_group' ,
12 section_name => 'abstract',
13 tag => 'abstract',
14 visible => true );
15 CTX_DDL.ADD_FIELD_SECTION
16 (group_name => 'keyword_section_group' ,
17 section_name => 'pdf',
18 tag => 'pdf',
19 visible => true );
20 END;
21 /
PL/SQL procedure successfully completed.
scott@ORA92> CREATE INDEX documents_keyword_index
2 ON documents (dummy)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS ('datastore CTXSYS.my_multi section group keyword_section_group')
5 /
Index created.
scott@ORA92> SELECT title, abstract, pdf, SCORE (1) AS weighted_score
2 FROM documents
3 WHERE CONTAINS (dummy, '(((MATCH) WITHIN title)) * 5
4 OR ((MATCH) WITHIN abstract) * 3
5 OR ((MATCH) WITHIN pdf)', 1) > 0
6 ORDER BY SCORE (1) DESC
7 /
TITLE ABSTRACT PDF WEIGHTED_SCORE
--------------- --------------- --------------- --------------
MATCH OTHER MATCH 20
MATCH MATCH OTHER 20
MATCH OTHER OTHER 20
OTHER MATCH MATCH 12
OTHER MATCH OTHER 12
OTHER OTHER MATCH 4
6 rows selected.
scott@ORA92>
|
|
|
|
Goto Forum:
Current Time: Tue Dec 17 20:30:42 CST 2024
|