Home » Server Options » Text & interMedia » Relevance rank within section weighting
Relevance rank within section weighting [message #137625] Thu, 15 September 2005 15:30 Go to next message
pmj1
Messages: 11
Registered: June 2005
Location: Ann Arobr, MI
Junior Member
We are switching to Oracle Text from an existing full text application where we could weight hits that occurred in the title or Abstract fields higher than hits that occurred in the PDF text. I am not seeing how this might be done easily in Oracle.

Note, I am not talking about term weighting as in cat,dog*2 doubles the score for dog hits. I want to weight the cat or dog hits if they occur in the title as opposed to the associated PDF.

I could imagine doing several WITHIN section searches and creating a weighted, summed score but there must be an easier way that I am missing.
Re: Relevance rank within section weighting [message #137774 is a reply to message #137625] Fri, 16 September 2005 13:25 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
This subject came up before on the OTN forums and the best solution anybody came up with was a multi-column datastore with sections and multiple within clauses. I have provided a small demonstration below.

scott@ORA92> CONNECT ctxsys/ctxsys_password
Connected.
scott@ORA92> @ LOGIN
scott@ORA92> SET ECHO OFF

GLOBAL_NAME
----------------------------------------------------------------------------------------------------
ctxsys@ORA92

ctxsys@ORA92> EXEC CTX_DDL.DROP_PREFERENCE ('my_multi')

PL/SQL procedure successfully completed.

ctxsys@ORA92> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('my_multi', 'MULTI_COLUMN_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE ('my_multi', 'COLUMNS', 'title, abstract, pdf, dummy');
  4  END;
  5  /

PL/SQL procedure successfully completed.

ctxsys@ORA92> CONNECT scott/tiger
Connected.
ctxsys@ORA92> @ LOGIN
ctxsys@ORA92> SET ECHO OFF

GLOBAL_NAME
----------------------------------------------------------------------------------------------------
scott@ORA92

scott@ORA92> DROP TABLE documents
  2  /

Table dropped.

scott@ORA92> CREATE TABLE documents
  2    (title	 VARCHAR2(15),
  3  	abstract VARCHAR2(15),
  4  	pdf	 VARCHAR2(15),
  5  	dummy	 VARCHAR2(1))
  6  /

Table created.

scott@ORA92> INSERT ALL
  2  INTO documents VALUES ('MATCH', 'OTHER', 'OTHER', NULL)
  3  INTO documents VALUES ('OTHER', 'MATCH', 'OTHER', NULL)
  4  INTO documents VALUES ('OTHER', 'OTHER', 'MATCH', NULL)
  5  INTO documents VALUES ('MATCH', 'MATCH', 'OTHER', NULL)
  6  INTO documents VALUES ('MATCH', 'OTHER', 'MATCH', NULL)
  7  INTO documents VALUES ('OTHER', 'MATCH', 'MATCH', NULL)
  8  SELECT * FROM DUAL
  9  /

6 rows created.

scott@ORA92> EXEC CTX_DDL.DROP_SECTION_GROUP ('keyword_section_group')

PL/SQL procedure successfully completed.

scott@ORA92> BEGIN
  2    CTX_DDL.CREATE_SECTION_GROUP
  3  	 (group_name => 'keyword_section_group',
  4  	  group_type => 'basic_section_group');
  5    CTX_DDL.ADD_FIELD_SECTION
  6  	 (group_name => 'keyword_section_group' ,
  7  	  section_name => 'title',
  8  	  tag => 'title',
  9  	  visible => true );
 10    CTX_DDL.ADD_FIELD_SECTION
 11  	 (group_name => 'keyword_section_group' ,
 12  	  section_name => 'abstract',
 13  	  tag => 'abstract',
 14  	  visible => true );
 15    CTX_DDL.ADD_FIELD_SECTION
 16  	 (group_name => 'keyword_section_group' ,
 17  	  section_name => 'pdf',
 18  	  tag => 'pdf',
 19  	  visible => true );
 20  END;
 21  /

PL/SQL procedure successfully completed.

scott@ORA92> CREATE INDEX documents_keyword_index
  2  ON documents (dummy)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS ('datastore CTXSYS.my_multi section group keyword_section_group')
  5  /

Index created.

scott@ORA92> SELECT title, abstract, pdf, SCORE (1) AS weighted_score
  2  FROM   documents
  3  WHERE  CONTAINS (dummy, '(((MATCH) WITHIN title)) * 5
  4  		      OR ((MATCH) WITHIN abstract) * 3
  5  		      OR ((MATCH) WITHIN pdf)', 1) > 0
  6  ORDER  BY SCORE (1) DESC
  7  /

TITLE           ABSTRACT        PDF             WEIGHTED_SCORE
--------------- --------------- --------------- --------------
MATCH           OTHER           MATCH                       20
MATCH           MATCH           OTHER                       20
MATCH           OTHER           OTHER                       20
OTHER           MATCH           MATCH                       12
OTHER           MATCH           OTHER                       12
OTHER           OTHER           MATCH                        4

6 rows selected.

scott@ORA92>

Re: Relevance rank within section weighting [message #137777 is a reply to message #137625] Fri, 16 September 2005 14:19 Go to previous message
pmj1
Messages: 11
Registered: June 2005
Location: Ann Arobr, MI
Junior Member
Thanks. That is simpler than what I had in mind. I didn't realize the weighting syntax would generalize in that way.
Previous Topic: Faster Context Index creation!!
Next Topic: Default search behavior not understood
Goto Forum:
  


Current Time: Tue Dec 17 20:30:42 CST 2024