OraFAQ Forum: Text & interMedia » Word category and word relation

Home » Server Options » Text & interMedia » Word category and word relation (Oracle text)

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

Word category and word relation [message #608914]

Wed, 26 February 2014 11:25

ariton
Messages: 5
Registered: February 2007
Location: Bucharest / Romania

Junior Member

Word category and word relations

Starting from an already example - http://www.orafaq.com/forum/t/95509/

        ID RUBRO                               TIPO_PRODUCT                             MARCA
---------- ----------------------------------- ---------------------------------------- -------------------------
      8818 PAPEL BOND                          Papel Bond Blanco 75 gramos              FIRST  RESMA, CONPACA
      8826 PAPEL BOND                          Papel Bond Blanco 75 gramos, por mtr.2   APUNTA, VIVIAN
      8931 PAPEL BOND                          PAPEL BOND COLOR                         MONDI / LIBRAS, MONDI
     10855 CARTONES, CARTONCILLO Y CARTULINAS  Cart?n Chip de 30" X 40", calibre 20     PLEK, PAINSA
     11029 CARTONES, CARTONCILLO Y CARTULINAS  CARTULINA OPALINA BLANCA                 FIRST  PLIEGO, CONPACA

Is it possible using ctx index and contains operator to obtain all words relating to a specific search? (RUBRO + TIPO_PRODUCT + MARCA are togheter into an ctx index)

Ex: If I search for PAPEL I want an output like this

SELECT * FROM test_tab WHERE CONTAINS (test_col, 'PAPEL') > 0

ID   Related  Category
8818 PAPEL    RUBRO
8818 BOND     RUBRO - because RUBRO is related to PAPEL from RUBRO category
8818 BLANCO   TIPO_PRODUCT - because BLANCO is related to PAPEL from TIPO_PRODUCT
8818 BOND     TIPO PRODUCT - ...
8818 FIRST    MARCA
8818 RESMA    MARCA
8818 CONPACA    MARCA
...

Thank you very much,
Claudiu

[mod-edit: code tags added by bb; next time please add them yourself]

[Updated on: Wed, 26 February 2014 13:37] by Moderator

Report message to a moderator

Re: Word category and word relation [message #608916 is a reply to message #608914]

Wed, 26 February 2014 16:36

Barbara Boehmer
Messages: 9106
Registered: November 2002
Location: California, USA

Senior Member

The following assumes that you want to retrieve all the words from all columns in any row that contains the search term.

SCOTT@orcl12c> -- original data table:
SCOTT@orcl12c> CREATE TABLE original_table
  2    (id	      NUMBER PRIMARY KEY,
  3  	rubro	      VARCHAR2(34),
  4  	tipo_product  VARCHAR2(38),
  5  	marca	      VARCHAR2(22))
  6  /

Table created.

SCOTT@orcl12c> -- new table:
SCOTT@orcl12c> CREATE TABLE new_table
  2    (id	      NUMBER,
  3  	related       VARCHAR2(30),
  4  	category      VARCHAR2(30),
  5  	test_col      VARCHAR2(100))
  6  /

Table created.

SCOTT@orcl12c> -- trigger that applies data changes to original table to new table:
SCOTT@orcl12c> CREATE OR REPLACE TRIGGER original_table_biudr
  2    BEFORE INSERT OR UPDATE OR DELETE ON original_table
  3    FOR EACH ROW
  4  BEGIN
  5    IF DELETING OR UPDATING THEN
  6  	 DELETE FROM new_table WHERE id = :OLD.ID;
  7    END IF;
  8    IF INSERTING OR UPDATING THEN
  9  	 INSERT INTO new_table (id, related, category, test_col)
 10  	 SELECT :NEW.id, related, category,
 11  		:NEW.rubro || ' ' || :NEW.tipo_product || ' ' || :NEW.marca
 12  	 FROM	(SELECT UPPER (RTRIM (REGEXP_SUBSTR (:NEW.rubro, '[^ ,/]+', 1, COLUMN_VALUE), ' ,/')) related,
 13  			'RUBRO' category
 14  		 FROM	DUAL,
 15  			TABLE
 16  			  (CAST
 17  			     (MULTISET
 18  				(SELECT LEVEL
 19  				 FROM	DUAL
 20  				 CONNECT BY LEVEL <= REGEXP_COUNT (:NEW.rubro, ' ') + 1)
 21  			      AS SYS.ODCINUMBERLIST))
 22  		 UNION
 23  		 SELECT UPPER (RTRIM (REGEXP_SUBSTR (:NEW.tipo_product, '[^ ,/]+', 1, COLUMN_VALUE), ' ,/')) related,
 24  			'TIPO_PRODUCT' category
 25  		 FROM	DUAL,
 26  			TABLE
 27  			  (CAST
 28  			     (MULTISET
 29  				(SELECT LEVEL
 30  				 FROM	DUAL
 31  				 CONNECT BY LEVEL <= REGEXP_COUNT (:NEW.tipo_product, ' ') + 1)
 32  			      AS SYS.ODCINUMBERLIST))
 33  		 UNION
 34  		 SELECT UPPER (RTRIM (REGEXP_SUBSTR (:NEW.marca, '[^ ,/]+', 1, COLUMN_VALUE), ' ,/')) related,
 35  			'MARCA' category
 36  		 FROM	DUAL,
 37  			TABLE
 38  			  (CAST
 39  			     (MULTISET
 40  				(SELECT LEVEL
 41  				 FROM	DUAL
 42  				 CONNECT BY LEVEL <= REGEXP_COUNT (:NEW.marca, ' ') + 1)
 43  			      AS SYS.ODCINUMBERLIST)))
 44  	    WHERE  related IS NOT NULL;
 45    END IF;
 46  END original_table_biudr;
 47  /

Trigger created.

SCOTT@orcl12c> SHOW ERRORS
No errors.
SCOTT@orcl12c> -- text index on new table:
SCOTT@orcl12c> CREATE INDEX test_idx ON new_table (test_col) INDEXTYPE IS CTXSYS.CONTEXT
  2  PARAMETERS ('SYNC (ON COMMIT)')
  3  /

Index created.

SCOTT@orcl12c> -- insert data into original table and commit:
SCOTT@orcl12c> INSERT ALL
  2  INTO original_table VALUES (8818,	'PAPEL BOND',			       'Papel Bond Blanco 75 gramos',		   'FIRST  RESMA, CONPACA')
  3  INTO original_table VALUES (8826,	'PAPEL BOND',			       'Papel Bond Blanco 75 gramos, por mtr.2',   'APUNTA, VIVIAN')
  4  INTO original_table VALUES (8931,	'PAPEL BOND',			       'PAPEL BOND COLOR',			   'MONDI / LIBRAS, MONDI')
  5  INTO original_table VALUES (10855, 'CARTONES, CARTONCILLO Y CARTULINAS',  'Cart?n Chip de 30" X 40", calibre 20',	   'PLEK, PAINSA')
  6  INTO original_table VALUES (11029, 'CARTONES, CARTONCILLO Y CARTULINAS',  'CARTULINA OPALINA BLANCA',		   'FIRST  PLIEGO, CONPACA')
  7  SELECT * FROM DUAL
  8  /

5 rows created.

SCOTT@orcl12c> COMMIT
  2  /

Commit complete.

SCOTT@orcl12c> -- query new table using text index:
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> COLUMN related  FORMAT A30
SCOTT@orcl12c> COLUMN category FORMAT A30
SCOTT@orcl12c> SELECT DISTINCT related, category
  2  FROM   new_table
  3  WHERE  CONTAINS (test_col, 'papel') > 0
  4  ORDER  BY DECODE (category, 'RUBRO', 1, 'TIPO_PRODUCT', 2, 'MARCA', 3), related
  5  /

RELATED                        CATEGORY
------------------------------ ------------------------------
BOND                           RUBRO
PAPEL                          RUBRO
75                             TIPO_PRODUCT
BLANCO                         TIPO_PRODUCT
BOND                           TIPO_PRODUCT
COLOR                          TIPO_PRODUCT
GRAMOS                         TIPO_PRODUCT
MTR.2                          TIPO_PRODUCT
PAPEL                          TIPO_PRODUCT
POR                            TIPO_PRODUCT
APUNTA                         MARCA
CONPACA                        MARCA
FIRST                          MARCA
LIBRAS                         MARCA
MONDI                          MARCA
RESMA                          MARCA
VIVIAN                         MARCA

17 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2737401209

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    98 |     4   (0)| 00:00:01 |
|   1 |  SORT ORDER BY                |           |     1 |    98 |     4   (0)| 00:00:01 |
|   2 |   HASH UNIQUE                 |           |     1 |    98 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| NEW_TABLE |     1 |    98 |     4   (0)| 00:00:01 |
|*  4 |     DOMAIN INDEX              | TEST_IDX  |       |       |     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CTXSYS"."CONTAINS"("TEST_COL",'papel')>0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SCOTT@orcl12c>

Report message to a moderator

Re: Word category and word relation [message #608938 is a reply to message #608916]

Thu, 27 February 2014 04:48

ariton
Messages: 5
Registered: February 2007
Location: Bucharest / Romania

Junior Member

Thank you Barbara for your great and quick response.

I undestand the approach, but from performance point of view, it is not suitable to use directly the DR$ table with USER_DATASTORE concatenation of columns? The DR$ tables already contain the words and the relations with original table, but I don't know it is possible to get the results in this way?

Claudiu

Report message to a moderator

Re: Word category and word relation [message #608974 is a reply to message #608938]

Thu, 27 February 2014 10:37

Barbara Boehmer
Messages: 9106
Registered: November 2002
Location: California, USA

Senior Member

ariton wrote on Thu, 27 February 2014 02:48

Thank you Barbara for your great and quick response.

I undestand the approach, but from performance point of view, it is not suitable to use directly the DR$ table with USER_DATASTORE concatenation of columns? The DR$ tables already contain the words and the relations with original table, but I don't know it is possible to get the results in this way?

Claudiu

I can't picture what you are suggesting. Can you provide a demonstration or at least a partial demonstration?

If you use either a multi_column_datastore or a user_datastore to concatenate the columns, and create a context index that uses either datastore, then you can query using CONTAINS to get all rows containing the search word. You could use WITHIN in your CONTAINS clause to select from each column, then concatenate the results, to get your category. You could then use some SQL to extract all the words from those rows. If there are very few matching rows, then that might be efficient. If there are a lot of rows in your result set, then it could take a while.

If your are planning to query the dr$your_index$i domain index table directly, I don't see how you could get the desired results. Although there is a lot of information stored, it is not stored in a way that is accessible other than through CONTAINS queries. You won't be able to get the column that it comes from or even the row that it came from in order to extract the other words in the row. Also, if you access the dr$your_index$i table directly, then you are not using an index and it can produce unexpected side effects if you try to index such a table.

Have you tested what I provided? I posted the explain plan to show the efficient plan that it produces.

Report message to a moderator

Re: Word category and word relation [message #608981 is a reply to message #608974]

Thu, 27 February 2014 13:40

ariton
Messages: 5
Registered: February 2007
Location: Bucharest / Romania

Junior Member

Thank you Barbara for your complete response. I already implemended your solution and it works great.
You are right about dr$ table, it is not supported directly by Oracle and the only way to access data is the "contains" operator.

Claudiu

Report message to a moderator

Previous Topic:	USER_DATASTORE procedure and markup
Next Topic:	contains

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Dec 16 05:22:25 CST 2025