Skip navigation.

Inverted tables: an alternative to relational structures

The inverted table format can deliver fast and flexible query capabilities, but is not widely used. ADABAS is probably the most successful implementation, but how often do you see that nowadays? Following is a description of how to implement inverted structures within a relational database. All code run on Oracle Database 12c, release 12.1.0.1.

Consider this table and a few rows, that describe the contents of my larder:

create table food(id number,capacity varchar2(10),container varchar2(10),item varchar2(10));
insert into food values(1,'large','bag','potatoes');
insert into food values(2,'small','box','carrots');
insert into food values(3,'medium','tin','peas');
insert into food values(4,'large','box','potatoes');
insert into food values(5,'small','tin','carrots');
insert into food values(6,'medium','bag','peas');
insert into food values(7,'large','tin','potatoes');
insert into food values(8,'small','bag','carrots');
insert into food values(9,'medium','box','peas');

The queries I run against the table might be "how many large boxes have I?" or "give me all the potatoes, I don't care about how they are packed". The idea is that I do not know in advance what columns I will be using in my predicate: it could be any combination. This is a common issue in a data warehouse.
So how do I index the table to satisfy any possible query? Two obvious possibilities:
First, build an index on each column, and the optimizer can perform an index_combine operation on whatever columns happen to be listed in the predicate. But that means indexing every column - and the table might have hundreds of columns. No way can I do that.
Second, build a concatenated index across all the columns: in effect, use an IOT. That will give me range scan access if any of the predicated columns are in the leading edge of the index key followed by filtering on the rest of the predicate. Or if the predicate does not include the leading column(s), I can get skip scan access and filter. But this is pretty useless, too: there will be wildly divergent performance depending on the predicate.
The answer is to invert the table:
create table inverted(colname varchar2(10),colvalue varchar2(10),id number);
insert into inverted select 'capacity',capacity,id from food;
insert into inverted select 'container',container,id from food;
insert into inverted select 'item',item,id from food;

Now just one index on each table can satisfy all queries:
create index food_i on food(id);
create index inverted_i on inverted(colname,colvalue);

To retrieve all the large boxes:
orclz> set autotrace on explain
orclz> select * from food where id in
  2  (select id from inverted where colname='capacity' and colvalue='large'
  3  intersect
  4  select id from inverted where colname='container' and colvalue='box');

        ID CAPACITY   CONTAINER  ITEM
---------- ---------- ---------- ----------
         4 large      box        potatoes


Execution Plan
----------------------------------------------------------
Plan hash value: 1945359172

---------------------------------------------------------------------------------
| Id  | Operation                                | Name       | Rows  | Bytes | C
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |            |     3 |   141 |
|   1 |  MERGE JOIN                              |            |     3 |   141 |
|   2 |   TABLE ACCESS BY INDEX ROWID            | FOOD       |     9 |   306 |
|   3 |    INDEX FULL SCAN                       | FOOD_I     |     9 |       |
|*  4 |   SORT JOIN                              |            |     3 |    39 |
|   5 |    VIEW                                  | VW_NSO_1   |     3 |    39 |
|   6 |     INTERSECTION                         |            |       |       |
|   7 |      SORT UNIQUE                         |            |     3 |    81 |
|   8 |       TABLE ACCESS BY INDEX ROWID BATCHED| INVERTED   |     3 |    81 |
|*  9 |        INDEX RANGE SCAN                  | INVERTED_I |     3 |       |
|  10 |      SORT UNIQUE                         |            |     3 |    81 |
|  11 |       TABLE ACCESS BY INDEX ROWID BATCHED| INVERTED   |     3 |    81 |
|* 12 |        INDEX RANGE SCAN                  | INVERTED_I |     3 |       |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("ID"="ID")
       filter("ID"="ID")
   9 - access("COLNAME"='capacity' AND "COLVALUE"='large')
  12 - access("COLNAME"='container' AND "COLVALUE"='box')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

orclz>

Or all the potatoes:
orclz> select * from food where id in
  2  (select id from inverted where colname='item' and colvalue='potatoes');

        ID CAPACITY   CONTAINER  ITEM
---------- ---------- ---------- ----------
         1 large      bag        potatoes
         4 large      box        potatoes
         7 large      tin        potatoes


Execution Plan
----------------------------------------------------------
Plan hash value: 762525239

---------------------------------------------------------------------------------
| Id  | Operation                              | Name       | Rows  | Bytes | Cos
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |            |     3 |   183 |
|   1 |  NESTED LOOPS                          |            |       |       |
|   2 |   NESTED LOOPS                         |            |     3 |   183 |
|   3 |    SORT UNIQUE                         |            |     3 |    81 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| INVERTED   |     3 |    81 |
|*  5 |      INDEX RANGE SCAN                  | INVERTED_I |     3 |       |
|*  6 |    INDEX RANGE SCAN                    | FOOD_I     |     1 |       |
|   7 |   TABLE ACCESS BY INDEX ROWID          | FOOD       |     1 |    34 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("COLNAME"='item' AND "COLVALUE"='potatoes')
   6 - access("ID"="ID")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan

orclz>

Of course, consideration needs to be given to handling more complex boolean expressions; maintaining the inversion is going to take resources; and a query generator has to construct the inversion code and re-write the queries. But In principle, this structure can deliver indexed access for unpredictable predicates of any number of any columns, with no separate filtering operation. Can you do that with a normalized star schema? I don't think so.
I hope this little thought experiment has stimulated the little grey cells, and made the point that relational structures are not always optimal for all problems.
--
John Watson
Oracle Certified Master DBA
http://skillbuilders.com

Could this be done with an IndexType

It would be great if Barbara from the OraFAQ forum took a look at this. She is an expert on Oracle's IndexTypes, and I suspect that you could create an Index Type to do this for you seamlessly. Ie. When you modify data in FOOD, the index structure (which is actually another table) is updated atomatically. Also, you wouldn't have to write your queries against the INVERTED table, Oracle could translate queries against FOOD into equivalent queries against the IndexType, but you may have to use one of those funny predicates (CONTAINS()?)

I have never seen Barbara comment on articles on this part of the site. John, it might be worthwhile sending her a PM to look at this page - she might have an elegant hybrid solution.

My thoughts on:
>>But that means indexing every column - and the table might have hundreds of columns. No way can I do that.

I would keep an open mind on this. Maintaining the INVERTED structure will probably generate just as much (or more?) IO. If the FOOD table is suitable for Bitmap indexing (no concurrent DML, no row-by-row DML), then I reckon a Bitmap index on every column would make for more efficient SELECTs and more efficient DML - that's just my gut feel though - I have no proof.

That said, there are plenty of times when Bitmap indexes are not an option. I'm filing the idea away for the future.

Oracle Text method

John, thanks for sending the PM and Ross, thanks for suggesting it. This can be accomplished using Oracle Text features: multi_column_datastore, section group, context index, and queries using contains and within. Please see the demonstration below. I have put the insert statement just before the query, in order to demonstrate that the index is updated upon commit. The multi_column_datastore does the same thing for one table as the user_datastore and procedure can do for multiple tables. Behind the scenes, they create an xml type document with tags that correspond to the columns or anything else that you wish to use. When you create section groups, you can then search within those sections. The auto_section_group does this using the existing tag names. This is just the simplest counterpart to the inverted table method that you posted. There are many more Oracle Text options that can be combined with this.

SCOTT@orcl12c> -- table provided by John Watson:
SCOTT@orcl12c> create table food(id number,capacity varchar2(10),container varchar2(10),item varchar2(10))
  2  /

Table created.

SCOTT@orcl12c> -- Oracle Text multi_column_datastore:
SCOTT@orcl12c> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('test_mcds', 'MULTI_COLUMN_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE	 ('test_mcds', 'COLUMNS', 'capacity, container, item');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- optional dummy column for indexing:
SCOTT@orcl12c> ALTER TABLE food ADD (any_column VARCHAR2(1))
  2  /

Table altered.

SCOTT@orcl12c> -- Oracle Text Context index that uses multi_column_datastore and section group:
SCOTT@orcl12c> CREATE INDEX test_idx ON food (any_column)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  PARAMETERS
  4    ('DATASTORE	test_mcds
  5  	 SECTION GROUP	CTXSYS.AUTO_SECTION_GROUP
  6  	 SYNC		(ON COMMIT)')
  7  /

Index created.

SCOTT@orcl12c> -- data provided by John Watson:
SCOTT@orcl12c> insert all
  2  into food (id, capacity, container, item) values(1,'large','bag','potatoes')
  3  into food (id, capacity, container, item) values(2,'small','box','carrots')
  4  into food (id, capacity, container, item) values(3,'medium','tin','peas')
  5  into food (id, capacity, container, item) values(4,'large','box','potatoes')
  6  into food (id, capacity, container, item) values(5,'small','tin','carrots')
  7  into food (id, capacity, container, item) values(6,'medium','bag','peas')
  8  into food (id, capacity, container, item) values(7,'large','tin','potatoes')
  9  into food (id, capacity, container, item) values(8,'small','bag','carrots')
 10  into food (id, capacity, container, item) values(9,'medium','box','peas')
 11  select * from dual
 12  /

9 rows created.

SCOTT@orcl12c> COMMIT
  2  /

Commit complete.

SCOTT@orcl12c> -- queries:
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> -- to retrieve all large boxes:
SCOTT@orcl12c> SELECT * FROM food
  2  WHERE  CONTAINS
  3  	      (any_column,
  4  	       'large WITHIN capacity AND
  5  		box WITHIN container') > 0
  6  /

        ID CAPACITY   CONTAINER  ITEM       A
---------- ---------- ---------- ---------- -
         4 large      box        potatoes

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3055600298

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    48 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| FOOD     |     1 |    48 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | TEST_IDX |       |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("ANY_COLUMN",'large WITHIN capacity AND
                   box WITHIN container')>0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SCOTT@orcl12c> -- all potatoes:
SCOTT@orcl12c> SELECT * FROM food
  2  WHERE  CONTAINS
  3  	      (any_column,
  4  	       'potatoes WITHIN item') > 0
  5  /

        ID CAPACITY   CONTAINER  ITEM       A
---------- ---------- ---------- ---------- -
         1 large      bag        potatoes
         4 large      box        potatoes
         7 large      tin        potatoes

3 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3055600298

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    48 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| FOOD     |     1 |    48 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | TEST_IDX |       |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("ANY_COLUMN",'potatoes WITHIN item')>0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SCOTT@orcl12c>

Oracle Text or manual inverrsion

I've been looking at Barbara's solution to my how-to-index-every-column problem, and trying to do little reverse engineering. I think that Oracle Text is in fact doing an inversion. Looking at the tables created by Text, it is storing every literal value in a manner that makes multi-column predicate searches indexable. And of course it all works with minimal manual intervention. I suppose it was unlikely that I would have invented something that the guys at Redwood Shores hadn't thought of. Thank you for the example, definitely worth remembering.