Data Warehouse Guru Question (Data Model question) between FACT and DIMENSION tbl

From: Chris Taylor <christopherdtaylor1994_at_gmail.com>
Date: Wed, 28 Jul 2021 09:41:42 -0400
Message-ID: <CAP79kiSGP7x97zCMMcpXOi_UzM=WdgXtOYGv_FOVKBhReE9PEw_at_mail.gmail.com>



This is a sanity check question as I'm helping out with a data model design and indexing strategy with our ETL team. None of us are what I'd call real strong on data warehouse design.

Here's the setup and the question:

Setup:
We have a DIMENSION table partitioned by SITE (Client) and we have a FACT table partitioned by SITE (Client).

The PK on the DIMENSION is a combo of SITE+KEY_FIELD The join between the DIMENSION and the FACT is on SITE+KEY_FIELD

All that is fine and works great with bitmap indexes on the FACT etc.

*HOWEVER* here is the problem:

In one particular query, we're applying an *additional FILTER *condition to the DIMENSION table (instead of the FACT) after the join which causes a bit of performance penalty as the rows are found between the DIMENSION and FACT but then the filter is then applied to the rows from the DIMENSION table (which is very large table also).

My theory is that this filter clause (the columns used by the filter) should actually also be in the FACT and apply the filter there with a BITMAP index instead of applying it to the DIMENSION. RIght now the condition (columns) in the DIMENSION that are checked for a value do not exist in the FACT table..

The join & condition is something like this:

*|  35 |                 PARTITION LIST ITERATOR                    |
                        |      0 |      1 |       |     1   (0)| 00:00:01 |
  KEY |   KEY |  Q1,01 | PCWP|* 36 |                  INDEX UNIQUE SCAN
                    | DIMENSION_TABLE_NDX2      |      0 |      1 |       |
    1   (0)| 00:00:01 |   KEY |   KEY |  Q1,01 | PCWP|* 37 |
 TABLE ACCESS BY LOCAL INDEX ROWID           | DIMENSION_TABLE  36 -
access("FACT_TABLE"."SITE_ID"="DIMENSION_TABLE"."SITE_ID" AND
"FACT_TABLE"."KEY_FIELD"="DIMENSION_TABLE"."KEY_FIELD") 37 - filter((CASE
"FIELD1" WHEN 'I' THEN 'Inpatient' WHEN 'O' THEN 'Outpatient' END))*

So we have the join between the FACT and DIMENSION on SITE_ID & KEY_FIELD

But then we have an additional FILTER being applied to the DIMENSION table.

Currently this doesn't kill performance, but definitely adds about 25% execution time (6 secs out of 24) and I'm concerned this will get worse as data volumes grow.

So the question is , should CASE statement filter really get applied to the FACT table and replicate that column (or some version of it) to the FACT table instead? (Generally as a rule of thumb I mean)

We can add the CASE statement to the PK index on the DIMENSION_TABLE as well, and thats what the ETL team is suggesting, but that seems like the wrong way to go.

Chris

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jul 28 2021 - 15:41:42 CEST

Original text of this message