Re: Data Warehouse Guru Question (Data Model question) between FACT and DIMENSION tbl

From: Tim Gorman <tim.evdbt_at_gmail.com>
Date: Thu, 29 Jul 2021 15:43:07 -0700
Message-ID: <28a71f39-b00c-05a3-c429-47f0460cddca_at_gmail.com>



Chris,

It is just the two tables (i.e. FACT and DIM) involved in the query?  There are no additional dimension tables?

If so, that is likely much of the problem, because the concept of a dimensional data model (a.k.a. star schema) as well as Oracle's "star transformation" mechanism is built around the idea of two or more dimensions being involved in any query on the fact table.  As many dimensions as possible, in fact.  The more, the merrier.

The basic idea is this...

  1. user specifies search criteria on columns on the dimensions
    • if no search predicates are specified for a dimension table, then that dimension is not included in steps 2-4 below. o instead it will be used in step 5 as a post-processing filter or join step
  2. database uses search criteria to filter down to a result set from each dimension
  3. database uses all the result sets from all the dimensions to create a consolidated set (a.k.a. star transformation)
  4. database uses all of the dimension key values to join to the bitmap keys in the fact table (a.k.a. star join)
  5. database uses additional filter predicates or join predicates against the fact table to produce final results

The entire reason for having bitmap indexes on all of the foreign keys of the fact table is for that big operation on step 4 to work. That operation in step 4 works well if the consolidated set created in step 3 is small, and that's only going to happen if filtering happens on the dimensions.

The filtering in step 5 is merely an after-effect, ideally most of the work of joining all the tables was already performed in step 4, and step 5 is running against a really small final result set.

The upshot is the more well-filtered dimensions that are involved in the star transformation, the more efficient is the star join.  Most (if not all) filtering predicates in star transformations should be found on columns in the dimensions.  Ideally there are no filter predicates against the fact table, but if there are, then we have to consider whether they should be converted into dimensions or left as "conformed dimensions".

Are star transformations important?  Yes, if you are attempting to query a dimensional data model.

There is more about star transformations HERE <https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/query-transformations.html#GUID-76B6B58B-24B0-4DFE-AC1B-CFAC4D93C55A>. Also, the STAR_TRANSFORMATION_ENABLED
<https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/STAR_TRANSFORMATION_ENABLED.html#GUID-B2E6145D-164A-4453-9839-0F6E6442A922> parameter should not be FALSE, and there are additional guidelines to post for the Oracle optimizer to initiate star transformations more easily, such as hierarchies specified with the CREATE HIERARCHY <https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-HIERARCHY.html#GUID-73925877-992B-4624-AA28-8F565E9C3F0D> command, enabled PK constraints on the dimension tables, and ENABLED or RELY enforced foreign-key constraints on the fact table.

Finally, one fun feature for star transformations is when Oracle database materializes the consolidated set mentioned in step 3 above, which is called a bitmap-join index.  Unfortunately, there are no specific documentation pages in the Oracle doc-set devoted to bitmap-join indexes, but there is a good discussion of DW optimization HERE
<https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/data-warehouse-optimizations-techniques.html#GUID-79C29A60-3477-448D-835D-2940D060D050>as well Tim Hall's more detailed blog post HERE <https://oracle-base.com/articles/9i/bitmap-join-indexes>. If it is possible to create a bitmap-join index on a fact table, then it is almost certain that any query using those dimensional search predicates will achieve a star transformation.

Hope this helps,

-Tim

On 7/28/2021 6:41 AM, Chris Taylor wrote:
> This is a sanity check question as I'm helping out with a data model
> design and indexing strategy with our ETL team.  None of us are what
> I'd call real strong on data warehouse design.
>
> Here's the setup and the question:
>
> Setup:
> We have a DIMENSION table partitioned by SITE (Client) and we have a
> FACT table partitioned by SITE (Client).
>
> The PK on the DIMENSION is a combo of SITE+KEY_FIELD
> The join between the DIMENSION and the FACT is on SITE+KEY_FIELD
>
> All that is fine and works great with bitmap indexes on the FACT etc.
>
> *HOWEVER* here is the problem:
>
> In one particular query, we're applying an /additional FILTER
> /condition to the DIMENSION table (instead of the FACT) after the join
> which causes a bit of performance penalty as the rows are found
> between the DIMENSION and FACT but then the filter is then applied to
> the rows from the DIMENSION table (which is very large table also).
>
> My theory is that this filter clause (the columns used by the filter)
> should actually also be in the FACT and apply the filter there with a
> BITMAP index instead of applying it to the DIMENSION.  RIght now the
> condition (columns) in the DIMENSION that are checked for a value do
> not exist in the FACT table..
>
> The join & condition is something like this:
>
> *|  35 | PARTITION LIST ITERATOR                    |                
> |      0 |      1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
>  Q1,01 | PCWP
> |* 36 |                  INDEX UNIQUE SCAN         |
> DIMENSION_TABLE_NDX2      |      0 |      1 |     |     1   (0)|
> 00:00:01 |   KEY |   KEY |  Q1,01 | PCWP
> |* 37 |                TABLE ACCESS BY LOCAL INDEX ROWID         |
> DIMENSION_TABLE
>
>
>   36 - access("FACT_TABLE"."SITE_ID"="DIMENSION_TABLE"."SITE_ID" AND
> "FACT_TABLE"."KEY_FIELD"="DIMENSION_TABLE"."KEY_FIELD")
>   37 - filter((CASE "FIELD1" WHEN 'I' THEN 'Inpatient' WHEN 'O' THEN
> 'Outpatient' END))*
>
> So we have the join between the FACT and DIMENSION on SITE_ID & KEY_FIELD
>
> But then we have an additional FILTER being applied to the DIMENSION
> table.
>
> Currently this doesn't kill performance, but definitely adds about 25%
> execution time (6 secs out of 24) and I'm concerned this will get
> worse as data volumes grow.
>
> So the question is , should CASE statement filter really get applied
> to the FACT table and replicate that column (or some version of it) to
> the FACT table instead?   (Generally as a rule of thumb I mean)
>
> We can add the CASE statement to the PK index on the DIMENSION_TABLE
> as well, and thats what the ETL team is suggesting, but that seems
> like the wrong way to go.
>
> Chris
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jul 30 2021 - 00:43:07 CEST

Original text of this message