Re: Data Warehouse Guru Question (Data Model question) between FACT and DIMENSION tbl
Date: Tue, 3 Aug 2021 13:09:45 -0400
Message-ID: <CAP79kiSNGCDjd4YBxbr57ZEwoJ2LwrWBux7X7i79qv8fReX8wg_at_mail.gmail.com>
Jonathan,
Thank you for the input. Learning a lot about DW data model configurations and enjoying it (odd as that sounds).
I know some of my questions come from a lot of ignorance of using DIM & FACT tables and star transformations and appreciate your thoughtful response.
Chris
On Tue, Aug 3, 2021 at 12:53 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
wrote:
> >> But then we have an additional FILTER being applied to the DIMENSION
> table.
> >> Currently this doesn't kill performance, but definitely adds about 25%
> execution time (6 secs out of 24) and I'm concerned this will get worse as
> data volumes grow.
> >> So the question is , should CASE statement filter really get applied to
> the FACT table and replicate that column (or some version of it) to the
> FACT table instead? (Generally as a rule of thumb I mean)
>
> The fragment you've shown looks perfectly normal for a bitmap star
> transformation with joinback to dimension when reporting any dimension
> columns that aren't key columns.
> (I'm assuming the filter predicate ends with " = 'something}' ". That's
> just the generic way that Oracle deals with the star transformation when a
> joinback is needed.
>
> Adding "FIELD1" to the index supporting the PK is the sensible option if
> you want to improve performance - just as carrying columns with any primary
> key index is for any type of query.
>
> It shouldn't be "unusual" to have non-key predicates against dimension
> tables, and I'd expect most queries to be doing that type of thing anyway.
>
> The only thing that would make your plan odd (for a bitmap star
> transformation) would be if that predicate with the case experession hadn't
> also been used in the first pass against the dimension table that genrated
> the primary keys used to select the bitmaps from the fact table indexes.
>
> Regards
> Jonathan Lewis
>
>
>
> On Wed, 28 Jul 2021 at 14:41, Chris Taylor <
> christopherdtaylor1994_at_gmail.com> wrote:
>
>> This is a sanity check question as I'm helping out with a data model
>> design and indexing strategy with our ETL team. None of us are what I'd
>> call real strong on data warehouse design.
>>
>> Here's the setup and the question:
>>
>> Setup:
>> We have a DIMENSION table partitioned by SITE (Client) and we have a FACT
>> table partitioned by SITE (Client).
>>
>> The PK on the DIMENSION is a combo of SITE+KEY_FIELD
>> The join between the DIMENSION and the FACT is on SITE+KEY_FIELD
>>
>> All that is fine and works great with bitmap indexes on the FACT etc.
>>
>> *HOWEVER* here is the problem:
>>
>> In one particular query, we're applying an *additional FILTER *condition
>> to the DIMENSION table (instead of the FACT) after the join which causes a
>> bit of performance penalty as the rows are found between the DIMENSION and
>> FACT but then the filter is then applied to the rows from the DIMENSION
>> table (which is very large table also).
>>
>> My theory is that this filter clause (the columns used by the filter)
>> should actually also be in the FACT and apply the filter there with a
>> BITMAP index instead of applying it to the DIMENSION. RIght now the
>> condition (columns) in the DIMENSION that are checked for a value do not
>> exist in the FACT table..
>>
>> The join & condition is something like this:
>>
>>
>>
>>
>>
>>
>>
>> *| 35 | PARTITION LIST ITERATOR |
>> | KEY | KEY | Q1,01 | PCWP|* 36 | INDEX UNIQUE SCAN
>> | DIMENSION_TABLE_NDX2 | 0 | 1 |
>> | 1 (0)| 00:00:01 | KEY | KEY | Q1,01 | PCWP|* 37 |
>> TABLE ACCESS BY LOCAL INDEX ROWID | DIMENSION_TABLE 36 -
>> access("FACT_TABLE"."SITE_ID"="DIMENSION_TABLE"."SITE_ID" AND
>> "FACT_TABLE"."KEY_FIELD"="DIMENSION_TABLE"."KEY_FIELD") 37 - filter((CASE
>> "FIELD1" WHEN 'I' THEN 'Inpatient' WHEN 'O' THEN 'Outpatient' END))*
>>
>> So we have the join between the FACT and DIMENSION on SITE_ID & KEY_FIELD
>>
>> But then we have an additional FILTER being applied to the DIMENSION
>> table.
>>
>> Currently this doesn't kill performance, but definitely adds about 25%
>> execution time (6 secs out of 24) and I'm concerned this will get worse as
>> data volumes grow.
>>
>> So the question is , should CASE statement filter really get applied to
>> the FACT table and replicate that column (or some version of it) to the
>> FACT table instead? (Generally as a rule of thumb I mean)
>>
>> We can add the CASE statement to the PK index on the DIMENSION_TABLE as
>> well, and thats what the ETL team is suggesting, but that seems like the
>> wrong way to go.
>>
>> Chris
>>
>>
>>
>>
>>
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Aug 03 2021 - 19:09:45 CEST