performance problems updating dimension tables with MVs prejoined between fact and dimension

Home -> Community -> Mailing Lists -> Oracle-L -> performance problems updating dimension tables with MVs prejoined between fact and dimension

From: Marc Slemko <identd_at_gmail.com>
Date: Sun, 20 Feb 2005 00:10:24 -0800
Message-ID: <9662496505022000107796f42a@mail.gmail.com>

We are having a bit of an annoying issue with a data warehouse we are currently in the process of deploying WRT some of the fast refresh materialized views we have. Running 10g.

Due to the large size of some of our dimensions, we want to create materialized views that pre-join a fact table with a dimension table to pull one boolean attribute, for example, out of the dimension without having to blow up the size of the materialized view by having the actual id of the dimension table in it.

This data warehouse is loaded more or less in realtime, throughout the day, so nearly all our materialized views are refresh on commit and fast refresh. We don't just add new fact table entries during the course of a day, but also load in new dimension records as necessary.

Conceptually, this poses no problems for the rollups since the dimension data in question is never modified after insertion.

However we are running into an implementation issue with Oracle that is causing this to be extremely ugly. When we do a COMMIT after adding a batch of records to the dimension, Oracle behind the scenes runs something like the following to update the materialized view:

MERGE INTO "MARCS"."MV_LARGE" "SNA$" USING (SELECT /*+ OPT_ESTIMATE(QUERY_BLOCK MAX=1000) */ "MAS$1"."SMALL_D" "GB0",
"DLT$0"."IS_SPECIAL" "GB1", SUM(1) "D0" FROM (SELECT /*+ CARDINALITY(MAS$

*/ CHARTOROWID("MAS$"."M_ROW$$") RID$ , "MAS$"."IS_SPECIAL",
"MAS$"."LARGE_D" FROM "MARCS"."MLOG$_LARGE" "MAS$" WHERE
"MAS$".SNAPTIME$$ > :1 ) "DLT$0" , (SELECT "MAS$"."ROWID" "RID$" ,
"MAS$"."SMALL_D", "MAS$"."LARGE_D" FROM "MARCS"."FACT" "MAS$") AS OF
SNAPSHOT (:2) "MAS$1" WHERE ("MAS$1"."LARGE_D"="DLT$0"."LARGE_D") GROUP BY "MAS$1"."SMALL_D","DLT$0"."IS_SPECIAL")"AV$" ON (SYS_OP_MAP_NONNULL("SNA$"."SMALL_D")=SYS_OP_MAP_NONNULL("AV$"."GB0") AND SYS_OP_MAP_NONNULL("SNA$"."IS_SPECIAL")=SYS_OP_MAP_NONNULL("AV$"."GB1")) WHEN MATCHED THEN UPDATE SET "SNA$"."COUNT(*)"="SNA$"."COUNT(*)
"+"AV$"."D0" WHEN NOT MATCHED THEN INSERT ("SNA$"."SMALL_D",
"SNA$"."IS_SPECIAL", "SNA$"."COUNT(*)") VALUES ( "AV$"."GB0", "AV$"."GB1",
"AV$"."D0")

Which ends up being optimized as:

Rows Row Source Operation

-------  ---------------------------------------------------
      2  MERGE  (cr=90996 pr=90930 pw=0 time=12304762 us)
      0   PX COORDINATOR  (cr=90996 pr=90930 pw=0 time=12304749 us)
      0  VIEW  (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  HASH JOIN OUTER BUFFERED (cr=0 pr=0 pw=0 time=0 us)
      0   BUFFER SORT (cr=0 pr=0 pw=0 time=0 us)
      0    PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  VIEW  (cr=90971 pr=90930 pw=0 time=12283633 us)(Orphan Entry)
      0  SORT GROUP BY (cr=90971 pr=90930 pw=0 time=12283627 us)
      0   HASH JOIN  (cr=90971 pr=90930 pw=0 time=12283597 us)

1000 TABLE ACCESS FULL MLOG$_LARGE (cr=23 pr=0 pw=0 time=1106 us) 1000000 TABLE ACCESS FULL FACT (cr=90948 pr=90930 pw=0 time=2006047 us)

      0   PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  MAT_VIEW ACCESS FULL MV_LARGE (cr=0 pr=0 pw=0 time=0 us)

Note the full table scan of the "fact" table. The optimizer appears to be making the correct decision here, since as far as it knows there are enough rows of the materialized view that would have to be updated based on the new records in the "large" dimension that a full table scan of the fact table is faster ... it just isn't being told that there really can't be any rows to update due to the foreign key constraint and the fact rows were only added. This becomes quite impractical with our larger fact tables. For example, with one dimension we have about 65 million fact table rows with 800k distinct dimension key values in the fact table, so you wouldn't have to actually update too many dimension records before the proper choice for the CBO would be to do the full table scan based on what it knows.

When we do updates in very small batches, the optimizer picks an index lookup into the fact table instead, which is much faster since it doesn't take long to lookup something that doesn't exist in the index.

I don't even know how I can add a hint to work around this, since the MERGE statement is generated by Oracle. Stored outlines don't seem an option because Oracle dynamically includes information about cardinality and blocks in the query text, which changes more or less with each commit.

A related, but less important problem, is that if we update column "b" in a row in a dimension table that is prejoined to the fact table in a materialized view using column "a" of the dimension, Oracle still thinks it has to update all the rows in the materialized views related to the modified rows, even though column "b" isn't even in the materialized view log, let alone used in the materialized view.

Has anyone run into this before or have any suggestions or other approaches that could accomplish the same thing? We really don't want to have to break the dimension out into multiple dimensions. We do have a TAR open, but have not been getting very useful results so far and are on a tight timeline.

It can be a bit hard to explain exactly what is going on here, a full example, with schema, population scripts, sample sqltrace / tkprof output, etc. is at http://znep.com/~marcs/tmp/mvtest.tgz if anyone thinks they could offer some insight, a README.txt in that file contains details of the contents.

Thanks for any comments. I'll post a followup if I figure anything out.

--
http://www.freelists.org/webpage/oracle-l

Received on Sun Feb 20 2005 - 03:13:20 CST