BI & Warehousing

Whaddya Want?

Tim Dexter - Fri, 2008-08-29 08:05
More Openworld stuff today, apologies to those of you not coming. There will be blogging and tweeting galore, plus the Openworld web site for you to keep up. I have a session on Monday @5.30pm! As it will be close... Tim Dexter
Categories: BI & Warehousing

The Mystery of OBIEE Bridge Tables

Rittman Mead Consulting - Thu, 2008-08-28 16:56

If you’ve played around with the Oracle BI Administrator tool for a while, you may have noticed a box you can tick in the Logical Table Properties dialog, called “Bridge Table”. If you’re familiar with Ralph Kimball and some of this dimensional modeling ideas, you’ve probably heard of this concept before, but it’s not all that clear how you use bridge tables in OBIEE and a quick search around the internet and the OTN forums doesn’t really come up with an examples on how it’s used.

Bridge Table

Bridge tables are a solution to what’s called the “multi-valued dimension” problem. For most dimensional models you generally want to link one sale, for example, to one product, one customer, one time period and so on, and this translates into a simple dimensional model where your fact table contains a single key value per dimension for each row that’s been stored. In some circumstances though, say where you are recording the diagnoses for a patient or the claim elements in a claim, you might need to record more than one key value for a particular dimension in each fact table row. In entity relationship modeling terms, you’ve got a many-to-many relationship between patient admissions and diagnoses, like this:

200808281839

and the usual way you resolve these many to many relationships is to use an intersection table, with the key from the patient admissions table and the key from the diagnoses table copied across to form the intersection, and usually with a weighting column that adds up to 1, so that you can properly add up all the diagnoses and not over-count them.

200808281842

As I said, this is not exactly new stuff and bridge tables, of which diagnosis group above is one of them, are a fairly common dimensional modeling construct. The problem you hit though when starting to use this feature is that the documentation on it is pretty minimal, and only really talks about setting this feature on the bridge table itself and doesn’t really mention what to do with the dimension table that hangs off of it. What I’ll do in this posting then is set out how I use it, explain my rationale and thereafter invite some feedback, so if someone else has come up with a better idea then we can work with that instead.

Going in to BI Administrator and looking at the physical model for the data set above, it looks like this:

Physical Model

with the key thing here being that the fact table we’ve got the bridge table joining to both the fact and diagnosis dim tables to form an intersection. If you imported this model into the logical business layer as is, the BI Administrator would think the intersection table is the fact table as the other tables join to it.

Logical No Bridge

So what you do now is go into the properties for the diagnosis table and indicate that it’s a bridge table, like this:

Set Bridge

Now when you look at the logical model the fact table is identified correctly.

Logical Bridge

If you try and validate the model now though, you get a warning because the diagnoses dimension table doesn’t link through to the fact table, as it goes through the bridge table instead.

Bridge Warning

Now not linking through to the fact table is sometimes allowed, basically in situations where you’ve snowflaked your logical model and the dimension table is actually a higher level in the same dimension, but in this case if we try and use this model and bring in the diagnoses information into a query, we’ll get a metadata consistency error.

To solve this, what I would do is remove the diagnosis dimension from the logical model, and instead add it to the logical table source for the bridge table, like this:

Add To Bridge Lts

Then I’d add any columns that I needed from the diagnosis dimension physical table into the bridge table, which I can do now as I’ve added the dimension table to the bridge table LTS, so that this bridge table now becomes my diagnosis dimension, like this:

Lts Diag Name

So what I’ve done here is take the dimension table, which before linked to the bridge table and thereafter caused the metadata inconsistency, and instead added it to the bridge table logical table source and added it’s columns to the bridge table logical table.

If I run a report now, the data comes out as you would expect, with multiple diagnoses per patient and the weighting applied correctly.

Bridge Report

However I’m conscious that to me, getting rid of the logical dimension table seems a bit wrong, and others have said that they don’t use the bridge table feature at all, and instead just combine the various tables into a single logical table source. If anyone’s got any other way of using bridge tables, add a comment to this post and we’ll see what the consensus is.

Categories: BI & Warehousing

Jobs: ETL Developer

Chet Justice - Thu, 2008-08-28 13:54
This position is in Tampa, Florida. If you are interested you can email me at chet dot justice at gmail.

Position Description:
· Develop and implement applications that support the business strategy of measuring and optimizing performance
· Support existing data warehouse applications and users

Mandatory Skills/Experience
· 2+ years experience working on data warehouse/data integration solutions
· 2+ years working with vendor ETL tools to design/build/test (Business Objects Data Integrator XIR2)
· 1+ years working with Oracle database; good PL/SQL knowledge
· Experience with dimensional data models (star schemas)

Preferred Skills/Experience
· Java programming experience
· Data Quality XIR2
· SQL Server 2005/SSIS
· Business Objects Enterprise XIR2
· Crystal Reports XIR2
Categories: BI & Warehousing

XSLT Extensions

Tim Dexter - Thu, 2008-08-28 10:09
Got a request to come up with a list of all the extension functions we provide in Publisher. I finally got around to generating some javadoc for the XSLTFunctions.java. You can get the zipped java doc for the 5.6.3 release... Tim Dexter
Categories: BI & Warehousing

Committed to Essbase 11.1.1

Tim Tow - Thu, 2008-08-28 09:54
Sorry it has been so long since I posted but I have been working on some non-Essbase product development area of our product. Specifically, I have been working on FIPS-140 (“Federal Information Processing Standards”) compliant computing required for Federal government computing systems. So if anyone wants to know about SHA-1 hashing (FIPS compliant) vs MD5 hashing (not FIPS compliant) or TripleDES encryption algorithms, I may be able to help them.

I have had some other things going on as well. I was interviewed by my local newspaper for the front page of the business section. The topic was businesspeople who fly their own airplanes on business which I do nearly every trip east of the Rockies. In fact, I flew myself and my family to New Orleans for the ODTUG Kaleidoscope conference. If you are interested in reading about my '15 minutes of fame', here is a link to the story: http://www.al.com/huntsvilletimes/stories/index.ssf?/base/business/1218993308126870.xml&coll=1

Now, on to Essbase stuff..

I haven't been writing much about Essbase 11.1.1 as I haven't had it installed on my laptop. Instead, I had one of my assistants create a VM with Essbase 11.1.1 on it.. Well, despite the fact that I have a laptop with 4 Gb of RAM, of which only 3 Gb is usable as I have XP Professional for my OS, VMWare ran very sluggishly on my system. Perhaps it is because I also had the entire Essbase stack, Oracle 11g Enterprise and SQL Server 2005 all running in addition to VMWare that made it slow, but still it was painful to turn all of that stuff off before firing up the VMWare image. Thus, I tended not to run it much and, in fact, ran it just enough to make sure Dodeca ran on 11.1.1.

Have you ever heard the old story about commitment? It goes something like 'a chicken and a pig were discussing their importance to breakfast and the chicken claimed it was more important as it contributed eggs. The pig replied that it made more of a commitment to breakfast when it provided bacon..' Well, translating that to my Essbase situation, I have committed to working with Essbase 11.1.1. Last night, I completely removed Essbase 9.3.1 from my system and will install 11.1.1 today. How is that for commitment?
Categories: BI & Warehousing

Bitmap Index Examples

Rittman Mead Consulting - Thu, 2008-08-28 07:05

Between assignments (mostly, OBIEE and BI Applications work) and a family holiday in the Austrian Alps I have been helping Mark with some of the examples he will be demonstrating as part of his 11g Data Warehousing Seminar. I must confess that I, too, will be using the examples in some of my data warehouse courses.

One of the demonstrations I have worked on for is the use of BITMAP indexes in simple (single table) queries. For this demonstration I used the standard SH schema and a couple of the provided bitmap indexes on the CUSTOMERS table; that is the GENDER (CUSTOMERS_GENDER_BIX)  and YEAR OF BIRTH (CUSTOMERS_YOB_BIX) indexes.

Firstly, let’s look at the data distributions:

select  cust_gender, count(*) from customers
group by cust_gender;

C   COUNT(*)
- ----------
F      18325
M      37175

There are about 66% M customers and 33% F, and no customers of any other (or NULL) gender.
A similar query shows us that there are 75 distinct years of birth (between 1913 and 1990) and for 1917 there are 37 customers

So what happens when we query the customers table on gender = M

SQL> explain plan for select * from customers where cust_gender = 'M';
select * from table(dbms_xplan.display);

Explained.
SQL>
PLAN_TABLE_OUTPUT
-------------------------------Plan hash value: 2008213504
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 27750 |  4877K|   407   (1)
|*  1 |  TABLE ACCESS FULL| CUSTOMERS | 27750 |  4877K|   407   (1)
---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CUST_GENDER"='M')

We do a full table scan and do not use the index.

But when we choose a more selective query - the people born in 1917 (which hits about 0.07% of the table) we see this:

SQL> explain plan for select * from customers where
CUST_YEAR_OF_BIRTH = 1917;

select * from table(dbms_xplan.display);

Explained.

SQL>
PLAN_TABLE_OUTPUT
--------------------------Plan hash value: 3460183038
---------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)
---------------------------------------------------
|   0 | SELECT STATEMENT             |                   |   740 |   130K|   146   (0)
|   1 |  TABLE ACCESS BY INDEX ROWID | CUSTOMERS         |   740 |   130K|   146   (0)
|   2 |   BITMAP CONVERSION TO ROWIDS|                   |       |       |
|*  3 |    BITMAP INDEX SINGLE VALUE | CUSTOMERS_YOB_BIX |       |       |
---------------------------------------------------

Predicate Information (identified by operation id):
—————————————————

3 - access(”CUST_YEAR_OF_BIRTH”=1917)

This time we are using the bitmap index.

So, is there any merit to using a bitmap index on a column with just two distinct values? After all, we do less work by table scanning the whole table.

The answer to this is yes. One of the great things about bitmap indexes is that they can be combined with other bitmap indexes on the same table by a bitwise ‘AND’ operation; that is we effectively create a multi-column index on-the-fly

SQL> explain plan for select * from customers where
CUST_YEAR_OF_BIRTH = 1917 and cust_gender = 'M';
select * from table(dbms_xplan.display);
Explained.
SQL>
PLAN_TABLE_OUTPUT
---------------------------------------------------Plan hash value: 4226934075
---------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------
|   0 | SELECT STATEMENT             |                      |   370 | 66600 |    82   (2)
|   1 |  TABLE ACCESS BY INDEX ROWID | CUSTOMERS            |   370 | 66600 |    82   (2)
|   2 |   BITMAP CONVERSION TO ROWIDS|                      |       |
|   3 |    BITMAP AND                |                      |       |
|*  4 |     BITMAP INDEX SINGLE VALUE| CUSTOMERS_YOB_BIX    |       |
|*  5 |     BITMAP INDEX SINGLE VALUE| CUSTOMERS_GENDER_BIX |       |       |           ---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("CUST_YEAR_OF_BIRTH"=1917)
5 - access("CUST_GENDER"='M')

Here you can see both indexes being combined  to access a small subset of the customer table.

Hopefully, this has been a clear example of why a low cardinality column may not be a good bitmap index choice in isolation but when combined with other bitmap indexes can be very beneficial.

Categories: BI & Warehousing

OTN forums - 2nd upgrade attempt

Amardeep Sidhu - Wed, 2008-08-27 19:58

Well, as everybody has seen, last week they have again upgraded the OTN forums. And sadly it has been handled in a very poor manner. End result is that it is running terribly slow and throwing some errors many times. In my office, i cant even open it properly may be its too heavy or some issue with the internet. Also there has been lots of hues and cries about the new point system and all. That in my opinion is fine except that speed at which forums are running should be fine.

But Oracle itself handling an upgrade like this is really bad. It should have happened in a sweet manner. Or no money comes from OTN, so no experts out there ?

Since the upgrade, i am not feeling like posting and i think number of posts have also gone down.

Lets hope for some improvements…

Categories: BI & Warehousing

Ragged Hierarchy Handling in OBIEE

Rittman Mead Consulting - Wed, 2008-08-27 15:14

Another area I’m looking to cover in my Oracle Open World session on OBIEE data modeling is support for ragged, skip-level and unbalanced hierarchies. If you’ve worked with OBIEE for a while and you’re from an OLAP background, one of the first things you would have noticed was that OBIEE, like most relational query tools, only really supports level-based hierarchies, where for example a product hierarchy breaks down into product classes, product families and items, and every item has a family, every family has a class and so on. This works well in your typical sales-type reporting environment, but when you are looking to report on organizations, account hierarchies and so on, this can be a bit limiting, as you may well end up working with a hierarchy that looks like the following:

Org Chart

Unfortunately in this current release of OBIEE there is no out-of-the-box support for these types of hierarchies, but if you have to report on hierarchies like this there are three main ways in which you can do this.

The first and most common way of handling this sort of data is to “flatten” it out so that all routes down the hierarchy have the same number of levels. If we took the organization chart above, this would mean that you’d need to create a dimension table, that you’ll then load into OBIEE, that looks something like this (note the fact that you need to make up names for the various dimension levels that you’ve created, and that you need to “copy-down” the lowest level dimension member down to the bottom level, to ensure that each route down the hierarchy has a bottom level to roll up from).

Flattened Table

Depending on whether your hierarchy is ragged (has varying depths down the hierarchy) or skip-level (has levels missing on various routes up the hierarchy, sometimes referred to as unbalanced) you end up filling in values at various points in the table, your measures can only be recorded at the lowest level in the flattened hierarchy, and OBIEE ends up repeating data where levels are missing when you drill down the hierarchy. This route does however have the virtue of simplicity and if your hierarchy doesn’t change much (most importantly, doesn’t increase the maximum depth of the longest route down the hierarchy) you can usually get away with this.

Another method that I’ve seen used, and in fact I documented in this blog post a couple of years ago, is to an approach put together by Joe Celko where you generate a table that sets out, for each member in the dimension, the member IDs that are it’s descendants.

Descendent

The idea here is that you take the parent-child relationship in your source table and break it out into this “stack” table, where each dimension member has its “children” listed out and you can then navigate through it, in the case of the example using the report navigation feature in Oracle BI Answers, to list out the results. This approach has the advantage of allowing you to record facts against any member in the dimension, not just those at the lowest (flattened hierarchy) level, but it’s a bit tricky to set up and maintaining the stack table is quite cumbersome. If you look around the internet you’ll see other variations on this, others I’ve seen include descendent tables and a variation where each route down the hierarchy has a guaranteed lowest level but then varying levels of aggregation up the hierarchy.

The final way that I’ve seen ragged and unbalanced hierarchies handled in OBIEE is to use Essbase as a data source. Essbase (like Oracle OLAP) inherently stores its dimension hierarchies in parent-child format and I was intrigued to notice that, when you bring in an Essbase dimension into the physical layer of your semantic model, there’s an option to set the dimension to balanced, unbalanced, ragged balanced and network. Now what can this be for?

Essbase Ragged 4

Normally, if you try and import a ragged hierarchy into the OBIEE seminar model using the BI Administrator tool, the hierarchy gets rejected and only the balanced ones import correctly. This was certainly the case for SAP B/W and Microsoft Analysis Services, but since the BI Administrator tool started using the Essbase C API rather than XML/A to read in it’s OLAP metadata (from the 10.1.3.3.2 release onwards, i.e. the first production release to support Essbase) you can now actually import most kinds of Essbase hierarchies in, and you use this setting after the import to tell OBIEE what sort of hierarchy it is (this is then used by the MDX generator within the BI Server). So what happens when you import a ragged hierarchy in, such as the one in this Essbase outline?

Essbase Ragged 1

Well, what actually happens is that the import routine scans the Essbase Outline and works out the maximum depth of the hierarchy, and then generates “columns” for each of the hiearchy levels, so that you end up with a physical model that looks like this for my organization chart:

Essbase Ragged 2

and a logical model that looks like this:

Essbase Ragged 3

So what’s happened here is that, like the conversion I did earlier on with the flattened table I set up, the import routine has generated a flattened relational representation of the data for the hierarchy as it stands at the moment. If I run a report in Oracle BI Answers and take a look at the resulting hierarchy, it looks like this:

200808272250

Which isn’t a bad representation, when you think about it, although this method does have one major drawback in that you’d need to re-import your Essbase dimension into the physical layer, and regenerate the logical business model of it, if the maximum depth of your ragged hierarchy increased.

So, that’s where things are in terms of ragged hierarchy support in OBIEE. At present, it’s not something that’s fully supported and you have to jump through various hoops to create a flattened, relational version of your hierarchy. The Essbase support is interesting (especially as with other multi-dimensional data sources, you can’t actually import ragged hierarchies in at all), and it’ll be interesting to see what functionality Oracle add to OBIEE 11g around this area, especially as Answers+ is being positioned as a replacement for the ragged-hierarchy capable Hyperion Web Analysis.

Categories: BI & Warehousing

Formatting Currencies

Tim Dexter - Wed, 2008-08-27 15:14
I bet $10 that many of you have reports that have numeric content that need to be formatted according to their currency. As many of you know you can use the formats that are provided by MS Word or our... Tim Dexter
Categories: BI & Warehousing

Loungin' at Oh Oh Dub

Tim Dexter - Tue, 2008-08-26 11:49
With OpenWorld around the corner we're gearing up for demos demos demos. The demogrounds are going to be open crazy hours ... Monday 11-6pm Tuesday 9-6pm Wednesday 9-6pm Thursday 9-1.30pm Plenty of time I hope, for you to get yourselves... Tim Dexter
Categories: BI & Warehousing

Resolving Fan Traps and Circular Joins using OBIEE

Rittman Mead Consulting - Tue, 2008-08-26 11:47

If you’re looking to migrate data from a Discoverer system to Oracle Business Intelligence Enterprise Edition, one of the first things you’ll probably end up thinking about is how your (not necessarily star schema-based) End User Layer will translated to a (star schema mandatory) Common Enterprise Information Model. In particular, Discoverer allows tables to have more than one join route between them, and it also allows circular joins so that, for example, an employee can belong to a department, which belongs to a region, which has an employee as the regional manager.

Circularjoin1-2

If you try and bring this arrangement in to the Physical Layer of your OBIEE semantic model, the administration tool will initially allow this and your repository verifies as being correct.

Circularjoins3

However if you then drag and drop the tables into the logical business model layer, all three lookup tables are shown as dimension tables, as you would expect,

Circularjoins4

and the BI Administrator tool shows an error message when you try and verify the repository, saying that circular joins aren’t supported.

Circularjoins5

In reality, you don’t want three dimension tables (departments, regions, employees) joining to the fact table (appointments), you just want one, so you’re going to have to use the mapping ability in the business model and mapping layer to join these three lookup tables together under one logical table source. For the employee table though, rather than join it through this circular join, you need to create an alias for it in the physical layer (in my case I call it CIRC_JOIN_REGEMP) like this:

Circularjoin6

and join the region table to this rather than the employees table.

Circularjoin7

Then you can create a single logical dimension table within your logical business model, and map the logical table source to the join between the four lookup tables, including your new alias for the employee table.

Circularjoins8

When you add the new columns into the logical table and then run a report, the figures come up as expected, with the regional manager details displayed correctly, the alias having resolved the circular join.

Circjoin11

Another issue that often comes up with Discoverer is “Fan Traps”. Fan traps are an oddity of SQL where, if you have say a dimension table full of account details, and two facts that join to it, say with sales figures in one and budget figures in the other, logically if you query the two together you should get a set of results like this, taken from the Oracle document “Oracle Discoverer’s Fan Trap Resolution - Correct Results Every Time”

Fantrap1

Fantrapexpectedres

The problem is though, that if you issue the obvious query in SQL, the results come back wrong, because SQL does a cartesian join between the two fact tables and over-counts the figures.

SQL> select a.acc_name,
2  sum(sales),
3  sum(budget)
4  from fan_trap_accounts a, fan_trap_sales s, fan_trap_budget b
5  where a.acc_id = s.acc_id
6  and   a.acc_id = b.acc_id
7  group by a.acc_name
8  order by a.acc_name
9  /

ACC_NAME             SUM(SALES) SUM(BUDGET)
-------------------- ---------- -----------
ACCOUNT1                    900        1050
ACCOUNT2                    130         200
ACCOUNT3                    600         750
ACCOUNT4                    600         600

Now Discoverer handles this quite well, and automatically rewrites the query internally to use two in-line views, one to join the sales and accounts tables, and one to join the budget and accounts tables, and then join the results together with the correct aggregation applied. So how does OBIEE handle this? Does it ignore the potential error and give you the wrong figures, or does it even disallow this sort of schema design in the first place? Let’s take a look.

The BI Administrator tool doesn’t have a problem with this sort of arrangement in the physical layer:

Fantrap2

The logical business model is fine as well, with the sales and budget tables being displayed as fact tables, and the accounts table as a dimension, as you would expect.

Fantrap3

In fact, when you run a query, using the same selection as our SQL statement, the figures do in fact come out correct.

Fantrap4-1

So what’s happening? Well, if you take a look at the query log and see how the logical, and physical, SQL queries are put together, you can see that OBIEE is doing what Discoverer does, but within the BI Server rather than writing a single SQL statement that uses inline views. The BI Server effectively creates two inline views that in turn lead to two physical SQL statements being sent to Oracle, the BI Server then joins them together and returns the data back as a single result set.

-------------------- Execution plan:

RqBreakFilter <<2750>>[1] [for database 0:0,0]
    RqList <<2641>> [for database 0:0,0]
        case  when D903.c1 is not null then D903.c1 when D903.c2 is not null then D903.c2 end  as c1 GB [for database 3023:5710,44],
        D903.c3 as c2 GB [for database 3023:5710,44],
        D903.c4 as c3 GB [for database 3023:5710,44]
    Child Nodes (RqJoinSpec): <<2752>> [for database 0:0,0]
        (
            RqList <<2795>> [for database 0:0,0]
                D902.c1 as c1 [for database 3023:5710,44],
                D901.c1 as c2 [for database 3023:5710,44],
                D901.c2 as c3 GB [for database 3023:5710,44],
                D902.c2 as c4 GB [for database 3023:5710,44]
            Child Nodes (RqJoinSpec): <<2798>> [for database 0:0,0]

                    (
                        RqList <<2653>> [for database 3023:5710:ora11g,44]
                            FAN_TRAP_ACCOUNTS.ACC_NAME as c1 GB [for database 3023:5710,44],
                            sum(FAN_TRAP_SALES.SALES by [ FAN_TRAP_ACCOUNTS.ACC_NAME] ) as c2 GB [for database 3023:5710,44]
                        Child Nodes (RqJoinSpec): <<2686>> [for database 3023:5710:ora11g,44]
                            FAN_TRAP_ACCOUNTS T6678
                            FAN_TRAP_SALES T6685
                        DetailFilter: FAN_TRAP_ACCOUNTS.ACC_ID = FAN_TRAP_SALES.ACC_ID [for database 0:0]
                        GroupBy: [ FAN_TRAP_ACCOUNTS.ACC_NAME]  [for database 3023:5710,44]
                        OrderBy: c1 asc [for database 3023:5710,44]
                    ) as D901 FullOuterStitchJoin <<2744>> On D901.c1 =NullsEqual D902.c1; actual join vectors:  [ 0 ] =  [ 0 ]

                    (
                        RqList <<2690>> [for database 3023:5710:ora11g,44]
                            FAN_TRAP_ACCOUNTS.ACC_NAME as c1 GB [for database 3023:5710,44],
                            sum(FAN_TRAP_BUDGET.BUDGET by [ FAN_TRAP_ACCOUNTS.ACC_NAME] ) as c2 GB [for database 3023:5710,44]
                        Child Nodes (RqJoinSpec): <<2723>> [for database 3023:5710:ora11g,44]
                            FAN_TRAP_ACCOUNTS T6678
                            FAN_TRAP_BUDGET T6681
                        DetailFilter: FAN_TRAP_ACCOUNTS.ACC_ID = FAN_TRAP_BUDGET.ACC_ID [for database 0:0]
                        GroupBy: [ FAN_TRAP_ACCOUNTS.ACC_NAME]  [for database 3023:5710,44]
                        OrderBy: c1 asc [for database 3023:5710,44]
                    ) as D902
        ) as D903
    OrderBy: c1 asc [for database 0:0,0]

+++Administrator:2a0000:2a0001:----2008/08/26 17:24:48

-------------------- Sending query to database named ora11g (id: <<2653>>):

select T6678.ACC_NAME as c1,
     sum(T6685.SALES) as c2
from
     CUST_ORDER_HISTORY.FAN_TRAP_ACCOUNTS T6678,
     CUST_ORDER_HISTORY.FAN_TRAP_SALES T6685
where  ( T6678.ACC_ID = T6685.ACC_ID )
group by T6678.ACC_NAME
order by c1

+++Administrator:2a0000:2a0001:----2008/08/26 17:24:48

-------------------- Sending query to database named ora11g (id: <<2690>>):

select T6678.ACC_NAME as c1,
     sum(T6681.BUDGET) as c2
from
     CUST_ORDER_HISTORY.FAN_TRAP_ACCOUNTS T6678,
     CUST_ORDER_HISTORY.FAN_TRAP_BUDGET T6681
where  ( T6678.ACC_ID = T6681.ACC_ID )
group by T6678.ACC_NAME
order by c1

Now this strictly speaking isn’t as efficient as Discoverer’s approach, which uses a single SQL statement and does the join of the two inline views within the database, but it’s still pretty clever and would work for platforms that don’t support inline views. In a way its an “on the fly” creation of two logical table sources without you having to set this up explicitly in the logical business model.

One thing I haven’t tried out yet is how OBIEE handles queries that, in Discoverer, would cause a fan-trap detection error, i.e. queries that it can’t rewrite to use two or more inline views. I suspect that these sorts of scenarios, that usually involve facts of differing granularity, where circular joins exist or where the keys to the tables don’t correspond would get picked up by the general data modeling rules in OBIEE, but if anyone’s got any experiences with this then by all means leave a comment.

If you’re wondering what prompted this posting, I’m putting the finishing touches to my Oracle Open World presentation on OBIEE data modeling, and looking to cover off a few data modeling “oddities”. I’ll try and cover support for ragged/unbalanced/value-based hierarchies in a posting tomorrow, and finish up with how the logical business model supports the Kimball concept of Bridge Tables, but no-one seems to use them.

Categories: BI & Warehousing

Tuning 11g Data Warehouse Schemas using the SQL Access Advisor

Rittman Mead Consulting - Mon, 2008-08-25 14:05

One of the new features in Oracle Database 11g that I’ve been meaning to try out for a while now is the SQL Access Advisor, part of the 11g Tuning Pack. Given that one of Oracle’s key product directions is to make administering the database as easy as, in the words of Dilbert, “one big button, and we press it for you in the factory” I thought It’d be interesting to see how far this automated tuning of database schemas has come, especially seeing as this latest 11g release of the feature now tunes and recommends indexes, materialized views and partitioning schemes.

To create a substantial enough workload with potential for improvement, I took a copy of the SH schema, removed all the constraints, indexes, materialized views and partitions, copied the SALES table into itself a few times so that it now had 147m rows, and then gathered statistics on the schema. I then when in to Oracle Business Intelligence Enterprise Edition and put together a selection of queries that aggregated data, filtered data on dimension attributes, time periods and so forth, and then copied the generated SQL statements into an SQL script so that I could run them on demand.

I’m now at the point where I’ve got some sample data and a set of SELECT statements that I want to improve the performance of. Taking a look at the 11g DBConsole application, there are a number of places where you can start to investigate the performance of your system. Starting at the front page, you can see a section on the ADDM Performance Analysis that shows, amongst other things, the top SQL statements by DB Time (response time) and by Disk I/O.

Advisor1

Selecting any of these SQL statements will present you with the option to run the SQL Tuning Advisor against them, which presents you with the original SQL statement, it’s execution plan, and a selection of alternatives with their execution plan. For now though, I’m more interested in what recommendations the SQL Access Advisor makes for my cloned schema, which has no indexes, materialized views, constraints or anything defined within it.

To perform the exercise I’m going to use another feature of the 11g DB Console, the ability to create SQL Tuning sets using a point and click interface, to capture a workload based on the SQL statements used by my queries. To do this I navigate to the Performance Screen and then the SQL Tuning Sets screen, and create a new SQL Tuning Set to hold the sample workload.

Advisor2

I then tell DBConsole to capture SQL statements for 10 minutes.

Advisor3

The SQL captured will be limited to the SH_COPY schema, so that I miss out all the background SQL generated from the database and from other sessions.

Advisor4

I then let the tuning set job run and execute my script from an SQL*Plus session to generate the workload. After 10 minutes, I return to the DBConsole job and check that a suitable number of long-running SQL statements have been captured in the SQL Tuning Set.

Advisor5

Taking a look at the buffer gets and elapsed time for the queries, there should be something to work with here.

Advisor6

So now that I’ve got an SQL Tuning set, the next step is to navigate to the SQL Performance Analyzer, which you can reach from the Advisor Central page on DBConsole.

Advisor7

Now the idea with the SQL Performance Analyzer is that you can take a given workload (i.e. an SQL Tuning Set), run it to provide a baseline response time, then apply some changes (change database initialization parameters, create some indexes, rework the SQL statement) and then replay the same workload to assess the effect of the changes. It’s nothing different to doing this all yourself but I thought I’d give this a try, to try and keep all of the process within the Enterprise Manager website.

I click on the Guided Workflow link and am presented with the following workflow steps:

Advisor8

So the first step then is define the SQL Performance Analyzer Task, and select the SQL Tuning Set I just created to replay as my workload.

Advisor9-1

I now execute step two of the SQL Performance Analyzer workflow to replay the SQL Tuning Set to establish a baseline, before I run the first iteration of the SQL Access Advisor to recommend new indexes. Once this is complete, I navigate to the SQL Access Advisor and begin the process of using it to recommend indexes for my schema.

The first step in this process is to either choose to validate existing indexes, MVs and so on in a schema (this would be useful for the original SH schema), or in my case, to recommend some new ones.

Advisor10

Once you select your advisor option, you can either base the recommendations on an SQL Tuning Set as I’m doing, whatever’s in the SQL cache at the time (filtered by parsing schema, elapsed time and so on) or on the structure of the tables and dimension objects in one or more schemas. From my perspective it’s better to create an SQL Tuning Set as you’ve got a defined, repeatable set of SQL statements then that you can use in different places, such as the SQL Performance Analyzer that I’m using to frame this tuning exercise.

Advisor11

On the next page of the wizard, I can select the schema structures that I want the advisor to recommend. In this instance I just select indexes on their own, although you can select all three - materialized views and partitions as well as indexes - if you want to recommend all of them at once.

Advisor12

“Advanced” options allow you to tell the advisor to consider or disregard the cost of creating the indexes when it makes recommendations, whether to include recommendations around dropping unused indexes, and to place limits on the size of indexes, materialized views etc that it can recommend, and whether to only consider queries which makes sense of course in a data warehouse environment.

Advisor13

Then it’s a case of telling DB Console when to run the advisor job - now, in my case - and then review the settings, before pressing the Submit button to run the analysis.

Advisor14

After waiting a few seconds, the analysis was ready. From the analysis overview, there are six index recommendations that bring down the overall workload marginally, which is about what I expected as most of the work being done by the OBIEE queries was summarization, not filtering.

Advisor15

Taking a look at the recommendation details, you can see more details on the individual index recommendations. Notice the “Action Type” colour blobs - when you run the advisor asking it to recommend materialized views and partitioning strategies as well, you’ll get additional colour blobs in there as well, depending on what objects have been recommended.

Advisor16

Looking at the top recommendation in more detail, I can see that it’s recommended three B-tree indexes on the channels, products and customers tables, in order to improve the performance of one of the SQL statements.

Advisor17

Recommending B-tree over Bitmap index is to me surprising, as typically in a data warehouse environment you’ll want to use bitmap indexes when working with dimension attributes, firstly because they’re typically smaller and faster to access (jn combination with other bitmap indexes) than B-tree indexes, and secondly because they’re required for star transformations, although I have seen Oracle transform B-tree indexes to bitmap indexes on-the-fly so that star transformations can take place.

Running the SQL Access Advisor again, this time selecting Materialized Views as the recommendation type, generated a set of recommendations with a much bigger potential performance gain. Notice the colour blobs are now for materialized views, mv logs and “others”, and that the expected performance gain is much bigger.

Advisor20

Taking a look at the top recommendation, it’s for four materialized view logs, a fast refresh materialized view with query rewrite enabled, and a statistics gathering job to gather stats on the new materialized view. I was wondering if the “other” would be a dimension recommendation (as in CREATE DIMENSION) but it looks like this version of the SQL Access Advisor doesn’t cover these yet.

Advisor21

Running the advisor a third time, this time for partitioning, brings up a set of recommendations that on the face of it look like some good timesavers.

Advisor22

However the actual recommendations themselves are a bit strange. The first one recommends hash partitioning the Sales fact table and the Times dimension table into 32 buckets, when I would have thought the more obvious recommendation would be to range partition the fact table on its date column, do the same to the time dimension and thereby permit partition-wise joins between the two.

Advisor23

Of course this does bring up the other question as to how to partition a fact table when you have more than one date within it, say load date and transaction date, given that one will favour the ETL process (potentially allowing partition exchange loading) and one will favour queries (partition pruning), but I’m struggling to understand why it would recommend hash partitioning a fact table with a date datatype as one of the table columns.

The second recommendation was equally strange, with the Sales fact table again recommended for hash partitioning and the customers dimension table range partitioned, using the INTERVAL (automatic creation of new partitions as needed) feature.

Advisor24

I wonder if this lack of awareness of data warehousing standard practices is down to their being no constraints defined within the schema, therefore the SQL Access Advisor doesn’t really know the relationship between the various tables? Once I’ve run the recommendations I’ll add some constraints and see how the recommendations come out then.

I now go back and run the Advisor one more time, this time turning on all the options to recommend indexes, materialized views and partitions, so that I can make a selection between the various recommendations, assuming as I do that you’re not meant to implement all of them (the partitioning one above illustrates this point, as it would partition the Sales fact table twice).

Looking at the projected performance improvement now, it’s certainly (if it works) not something to be sniffed at.

Advisor25

The projected cost of the various SQL queries in the workload has gone down from 565853 to 212, with most statements gaining a 100% query execution time improvement, and however you might query the advice it certainly does what it says on the tin, at least as far as the statements in my workload are concerned.

Looking at the total set of recommendations in detail, they all include elements of index creation, materialized view creation and partitioning changes, I also note that they’re all selected which does seem to suggest that they should all be implemented together (or at least that it’s alright to do so).

Advisor27-1

So, I leave all the changes selected and press the “Schedule Implementation” button, and then notice that (a) it won’t schedule implementations that involve partitioning changes, you have to save the script and manually run it yourself, and (b) the materialized views and indexes that the script creates are being created in the SYSTEM account, not the account that the objects are in (SH_COPY), so I’ll need to edit the script a bit so they’re created in the correct place. I copy the advisor SQL output to Notepad, do a search and replace on SYSTEM, changing all values to SH_COPY, and then run the script from SQL*Plus. Once I did this (the script took about 10 minutes to run), I then returned to the SQL Performance Advisor and ran my second replay of the sample workload, followed by the comparison of the two workloads.

Advisor28

So, the $64,000 question - did the performance of the queries improve? Well, according to the SQL Performance Analyzer output, it improved by an overall 63%, with some queries gaining and some queries regressing. In defense of the results, a couple of statements failed to execute in the recommendations script because of formatting errors, but even so, it’s not a bad result, albeit with a few weird partitioning recommendations.

Advisor29

Of course this is still a bit simplistic, as firstly I only gave the advisor a very small set of queries to advise against, and secondly the SH_COPY schema only has two fact tables and a bunch of dimension tables. Also, of course, none of this takes into account the ETL process which will make its own demands on the schema, its indexes, summaries and partitioning, and you’ll most definately need to bear this in mind before changing around the performance structures in your warehouse (which is why, of course, you’re still best contacting an experienced data warehouse implementor before going ahead with any of the bits of advice the SQL Access Advisor comes up with). Still, it’s not bad as a way of automating much of the analysis that goes into tuning the structures in your warehouse, and I can certainly see a point in time in the future when the SQL access advisor takes into account a SQL Tuning Set to represent your ETL process, and recommends additional structures such as cube organized materialized views and index and storage variations such as compression and compressed b-tree indexes as appropriate.

So finally, what about the issue around hash partitions, and the B-tree indexes on the dimension tables. Will adding constraints to the schema lead to more sensible (in the context of data warehousing) recommendations? I then went on to add constraints to the sample schema, generate a new SQL Tuning Set and run the SQL Access Advisor again, and the result was - the same: hash partitioning was recommended for the fact table, B-tree indexes for the dimensions, which does seem to suggest that the SQL Access Advisor isn’t too “data warehouse-aware”, although my test case was limited and there could well be something I’m missing. Still, I’ll be interested to see how this feature evolves as the various patches to 11gR1 and 11gR2 are released, it’s certainly got the potential to be an interesting feature.

Categories: BI & Warehousing

Line Charts

Tim Dexter - Mon, 2008-08-25 10:09
Hot chart of the week this week is line charts - not just your single line oh no. Folks want multiple lines! Its not that tough and with the new Template Builder for Word (10.1.3.4) out its even easier -... Tim Dexter
Categories: BI & Warehousing

Competent Shapes

Tim Dexter - Fri, 2008-08-22 09:08
More on using shapes in your RTF templates and outputs. I think I have mentioned before that the shape support is very cool but finding real world applications for them is a little tougher. Im pleased to see a few... Tim Dexter
Categories: BI & Warehousing

Data Mining in Action: Oracle Sales Prospector

Marcos Campos - Fri, 2008-08-22 07:10
I firmly believe that a major trend in applications is the incorporation of analytic-enabled functionality. Users want more than just reports or a replay of the past. Users want to have insights and their attention directed to key points. This is where analytics can make a big impact across all types of applications. Notice that I am not proposing exposing analytical capabilities (e.g., data Marcoshttp://www.blogger.com/profile/14756167848125664628noreply@blogger.com
Categories: BI & Warehousing

Last Position

Tim Dexter - Thu, 2008-08-21 07:06
I have seen quite a few templates recently that are using our updatable variables to keep a track of the current record being processed. Regular readers will know my dislike for the updatable variables. Don't get me wrong, they have... Tim Dexter
Categories: BI & Warehousing

Calculate Total Possible Score for Wii Bowling, Power Throws

Chet Justice - Wed, 2008-08-20 20:04
Needless to say I have to much time on my hands. I've become obsessed with two Wii games, Bowling and Tanks. I play over and over. I had a perfect game in bowling the other day which wasn't quite as exciting as I thought it would be.

I've also been playing the "Training" game of Power Throws (bowling). Your given 10 shots with another row added each frame. You start out with 10, then 15, then 21, etc. If you get all the pins, you get a bonue (total pins * 2). So far, my highest score has been 651.

I've never bothered to figure out what the total possible score is though.

SQL to the rescue.

Using analytics and the CONNECT BY LEVEL option (10g and higher), it's fairly easy:

SELECT
pin_row,
pins,
running_total,
rt_with_bonus
FROM
(
SELECT
rownum pin_row,
rownum pins,
SUM( rownum ) OVER ( ORDER BY rownum ) running_total,
( SUM( rownum ) OVER ( ORDER BY rownum ) * 2 ) rt_with_bonus
FROM dual
CONNECT BY LEVEL <= 13
)
WHERE pins >= 4
I had to set the filter on the outer query because you start with 4 rows (10 pins). And the results:

CJUSTICE@XE>BREAK ON REPORT
CJUSTICE@XE>COMPUTE SUM OF RUNNING_TOTAL ON REPOR
CJUSTICE@XE>COMPUTE SUM OF RT_WITH_BONUS ON REPOR
CJUSTICE@XE>/

PIN_ROW PINS RUNNING_TOTAL RT_WITH_BONUS
---------- ---------- ------------- -------------
4 4 10 20
5 5 15 30
6 6 21 42
7 7 28 56
8 8 36 72
9 9 45 90
10 10 55 110
11 11 66 132
12 12 78 156
13 13 91 182
------------- -------------
sum 445 890
Voila! 890 is the best possible score if I strike every frame. I did manage to pass my high score by 9 pins tonight as well. It's a great day!

So how come no one wants to hire a guy that can do this kind of fun stuff with SQL? ;)
Categories: BI & Warehousing

Lotus notes and sametime

Amardeep Sidhu - Wed, 2008-08-20 11:30

My company uses Lotus Notes for email and Sametime connect as messenger for all the internal communication. Both are stupid applications and are a big resource hogs. Most of the systems are P IV with 512 MB RAM. You just run Lotus and Sametime 7 and it eats up everything. The system moves like a 386 based machine.

Moreover, i was looking for Lotus short cuts today and found that there is no short cut in Lotus for Send/Receive mail.

At our client’s site, i have seen IBM people having kept an exe killnotes.exe on their desktop which they use to quickly exit from Lotus. What bullshit ?

Outlook/Outlook Express rocks, seriously !

Categories: BI & Warehousing

iPhone in India

Amardeep Sidhu - Wed, 2008-08-20 11:22

Ok, so all the wait is over and they have announced the iPhone prices in India. The 8 GB one will be for INR 31000 and 16 GB for 36100. What the hell ? In US its being sold for $200 which comes up to be 9000 INR at the max. And here more than 3 times. Why ? They are just trying to screw people with this iPhone tag.

Anyways, Nokia’s capture in Indian market is going to be impacted terribly. They are already into the launch of N96. Lets see how things go on.

BTW by pricing iPhone at 31k, they just lost one customer. I am not buying it ;)

Categories: BI & Warehousing

Customizing Publisher

Tim Dexter - Tue, 2008-08-19 16:52
I have had requests over the last few months asking how one might customize the look and feel of Publisher. I started to delve into the css and html files but its just too time consuming and a mine field... Tim Dexter
Categories: BI & Warehousing