BI & Warehousing
Whaddya Want?
The Mystery of OBIEE Bridge Tables
If you’ve played around with the Oracle BI Administrator tool for a while, you may have noticed a box you can tick in the Logical Table Properties dialog, called “Bridge Table”. If you’re familiar with Ralph Kimball and some of this dimensional modeling ideas, you’ve probably heard of this concept before, but it’s not all that clear how you use bridge tables in OBIEE and a quick search around the internet and the OTN forums doesn’t really come up with an examples on how it’s used.

Bridge tables are a solution to what’s called the “multi-valued dimension” problem. For most dimensional models you generally want to link one sale, for example, to one product, one customer, one time period and so on, and this translates into a simple dimensional model where your fact table contains a single key value per dimension for each row that’s been stored. In some circumstances though, say where you are recording the diagnoses for a patient or the claim elements in a claim, you might need to record more than one key value for a particular dimension in each fact table row. In entity relationship modeling terms, you’ve got a many-to-many relationship between patient admissions and diagnoses, like this:

and the usual way you resolve these many to many relationships is to use an intersection table, with the key from the patient admissions table and the key from the diagnoses table copied across to form the intersection, and usually with a weighting column that adds up to 1, so that you can properly add up all the diagnoses and not over-count them.

As I said, this is not exactly new stuff and bridge tables, of which diagnosis group above is one of them, are a fairly common dimensional modeling construct. The problem you hit though when starting to use this feature is that the documentation on it is pretty minimal, and only really talks about setting this feature on the bridge table itself and doesn’t really mention what to do with the dimension table that hangs off of it. What I’ll do in this posting then is set out how I use it, explain my rationale and thereafter invite some feedback, so if someone else has come up with a better idea then we can work with that instead.
Going in to BI Administrator and looking at the physical model for the data set above, it looks like this:

with the key thing here being that the fact table we’ve got the bridge table joining to both the fact and diagnosis dim tables to form an intersection. If you imported this model into the logical business layer as is, the BI Administrator would think the intersection table is the fact table as the other tables join to it.

So what you do now is go into the properties for the diagnosis table and indicate that it’s a bridge table, like this:

Now when you look at the logical model the fact table is identified correctly.

If you try and validate the model now though, you get a warning because the diagnoses dimension table doesn’t link through to the fact table, as it goes through the bridge table instead.

Now not linking through to the fact table is sometimes allowed, basically in situations where you’ve snowflaked your logical model and the dimension table is actually a higher level in the same dimension, but in this case if we try and use this model and bring in the diagnoses information into a query, we’ll get a metadata consistency error.
To solve this, what I would do is remove the diagnosis dimension from the logical model, and instead add it to the logical table source for the bridge table, like this:

Then I’d add any columns that I needed from the diagnosis dimension physical table into the bridge table, which I can do now as I’ve added the dimension table to the bridge table LTS, so that this bridge table now becomes my diagnosis dimension, like this:

So what I’ve done here is take the dimension table, which before linked to the bridge table and thereafter caused the metadata inconsistency, and instead added it to the bridge table logical table source and added it’s columns to the bridge table logical table.
If I run a report now, the data comes out as you would expect, with multiple diagnoses per patient and the weighting applied correctly.

However I’m conscious that to me, getting rid of the logical dimension table seems a bit wrong, and others have said that they don’t use the bridge table feature at all, and instead just combine the various tables into a single logical table source. If anyone’s got any other way of using bridge tables, add a comment to this post and we’ll see what the consensus is.
Jobs: ETL Developer
Position Description:
· Develop and implement applications that support the business strategy of measuring and optimizing performance
· Support existing data warehouse applications and users
Mandatory Skills/Experience
· 2+ years experience working on data warehouse/data integration solutions
· 2+ years working with vendor ETL tools to design/build/test (Business Objects Data Integrator XIR2)
· 1+ years working with Oracle database; good PL/SQL knowledge
· Experience with dimensional data models (star schemas)
Preferred Skills/Experience
· Java programming experience
· Data Quality XIR2
· SQL Server 2005/SSIS
· Business Objects Enterprise XIR2
· Crystal Reports XIR2
XSLT Extensions
Committed to Essbase 11.1.1
I have had some other things going on as well. I was interviewed by my local newspaper for the front page of the business section. The topic was businesspeople who fly their own airplanes on business which I do nearly every trip east of the Rockies. In fact, I flew myself and my family to New Orleans for the ODTUG Kaleidoscope conference. If you are interested in reading about my '15 minutes of fame', here is a link to the story: http://www.al.com/huntsvilletimes/stories/index.ssf?/base/business/1218993308126870.xml&coll=1
Now, on to Essbase stuff..
I haven't been writing much about Essbase 11.1.1 as I haven't had it installed on my laptop. Instead, I had one of my assistants create a VM with Essbase 11.1.1 on it.. Well, despite the fact that I have a laptop with 4 Gb of RAM, of which only 3 Gb is usable as I have XP Professional for my OS, VMWare ran very sluggishly on my system. Perhaps it is because I also had the entire Essbase stack, Oracle 11g Enterprise and SQL Server 2005 all running in addition to VMWare that made it slow, but still it was painful to turn all of that stuff off before firing up the VMWare image. Thus, I tended not to run it much and, in fact, ran it just enough to make sure Dodeca ran on 11.1.1.
Have you ever heard the old story about commitment? It goes something like 'a chicken and a pig were discussing their importance to breakfast and the chicken claimed it was more important as it contributed eggs. The pig replied that it made more of a commitment to breakfast when it provided bacon..' Well, translating that to my Essbase situation, I have committed to working with Essbase 11.1.1. Last night, I completely removed Essbase 9.3.1 from my system and will install 11.1.1 today. How is that for commitment?
Bitmap Index Examples
Between assignments (mostly, OBIEE and BI Applications work) and a family holiday in the Austrian Alps I have been helping Mark with some of the examples he will be demonstrating as part of his 11g Data Warehousing Seminar. I must confess that I, too, will be using the examples in some of my data warehouse courses.
One of the demonstrations I have worked on for is the use of BITMAP indexes in simple (single table) queries. For this demonstration I used the standard SH schema and a couple of the provided bitmap indexes on the CUSTOMERS table; that is the GENDER (CUSTOMERS_GENDER_BIX) and YEAR OF BIRTH (CUSTOMERS_YOB_BIX) indexes.
Firstly, let’s look at the data distributions:
select cust_gender, count(*) from customers group by cust_gender; C COUNT(*) - ---------- F 18325 M 37175
There are about 66% M customers and 33% F, and no customers of any other (or NULL) gender.
A similar query shows us that there are 75 distinct years of birth (between 1913 and 1990) and for 1917 there are 37 customers
So what happens when we query the customers table on gender = M
SQL> explain plan for select * from customers where cust_gender = 'M';
select * from table(dbms_xplan.display);
Explained.
SQL>
PLAN_TABLE_OUTPUT
-------------------------------Plan hash value: 2008213504
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 27750 | 4877K| 407 (1)
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 27750 | 4877K| 407 (1)
---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CUST_GENDER"='M')
We do a full table scan and do not use the index.
But when we choose a more selective query - the people born in 1917 (which hits about 0.07% of the table) we see this:
SQL> explain plan for select * from customers where CUST_YEAR_OF_BIRTH = 1917; select * from table(dbms_xplan.display); Explained. SQL> PLAN_TABLE_OUTPUT --------------------------Plan hash value: 3460183038 --------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) --------------------------------------------------- | 0 | SELECT STATEMENT | | 740 | 130K| 146 (0) | 1 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS | 740 | 130K| 146 (0) | 2 | BITMAP CONVERSION TO ROWIDS| | | | |* 3 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_YOB_BIX | | | --------------------------------------------------- Predicate Information (identified by operation id): ————————————————— 3 - access(”CUST_YEAR_OF_BIRTH”=1917)
This time we are using the bitmap index.
So, is there any merit to using a bitmap index on a column with just two distinct values? After all, we do less work by table scanning the whole table.
The answer to this is yes. One of the great things about bitmap indexes is that they can be combined with other bitmap indexes on the same table by a bitwise ‘AND’ operation; that is we effectively create a multi-column index on-the-fly
SQL> explain plan for select * from customers where
CUST_YEAR_OF_BIRTH = 1917 and cust_gender = 'M';
select * from table(dbms_xplan.display);
Explained.
SQL>
PLAN_TABLE_OUTPUT
---------------------------------------------------Plan hash value: 4226934075
---------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------
| 0 | SELECT STATEMENT | | 370 | 66600 | 82 (2)
| 1 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS | 370 | 66600 | 82 (2)
| 2 | BITMAP CONVERSION TO ROWIDS| | |
| 3 | BITMAP AND | | |
|* 4 | BITMAP INDEX SINGLE VALUE| CUSTOMERS_YOB_BIX | |
|* 5 | BITMAP INDEX SINGLE VALUE| CUSTOMERS_GENDER_BIX | | | ---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CUST_YEAR_OF_BIRTH"=1917)
5 - access("CUST_GENDER"='M')
Here you can see both indexes being combined to access a small subset of the customer table.
Hopefully, this has been a clear example of why a low cardinality column may not be a good bitmap index choice in isolation but when combined with other bitmap indexes can be very beneficial.
OTN forums - 2nd upgrade attempt
Well, as everybody has seen, last week they have again upgraded the OTN forums. And sadly it has been handled in a very poor manner. End result is that it is running terribly slow and throwing some errors many times. In my office, i cant even open it properly may be its too heavy or some issue with the internet. Also there has been lots of hues and cries about the new point system and all. That in my opinion is fine except that speed at which forums are running should be fine.
But Oracle itself handling an upgrade like this is really bad. It should have happened in a sweet manner. Or no money comes from OTN, so no experts out there ?
Since the upgrade, i am not feeling like posting and i think number of posts have also gone down.
Lets hope for some improvements…
Ragged Hierarchy Handling in OBIEE
Another area I’m looking to cover in my Oracle Open World session on OBIEE data modeling is support for ragged, skip-level and unbalanced hierarchies. If you’ve worked with OBIEE for a while and you’re from an OLAP background, one of the first things you would have noticed was that OBIEE, like most relational query tools, only really supports level-based hierarchies, where for example a product hierarchy breaks down into product classes, product families and items, and every item has a family, every family has a class and so on. This works well in your typical sales-type reporting environment, but when you are looking to report on organizations, account hierarchies and so on, this can be a bit limiting, as you may well end up working with a hierarchy that looks like the following:

Unfortunately in this current release of OBIEE there is no out-of-the-box support for these types of hierarchies, but if you have to report on hierarchies like this there are three main ways in which you can do this.
The first and most common way of handling this sort of data is to “flatten” it out so that all routes down the hierarchy have the same number of levels. If we took the organization chart above, this would mean that you’d need to create a dimension table, that you’ll then load into OBIEE, that looks something like this (note the fact that you need to make up names for the various dimension levels that you’ve created, and that you need to “copy-down” the lowest level dimension member down to the bottom level, to ensure that each route down the hierarchy has a bottom level to roll up from).

Depending on whether your hierarchy is ragged (has varying depths down the hierarchy) or skip-level (has levels missing on various routes up the hierarchy, sometimes referred to as unbalanced) you end up filling in values at various points in the table, your measures can only be recorded at the lowest level in the flattened hierarchy, and OBIEE ends up repeating data where levels are missing when you drill down the hierarchy. This route does however have the virtue of simplicity and if your hierarchy doesn’t change much (most importantly, doesn’t increase the maximum depth of the longest route down the hierarchy) you can usually get away with this.
Another method that I’ve seen used, and in fact I documented in this blog post a couple of years ago, is to an approach put together by Joe Celko where you generate a table that sets out, for each member in the dimension, the member IDs that are it’s descendants.

The idea here is that you take the parent-child relationship in your source table and break it out into this “stack” table, where each dimension member has its “children” listed out and you can then navigate through it, in the case of the example using the report navigation feature in Oracle BI Answers, to list out the results. This approach has the advantage of allowing you to record facts against any member in the dimension, not just those at the lowest (flattened hierarchy) level, but it’s a bit tricky to set up and maintaining the stack table is quite cumbersome. If you look around the internet you’ll see other variations on this, others I’ve seen include descendent tables and a variation where each route down the hierarchy has a guaranteed lowest level but then varying levels of aggregation up the hierarchy.
The final way that I’ve seen ragged and unbalanced hierarchies handled in OBIEE is to use Essbase as a data source. Essbase (like Oracle OLAP) inherently stores its dimension hierarchies in parent-child format and I was intrigued to notice that, when you bring in an Essbase dimension into the physical layer of your semantic model, there’s an option to set the dimension to balanced, unbalanced, ragged balanced and network. Now what can this be for?

Normally, if you try and import a ragged hierarchy into the OBIEE seminar model using the BI Administrator tool, the hierarchy gets rejected and only the balanced ones import correctly. This was certainly the case for SAP B/W and Microsoft Analysis Services, but since the BI Administrator tool started using the Essbase C API rather than XML/A to read in it’s OLAP metadata (from the 10.1.3.3.2 release onwards, i.e. the first production release to support Essbase) you can now actually import most kinds of Essbase hierarchies in, and you use this setting after the import to tell OBIEE what sort of hierarchy it is (this is then used by the MDX generator within the BI Server). So what happens when you import a ragged hierarchy in, such as the one in this Essbase outline?

Well, what actually happens is that the import routine scans the Essbase Outline and works out the maximum depth of the hierarchy, and then generates “columns” for each of the hiearchy levels, so that you end up with a physical model that looks like this for my organization chart:

and a logical model that looks like this:

So what’s happened here is that, like the conversion I did earlier on with the flattened table I set up, the import routine has generated a flattened relational representation of the data for the hierarchy as it stands at the moment. If I run a report in Oracle BI Answers and take a look at the resulting hierarchy, it looks like this:

Which isn’t a bad representation, when you think about it, although this method does have one major drawback in that you’d need to re-import your Essbase dimension into the physical layer, and regenerate the logical business model of it, if the maximum depth of your ragged hierarchy increased.
So, that’s where things are in terms of ragged hierarchy support in OBIEE. At present, it’s not something that’s fully supported and you have to jump through various hoops to create a flattened, relational version of your hierarchy. The Essbase support is interesting (especially as with other multi-dimensional data sources, you can’t actually import ragged hierarchies in at all), and it’ll be interesting to see what functionality Oracle add to OBIEE 11g around this area, especially as Answers+ is being positioned as a replacement for the ragged-hierarchy capable Hyperion Web Analysis.
Formatting Currencies
Loungin' at Oh Oh Dub
Resolving Fan Traps and Circular Joins using OBIEE
If you’re looking to migrate data from a Discoverer system to Oracle Business Intelligence Enterprise Edition, one of the first things you’ll probably end up thinking about is how your (not necessarily star schema-based) End User Layer will translated to a (star schema mandatory) Common Enterprise Information Model. In particular, Discoverer allows tables to have more than one join route between them, and it also allows circular joins so that, for example, an employee can belong to a department, which belongs to a region, which has an employee as the regional manager.
If you try and bring this arrangement in to the Physical Layer of your OBIEE semantic model, the administration tool will initially allow this and your repository verifies as being correct.

However if you then drag and drop the tables into the logical business model layer, all three lookup tables are shown as dimension tables, as you would expect,

and the BI Administrator tool shows an error message when you try and verify the repository, saying that circular joins aren’t supported.

In reality, you don’t want three dimension tables (departments, regions, employees) joining to the fact table (appointments), you just want one, so you’re going to have to use the mapping ability in the business model and mapping layer to join these three lookup tables together under one logical table source. For the employee table though, rather than join it through this circular join, you need to create an alias for it in the physical layer (in my case I call it CIRC_JOIN_REGEMP) like this:

and join the region table to this rather than the employees table.

Then you can create a single logical dimension table within your logical business model, and map the logical table source to the join between the four lookup tables, including your new alias for the employee table.

When you add the new columns into the logical table and then run a report, the figures come up as expected, with the regional manager details displayed correctly, the alias having resolved the circular join.

Another issue that often comes up with Discoverer is “Fan Traps”. Fan traps are an oddity of SQL where, if you have say a dimension table full of account details, and two facts that join to it, say with sales figures in one and budget figures in the other, logically if you query the two together you should get a set of results like this, taken from the Oracle document “Oracle Discoverer’s Fan Trap Resolution - Correct Results Every Time”


The problem is though, that if you issue the obvious query in SQL, the results come back wrong, because SQL does a cartesian join between the two fact tables and over-counts the figures.
SQL> select a.acc_name, 2 sum(sales), 3 sum(budget) 4 from fan_trap_accounts a, fan_trap_sales s, fan_trap_budget b 5 where a.acc_id = s.acc_id 6 and a.acc_id = b.acc_id 7 group by a.acc_name 8 order by a.acc_name 9 / ACC_NAME SUM(SALES) SUM(BUDGET) -------------------- ---------- ----------- ACCOUNT1 900 1050 ACCOUNT2 130 200 ACCOUNT3 600 750 ACCOUNT4 600 600
Now Discoverer handles this quite well, and automatically rewrites the query internally to use two in-line views, one to join the sales and accounts tables, and one to join the budget and accounts tables, and then join the results together with the correct aggregation applied. So how does OBIEE handle this? Does it ignore the potential error and give you the wrong figures, or does it even disallow this sort of schema design in the first place? Let’s take a look.
The BI Administrator tool doesn’t have a problem with this sort of arrangement in the physical layer:

The logical business model is fine as well, with the sales and budget tables being displayed as fact tables, and the accounts table as a dimension, as you would expect.

In fact, when you run a query, using the same selection as our SQL statement, the figures do in fact come out correct.

So what’s happening? Well, if you take a look at the query log and see how the logical, and physical, SQL queries are put together, you can see that OBIEE is doing what Discoverer does, but within the BI Server rather than writing a single SQL statement that uses inline views. The BI Server effectively creates two inline views that in turn lead to two physical SQL statements being sent to Oracle, the BI Server then joins them together and returns the data back as a single result set.
-------------------- Execution plan:
RqBreakFilter <<2750>>[1] [for database 0:0,0]
RqList <<2641>> [for database 0:0,0]
case when D903.c1 is not null then D903.c1 when D903.c2 is not null then D903.c2 end as c1 GB [for database 3023:5710,44],
D903.c3 as c2 GB [for database 3023:5710,44],
D903.c4 as c3 GB [for database 3023:5710,44]
Child Nodes (RqJoinSpec): <<2752>> [for database 0:0,0]
(
RqList <<2795>> [for database 0:0,0]
D902.c1 as c1 [for database 3023:5710,44],
D901.c1 as c2 [for database 3023:5710,44],
D901.c2 as c3 GB [for database 3023:5710,44],
D902.c2 as c4 GB [for database 3023:5710,44]
Child Nodes (RqJoinSpec): <<2798>> [for database 0:0,0]
(
RqList <<2653>> [for database 3023:5710:ora11g,44]
FAN_TRAP_ACCOUNTS.ACC_NAME as c1 GB [for database 3023:5710,44],
sum(FAN_TRAP_SALES.SALES by [ FAN_TRAP_ACCOUNTS.ACC_NAME] ) as c2 GB [for database 3023:5710,44]
Child Nodes (RqJoinSpec): <<2686>> [for database 3023:5710:ora11g,44]
FAN_TRAP_ACCOUNTS T6678
FAN_TRAP_SALES T6685
DetailFilter: FAN_TRAP_ACCOUNTS.ACC_ID = FAN_TRAP_SALES.ACC_ID [for database 0:0]
GroupBy: [ FAN_TRAP_ACCOUNTS.ACC_NAME] [for database 3023:5710,44]
OrderBy: c1 asc [for database 3023:5710,44]
) as D901 FullOuterStitchJoin <<2744>> On D901.c1 =NullsEqual D902.c1; actual join vectors: [ 0 ] = [ 0 ]
(
RqList <<2690>> [for database 3023:5710:ora11g,44]
FAN_TRAP_ACCOUNTS.ACC_NAME as c1 GB [for database 3023:5710,44],
sum(FAN_TRAP_BUDGET.BUDGET by [ FAN_TRAP_ACCOUNTS.ACC_NAME] ) as c2 GB [for database 3023:5710,44]
Child Nodes (RqJoinSpec): <<2723>> [for database 3023:5710:ora11g,44]
FAN_TRAP_ACCOUNTS T6678
FAN_TRAP_BUDGET T6681
DetailFilter: FAN_TRAP_ACCOUNTS.ACC_ID = FAN_TRAP_BUDGET.ACC_ID [for database 0:0]
GroupBy: [ FAN_TRAP_ACCOUNTS.ACC_NAME] [for database 3023:5710,44]
OrderBy: c1 asc [for database 3023:5710,44]
) as D902
) as D903
OrderBy: c1 asc [for database 0:0,0]
+++Administrator:2a0000:2a0001:----2008/08/26 17:24:48
-------------------- Sending query to database named ora11g (id: <<2653>>):
select T6678.ACC_NAME as c1,
sum(T6685.SALES) as c2
from
CUST_ORDER_HISTORY.FAN_TRAP_ACCOUNTS T6678,
CUST_ORDER_HISTORY.FAN_TRAP_SALES T6685
where ( T6678.ACC_ID = T6685.ACC_ID )
group by T6678.ACC_NAME
order by c1
+++Administrator:2a0000:2a0001:----2008/08/26 17:24:48
-------------------- Sending query to database named ora11g (id: <<2690>>):
select T6678.ACC_NAME as c1,
sum(T6681.BUDGET) as c2
from
CUST_ORDER_HISTORY.FAN_TRAP_ACCOUNTS T6678,
CUST_ORDER_HISTORY.FAN_TRAP_BUDGET T6681
where ( T6678.ACC_ID = T6681.ACC_ID )
group by T6678.ACC_NAME
order by c1
Now this strictly speaking isn’t as efficient as Discoverer’s approach, which uses a single SQL statement and does the join of the two inline views within the database, but it’s still pretty clever and would work for platforms that don’t support inline views. In a way its an “on the fly” creation of two logical table sources without you having to set this up explicitly in the logical business model.
One thing I haven’t tried out yet is how OBIEE handles queries that, in Discoverer, would cause a fan-trap detection error, i.e. queries that it can’t rewrite to use two or more inline views. I suspect that these sorts of scenarios, that usually involve facts of differing granularity, where circular joins exist or where the keys to the tables don’t correspond would get picked up by the general data modeling rules in OBIEE, but if anyone’s got any experiences with this then by all means leave a comment.
If you’re wondering what prompted this posting, I’m putting the finishing touches to my Oracle Open World presentation on OBIEE data modeling, and looking to cover off a few data modeling “oddities”. I’ll try and cover support for ragged/unbalanced/value-based hierarchies in a posting tomorrow, and finish up with how the logical business model supports the Kimball concept of Bridge Tables, but no-one seems to use them.
Tuning 11g Data Warehouse Schemas using the SQL Access Advisor
One of the new features in Oracle Database 11g that I’ve been meaning to try out for a while now is the SQL Access Advisor, part of the 11g Tuning Pack. Given that one of Oracle’s key product directions is to make administering the database as easy as, in the words of Dilbert, “one big button, and we press it for you in the factory” I thought It’d be interesting to see how far this automated tuning of database schemas has come, especially seeing as this latest 11g release of the feature now tunes and recommends indexes, materialized views and partitioning schemes.
To create a substantial enough workload with potential for improvement, I took a copy of the SH schema, removed all the constraints, indexes, materialized views and partitions, copied the SALES table into itself a few times so that it now had 147m rows, and then gathered statistics on the schema. I then when in to Oracle Business Intelligence Enterprise Edition and put together a selection of queries that aggregated data, filtered data on dimension attributes, time periods and so forth, and then copied the generated SQL statements into an SQL script so that I could run them on demand.
I’m now at the point where I’ve got some sample data and a set of SELECT statements that I want to improve the performance of. Taking a look at the 11g DBConsole application, there are a number of places where you can start to investigate the performance of your system. Starting at the front page, you can see a section on the ADDM Performance Analysis that shows, amongst other things, the top SQL statements by DB Time (response time) and by Disk I/O.

Selecting any of these SQL statements will present you with the option to run the SQL Tuning Advisor against them, which presents you with the original SQL statement, it’s execution plan, and a selection of alternatives with their execution plan. For now though, I’m more interested in what recommendations the SQL Access Advisor makes for my cloned schema, which has no indexes, materialized views, constraints or anything defined within it.
To perform the exercise I’m going to use another feature of the 11g DB Console, the ability to create SQL Tuning sets using a point and click interface, to capture a workload based on the SQL statements used by my queries. To do this I navigate to the Performance Screen and then the SQL Tuning Sets screen, and create a new SQL Tuning Set to hold the sample workload.

I then tell DBConsole to capture SQL statements for 10 minutes.

The SQL captured will be limited to the SH_COPY schema, so that I miss out all the background SQL generated from the database and from other sessions.

I then let the tuning set job run and execute my script from an SQL*Plus session to generate the workload. After 10 minutes, I return to the DBConsole job and check that a suitable number of long-running SQL statements have been captured in the SQL Tuning Set.

Taking a look at the buffer gets and elapsed time for the queries, there should be something to work with here.

So now that I’ve got an SQL Tuning set, the next step is to navigate to the SQL Performance Analyzer, which you can reach from the Advisor Central page on DBConsole.

Now the idea with the SQL Performance Analyzer is that you can take a given workload (i.e. an SQL Tuning Set), run it to provide a baseline response time, then apply some changes (change database initialization parameters, create some indexes, rework the SQL statement) and then replay the same workload to assess the effect of the changes. It’s nothing different to doing this all yourself but I thought I’d give this a try, to try and keep all of the process within the Enterprise Manager website.
I click on the Guided Workflow link and am presented with the following workflow steps:

So the first step then is define the SQL Performance Analyzer Task, and select the SQL Tuning Set I just created to replay as my workload.

I now execute step two of the SQL Performance Analyzer workflow to replay the SQL Tuning Set to establish a baseline, before I run the first iteration of the SQL Access Advisor to recommend new indexes. Once this is complete, I navigate to the SQL Access Advisor and begin the process of using it to recommend indexes for my schema.
The first step in this process is to either choose to validate existing indexes, MVs and so on in a schema (this would be useful for the original SH schema), or in my case, to recommend some new ones.

Once you select your advisor option, you can either base the recommendations on an SQL Tuning Set as I’m doing, whatever’s in the SQL cache at the time (filtered by parsing schema, elapsed time and so on) or on the structure of the tables and dimension objects in one or more schemas. From my perspective it’s better to create an SQL Tuning Set as you’ve got a defined, repeatable set of SQL statements then that you can use in different places, such as the SQL Performance Analyzer that I’m using to frame this tuning exercise.

On the next page of the wizard, I can select the schema structures that I want the advisor to recommend. In this instance I just select indexes on their own, although you can select all three - materialized views and partitions as well as indexes - if you want to recommend all of them at once.

“Advanced” options allow you to tell the advisor to consider or disregard the cost of creating the indexes when it makes recommendations, whether to include recommendations around dropping unused indexes, and to place limits on the size of indexes, materialized views etc that it can recommend, and whether to only consider queries which makes sense of course in a data warehouse environment.

Then it’s a case of telling DB Console when to run the advisor job - now, in my case - and then review the settings, before pressing the Submit button to run the analysis.

After waiting a few seconds, the analysis was ready. From the analysis overview, there are six index recommendations that bring down the overall workload marginally, which is about what I expected as most of the work being done by the OBIEE queries was summarization, not filtering.

Taking a look at the recommendation details, you can see more details on the individual index recommendations. Notice the “Action Type” colour blobs - when you run the advisor asking it to recommend materialized views and partitioning strategies as well, you’ll get additional colour blobs in there as well, depending on what objects have been recommended.

Looking at the top recommendation in more detail, I can see that it’s recommended three B-tree indexes on the channels, products and customers tables, in order to improve the performance of one of the SQL statements.

Recommending B-tree over Bitmap index is to me surprising, as typically in a data warehouse environment you’ll want to use bitmap indexes when working with dimension attributes, firstly because they’re typically smaller and faster to access (jn combination with other bitmap indexes) than B-tree indexes, and secondly because they’re required for star transformations, although I have seen Oracle transform B-tree indexes to bitmap indexes on-the-fly so that star transformations can take place.
Running the SQL Access Advisor again, this time selecting Materialized Views as the recommendation type, generated a set of recommendations with a much bigger potential performance gain. Notice the colour blobs are now for materialized views, mv logs and “others”, and that the expected performance gain is much bigger.

Taking a look at the top recommendation, it’s for four materialized view logs, a fast refresh materialized view with query rewrite enabled, and a statistics gathering job to gather stats on the new materialized view. I was wondering if the “other” would be a dimension recommendation (as in CREATE DIMENSION) but it looks like this version of the SQL Access Advisor doesn’t cover these yet.

Running the advisor a third time, this time for partitioning, brings up a set of recommendations that on the face of it look like some good timesavers.

However the actual recommendations themselves are a bit strange. The first one recommends hash partitioning the Sales fact table and the Times dimension table into 32 buckets, when I would have thought the more obvious recommendation would be to range partition the fact table on its date column, do the same to the time dimension and thereby permit partition-wise joins between the two.

Of course this does bring up the other question as to how to partition a fact table when you have more than one date within it, say load date and transaction date, given that one will favour the ETL process (potentially allowing partition exchange loading) and one will favour queries (partition pruning), but I’m struggling to understand why it would recommend hash partitioning a fact table with a date datatype as one of the table columns.
The second recommendation was equally strange, with the Sales fact table again recommended for hash partitioning and the customers dimension table range partitioned, using the INTERVAL (automatic creation of new partitions as needed) feature.

I wonder if this lack of awareness of data warehousing standard practices is down to their being no constraints defined within the schema, therefore the SQL Access Advisor doesn’t really know the relationship between the various tables? Once I’ve run the recommendations I’ll add some constraints and see how the recommendations come out then.
I now go back and run the Advisor one more time, this time turning on all the options to recommend indexes, materialized views and partitions, so that I can make a selection between the various recommendations, assuming as I do that you’re not meant to implement all of them (the partitioning one above illustrates this point, as it would partition the Sales fact table twice).
Looking at the projected performance improvement now, it’s certainly (if it works) not something to be sniffed at.

The projected cost of the various SQL queries in the workload has gone down from 565853 to 212, with most statements gaining a 100% query execution time improvement, and however you might query the advice it certainly does what it says on the tin, at least as far as the statements in my workload are concerned.
Looking at the total set of recommendations in detail, they all include elements of index creation, materialized view creation and partitioning changes, I also note that they’re all selected which does seem to suggest that they should all be implemented together (or at least that it’s alright to do so).

So, I leave all the changes selected and press the “Schedule Implementation” button, and then notice that (a) it won’t schedule implementations that involve partitioning changes, you have to save the script and manually run it yourself, and (b) the materialized views and indexes that the script creates are being created in the SYSTEM account, not the account that the objects are in (SH_COPY), so I’ll need to edit the script a bit so they’re created in the correct place. I copy the advisor SQL output to Notepad, do a search and replace on SYSTEM, changing all values to SH_COPY, and then run the script from SQL*Plus. Once I did this (the script took about 10 minutes to run), I then returned to the SQL Performance Advisor and ran my second replay of the sample workload, followed by the comparison of the two workloads.

So, the $64,000 question - did the performance of the queries improve? Well, according to the SQL Performance Analyzer output, it improved by an overall 63%, with some queries gaining and some queries regressing. In defense of the results, a couple of statements failed to execute in the recommendations script because of formatting errors, but even so, it’s not a bad result, albeit with a few weird partitioning recommendations.

Of course this is still a bit simplistic, as firstly I only gave the advisor a very small set of queries to advise against, and secondly the SH_COPY schema only has two fact tables and a bunch of dimension tables. Also, of course, none of this takes into account the ETL process which will make its own demands on the schema, its indexes, summaries and partitioning, and you’ll most definately need to bear this in mind before changing around the performance structures in your warehouse (which is why, of course, you’re still best contacting an experienced data warehouse implementor before going ahead with any of the bits of advice the SQL Access Advisor comes up with). Still, it’s not bad as a way of automating much of the analysis that goes into tuning the structures in your warehouse, and I can certainly see a point in time in the future when the SQL access advisor takes into account a SQL Tuning Set to represent your ETL process, and recommends additional structures such as cube organized materialized views and index and storage variations such as compression and compressed b-tree indexes as appropriate.
So finally, what about the issue around hash partitions, and the B-tree indexes on the dimension tables. Will adding constraints to the schema lead to more sensible (in the context of data warehousing) recommendations? I then went on to add constraints to the sample schema, generate a new SQL Tuning Set and run the SQL Access Advisor again, and the result was - the same: hash partitioning was recommended for the fact table, B-tree indexes for the dimensions, which does seem to suggest that the SQL Access Advisor isn’t too “data warehouse-aware”, although my test case was limited and there could well be something I’m missing. Still, I’ll be interested to see how this feature evolves as the various patches to 11gR1 and 11gR2 are released, it’s certainly got the potential to be an interesting feature.
Line Charts
Competent Shapes
Data Mining in Action: Oracle Sales Prospector
Last Position
Calculate Total Possible Score for Wii Bowling, Power Throws
I've also been playing the "Training" game of Power Throws (bowling). Your given 10 shots with another row added each frame. You start out with 10, then 15, then 21, etc. If you get all the pins, you get a bonue (total pins * 2). So far, my highest score has been 651.
I've never bothered to figure out what the total possible score is though.
SQL to the rescue.
Using analytics and the CONNECT BY LEVEL option (10g and higher), it's fairly easy:
I had to set the filter on the outer query because you start with 4 rows (10 pins). And the results:
SELECT
pin_row,
pins,
running_total,
rt_with_bonus
FROM
(
SELECT
rownum pin_row,
rownum pins,
SUM( rownum ) OVER ( ORDER BY rownum ) running_total,
( SUM( rownum ) OVER ( ORDER BY rownum ) * 2 ) rt_with_bonus
FROM dual
CONNECT BY LEVEL <= 13
)
WHERE pins >= 4
Voila! 890 is the best possible score if I strike every frame. I did manage to pass my high score by 9 pins tonight as well. It's a great day!
CJUSTICE@XE>BREAK ON REPORT
CJUSTICE@XE>COMPUTE SUM OF RUNNING_TOTAL ON REPOR
CJUSTICE@XE>COMPUTE SUM OF RT_WITH_BONUS ON REPOR
CJUSTICE@XE>/
PIN_ROW PINS RUNNING_TOTAL RT_WITH_BONUS
---------- ---------- ------------- -------------
4 4 10 20
5 5 15 30
6 6 21 42
7 7 28 56
8 8 36 72
9 9 45 90
10 10 55 110
11 11 66 132
12 12 78 156
13 13 91 182
------------- -------------
sum 445 890
So how come no one wants to hire a guy that can do this kind of fun stuff with SQL? ;)
Lotus notes and sametime
My company uses Lotus Notes for email and Sametime connect as messenger for all the internal communication. Both are stupid applications and are a big resource hogs. Most of the systems are P IV with 512 MB RAM. You just run Lotus and Sametime 7 and it eats up everything. The system moves like a 386 based machine.
Moreover, i was looking for Lotus short cuts today and found that there is no short cut in Lotus for Send/Receive mail.
At our client’s site, i have seen IBM people having kept an exe killnotes.exe on their desktop which they use to quickly exit from Lotus. What bullshit ?
Outlook/Outlook Express rocks, seriously !
iPhone in India
Ok, so all the wait is over and they have announced the iPhone prices in India. The 8 GB one will be for INR 31000 and 16 GB for 36100. What the hell ? In US its being sold for $200 which comes up to be 9000 INR at the max. And here more than 3 times. Why ? They are just trying to screw people with this iPhone tag.
Anyways, Nokia’s capture in Indian market is going to be impacted terribly. They are already into the launch of N96. Lets see how things go on.
BTW by pricing iPhone at 31k, they just lost one customer. I am not buying it ![]()



