BI & Warehousing
Oracle Excellence Awards 2013 - Nominations Open

The deadline for your nomination is Tuesday, June 18th, 2013. Good luck and, if you win, let me know so I can see your application at Open World!
Variable Numbers to Words
How can I store the result of <?xdofx:to_check_number(TOTAL_INV_AMOUNT,'USD','CASE_UPPER','DECIMAL_STYLE_WORDS')?> inside a variable.
Checking this out, BIP chokes on the assigning to the variable with a nice error:
Namespace prefix 'xdofx' used but not declared
Turning to BIP RTF template guru in residence Hok-Min, he suggested avoiding the xdofx: wrapper altogether in this case and calling the function more directly. The underlying function in java is:
public static String toCheckNumber(String locStr, String amount, String preOrCurCode, String caseStyle, String decimalStyle)
Applying that to Satyender's needs we end up with:
<?variable@incontext:salval; xdoxslt:toCheckNumber($_XDOLOCALE,.//SALARY,'USD'
,'CASE_UPPER','DECIMAL_STYLE_WORDS')?>
We still need the xdoxslt prefix but we can now assign the value to a variable. There is a caveat from Hok Min.
Note that the amount has to be in string format. If it is not a string, it has to be converted to a string, e.g. string($CALCULATED_SALARY). If you use XML element name directly (like in this case SALARY), then it is already a string, so no need to do conversion.
I know this raises the question of why do we need the xdofx: prefix at all? Im discussing that with Hok Min as I write and will get back to you.
What’s Coming at the Rittman Mead BI Forum 2013, Atlanta, May 15th – 17th 2013
At the end of last week I talked about what we had planned for the BI Forum in Brighton, and today I want to talk about what we’ve got planned for the Rittman Mead BI Forum 2013 in Atlanta, running the week after at the Georgia Tech Hotel & Conference Center on May 15th – 17th 2013.

As with the Brighton BI Forum event, we’ve got a mix of partner, independent developer, customer and Oracle speakers, all covering topics centered around OBIEE and its supporting technologies. The central idea around the BI Forum is that its “the conference we’d want to go to”, with content aimed at developers who already know the basics, want to hear solid technical and implementation talk not marketing fluff, and want to meet their friends and peers in the industry to share stories and trade tips. We keep numbers strictly limited, run just a single track so that everyone gets to take part in the same sessions, and maximize the networking and social elements so that you get to meet everyone who attends, hopefully staying in touch well after the event closes.
This year our speakers and sessions in Atlanta include:
- Jeff McQuigg talking about OBIEE testing (and making the topic interesting), Christian Screen on OBIEE plug-ins, Tim Vlamis covering OBIEE forecasting and time-series analysis, Kevin McGinley looking at ODI and the BI Apps (and hoping it’s GA by then, otherwise he’ll sing us Bohemian Rhapsody in Klingon for an hour), and our own Adam Seed will take us beyond the demos with Endeca
- From Oracle, we’ll have Marty Gubar and Alan Lee talking about the new Admin Tool, and OBIEE’s support for Hadoop as a data source, Florian Schouten will talk about BI Apps futures, and Jack Berkowitz will take us through the latest in OBIEE presentation, mobility and interactivity
Jack will also deliver the opening Oracle keynote on the Wednesday evening, and before that earlier in the day will be our optional masterclass, this year being delivered by Stewart Bryson, Michael Rainey and myself and focusing on Oracle’s data integration technologies. And – to top things off, we’re joined by two special guests, Method R’s Cary Millsap, and Pythian’s Alex Gorbachev, two of our friends from the Oracle database world and who’ll talk to us about reponse-time based performance tuning, and what’s new in the worlds of Big Data and unstructured analytics.

Of course – it’s not all about learning and technology, and we make a special effort around the social events usually to the point where we spend all the proceeds on free bars and making sure everyone’s having a good time. There’ll be a debate on the Friday, TED-style 10-minute sessions on the Thursday, a competition to see who can speak longer than Paul Rodwick about Oracle BI without blinking, full delegate packs and t-shirts to take home, and we share all delegate contact details amongst the attendees so you can stay in touch after everything closes. Registration is open and there are still a few places left, so if you’re thinking of attending and don’t want to lose your place, register now before we sell out…!
Some Upcoming Events
It’s going to be a busy few weeks leading up to the BI Forum. First, Rittman Mead will be exhibiting at the UKOUG Engineered Systems Summit on Tuesday 16th April, this is a one day event in London for Exadata, Exalogic, SuperCluster and not least Exalytics. Mark will be presenting on Oracle Exalytics – Tips and Experiences from Rittman Mead , full agenda available here. Mark will then hoping over to Norway to speak at the Oracle Norway User Group event on High-Speed, In-Memory Big Data Analysis with Oracle Exalytics, maybe he’ll be previewing his work getting OBIEE 11.1.1.7 working with Hadoop.
The following week on Tuesday 23rd April I am speaking at an Oracle Business Analytics event, I am giving a presentation about our story so far with Exalytics, this event is at Oracle’s City Office in London. Later that week on Thursday 25th, as part of Big Data Week I’m speaking in the evening in Brighton about the evolution from Business Intelligence to Analytics and Big Data, full agenda here, please register here.
Interview with Toby Potter from DataSift – Social Data and Business Intelligence
With the BI Forum fast approaching, Toby Potter from DataSift agreed to an interview for the blog, to discuss social data, it’s value and how it can be combined with other business intelligence. Toby will be one of the guest speakers at the BI Forum in Brighton sharing more details and experiences in this growing area. Over to the interview…
[James Knight] “For the readers of the blog that may not have heard of DataSift, tell us a bit about the company and what they can provide.”
[Toby Potter] ”DataSift provides the leading Social Data Platform for enterprise. DataSift collects, structures and enriches unstructured social, news and blog data allowing our customers to filter and collect the data of interest to their business for further, more detailed analysis.
We provide real-time access to the full Twitter firehose, bit.ly (click) data, Facebook, YouTube, millions of blogs, news and others, complemented by our historical archive of data.”
[James Knight] ”Big data is a hot topic and lots of our customers want to do some ‘big data stuff’, but what does the term mean to you?”
[Toby Potter] ”Whilst we consider ourselves to be a Big Data Platform – our core platform is into the Petabytes and we’re growing at several Terabytes per day – our focus is really on the Right Data. Using the power of our filtering capabilities, we’re able to take those huge volumes of data and provide the Right Data to our customers based upon the rules they define.
Even given that we are processing 1 Trillion (1,000,000,000,000) items of social data a week in our historics platform – and doubling every 2 months.”
[James Knight] ”There’s a ton of social data out there, which may make people wary of the effort, timescales and costs involved. How can people start their journey in the use of social analytics and gain benefit quickly?”
[Toby Potter] ”Social analytics is actually pretty straightforward to get started on – for example, you can simply create a Pay As You Go account on the DataSift platform and be collecting data within a few miutes – but the challenge is often realising the insight that’s available within social data. A typical tweet for example, might have anywhere between 50 and 100 individual data points.
As with any analytics project, be clear on your objectives and you should see whether this is going to be of benefit very quickly. Think about how you would capture the data, what SLA (if any) do you need if this was to become mainstream, how would you analyse it and do you want to merge it with existing, internal sources.
The bottom line however, is to understand the action you would take from any insight gained. No matter how interesting the results, if there’s no action taken from the insight, you have to question the value of any analytics project.”
[James Knight] ”So far we’ve been talking about social media in relation to big data, but how can it also be used to enhance regular BI reports, such as those produced using Oracle BI EE?”
[Toby Potter] ”Typically, BI reports focus on reporting on data that’s available internally; sales data, customer data and effectiveness of a marketing campaign for example.
Social data in and of itself provides insight into the world outside; bringing this into your organisation’s Business Intelligence environment provides the business with a whole new perspective and delivers a new view of how your business is performing.
Both uses provide tremendous value, but it’s the combination of the two that really drives insight. Being able to link social discussions, or a particular news story, drove product interest and traffic to your web-site which both increased online sales, but also drove footfall to your physical stores (which also saw an uplift in sales) would provide the justification in running more similar campaigns. Understanding the social profiles of the customer base may also help you tune your offline messaging too.”
[James Knight] ”Out of those that you are allowed to talk about publicly, what’s the cleverest use of social data that you have seen so far?”
[Toby Potter] ”This is what I love most about my job; there are so many uses for the data that almost every discussion is different!
In the media business, I would point towards SecondSync as a great example of using social data to disrupt and add value to the fairly traditional business of measuring TV audiences. They capture social discussions around particular TV programmes and provide both a dashboard view and deeper analytics services to allow broadcasters to better understand their audience and how to engage with them.
News is an obvious area for analytics on social data and many of our customers are able to understand how stories are shared socially, enabling them to better target news stories, drive up audience and therefore drive increased advertising revenues.
Finance is another interesting area; the ability to bring together all of the discussion around particular investments, measure the sentiment and identify breaking news and the market reaction to it quickly, gives our customers an edge.
Customer services is another area and not just being able to react to complaints. Understanding what is working and what isn’t for your customers, collecting feedback and relating that to churn figures for example, allows you to understand how better to adapt your business. As an example, when the broadband goes down in a particular area, people tend to tweet about it very quickly; whilst your engineers may well be alerted, customer services are often left to pick up a sudden, increased load, without having any ideas as to why or what the problem might be. Integrating social data helps overcome this.
More broadly, by looking into the data itself, I’ve seen our customers collecting all of the geo-located tweets to understand population density to better inform telecoms infrastructure; retailers analyse data to understand future fashions to better inform stock ordering/promotions; competitor benchmarking to understand how your business compares to your peers.
The most innovative I’m seeing at the moment is the ability to build out profiles, much richer than simple demographics, of social customers to better understand their interests and so provide them with more useful promotions.
One of our customers, Local Response, have integrated a real-time social feed into their online advertising platform to drive “Intent Targetting”, the ability to target more appropriate and context sensitive advertising to web-site viewers.”
[James Knight] ”Social data can be quite messy. How can organisations uncover value?”
[Toby Potter] ”As a gross over simplification, there are two fundamental elements required: the ability to capture, process and cleanse the data in the first place (this is where DataSift comes in), and the ability to analyse it, which is where the Oracle platforms and Endeca in particular come into play.
Getting the data into a format where it can be analysed initially – for more traditional types of data this would typically be some kind of ETL tool – is key to being able to then store it ready for further analysis.
Endeca is fantastic at then taking this data and providing a flexible “data playground” allowing users to discover what insight might be hidden inside. For more structured analytics the more traditional BI tools provide a great environment to distribute reports around the business to a broader Business As Usual user base.”
[James Knight] ”Technically, how are companies dealing with social data, and do you see a standard approach or a variety of different approaches?”
[Toby Potter] ”As is to be expected in a relatively young space, there are many approaches. The majority of businesses are dipping their toes in and pulling data directly from the various provider APIs out there, but this provides a very distorted view of the potential; most APIs provide very restricted data sets and have limits on the number of requests you can make, so you’re potentially missing out on a lot.
Increasingly, as organisations recognise the importance and value of the data available, they are turning to specialist providers such as DataSift to provide a reliable, enterprise quality feed of the data they need.
We see the same journey often – initial experiment seems to yield value; build infrastructure to handle and incorporate data; deliver valuable insight … and then an API changes, or a major event changes volume levels, or the business interest grows and so on. The result is almost always frustration with the amount of expense, lack of reliability and general hard work that is required to get this working, but the insight is so valuable the need for a working solution over-rides everything else.
It is all of this work that DataSift aims to replace with a single platform to take away this maintenance and reliability nightmare.
If you’re looking at social data coming in to your business, I’d highly recommend looking at the tools that are already out there to save you a lot of time, money and frustration!”
[James Knight] ”At Rittman Mead, we’ve been working with Oracle Endeca Information Discovery (OEID), allowing us to combine unstructured, semi-structured and structure data. What additional value do you see OEID providing in the analysis of social data?”
[Toby Potter] ”This is exactly where the maximum value from the social data can be gained. Combining the data with existing sources, enhances the potential from the data, but having the right tools to extract this value is fundamental. Endeca provides exactly that kind of creative analytical environment where you can explore the data and determine where the most value lies.
Using Endeca, once the data is better understood, it’s then straightforward to productionise the areas of maximum benefit and focus energy on delivering that insight to where it can have most impact in your business.”
Many thanks to Toby for taking the time out for this interview, which we hope has provided some useful insight. There’s still time to register for the BI Forum and see Toby’s presentation at the Brighton event.
Incremental refresh of Exalytics aggregates using native BI Server capabilities
One of the key design features of the Exalytics In-Memory Machine is the use of aggregates (pre-calculated summary data), held in the TimesTen In-Memory database. Out of the box (“OotB”) these aggregates are built through the OBIEE tool, and when the underlying data changes they must be rebuilt from scratch.
For OBIEE (Exalytics or not) to make use of aggregate tables in a manner invisible to the user, they must be mapped into the RPD as additional Logical Table Sources for the respective Logical Table in the Business Model and Mapping (BMM) layer. OBIEE will then choose the Logical Table Source that it thinks will give the fastest response time for a query, based on the dimension level at which the query is written.
OBIEE’s capability to load aggregates is provided by the Aggregate Persistence function, scripts for which are generated by the Exalytics Summary Advisor, or the standard tool’s Aggregate Persistence Wizard. The scripts can also be written by hand.
Aggregate Persistence has two great benefits:
- It uses the existing metadata model of the RPD to understand where to get the source data for the aggregate from, and how to aggregate it. Because it uses standard RPD metadata, it also means that any data source that is valid for reporting against in OBIEE can be used as a source for the aggregates, and OBIEE will generate the extract SQL automagically. The aggregate creation process becomes source-agnostic. OBIEE will also handle any federation required in creating the aggregates. For example, if there are two source systems (such as Sales, and Stock) but one target aggregate, OBIEE will manage the federation of the aggregated data, just as it would in any query through the front-end.
- All of the required RPD work for mapping the aggregate as a new Logical Table Source is done automagically. There is no work on the RPD required by the developer.
However, there are two particular limitations to ‘vanilla’ Aggregate Persistence:
- It cannot do incremental refresh of aggregates. Whenever the underlying data changes, the aggregate must be dropped and rebuilt in entirety. This can be extremely inefficient if only a small proportion of the source data has changed, and can ultimately lead to scalability and batch SLA issues.
- Each time that the aggregate is updated, the RPD is modified online. This can mean that batch times take longer than they need to, and is also undesirable in a Production environment.
I have written about alternatives and variations to the OotB approach for refreshing Exalytics aggregates previously here and here, namely:
- Loading TimesTen aggregates through bespoke ETL, in tools such as GoldenGate and ODI. TimesTen supports a variety of interfaces – including ODBC and JDBC – and therefore can be loaded by any standard ETL tool. A tool such as GoldenGate can be a good way of implementing a light-touch CDC solution against a source database.
- Loading TimesTen aggregates directly using TimesTen’s Load from Oracle functionality, taking advantage of Aggregate Persistence to do the aggregate mapping work in the RPD
In both of these cases, there are downsides to the method. Using bespoke ETL is ultimately very powerful and flexible, but has the overhead of writing the ETL along with requiring manual mapping of the aggregates into the RPD. This mapping work is done in the TimesTen Load from Oracle method, but can only be used against an Oracle source database and where there is a single physical SQL required to load the aggregate.
Refreshing aggregates using native OBIEE functionality aloneHere I present another alternative method for refreshing Exalytics aggregates, but using OBIEE functionality alone and remaining close to the OotB method. It is based on Aggregate Persistence but varies in two significant ways :
- Incremental refresh of the aggregate is possible
- No changes are made to the RPD when the aggregate is refreshed
The method still uses the fundamentals of Aggregate Persistence since , as I mentioned above, it has some very significant benefits:
- BI Server uses (dare I say, leverages), your existing metadata modelling work which is necessary – regardless of your aggregates – for users to report from the unaggregated data.
- BI Server generates your aggregate refresh ETL code
- If your source systems change, your aggregate refresh code doesn’t need to – just as reports are decoupled from the source system through the RPD metadata layers, so are your target aggregates
For us to understand the new method, a bit of background and explanation of the technology is required.
Background, part 1 : Aggregate Persistence – under the coversWhen Aggregate Persistence runs, it does several things:
- Remove aggregates from physical database and RPD mappings
- Create the physical aggregate tables and indexes on the target database, for the fact aggregate and supporting dimensions
- Update the RPD Physical and Logical (BMM) layers to include the newly built aggregates
- Populate the aggregate tables, from source via the BI Server to the aggregate target (TimesTen)
What we are going to do here is pick apart Aggregate Persistence and invoke just part of it. We don’t need to rebuild the physical tables each time we refresh the data, and we don’t need to touch the RPD. We can actually just tell the BI Server to load the aggregate table, using the results of a Logical SQL query. That is, pretty much the same SQL that would be executed if we ran the aggregate query from an analysis in the OBIEE front end.
The command to tell the BI Server to do this is the populate command, which can be found from close inspection of the nqquery.log during execution of normal Aggregate Persistence:
populate "ag_sales_month" mode ( append table connection pool "TimesTen aggregates"."TT_CP") as
select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0"
from "Sales";
This populate <table> command can be sent by us directly to the BI Server (exactly in the way that a standard create aggregate Aggregate Persistence script would be – with nqcmd etc) and causes it to load the specified table (using the specified connection pool) using the logical SQL given. The re-creation of the aggregate tables, and the RPD mapping, doesn’t get run:
The syntax of the populate command is undocumented, but from observing the nqquery.log file it follows this pattern:
Looking at a very simple example, we can see how a simple aggregate with a measure summarised by month could be populated:
SELECT_BUSINESS_MODEL was written about by Venkat here, and is BI Server syntax allowing a query directly against the BMM, rather than the Presentation Layer which Logical SQL usually specifies. You can build and test the SELECT_BUSINESS_MODEL clause in OBIEE directly (from Administration -> Issue SQL), in nqcmd, or just by extracting it from the nqquery.log.
So, we have seen how we can take advantage of Aggregate Persistence to tell the BI Server to load an aggregate, from any source we’ve modelled in the RPD, without requiring it to delete the aggregate to start with or modify the RPD in any way.
Now, we need the a bit of secret sauce to complete the picture and make this method a viable one.
In side-stepping the full Aggregate Persistence sequence, we have one problem. The Logical SQL that we use in the populate statement is going to be parsed by the BI Server to generate the select statement(s) against the source database. However, the BI Server uses its standard query parsing on it, using the metadata defined. Because the aggregates we are loading are already mapped into the RPD then by default the BI Server will probably try to use the aggregate to satisfy the aggregate populate request (because it will judge it the most efficient LTS) – thus loading data straight from the table that we are trying to populate!
The answer is the magical INACTIVE_SCHEMAS variable. What this does it tell OBIEE to ignore one or more Physical schemas in the RPD, and importantly, any associated Logical Table Sources. INACTIVE_SCHEMAS is documented as part of the Double Buffering. It can be used in any logical SQL statement, so is easily demonstrated in an analysis (using Advanced SQL Clauses -> Prefix):

Forcing OBIEE query to use avoid a LTS, using INACTIVE_SCHEMAS. Click image for a larger version.
So when we specify the populate command to update the aggregate, we just include the necessary INACTIVE_SCHEMAS prefix:
SET VARIABLE INACTIVE_SCHEMAS='"TimesTen Aggregates".."EXALYTICS"':
populate "ag_sales_month" mode ( append table connection pool
"TimesTen aggregates"."TT_CP") as
select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0"
from "Sales";
Why, you could reasonably ask, is this not necessary in a normal OotB aggregate refresh? For the simply reason that in “vanilla” Aggregate Persistence usage the whole aggregate gets deleted from the RPD before it is rebuilt, and therefore when the aggregate query is executed there is only the base LTS is enabled in the RPD at that point in time.
The final part of the puzzle – Incremental refreshSo, we have a way of telling BI Server to populate a target aggregate without rebuilding it, and we have the workaround necessary to stop it trying to populate the aggregate from itself. The last bit is making sure that we only load the data we want to. If we execute the populate statement as it stands straight from the nqquery.log of the initial Aggregate Persistence run then we will end up with duplicate data in the target aggregate. So we need to do one of the following :
- Truncate the table contents before the
populate - Use a predicate in the
populateLogical SQL so that only selected data gets loaded
To issue a truncate command, you can use the logical SQL command execute physical to get the BI Server to run a command against the target database, for example:
execute physical connection pool "TimesTen Aggregates"."TT_CP" truncate table ag_sales_month
This truncate/load method is appropriate for refreshing dimension aggregate tables, since there won’t usually be an update key as such. However, when refreshing a fact aggregate it is better for performance to use an incremental update and only load data that has changed. This assumes that you can identify the data and have an update key for it. In this example, I have an aggregate table at Month level, and each time I refresh the aggregate I want to load just data for the current month. In my repository I have a dynamic repository variable called THIS_MONTH. To implement the incremental refresh, I just add the appropriate predicate to the SELECT_BUSINESS_MODEL clause of the populate statement:
select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0"
from "Sales"
where "Dim Times"."Month YYYYMM" = VALUEOF("THIS_MONTH")
Making the completed aggregate refresh command to send to the BI Server:
SET VARIABLE DISABLE_CACHE_HIT=1, DISABLE_CACHE_SEED=1, DISABLE_SUMMARY_STATS_LOGGING=1,
INACTIVE_SCHEMAS='"TimesTen Aggregates".."EXALYTICS"';
populate "ag_sales_month" mode ( append table connection pool
"TimesTen aggregates"."TT_CP") as
select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0"
from "Sales"
where "Dim Times"."Month YYYYMM" = VALUEOF("THIS_MONTH");
Since there will be data in the table for the current month, I delete this out first, using execute physical:
execute physical connection pool "TimesTen Aggregates"."TT_CP" delete from ag_sales_month where Month_YYYY000000D0 = VALUEOF(THIS_MONTH);
Step-by-step
The method I have described above is implemented in two parts:
- Initial build- only needs doing once
- Create Aggregate Persistence scripts as normal (for example, with Summary Advisor)
- Execute the Aggregate Persistence script to :
- Build the aggregate tables in TimesTen
- Map the aggregates in the RPD
- Create custom populate scripts:
- From
nqquery.log, extract the fullpopulatestatement for each aggregate (fact and associated dimensions) - Amend the INACTIVE_SCHEMAS setting into the
populatescript, specifying the target TimesTen database and schema.

- For incremental refresh, add a
WHEREclause to thepopulatelogical SQL so that it only fetches the data that will have changed. Repository variables are useful here for holding date values such as current date, week, etc. - If necessary, build an
execute physicalscript to clear down all or part of the aggregate table. This is run prior to thepopulatescript to ensure you do not load duplicate data
- From
- Aggregate refresh – run whenever the base data changes
- Optionally, execute the
execute physicalscript to prepare the aggregate table (by deleting whatever data is about to be loaded) - Execute the custom
populatescript from above.
Because the aggregates are being built directly from the base data (as enforced byINACTIVE_SCHEMAS) the refresh scripts for multiple aggregates could potentially be run in parallel (eg using xargs). A corollary of this is that this method could put additional load on the source database, because it will be hitting it for every aggregate, whereas vanilla Aggregate Persistence will build aggregates from existing lower-level aggregates if it can.
- Optionally, execute the
This method is completely valid for use outside of Exalytics too, since only the Summary Advisor is licensed separately. Aggregate Persistence itself is standard OBIEE functionality. For Exalytics deployed in an environment where aggregate definitions and requirements change rapidly then this method would be less appropriate, because of the additional work required to modify the scripts. However, for an Exalytics deployment where aggregates change less frequently, it could be very useful.
The approach is not without drawbacks. Maintaining a set of custom populate commands has an overhead (although arguably no more so than a set of Aggregate Persistence scripts), and the flexibility comes at the cost of putting the onus of data validity on the developer. If an aggregate table is omitted from the refresh (for example, a support aggregate dimension table) then reports will show erroneous data.
The benefit of this approach is that aggregates can be rapidly built and maintained in a sensible manner. The RPD is modified only in the first step, the initial build. It is then left entirely untouched. This makes refreshes faster, and safer; if it fails there is just the data to tidy up, not the RPD too.
GoldenGate and Oracle Data Integrator – A Perfect Match… Part 2: Replicate to Staging and Foundation
It has been quite a hectic month since my last post, GoldenGate and Oracle Data Integrator – A Perfect Match Part 1: Introduction. Plenty of client work and delivery of two ODI 11g Bootcamp training courses have kept me busy. In the previous post, I described how the Oracle Reference Data Warehouse Architecture could be used to support a real-time data warehouse implementation. I also introduced the integration between ODI 11g and Oracle GoldenGate. In this post, I will describe the implementation and configuration of GoldenGate via ODI 11g at a high level. The detailed implementation has already been covered in a previous post, Extreme BI: Capturing Change with Oracle GoldenGate and ODI: Part 2 (Implementation), so here I will just focus on the highlights. In the final post in the series, I will switch the focus to ODI Change Data Capture (CDC), walking through several options to using CDC in downstream ETL processes.
Oracle GoldenGate and Oracle Data Integrator will be used to load each layer of the Oracle Reference Data Warehouse Architecture. GoldenGate is implemented to capture changes on the source and replicate data to the Staging layer (fully replicated source data) database tables. GoldenGate will also load the Foundation layer (transactional history) directly from the source, as the product is able to capture changes once, and then distribute the change rows to multiple locations in parallel without any additional impact to the source or target systems. The need for ETL mappings to incrementally load Foundation from Staging is eliminated, along with the added maintenance. Finally, ODI Interfaces are built to execute ETL mappings that will load the Access and Performance layer.
The key to the ODI and GoldenGate integration is a Journalizing Knowledge Module in ODI named “JKM Oracle to Oracle Consistent (OGG)”. The JKM, which will be applied to the Staging and Foundation Models, must be slightly modified to allow the capture of historical transactional data in the Foundation schema. The addition of the parameter “INSERTALLRECORDS” to the replicat parameter file loading the Foundation schema will apply every transaction as an insert into the target, regardless of whether it was an insert, update, or delete on the source. The addition of data warehouse audit columns, described below, will allow the tracking of transactional history in each table.
EDW_SCN (System Change Number)
EDW_COMMIT_TIMESTAMP (when the change was committed)
EDW_TRANS_TYPE (DML type for the transaction: insert, update, delete)
EDW_ROW_RANK (Intra-commit ranking for each transaction)
Journalizing within Oracle Data Integrator allows ETL developers to use the change records, stored in the J$ change tables, as the source of the ODI Interface. The JKM will setup GoldenGate processes to replicat the change data from the source to the change tables. When a change occurs in the source, the primary key and changed data are stored in the change table, along with the transaction DML type. Using the change data allows for near real-time processing of ETL mappings when loading downstream data warehouse tables.
To setup the integration between ODI and GoldenGate, begin by creating the ODI Models and Datastores for the Staging and Foundation tables. I will typically reverse engineer the source tables first, then copy them to the Staging and Foundation Models. This approach will ensure the column names and data types remain consistent with the source. I then execute a Groovy script to create the additional data warehouse audit columns in each of the Foundation Datastores. Now that we have the Models set, apply the customized “JKM Oracle to Oracle Consistent (OGG)” to the Staging and Foundation Models and set the options appropriately, as described in the aforementioned “Extreme BI: Capturing Change with GoldenGate…” blog post. From the tree structure under the Model, right-click each Datastore and choose “Add to CDC”. Then, choose “Start Journal” at the Model level. Based on the options set in the JKM, the Start Journal process will: create the ODI CDC framework (“J$” change tables, “JV$” change views, etc.) and generate the GoldenGate parameter files, configuration batch scripts (called Obey files), and instructions on how to complete the setup.

Browse to the temp directory indicated in the JKM options, open the Readme.txt file, and follow the instructions to complete the GoldenGate implementation. One important thing to note is that the Obey scripts generated by the JKM contain start commands for the extract, pump, and replicat processes. My recommendation is to comment out or remove these lines, as an initial load of source data to the target will need to be performed prior to starting replication.

For the initial load, first start the GoldenGate extract and pump processes on the source. Next, choose a batch load method such as Oracle Data Pump and move the source data to the target using the flashback query approach, as of a specific SCN. While GoldenGate does provide an initial load capability that can also be generated via the ODI “JKM Oracle to Oracle Consistent (OGG)”, it is very slow at moving the data from source to target and therefore not often used or recommended. Once the data has been loaded, start the replicat on the target after the SCN captured during the initial load process. This will ensure no transactions are duplicated or skipped.
GGSCI> start replicat ODIT1A aftercsn 123456
GoldenGate has been setup via ODI and is now replicating source changes to the Staging and Foundation schema tables. It will also load the “J$” change tables for use with ODI change data capture. In Part 3, the final post in the series, I will discuss how to meet some real-time ETL challenges using various development methods and Journalizing within Oracle Data Integrator.
GoldenGate and Oracle Data Integrator – A Perfect Match…
1. Introduction
2. Replicate to Staging and Foundation
3. Real-time ETL Challenges
Ubuntu 12.10 + nvidia
Since then, I've had an issue with my Nvidia drivers. Basically, I get video on a single monitor (dual set up) and that single monitor resolution is like 200 x 400 (no, it's not really that, but it is gigantic). Thank goodness for The Google Machine™. That originally led me here on StackOverflow. (Another reason to do things from the command line, you can remember things with history | grep nvidia).
I'm on the 4th time of going through this exercise. Each time the kernel is updated, nvidia breaks. Fortunately for me, that guy on StackOverflow gave me all the information I needed. This time after reboot and the gigantic screen, I removed the nvidia drivers and then reinstalled them. No go. uname -r gave me the following: 3.5.0-26-generic and dpkg -l|grep headers showed an older version of the kernel headers. So I updated those, reinstalled nvidia-current and rebooted. Yay.
Many "small" issues like this recently have me pondering a move back to, gasp, Windows or perhaps even a Mac. The Mac ecosystem scares me because it is expensive...but it's difficult to square when so many of my friends (technical and otherwise) swear by Macs. Something for another day I guess...
Looking forward the BI Forum 2013, Brighton UK
It’s around a month to go now until the Rittman Mead BI Forum runs in Brighton, UK, with the Atlanta event running the week after. Full details on both events are on the Rittman Mead BI Forum 2013 homepage, but I thought it’d be worth taking a look at the Brighton event in more detail, along with the speaker and session line-up.
Our first ever BI Forum ran at the Hilton Metropole in Brighton back in 2009, with around fifty attendees from around the world getting together talk about OBIEE, Essbase, ODI and the Oracle Database. Tony Heljula won the inaugural “Best Speaker” award, Venkat Janakiraman came over all the way from India (and joined us at Rittman Mead shortly afterwards, to become Managing Director for Rittman Mead India), Edward Roske slept through the days and was awake during the night, and we took the decision shortly afterwards to make it an annual event, moving down the road to the Hotel Seattle the following year where we’ve been ever since.

This year is the fifth anniversary of the BI Forum, and whilst we now run the event in Atlanta as well we’ve kept to the same basic principles; keep the attendee numbers restricted, assume a basic level of knowledge with the tools, maximise networking and keep the focus on OBIEE and its related technologies. This year our speaker line-up includes familiar faces and some new speakers from around the world, along with Oracle’s product management team and a guest speaker from outside our immediate industry:
- Uli Bethke, Tony Heljula, Michael Wilcke, Edelweiss Kammermann and Adam Seed from the developer/partner community will be talking about OBIEE performance tuning, ODI development best practices, going beyond the basics with Endeca, and getting started with BI-related technologies such as Oracle Business Activity Monitoring
- Oracle’s Philippe Lions, Adam Bloom, Alan Lee, Mike Durran and Nick Tuson will update everyone on what’s new with OBIEE post-11.1.1.7, including news on the replacement to the BI Administration Tool and the updated 11.1.1.7 version of SampleApp, as well as answering questions around OBIEE multi-tenancy, Exalytics and virtualisation
On the day before the main conference, myself, Stewart Bryson and Michael Rainey will be delivering a masterclass around Oracle’s data integration tools; in this optional one-day session we’ll be taking a closer look at the ODI product roadmap, integration with other tools such as GoldenGate, Hadoop and Enterprise Data Quality, and also sharing ODI development best practices around topics such as ETL high-availability and scripted deployment.

Finally – if all of this sounds a bit heavy going – the secret about the BI Forum is that we try to make it fun as well. On the night before the main event opens we hold a drinks reception followed by an Oracle keynote, and on both nights we host gala dinners with open bars and lots of opportunities to meet everyone at the event. The vast majority of people who come to one BI Forum event then come back every following year (if not, we come and get you anyway, Christian…), and if you’ve not been before, make sure you sign-up soon before all of the places go. Details of the event including links to the registration form are on the BI Forum 2013 homepage, and I’ll be back in a couple of days time to talk about what we’ve got planned for Atlanta this year.
Fun with CHAR
There are a mixture of types too, fixed width, csv, etc. Thankfully, I've read enough of the code now that it's relatively easy to figure out. The fixed width variety is what this is about though.
In much of the code, there's a type that's defined, something like this:
type my_record is recordThat's then used to receive assignments from incoming variables. I'll hardcode my variables for this exercise.
(
column_01 CHAR(10),
column_02 CHAR(10),
column_03 CHAR(10)
);
declareLittered throughout those assignments though, are things like LPAD and RPAD. You're going to say, "well, yeah, if it's a number, you may want it right aligned or something." Fair enough. But I'm not talking about those, I'm talking about this:
type my_record is record
(
column_01 CHAR(10),
column_02 CHAR(10),
column_03 CHAR(10)
);
l_rec my_record;
begin
l_rec.column_01 := '1';
l_rec.column_02 := '3';
l_rec.column_03 := '6';
end;
l_rec.column_01 := rpad( ' ', 10 );Ostensibly, these columns once held data. Instead of forcing the client (application, business, whatever) to change their processing bit, the file was left the same. Makes sense.
l_rec.column_02 := '3';
l_rec.column_03 := RPAD( ' ', 10 );
Then I started to think about it...it's a CHAR. CHAR is already fixed width. To wit:
drop table t purge;I inserted a single space in the first record. It has a length of 10 despite only inserting a single character there.
create table t
(
x CHAR(10)
);
insert into t ( x ) values ( ' ' );
insert into t ( x ) values ( null );
select
rownum,
length( x ),
x
from t;
ROWNUM LENGTH(X) X
---------- ---------- ----------
1 10
2
So what's the purpose of those RPAD( ' ', 10 ) calls? I'm not sure.
The only reason I even began to think about it was that I ran across one type set up with VARCHAR data types. There it makes sense, using RPAD I mean. With the CHAR field, it's a waste of typing IMO. Perhaps it was just for readability...who knows?
ODI 11.1.1.7 Also Available for Download
As well as the 11.1.1.7 release of OBIEE, this week saw the 11.1.1.7 release of Oracle Data Integrator 11g, Oracle’s long-term strategic data integration tool. Unlike the OBIEE 11.1.1.7 release which came with a bunch of new functionality outlined in our post earlier in the week, ODI 11.1.1.7 is really all about getting the product certified for use in the upcoming BI Apps 11.1.1.7.1 release, and most of the new functionality either relates to that certification or delivers on some long-term customer requirements such as support for WebSphere.
This PDF on the Oracle website provides more details on what’s new in this release, but to summarise these include the aforementioned support for IBM WebSphere as an alternative to Oracle WebLogic for hosting JEE agents; improvements to XML parsing and handling (apparently functionality that was brought forward from the 12c release); the ability to run standalone agents using SSL for improved security; and out-of-the-box support for Oracle EDQ as a package step using its own ODI Tool.

If the last time you checked-out ODI 11g’s new features was the initial 11.1.1.3 release, you might also want to check out some of the new features that have come with the subsequent 11.1.1.5 and 11.1.1.6 patchsets, many of which we’ve covered on this blog and in other publications including:
- Load Plans, a more flexible and straightforward way to orchestrate ODI scenarios, including support for exception handling and restartability
- The ability to access “big data” sources using Apache Hadoop, MapReduce and Hive
- Integration with Oracle Enterprise Data Quality, a full set of data quality auditing, case-management and correction tools
- New mapping (interface) features such as set-based operators, and the ability to turn temporary interfaces into SQL subqueries
Rittman Mead have also been working closely with Oracle’s ODI product development team over the last year, providing feedback on the upcoming ODI 12c release and the upcoming BI Apps 11g version that uses ODI as the ETL service. Keep an eye on this blog as these products go onto general release, and hopefully we’ll see some of you at the Oracle Data Integration Masterclass right before the Rittman Mead BI Forum 2013 events in Brighton and Atlanta, just over a month from now.
Upgrading OBIEE to 11.1.1.7
OBIEE 11.1.1.7 was released earlier this week, and brings with it the usual delights and challenges of the Fusion Middleware (FMW) platform. Because OBIEE is part of Fusion Middleware, and because it is made up of OBIEE-specific components (such as the BI Server) and FMW-generic components (such as Web Logic domains), the upgrade documentation isn’t quite so simple as it could be in a standalone product.
There are two options for upgrading to OBIEE 11.1.1.7 from an earlier release of 11g:
- “Out-of-Place”, which in the olden days of software was called “reinstallation”. A brand new copy of OBIEE is installed, and the relevant configuration files (RPD, web cat, and so on) are migrated to the new installation, and then the old one decommissioned.
- “In-Place” – OBIEE is upgraded where it resides, and what was a 11.1.1.6 (or .3 or .5) OBIEE then becomes 11.1.1.7
There are pros and cons to each approach, including:
- In-place upgrade are big bang, where as out-of-place lets you migrate in a phased manner if you want to. Depending on the scale of the changes in the upgrade, or the amount of regression testing you want to do, this may be a factor
- In 11.1.1.3 -> 11.1.1.5 -> 11.1.1.6, the in-place upgrade was complex and infamous for being buggy. From what I’ve seen of two separate in-place 11.1.1.6 -> 11.1.1.7 upgrades it is simple to do and with no problems discovered yet.
- An out of-place upgrade is ‘safer’ because you are just doing a reinstallation. When you do an in-place upgrade you are trusting the tools to not fail and/or damage your current installation. The counter argument to this is that you should always have backups, and in doing an out-of-place upgrade you may yourself make mistakes or have trouble migrating content
Christian Screen wrote about the advantages of an out of place upgrade, but I think some of these could be less relevant now. Given the simplicity of the in-place upgrade, out-of-place strikes me as more complex and risky.
Getting startedTake a full backup of your OBIEE installation. You do take backups, right? If not, now is a very good time to start, upgrade or not. See the backup documentation, here
Next, regardless of upgrade method, you need to download the complete OBIEE 11.1.1.7 installers for your platform, from here
Out-of-place upgradeThe installation process for 11.1.1.7 is identical to previous versions (except you now have the option to install Essbase too).
Once you have installed 11.1.1.7, you migrate your RPD, web catalog, security configuration and component configuration over from your existing installation. You also need to think if you want to keep any of your RCU table content (Usage Tracking, Event Polling, etc), and manually migrate this, watching out for if the RCU table definitions have changed.
In-place upgradeAn in-place upgrade will upgrade your existing 11g (whether 11.1.1.3, 11.1.1.5, or 11.1.1.6) to 11.1.1.7. It is documented here: Moving from 11.1.1.3, 11.1.1.5, or 11.1.1.6 to 11.1.1.7.
I have tested it on Linux and Windows, from both 11.1.1.6.2 BP1 and 11.1.1.6.9, to 11.1.1.7.
Be aware that previous in-place upgrades (11.1.1.3 -> 11.1.1.5 -> 11.1.1.6) had issues, and I have not tested in-place upgrading 11.1.1.3/11.1.1.5 -> 11.1.1.7.
The outline of an in-place upgrade is as follows:
1. Do a Software Only Install of 11.1.1.7, into the existing installation
2. Run the Patch Set Assistant to update the RCU schemas for 11.1.1.7
3. Run the Configuration Assistant to update the BI domain
This can be done using the GUI tools, or scripted using response files.
In-place upgrade using the GUI, step by step- Shutdown the BI stack (WLS Admin Server, Managed Server, Node Manager, OPMN)
- Run 11.1.1.7 installer (
runInstalleron Linux orsetup.exeon Windows)- Select Software Only Install
- Specify the existing FMW_HOME (this should be pre-populated in the OUI installer screen)
- Click Next through the following screens, until the installation begins.
- Select Software Only Install
- Run the Patch Set Assistant (PSA). This is located in
oracle_common/bin/and is calledpsaon Linux orpsa.cmdon Windows- Under Available upgrades select Oracle Business Intelligence
- Before running the PSA you should make sure you’ve got a valid database backup (just as you should for the whole BI stack). You are given checkbox to tick to confirm you’ve taken the backup.
- Enter the details of the database where your existing RCU tables reside, along with a DBA-privileged user.
When specifying DB credentials, note that it is different from the RCU screens – if connecting as SYSDBA you must enterSYSDBA AS DBA(as you would if connecting from sqlplus etc)Click Connect. The PSA then brings back a list of RCU schemas; select the MDS one
- Complete the following screen for the BIPLATFORM schema, and continue through the PSA. Click Upgrade.
- Under Available upgrades select Oracle Business Intelligence
- Start Node Manager
- On Windows, this is a service
- On Linux, go to your FMW home and run
./wlserver_10.3/server/bin/startNodeManager.sh
- On Windows, this is a service
- Start the Web Logic Administration Server (AdminServer)
- On Windows, go to your FMW home folder and then double click on
user_projects\domains\bifoundation_domain\bin\startWebLogic.cmd
- On Linux, go to your FMW home and run
./user_projects/domains/bifoundation_domain/bin/startWebLogic.sh
Wait until the Admin Server is running – you should be able to login to the WLS Console at
http://<yourhost>:7001/console, and the Admin Server command window will showServer started in RUNNING mode
- On Windows, go to your FMW home folder and then double click on
- Run the Configuration Assistant
- The script is in the
Oracle_BI1\binfolder- On Windows, go to your FMW home folder and then double click on
Oracle_BI1\bin\config.bat
- On Linux, go to your FMW home and run
./Oracle_BI1/bin/config.sh
- On Windows, go to your FMW home folder and then double click on
- Select Update BI Domain and then enter the details of your domain.
Note on Windows, if you are using a loopback adaptor (which you should on a DHCP host such as often found on a VM) make sure the hostname that you specify is the one you have in your hosts file for the loopback IP.
- The script is in the
- The Configuration Assistant will [re]start AdminServer, the managed server, and OPMN components.
- Flush your browser cache, and then login to OBIEE and enjoy the new version number and the sloping tabs of Fusion skin
The three utilities that are run to do an in-place upgrade can all be run from the commandline only, using response files to specify the values and actions.
To create a response file, you can run the utility in GUI mode and then choose the Save option on the summary screen. You can also take an existing response file and modify it to suit your target environment.
Passwords in response filesWhen you create a response file from the GUI, passwords are not saved in clear text.
- In the Patch Set Assistant response file passwords are saved encrypted, so if you don’t need to change them you can leave this file unmodified. Alternatively, you can change
encryptedforcleartext:MDS.encryptedSchemaPassword = 0562A61E90282BD7CE8DDA04D618E53960D533DC3903301F63 MDS.encryptedDbaPassword = 05115EE31A473F46DFCB9FB23C99C3A348DD8D3D2C4FBA68BFwould become
MDS.cleartextSchemaPassword = Password01 MDS.cleartextDbaPassword = Password01And do the same for the BIPLATFORM schema and DBA passwords in the file
- For the Configuration Assistant response file you need to replace all instances of
<SECURE VALUE>with the actual password
- Shutdown the BI stack (WLS Admin Server, Managed Server, Node Manager, OPMN)
- Run the Software Only install of 11.1.1.7
cd bishiphome\Disk1\ setup.exe -silent -responsefile c:\response_files\1_swonly.rsp - Run the Patch Set Assistant
cd oracle_common\bin\ psa.bat -response c:\response_files\2_psa.rsp - Start Node Manager
- Start AdminServer
cd user_projects\domains\bifoundation_domain\bin\ startWebLogic.cmdWait until the Admin Server is running – you should be able to login to the WLS Console at
http://<yourhost>:7001/console, and the Admin Server command window will showServer started in RUNNING mode - Run the Configuration Assistant script.
cd Oracle_BI1\bin\ config.bat -waitforcompletion -silent -responseFile c:\response_files\3_config_ass.rspThe
-waitforcompletionis optional but useful on Windows as it stops the cmd window which you launch the script from disappearing
You can download the response files used from here, or generate them yourself using the utilities in GUI mode.
In-place upgrade of a scaled-out BI domainIf you have a scaled-out (horizontal cluster) BI domain, the in-place upgrade is simple for the additional nodes. Since the domain and RCU upgrades are handled by the Configuration Assistant and Patch Set Assistant respectively, you just need to update the binaries, using the Software Only Install
On each additional node:
- Shutdown all WLS, OPMN
- Run 11.1.1.7 installer
- Select Software Only Install
- Specify existing FMW_HOME (should be pre-populated in OUI)
- Takes c.10 minutes
- Start up Managed Server, and OPMN
- That’s it
If you don’t flush your browser cache then you can expect to see the header missing from OBIEE :

Flush your browser cache and then refresh / relogin, and it should display correctly:

Error: INST-08112: The Admin Server is listening on multiple network interfaces and the default listening address 192.168.248.160 cannot be accessed from the scale out host
Diagnostics: Windows machine has loopback (10.10.10.10) plus a NAT NIC (IP 192.168.248.160)
Resolution: In WLS console, set the Listen Address for the Admin Server to 10.10.10.10
Known issues from the release notesThese are currently listed in the release notes and are related to in-place upgrades:
- Configuration Assistant for Update BI Domain Steps Incorrectly Reports an OCM Configuration Error at 91% Complete when Moving from Earlier 11g Releases
- Managed Server for Oracle BI Encounters Issues After Upgrade from Release 11.1.1.3
- SSL Configured in Oracle BI EE 11.1.1.5.x or 11.1.1.6.x Fails After Upgrading to 11.1.1.7.0
OBIEE 11.1.1.7 Now Available for Download
OBIEE 11.1.1.7 became generally available (“G.A.”) over the long Easter weekend, for Microsoft Windows 32/64-bit and all the usual Unix and Linux platforms. We’ll have more on this blog over the next few weeks on the details of some of the most important new features, but at a high-level here’s what’s new:
The OBIEE dashboard and analyses have a new look-and-feel to match the Oracle Fusion Apps, though you can switch-back to the old look if you want to (both are user/developer-selectable themes). There are a number of new visualisation types including performance tiles (as shown in the screenshot below), and 100% stacked bar and waterfall charts.

There’s also a new view suggestion feature, that recommends the best visualisation based on the type of analysis or comparison you’re trying to put together.

There’s also improvements to the Trellis Chart view including the ability to associate action links with individual trellis cells; an ability to freeze headers in table and pivot views, and a number of other “fit and finish” changes to improve overall product quality.
With the installer, Essbase can now be installed alongside the main OBIEE server products, and as we saw in a preview with OBIEE 11.1.1.6.2 BP1, Essbase then appears within the main Oracle BI Domain and can be monitored, and stopped and started, from within Enterprise Manager Fusion Middleware Control.

There’s also a number of improvements and new features to the Administration tool and BI Server, including support for Apache Hadoop sources (via Hive and MapReduce), some incremental improvements to the Model Checker and statistics gathering, and some changes and improvements around MUD (it wouldn’t an OBIEE patch release without some changes to how MUD works). Full details of everything are in the updated product docs, and we’ll cover some of these new features, particularly the Apache Hadoop integration, in some postings in the near future.
Dodeca Dashboard Screenshot

We think it looks great. How about you?
Oracle SQL Developer Data Modeler 3.3 now available

http://www.oracle.com/technetwork/developer-tools/datamodeler/overview/index.html
Attention all data modellers - we are pleased to announce the release of SQL Developer Data Modeler 3.3. This release includes a new search, reports can be generated from search results, extended Excel import and export capabilities and more control and flexibility in generating your DDL. Here are a few links to get you started:
- For downloads see here: http://www.oracle.com/technetwork/developer-tools/datamodeler/downloads/index.html
- For information about new features see here: http://www.oracle.com/technetwork/developer-tools/datamodeler/datamodeler-33-newfeatures-1869156.html
- The release notes are available here: http://www.oracle.com/technetwork/developer-tools/datamodeler/data-modeler-releasenotes33-1869517.html
- The documentation can be found here: http://www.oracle.com/pls/topic/lookup?ctx=E29463_01&id=homepage
For data warehouse data modellers there are some very important new features around logical models, multi-dimensional models and physical models. For example:
- Support for surrogate keys during engineering to relational model which can be set on each entity.
- More flexible transformation to relational model with mixed engineering strategies based on “engineer” flag and subtypes setting for each entity in the hierarchy
- Export to “Oracle AW” now supports Oracle 11g OLAP
- Support for role playing dimensions in export to Oracle AW.
- Level descriptive attributes can be created without mapping to attribute in logical model.
- Multidimensional model can be bound directly to relational model.
- Support EDITIONING option on views, and support for invisible indexes in Oracle 11g physical model.
Lots of great features that will make life a lot easier for data warehouse teams.
On Work/Life Balance
(It’s been said that many people in high tech cannot balance their personal and work lives. Here’s another way to look at it: Their jobs are so interesting that it’s difficult to figure out what is work and what is play. Creative projects produce this kind of excitement.)
I've struggled with the work/balance thing. I'm better now than I was a year ago, but it takes a lot of work. That quote definitely illustrates one aspect of how it could get so out of balance...I thoroughly enjoy what I do.
Analysis Tools...
In that regard, I'm doing a fair amount of analysis right now. So I need help finding two tools:
1. A tool that will allow me to map (visually or otherwise) a single data point from source to target(s). I typically use Excel. It's easy to use and available everywhere. Where it falls apart, slightly, is that a single data point may have one or more middle steps (i.e. not target) and one or more targets. I think I want something like this:
Keep in mind though, I have potentially hundreds of columns in a system with thousands upon thousands of...
A couple of people have suggested using an ETL tool like Informatica, Pentaho or ODI. Yes. But I don't see it yet. Besides, I don't want to map to actually do something...most of the conversion has already been done and I'm picking it up at a particular step (near the beginning). What's missing is that mapping document that I want to create for everything...but that's another story.
2. I want to to look at a view and know where those stupid unaliased columns are sourced from. A very, very basic example:
SELECTVAR_VALUE_01 and VAR_VALUE_02, why don't you have aliases? Why did your developer neglect you so? Why can't every single developer just remember that someone, someday, will have to look at their code? Please? Pretty please? Or did you know it would be me and thus you did it on purpose? If so, I'm not talking to you again.
hs.column_1,
hs.column_2,
add.address_line_1,
var_value_01,
var_value_02
FROM
big_table hs,
address_table add,
other_random_table ran
WHERE hs.address_id = add.address_id
AND hs.random_id = ran.random_id
Anyway, it doesn't take me very long to figure where those columns are sourced from. What if there are 10's of those in a view with 100's of columns? Yes, not enjoyable. What if there are many views just like this that you have to analyze? Yes!
Data Dictionary?!
Not yet. DBA_TAB_COLUMNS? Nope. Come on! It's got to be there somewhere...when you compile a view Oracle checks to make sure everything is a-ok right? Doesn't it store that information somewhere? It must!. I took to Twitter, naturally, and Steve Karam, aka @OracleAlchemist found this possible gem:
@oraclenerd @thatjeffsmith @eaviles94 Check this out Chet. rwijk.blogspot.com/2008/10/dbadep…
— Steve Karam (@OracleAlchemist) March 26, 2013I'm also requesting a feature in SQL Developer...or, trying to anyway. Back channels of course.
I've done this kind of analysis in the past, but it is usually a one off, so there never seemed to be a need to make it repeatable. Now, there is a need. A giant need. If you've got any ideas for me, let me know...
Happy Birthday Applied OLAP
*One of my favorite quotes is from the playwright, George Bernard Shaw:
"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man."
Am I unreasonable? Why yes. Yes I am!
Oracle Endeca Information Discovery 3.0 Now Available
You probably missed it over the weekend, but a couple of days ago I posted an article on our blog about Endeca Information Discovery 2.4, and whether it could be a viable alternative to Oracle BI’s Analysis Editor (aka “Answers”). In that posting I said that, whilst the in-memory search/analytic database that powers Endeca Information Discovery could potentially be a great back-end for OBIEE, the current generation of dashboarding and analysis tools weren’t quite user-friendly enough for Oracle BI “power users”. I did say though that the next, 3.0 release of Endeca Information Discovery might improve on this usability though, and in fact over that same weekend, Endeca Information Discovery 3.0 actually became available for download on Oracle’s Edelivery site. So what’s in the 3.0 release, and have there been any usability improvements in the visualisation components?
Oracle Endeca Information Discovery 3.0 actually has changes and new features in a number of areas. Probably the most significant is in the area of infrastructure; the Endeca Server, Studio Server and Integrator Server all now run on Oracle WebLogic Server as well as Tomcat, with Endeca Server actually requiring WebLogic now to host the java-based server engine. I’m not sure at this point whether Endeca Server will include a limited-use WebLogic license, or whether you’ll need to license it separately, but the Endeca Server goes into WebLogic and can now be clustered over several WebLogic managed server nodes, with one Endeca Server per cluster providing read/write access to the Endeca Server Datastore (now called “Data Domain”), and multiple Endeca Servers providing read-access with traffic passed across using a load-balancer. A cluster can contain multiple Endeca Servers, and multiple data domains, and presumably you’d need an Enterprise Edition license for WebLogic as opposed to Standard Edition if you want to make use of this clustering.
Endeca Information Discovery Studio and Integrator are now formally separate products in terms of packaging, both can still run on Tomcat but both can now also run on WebLogic, though the installation process is quite tricky and you’ll need to be aware of Java minimum versions, libraries that also need to go into WebLogic and so forth. In fact installation is quite a lot harder than with version 2.4, as the Quick Installation option has now gone and every product needs to be separately installed, deployed to the application server and so forth. You’ll need quite a bit more familiarity with WebLogic or Tomcat compared to 2.4 to get this all up and running, but it’s do-able and took me about an hour to get going.
Integrator, visually, looks just the same as before, but there are a number of enhancements and new features, and under-the-cover changes to reflect the move to WebLogic for the Endeca Server. The Quickstart project is now renamed “Getting Started”, is again slightly more involved to install, but once running has the same content, albeit with the new Studio 3.0 look-and-feel. Studio itself can now run in WebLogic (though it’s not mandatory), like Endeca Server can now be clustered, and has an interesting new feature associated with it called the Provisioning Service that allows end-users to upload files (in this initial release, just Excel files) and automatically create an Endeca Server data domain so that it can then be analysed in Studio.

Studio itself has a new look and feel, and there are a bunch of incremental improvements to the data visualisation and analysis component, though nothing game changing. These include:
- The Crosstab component being renamed to Pivot Table component, which has more or less the same functionality as before but now uses views, rather than raw EQL, for its data source
- The concept of Studio applications, rather than Liferay’s communities
- Each components’ Preferences button and dialog is now renamed Configuration, and what was the old Configuration dialog (used for component access permissions) has now gone, as part of a wider overhaul of user groups and organisation
- Studio 3.0 applications are now bound to a single datastore/data domain data source, whereas 2.x ones could bind individual application page components to separate data sources
- Incremental improvements to the search and navigation components, which we’ll go into more detail on in a future post
So to go back to the original question, around whether Endeca Information Discovery 3.0 now provides a more OBIEE-like environment for end-user reporting, particularly “power users” – well no, there’s nothing significantly different in this release, the pivot table, results table, chart and other components are much the same, but at least there’s a bit more consistency now around selecting and configuring their data sources (via “views”, the nearest Endeca has to the concept of subject areas and dimension/attribute/measure metadata).

Views and their administration tasks have now been moved to a new part of Studio, with this functionality removed from the Control Panel and moved to a new Application Settings area, which also configures the application itself, and the maintenance of attribute groups.

Of course – OLAP-style complex ad-hoc analysis isn’t really the main use case for Endeca Information Discovery, which is really aimed first and foremost at data discovery and access to unstructured data, with graphing and analysis to my mind a “bonus”. That said, the ad-hoc analysis capabilities at least with this initial 3.x release haven’t changed all that much compared to the 2.x releases, with the improvements and new features being mostly focused around infrastructure, security and manageability, but with some incremental improvements to the end-user and developer experience.
Using OBIEE against Transactional Schemas Part 5: Complex Facts
I’ve finally gotten around to finishing this series… I believe it’s actually been a year in the making. I’m planning on being more proactive with getting content on the blog. Actually… the main reason I’m putting this one to bed is I have some exciting posts planned in the coming months, and I feel guilty writing those without closing this one off. For the recap, here are the previous entries:
Using OBIEE against Transactional Schemas Part 1: Introduction
Using OBIEE against Transactional Schemas Part 2: Aliases
Using OBIEE against Transactional Schemas Part 3: Simple Dimensions and Facts
Using OBIEE against Transactional Schemas Part 4: Complex Dimensions
In this post, I want to talk about complex facts. The OBIEE metadata layer is a very powerful force indeed, and I can’t demonstrate all the possibilities for shifting a normalized schema into a dimensional one. Instead, I thought I would show one really interesting example… something extremely powerful, in hopes that others reading this series might find inspiration to try something daring themselves.
OLTP developers design their systems with one task in mind: facilitating the functions of the transactional system. Even when standard reporting is delivered as part of an OLTP system, the flexibility of the schema for these purposes is usually an afterthought because the stated goal of these systems is (and should be): delivering transactions with as much efficiency as possible. It should come as no surprise to BI developers when these systems store unrelated activities (at least from a source-system perspective) in completely separate tables. We often want to combine these activities in a coherent structure — usually an activity-based fact table — to be able to GROUP BY the Activity Type in a single report. It’s usually these advanced reporting requirements that cause us to pull out SQL Developer Data Modeler and start designing a data warehouse, which is probably a good idea. But for the reasons mentioned in the introductory blog post in this series, a data warehouse isn’t always in the cards. Desperate for their data, business analysts pull out that old Swiss Army knife — Excel — and start combining data sets and pivoting them, generating the reports that the business needs. We have to understand the need to deliver content to the business quickly, and the business can’t always wait for mounds of ETL code before making very important decisions.
The Business Model and Mapping layer available to us in the OBIEE Semantic layer provides us the ability to present a logical fact table, such as Fact – Customer Activity, as a shell for many types of activities, and even combine those activities in intelligent and performant ways. The example we’ll construct from the Customer Tracking application (which has been the basis for all the previous entries) involves reporting against the activities stored in the EBA_CUST_CUST_ACTIVITY table. As mentioned earlier, this table tracks explicit CRM activities related to particular Customers and Contacts, including meetings, phone calls, etc. We have several activities that we would find useful to combine with these explicit events, such as Customer creation dates, Customer inactivity dates, etc. These implicit activities would look great combined in our Fact – Activity Fact table so we could include the type of activity in our GROUP BY list, and return the results when we drill down to the detail level for a particular customer. We could try to build this integration in the dashboard itself to show the account creation date from Dim – Contact on the same dashboard with dates of CRM activities. But we should all admit that, it’s a better solution to build this functionality in the Business Model if it’s possible. But is this feasible without ETL? Would we have to stitch this together using perhaps presentation variables, or worse: have business analysts dump the results of two separate analyses into an Excel spreadsheet and produce the report they need outside of OBIEE?
So we want to add an implicit event to our Fact – Customer Activity logical table: the creation of a Customer account, which is represented in Customer Tracker with the CREATED_ON column in the EBA_CUST_CUSTOMERS table. We’ll start by adding another logical table source using this source table and provide the same hard-coded value of 1 to the Activity Count measure:
Remember: this is still a factless fact table, and we have to provide a measure to the OBIEE semantic layer which allows aggregate queries. We have a little bit more to do with these logical table sources, but we need to make similar changes to several of our logical dimension tables as well before we complete this task. I’ll preview two logical table sources below (both called Customers), and then explain them further down. The one on the left is a new logical table source for the Dim – Customer Activity table, while the second is for the Dim – Activity Date table:
In the Dim – Customer Activity logical dimension table, we’ll create a new logical table source also based on the EBA_CUST_CUSTOMERS table. As this is a dimension table and requires that we map a value for the primary key, we simply use the physical primary key from the EBA_CUST_CUSTOMERS table. Notice that we have constructed a new logical column called Activity Source Type. This attribute will allow us to differentiate our explicit activities, such as those sourced directly from the EBA_CUST_CUST_ACTIVITY table, from this new implicit activity that we are constructing from the EBA_CUST_CUSTOMERS table. We also provide several hard-coded values to other attributes in this dimension table to compensate for the lack of values for those attributes for our implicit activities.
We also need to provide an additional logical table source for our date dimension Dim – Activity Date. This is where the magic starts to happen. The creation of the Customer account is actually the event that we are interested in reporting on, so it’s this date that ties all the activities together. We’ll map the CREATED_ON date from EBA_CUST_CUSTOMERS to the single Activity Date column that we have defined in the logical table source and let the other calculated measures provide the remaining attributes necessary in the logical dimension table. However, since the CREATED_ON column in the EBA_CUST_CUSTOMERS table is defined as a DATETIME attribute in the physical model (and we want it to remain that way when we view it as a dimensional attribute), we need to modify the expression slightly in the logical table source to remove the time element. As the calculation is not visible in the image above, I’ve listed it here:
Cast(“orcl”.”".”CUST_TRACK”.”EBA_CUST_CUSTOMERS”.”CREATED_ON” AS DATE )
The only remaining dimension table is Dim – Contact, but we don’t need to make any changes here, as the EBA_CUST_CUSTOMERS table is already adequately represented in the logical table source. Because we are bringing a logical table source associated with this logical dimension table into our logical fact table, the BI Server already understands how to construct the join (or more correctly, the lack of a join) with this logical dimension.
Now, we can return to the logical table sources for Fact – Customer Activity to exploit one final piece of sheer magic from the BI Server. For each logical table source, we select the Content tab and make the following changes:
There’s a few really important bits that we are unlocking here. First: we need to check the option next to This source should be combined with other sources at this level. Ordinarily, logical table sources are usually selected by the BI Server using an OR evaluation: one LTS per combination of fact and dimension joins. (There are exceptions to this, but I’m distilling the content down a bit to hopefully make this easier to follow). This setting instead dictates that the LTS’s should be evaluated with an AND instead. We are instructing the BI Server to combine these two logical table sources as if they existed in the same table. This is done using a logical union, which manifests itself as an actual UNION statement in the physical SQL when both sources exist in the same database. We can see this behavior by examining the physical SQL generated by an analysis using Fact – Activity Fact:
The more impressive functionality comes when we make use of the expression specified in Fragmentation content. This logic instructs the BI Server to do a kind of partition pruning that is similar to the behavior of the Oracle Database optimizer when dealing with partitioned tables. What we have constructed here is really a form of logical partitioning, with the source for the Fact – Customer Activity logical fact table existing in two logical physical sources, or partitions. So far, our query wasn’t precise enough to allow the BI Server to prune down to a single logical table source. However, when we choose to filter on the Activity Source Type logical column either directly or by drilling down, which is the same column we defined in our Fragmentation content section, the BI Server removes the UNION statement and generates a query against a single logical table source:
It’s still a best practice to build conformed data warehouses, and transactional reporting should be seen as a means to an end. Regardless, there will always be reasons to do transactional reporting, and the power of the Business Model and Mapping layer provides us with capabilities to deliver analyses and dashboards to replace the Excel spreadsheets that often form the cornerstone of transactional reporting. I hope you’ve enjoyed this series… and perhaps the long delays between each entry kept you on the edge of your seat for each new installment. Perhaps not. Regardless… drop me some comments and let me know what you think.









