Skip navigation.

Rittman Mead Consulting

Syndicate content
Delivering Oracle Business Intelligence
Updated: 9 hours 2 min ago

Combining Spark Streaming and Data Frames for Near-Real Time Log Analysis & Enrichment

Sat, 2015-08-01 08:51

A few months ago I posted an article on the blog around using Apache Spark to analyse activity on our website, using Spark to join the site activity to some reference tables for some one-off analysis. In this article I’ll be taking an initial look at Spark Streaming, a component within the overall Spark platform that allows you to ingest and process data in near real-time whilst keeping the same overall code-based as your batch-style Spark programs.

NewImage

Like regular batch-based Spark programs, Spark Streaming builds on the concept of RDDs (Resilient Distributed Datasets) and provides an additional high-level abstraction called a “discretized stream” or DStream, representing a continuous stream of RDDs over a defined time period. In the example I’m going to create I’ll use Spark Streaming’s DStream feature to hold in-memory the last 24hrs worth of website activity, and use it to update a “Top Ten Pages” Impala table that’ll get updated once a minute.

NewImage

To create the example I started with the Log Analyzer example in the set of DataBricks Spark Reference Applications, and adapted the Spark Streaming / Spark SQL example to work with our CombinedLogFormat log format that contains two additional log elements. In addition, I’ll also join the incoming data stream with some reference data sitting in an Oracle database and then output a parquet-format file to the HDFS filesystem containing the top ten pages over that period.

The bits of the Log Analyzer reference application that we reused comprise of two scripts that compile into a single JAR file; a script that creates a Scala object to parse the incoming CombinedLogFormat log files, and other with the main program in. The log parsing object contains a single function that takes a set of log lines, then returns a Scala class that breaks the log entries down into the individual elements (IP address, endpoint (URL), referrer and so on). Compared to the DataBricks reference application I had to add two extra log file elements to the ApacheAccessLog class (referer and agent), and add some code in to deal with “-“ values that could be in the log for the content size; I also added some extra code to ensure the URLs (endpoints) quoted in the log matched the format used in the data extracted from our WordPress install, which stores all URLs with a trailing forward-slash (“/“).

package com.databricks.apps.logs
case class ApacheAccessLog(ipAddress: String, clientIdentd: String,
 userId: String, dateTime: String, method: String,
 endpoint: String, protocol: String,
 responseCode: Int, contentSize: Long, 
 referer: String, agent: String) {
}
 
object ApacheAccessLog {
val PATTERN = """^(\S+) (\S+) (\S+) \[([\w\d:\/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) ([\d\-]+) "([^"]+)" "([^"]+)"""".r
def parseLogLine(log: String): ApacheAccessLog = {
 val res = PATTERN.findFirstMatchIn(log)
 if (res.isEmpty) {
 ApacheAccessLog("", "", "", "","", "", "", 0, 0, "", "")
 }
 else {
 val m = res.get
 val contentSizeSafe : Long = if (m.group(9) == "-") 0 else m.group(9).toLong
 val formattedEndpoint : String = (if (m.group(6).charAt(m.group(6).length-1).toString == "/") m.group(6) else m.group(6).concat("/"))
 
 ApacheAccessLog(m.group(1), m.group(2), m.group(3), m.group(4),
 m.group(5), formattedEndpoint, m.group(7), m.group(8).toInt, contentSizeSafe, m.group(10), m.group(11))
 }
 }
}

The body of the main application script looks like this – I’ll go through it step-by-step afterwards:</>

package com.databricks.apps.logs.chapter1

import com.databricks.apps.logs.ApacheAccessLog
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SaveMode
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.streaming.{StreamingContext, Duration}

object LogAnalyzerStreamingSQL {
  val WINDOW_LENGTH = new Duration(86400 * 1000)
  val SLIDE_INTERVAL = new Duration(10 * 1000)

  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("Log Analyzer Streaming in Scala")
    val sc = new SparkContext(sparkConf)

    val sqlContext = new SQLContext(sc)
    import sqlContext.implicits._
    
    val postsDF = sqlContext.load("jdbc", Map(
                  "url" -> "jdbc:oracle:thin:blog_refdata/password@bigdatalite.rittmandev.com:1521:orcl",
                  "dbtable" -> "BLOG_REFDATA.POST_DETAILS"))
                  
    postsDF.registerTempTable("posts")

    val streamingContext = new StreamingContext(sc, SLIDE_INTERVAL)

    val logLinesDStream = streamingContext.textFileStream("/user/oracle/rm_logs_incoming")

    val accessLogsDStream = logLinesDStream.map(ApacheAccessLog.parseLogLine).cache()

    val windowDStream = accessLogsDStream.window(WINDOW_LENGTH, SLIDE_INTERVAL)

    windowDStream.foreachRDD(accessLogs => {
      if (accessLogs.count() == 0) {
        println("No logs received in this time interval")
      } else {
        accessLogs.toDF()

        // Filter out bots 
        val accessLogsFilteredDF = accessLogs
                                      .filter( r => ! r.agent.matches(".*(spider|robot|bot|slurp|bot|monitis|Baiduspider|AhrefsBot|EasouSpider|HTTrack|Uptime|FeedFetcher|dummy).*"))
                                      .filter( r => ! r.endpoint.matches(".*(wp-content|wp-admin|wp-includes|favicon.ico|xmlrpc.php|wp-comments-post.php).*")).toDF()
                                      .registerTempTable("accessLogsFiltered")
                                      
        val topTenPostsLast24Hour = sqlContext.sql("SELECT p.POST_TITLE, p.POST_AUTHOR, COUNT(*) as total FROM accessLogsFiltered a JOIN posts p ON a.endpoint = p.POST_SLUG GROUP BY p.POST_TITLE, p.POST_AUTHOR ORDER BY total DESC LIMIT 10 ")                 
        
        // Persist top ten table for this window to HDFS as parquet file
        
        topTenPostsLast24Hour.save("/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet", "parquet", SaveMode.Overwrite)      
      }
    })

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

The application code starts then by importing Scala classes for Spark, Spark SQL and Spark Streaming, and then defines two variable that determine the amount of log data the application will consider; WINDOW_LENGTH (86400 milliseconds, or 24hrs) which determines the window of log activity that the application will consider, and SLIDE_INTERVAL, set to 60 milliseconds or one minute, which determines how often the statistics are recalculated. Using these values means that our Spark Streaming application will recompute every minute the top ten most popular pages over the last 24 hours.

package com.databricks.apps.logs.chapter1
import com.databricks.apps.logs.ApacheAccessLog
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SaveMode
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.streaming.{StreamingContext, Duration}
object LogAnalyzerStreamingSQL {
 val WINDOW_LENGTH = new Duration(86400 * 1000)
 val SLIDE_INTERVAL = new Duration(60 * 1000)

In our Spark Streaming application, we’re also going to load-up reference data from our WordPress site, exported and stored in an Oracle database, to add post title and post author values to the raw log entries that come in via Spark Streaming. In the next part of the script then we define a new Spark context and then a Spark SQL context off-of the base Spark context, then create a Spark SQL data frame to hold the Oracle-sourced WordPress data to later-on join to the incoming DStream data – using Spark’s new Data Frame feature and the Oracle JDBC drivers that I separately download off-of the Oracle website, I can pull in reference data from Oracle or other database sources, or bring it in from a CSV file as I did in the previous Spark example, to supplement my raw incoming log data. 

def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName("Log Analyzer Streaming in Scala")
 val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
 import sqlContext.implicits._
 
 val postsDF = sqlContext.load("jdbc", Map(
 "url" -> "jdbc:oracle:thin:blog_refdata/password@bigdatalite.rittmandev.com:1521:orcl",
 "dbtable" -> "BLOG_REFDATA.POST_DETAILS"))
 
 postsDF.registerTempTable("posts")

Note also how Spark SQL lets me declare a data frame (or indeed any RDD with an associated schema) as a Spark SQL table, so that I can later run SQL queries against it – I’ll come back to this at the end).

Now comes the first part of the Spark Streaming code. I start by defining a new Spark Streaming content off of the same base Spark context that I created the Spark SQL one off-of, then I use that Spark Streaming context to create a DStream that reads newly-arrived files landed in an HDFS directory  – for this example I’ll manually copy the log files into an “incoming” HDFS directory, whereas in real-life I’d connect Spark Streaming to Flume using FlumeUtils for a more direct-connection to activity on the webserver. 

val streamingContext = new StreamingContext(sc, SLIDE_INTERVAL)
val logLinesDStream = streamingContext.textFileStream("/user/oracle/rm_logs_incoming")

Then I call the Scala “map” transformation to convert the incoming DStream into an ApacheAccessLog-formatted DStream, and cache this new DStream in-memory. Next and as the final part of this stage, I call the Spark Streaming “window” function which packages the input data into in this case a 24-hour window of data, and creates a new Spark RDD every SLIDE_INTERVAL – in this case 1 minute – of time.

val accessLogsDStream = logLinesDStream.map(ApacheAccessLog.parseLogLine).cache()
val windowDStream = accessLogsDStream.window(WINDOW_LENGTH, SLIDE_INTERVAL)

Now that Spark Streaming is creating RDDs for me to represent all the log activity over my 24 hour period, I can use the .foreachRDD control structure to turn that RDD into its own data frame (using the schema I’ve inherited from the ApacheAccessLog Scala class earlier on), and filter out bot activity and references to internal WordPress pages so that I’m left with actual page accesses to then calculate the top ten list from.

windowDStream.foreachRDD(accessLogs => {
 if (accessLogs.count() == 0) {
 println("No logs received in this time interval")
 } else {
 accessLogs.toDF().registerTempTable("accessLogs")
// Filter out bots 
 val accessLogsFilteredDF = accessLogs
 .filter( r => ! r.agent.matches(".*(spider|robot|bot|slurp|bot|monitis|Baiduspider|AhrefsBot|EasouSpider|HTTrack|Uptime|FeedFetcher|dummy).*"))
 .filter( r => ! r.endpoint.matches(".*(wp-content|wp-admin|wp-includes|favicon.ico|xmlrpc.php|wp-comments-post.php).*")).toDF()
 .registerTempTable("accessLogsFiltered")

Then, I use Spark SQL’s ability to join tables created against the windowed log data and the Oracle reference data I brought in earlier, to create a parquet-formatted file containing the top-ten most popular pages over the past 24 hours. Parquet is the default storage format used by Spark SQL and is suited best to BI-style columnar queries, but I could use Avro, CSV or another file format If I brought the correct library imports in.

val topTenPostsLast24Hour = sqlContext.sql("SELECT p.POST_TITLE, p.POST_AUTHOR, COUNT(*) as total FROM accessLogsFiltered a JOIN posts p ON a.endpoint = p.POST_SLUG GROUP BY p.POST_TITLE, P.POST_AUTHOR ORDER BY total DESC LIMIT 10 ") 
 
 // Persist top ten table for this window to HDFS as parquet file
 
 topTenPostsLast24Hour.save("/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet", "parquet", SaveMode.Overwrite) 
 }
 })

Finally, the last piece of the code starts-off the data ingestion process and then continues until the process is interrupted or stopped.

streamingContext.start()
    streamingContext.awaitTermination()
  }
}

I can now go over to Hue and move some log files into the HDFS directory that the Spark application is running on, like this:

file_upload

Then, based on the SLIDE_INTERVAL I defined in the main Spark application earlier on (60 seconds, in my case) the Spark Streaming application picks up the new files and processes them, outputting the results as a Parquet file back on the HDFS filesystem (these two screenshots should display as animated GIFs)

spark_processing

So what to do with the top-ten pages parquet file that the Spark Streaming application creates? The most obvious thing to do would be to create an Impala table over it, using the schema metadata embedded into the parquet file, like this:

CREATE EXTERNAL TABLE rm_logs_24hr_top_ten
LIKE PARQUET '/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet/part-r-00001.parquet'
STORED AS PARQUET
LOCATION '/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet';

Then I can query the table using Hue again, or I can import the Impala table metadata into OBIEE and analyse it using Answers and Dashboards.

NewImage

So that’s a very basic example of Spark Streaming, and I’ll be building on this example over the new few weeks to add features such as persistent storing of all processed data, and classification and clustering the data using Spark MLlib. More importantly, copying files into HDFS for ingestion into Spark Streaming adds quite a lot of latency and it’d be better to connect Spark directly to the webserver using Flume or even better, Kafka – I’ll add examples showing these features in the next few posts in this series.

Categories: BI & Warehousing

Oracle Data Integrator to load Oracle BICS and Oracle Storage Cloud Service

Mon, 2015-07-27 08:54

It is known that Oracle focuses its sales on cloud this year and in the BI world, we have seen the introduction of Oracle Business Intelligence Cloud Service – BICS – that we already covered on the blog. Another great product to store files and unstructured data is Oracle Storage Cloud Service, that can also be used as a staging area before loading a BICS schema. To provision our BI system in the cloud or to backup our files, it’s important to find a reliable and maintainable way to move on-premise data to the cloud. And when we speak about reliable and maintainable way to move data, I can’t help but think about Oracle Data Integrator. Luckily, Ayush Ganeriwal wrote two excellent blog posts explaining how to use ODI to load these two cloud services! And he even gave away the Knowledge Modules, ODI Open Tools, Jar files and other things he developed to make it work.

Load data to Oracle Storage Cloud Service with a package.

Oracle Storage Cloud Service is used to store files and unstructured data in the cloud, very much like Amazon S3. In the case of a company that has moved to the cloud, they need some space to store their files that will be used by other cloud services. Oracle Storage Cloud Service is the place to store all these files. If you want to use it as a cloud backup of your local filesystem, you might even choose Oracle Storage Cloud Archive Service which has cheaper storage but cost more when retrieving or editing data, similarly to Amazon Glacier .

Oracle Storage Cloud Service offers a RESTful web service API to load data as well as a Java Library directly wrapping this API and supporting client-side encryption. When it comes to load data, what could be better than using the tool we already use for all the rest? In this blog post , Ayush details how he created an ODI OpenTool that can load data to the Oracle Storage Cloud Service using the Java Library.

We first need to download the Java Library and place it in our ODI Studio or Agent classpaths. Below are the default locations for these classpaths, but we can also specify additional classpath if we want. Don’t forget to close ODI Studio and shutdown the agent(s) before adding the files.

ODI Studio on Unix : ~/.odi/oracledi/userlib
ODI Studio on Windows : %APPDATA%\Roaming\odi\oracledi\userlib
12c Standalone Agent : &lt;ODI_AGENT_HOME&gt;/odi/agent/lib
11g Standalone Agent : &lt;ODI_AGENT_HOME&gt;/oracledi/agent/drivers/

For a JEE agent, we need to create a server template as described in the doc : https://docs.oracle.com/middleware/1212/odi/ODIDG/setup_topology.htm#CHDHGIGI.

In the same folder, we can also drop the Jar file for the Open Tool that Ayush created and posted on java.net (last link in the folder at the moment of writing). Remember that all the drivers, libraries and Open Tools have to be present in every ODI Studio or agent install.

We can now reopen ODI Studio as we need to register the new Open Tool in the master repository so everyone can use it. In the ODI Menu, click on Add Remove Open  Tools and then on the googles icon to search for the Open Tools. In our case, the class is under the package oracle.odi.rest.

Adding an ODI Open Tool

Select the OracleStorageCloudGet tool and click the Plus icon to add it. There we can see the description of the tool, the command, the parameters and the icon used. Let’s add the OracleStorageCloudPut as well and click OK.

Open Tool description

That’s it! We can now see two new ODI Open Tools under the Plugins category in the Package Toolbox. We can add it to the package and fill all the parameters or copy/paste the command to use it in an ODI Procedure.

Open Tools in a package

Load data to Oracle Business Intelligence Cloud Service with a mapping.

BICS offers a Data Loader which is actually an Apex application taking Excel, CSV, text or zip files as input. But this is more a manual process used when working with sample data or really small dataset. If we plan to use BICS in a production environment, we need something more resilient that can be scheduled and that handles errors. We could use Data Sync but maybe you don’t want to introduce a new tool. Luckily, BICS also provides a RESTful web service API we can use to load data. Again, ODI is the perfect tool to create our own reusable components to integrate with new sources or targets. And once again, Ayush did all the job for us as described in this article.

The first step is similar to what we did previously, we need make all the drivers and libraries available for ODI Studio and the agent. The steps in the two Knowledge Modules contains Jython code, calling some java methods from the odi-bics.jar file. These methods – just like the Oracle Storage Cloud Service java library – will use the Jersey Libraries to do the RESTful web service calls so we will need them as well. Basically, just take all the jar files from the archive you can find on java.net under the name “RKM and IKM for Oracle BI Cloud Service”. As mentioned above, we need to add it to the classpath of ODI Studio and the agent installs and restart them.

The article mentions that we need to create a new Technology for BI Cloud Service but an XML export is already created in the archive. Go in the Topology, right click on Technology and import the xml file.

Adding a Technology

We can also import the two Knowledge Modules, either globally or under a single project. Ayush wrote one RKM to reverse-engineer the tables from our BICS schema and one Multi-Connections IKM to integrate the data from the on-premise database to the BICS schema in the cloud. I had to tweak the IKM a little bit but it might be linked to my particular setup. The steps were not displayed in the order it was set in the XML file and I had to reorder it properly to place the drop before the create and truncate. Probably a bug, I will investigate that later.

Reordering the steps in the KM

We are done with the install of new components and we can now set up the topology. First create a Dataserver with the newly imported technology and provide a login/password and the URL to the BICS instance as the Service URL. From my testing and unlike Ayush’s screenshot, it should start with https and should not include the port number. Of course, we need to create a physical schema where we need to provide our Identity Domain in both Tenant fields. We can leave the Database fields empty. The last step to do in the Topology is to create a Logical schema and associate it to the Physical schema through a context.

topo_DS

We can then switch to the Designer tab and create a new Model with BI Cloud Service technology and that Logical Schema. On the Reverse Engineer tab we select Customized and the RKM BI Cloud Service. We can also specify a mask to restrict the metadata import to specific table names before hitting the Reverse Engineer button at the top. I set the mask to DIM_LOC% and only my DIM_LOCATION as been reverse engineered from the BICS schema.

Model and reverse engineering

Finally, we can create a mapping. In this example I load two target tables. DIM_LOCATIONS (plural) is sitting on an on-premise Oracle Database for users accessing OBIEE from our HQ while DIM_LOCATION (singular) is in the schema linked to the our BICS instance used by remote users. There is nothing special here, except that I unselected the insert and update checkboxes for my surrogate key (LOCATION_SK) because I asked BICS to automatically populate it when I created the table – under the scene, a sequence is created and a trigger added to the table to populate the field with the next value of the sequence on each insert.

ODI-BICS-mapping

On the Physical tab, the LKM should be set on LKM SQL Multi-connect because we will delegate the data transfer to the Multi-Connections IKM. This IKM is the one we imported : IKM SQL to BI Cloud Service. I choose to enable the truncate option here.

mapping_2DS

Let’s hit the execute button and watch the result in the operator and in our BICS schema.

bics_result_DS

The two articles demonstrate one more time how easy it is to plug a new technology within ODI. This tool is a wonderful framework that can be used to move data from and to any technologies, even if it’s Big Data or Cloud as we have seen here. By having every integration jobs happening in the same place, we can have an easier maintenance, a better monitoring and we can schedule all the jobs together. It’s also way easier to see the big picture in our projects.

If you have any question about this post feel free to reach me, we love that kind of interesting challenges at Rittman Mead.

Thanks a lot to Ayush for such good components and libraries he provided. I’m glad I will share the stage with him and a fellow ODI expert, Holger Friedrich, at the Oracle Open World. We will speak about Best Practices for Development Lifecycle Management . Come to see us and say hi!

Categories: BI & Warehousing

Taking a Closer Look at Knowledge Modules in ODI12c – Component-Style and Multi-Connect KMs

Sun, 2015-07-26 10:19

Another question that came up from the ODI12c Bootcamp Course I’m delivering for a client in London at the moment is how to choose between the different knowledge modules that come with ODI12c. What with the choice now between template-style KMs and the new component-style KMs, the new option of multi-connect KMs, and the general question around which KM you pick within a KM type when building a table mapping, I thought it’d be interesting to take a closer look at how knowledge modules work with ODI12c and how you go about making the right choice of KM when creating a mapping.

As a quick primer, Oracle Data Integrator up until the recent 12c release had six types of knowledge module you could use in a data mapping:

NewImage

They were:

  • Load Knowledge Modules, for loading source data out of the source database server and into a staging table typically on the target database platform
  • Integrate Knowledge Modules, for taking that staging data and integrating (inserting, updating, merging etc) it into the target table
  • Reverse Knowledge Modules, for reverse-engineering the table metadata from a source system
  • Check Knowledge Modules, for performing data quality checks on source and target tables
  • Journalise Knowledge Modules, for setting up change data capture on a source table or table set
  • Service Knowledge Modules, for exposing tables or other datastore as CRUD-type web services

Using ODI11g as an example, when you created a new mapping you selected an LKM for extracting data out of your source database, an IKM for integrating the results into the target table, and optionally a CKM or JKM if you needed to run data quality checks or use table journalization (CDC). In all cases you had to first import the knowledge module definitions into the ODI11g Work Repository and your project before you could use them. To take an example, an ODI11g mapping where the source was a file and the target, an Oracle database, might look like this as a Mapping diagram:

NewImage

Looking at the Flow diagram, at the start there are no knowledge modules to select from as none have yet been imported:

NewImage

I therefore import a selection of IKM, LKM and other KMs for the technologies I’m using, and then I’m able to assign an LKM and IKM to my flow diagram.

NewImage

For both Load Knowledge Modules (LKMs) and Integration Knowledge Modules (IKMs), you have a number of options ranging from generic JBDC/SQL-type modules that connect to a source and then transfer the data using JDBC batch routines, through to highly-specialized ones that leverage particular platform technologies. I typically start the prototyping phase of my ODI projects by selecting the simplest, most generic LKM and IKM I can find, and once I’ve got the mapping logic correct then shift to one that uses more of the underlying database’s features – the docs also have a nice guide for making your KM selection. For example, I might assign the LKM SQL to Oracle knowledge module to the source table and the IKM SQL Control Append one to the target, like this:

NewImage

In this case, ODI will first create a Java routine that extract via a JDBC connection the rows of data from the file, and then load that data into a staging table on the target database server. Then, that staging data will be integrated into the target table using a regular SQL INSERT statement.

Of course I could do this more efficiently using an Oracle External Table. Let’s select the LKM File to Oracle (EXTERNAL TABLE) load knowledge module and also change the IKM to IKM Oracle Incremental Update; note that this would only work if the target database server could see the file we’re loading in via this mechanism – if the file had to stay on a remote server then I’d have to stick with the IKM SQL Control Append and ODI would effectively ignore the request to use an Oracle External Table.

NewImage

With ODI12c, things are a bit different due to the introduction of another type of knowledge module: “Component-Style” Knowledge Modules. Component-style KMs were introduced for a few reasons with ODI12c; they made migration of mappings and projects from OWB easier as OWB mappings are made up of lots of arbitrarily-arranged mapping operator components that can be combined into all-types of data flow, and they also made it possible for Oracle to create lots more of these granular mapping components and use them across all technology types. For example as Oracle’s David Allen talks about in this comment on one of our previous blog posts and in this follow-up blog of his own, Oracle could create a generic Table Function component-style KM and have it apply a SQL table function for Oracle sources, or run a Pig relation through an arbitrary Pig Latin script as I did in a more recent blog post.

These new component-style KMs come built-in to ODI12c which means that you don’t actually have to import any template-style KMs in to get started with a mapping; in the example below from my blog post earlier today on ODI12c and Oracle Streams, I can run the mapping I’ve just created by just selecting from the built-in component-style KMs that ship with ODI12c out-of-the-box.

NewImage19

So should we now use component-style KMs when creating mappings and avoid using the old template-style ones? The docs don’t say this explicitly, but my impression is that component-style KMs are the way Oracle wants to take things forward and in most cases, ODI will automatically select suitable component-style LKMs and IKMs when you create the physical mapping and this is usually the best option; only time I switch to a template-style IKM or LKM is if I’m using a platform technology that component-style KMs don’t yet cover, or I’m working on some edge-case – by default though I go with component-style LKMs and IKMs. Of course you still need to import JKMs and other KM types and presumably Oracle will extend component KMs beyond IKMs and LKMs over time, but that’s my recommendation for now.

So now we’ve got what component-style KMs are, there’s another new KM concept that came along with ODI12c – “Multi-Connect” KMs. Multi-Connect KMs are a special type of template-style KM that allows the staging area to be on a separate data server to the target data warehouse, whereas most template-style KMs assume the staging and target data schemas are on the same data server. I used a multi-connect IKM File-Hive to Oracle knowledge module towards the end of another article where I used Oracle Loader for Hadoop to export data out of Hadoop and into an Oracle Database; normally when you use an IKM the staging and target areas are on the same database, but in this case the staging table for the mapping was on the Hadoop (Hive) side and I therefore had to select LKM SQL Multi-Connection as the load knowledge module, whereapon I could then select IKM File-Hive to Oracle (OLH/OSCH) as the integration knowledge module.

NewImage

Similarly, with our ODI12c mapping if I moved the staging area to the source database, or more commonly a separate “ETL-hub”-style database where the customer wants a more ETL (compared to ELT)-style integration setup, I can request that the staging location for the mapping moves to this hub database using a new feature in the logical mapping editor:

NewImage

Then in the mapping I can add some transformations that take place on the ETL hub database, something you’d most probably do to avoid licensing ODI on their full data warehouse database server.

NewImage

Then when I switch to the Physical mapping view I can see this additional execution unit with the transformations in it, I first select the LKM SQL Multi-Connect to bring the file data in, then I can select an LKM and IKM combination that supports multi-connect but uses an Oracle-specific technology (in this case, dblinks) to move the data from the ETL hub to the target database.

NewImage

Or, if I didn’t like using dblinks or the ETL hub was on a non-Oracle platform, I could use the component-style LKM SQL Multi-Connect LKM at the access point and then a mono-source component-style IKM to do the data integration – note however that whenever we use generic JDBC connections and batch-extraction to bring data across to the target platform the data flows through the agent, which won’t be as efficient as using the multi-connect template-style KM that transfers data using SQL*Net and dblinks.

NewImage

So – a few thoughts on the new knowledge module setup in ODI12c, and what these new component KMs and multi-connect KMs are actually for. You can find more articles on ODI12c and data integration over on our blog, if you’re interested in reading more about ODI12c development and internals.

Categories: BI & Warehousing

Using Streams with ODI12c for Oracle-to-Oracle Change Data Capture

Sun, 2015-07-26 05:10

Although Oracle GoldenGate replaced Oracle Streams a couple of years ago as the recommended data replication and change data capture technology for Oracle databases, many customers still on Oracle Database 11gR2 or earlier still use Streams for Oracle-to-Oracle change data capture as it works and compared to GoldenGate doesn’t require any additional licensing. Oracle’s GoldenGate Statement of Direction paper from 2014 states that streams in Oracle 11gR2 will continue to be supported but no future versions of the Oracle Database will come with Streams included, but if you’re on 11gR2 and you just want to trickle-feed capture between two Oracle databases it’s an interesting option.

I covered Oracle-to-Oracle data replication using Streams a few times in the past including this OTN article on ODI and Change Data Capture back from before 2007 or so, this article on OWB11gR2 and Change Data Capture from 2010 and once from back in 2006 that went into the details of setting up asynchronous hotlog change data capture with the new “Paris” OWB10gR2 release. We’re now on the 12c release of Oracle Data Integrator and I’m teaching our ODI12c Bootcamp course to a client next week who’s particularly interested in using Streams with ODI12c, so I thought it’d be worth taking a look at this feature in more detail to see if much has changed since the earlier articles.

Let’s start then with an ODI12c 12.1.3 install with a regular mapping set-up to copy and join the DEPT and EMP tables from one Oracle database into a denormalized table in another Oracle Database. Both are Oracle Database 11gR2 (11.2.0.3) and the initial mapping looks like this:

NewImage

One thing that many ODI developers don’t know about the 12c release is that it comes with a set of “component-style” knowledge modules built-into the tool, which you can use straightaway to get a mapping running without having to select and import IKMs, LKMs and other KMs from the ODI Studio filesystem. In my case the Physical mapping looks like the screenshot below with two execution units (one for each Oracle Database server) and a number of built-in component-style KMs available for selection. I choose the LKM SQL to SQL (Built-in) load knowledge module which uses a generic JDBC connection to load source records into the staging table on the target server, and then the IKM Oracle Insert integration knowledge module to take that staging data and integrate it into the target table.

NewImage

I then run this mapping and see that ODI extracted data from the source database using a Java routine and batch transfers into the Oracle staging table, and then integrated the contents of that staging table into the target Oracle table. I could of course improve this by using the LKM Oracle to Oracle (DBLink) knowledge module and thereby avoid loading in two steps, but what I’d instead like to do is use Oracle Streams to trickle-feed new and changed data from my source tables over to the target database server, as shown in the diagram below. 

 

Asynch

In the OWB and Asynchronous Change Data Capture article i linked to earlier in the post, setting up change data capture involved quite a few steps; the database had to be put into archivelog mode, the GLOBAL_NAMES parameter had to be set and a whole bunch of PL/SQL procedures had to be called to set up the source-to-target connection. Once it’s running, Streams takes transactions off of the redo log files on the source database and send them across the network to the target database server in a similar way to how GoldenGate sends transactions in the trail file across to target database servers – except it’s Oracle-to-Oracle only and in my experience is a lot more fragile than GoldenGate, which is why we and most other customers switched to GoldenGate when it came out.

ODI12c comes with a number of change data capture or “journalizing” knowledge modules that use either database triggers, UPDATE_DATE fields, Oracle Streams or GoldenGate to replicate data from source system to the target data warehouse. The journalizing knowledge module we’ll use, JKM Oracle 11g Consistent (Streams) is a template-style KM that needs to be imported first from the filesystem where ODI Studio was installed, as shown in the screenshot below – note also when you do this yourself that there’s a big “deprecated” notice next to it, saying that it could be removed at any time (presumably in-favour of the GoldenGate-based ones)

NewImage

ODI and the JKM Oracle 11g Consistent (Streams) KM takes a more “f*ck it, let’s just to it” approach to database configuration than my OWB10gR2 version did, automatically configuring the source database for Streams by running all of the setup PL/SQL routines leaving you just to put the database in archivelog mode if it’s not already, and granting the connecting user (in my case, SYSTEM) streams administrator privileges. Moving over to SQL*Plus on the source database I therefore run the setup commands listed in the KM notes like this:

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jul 26 06:34:44 2015

 

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

 

 

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

 

SQL> grant dba to system;

 

Grant succeeded.

 

SQL> begin

  2  dbms_streams_auth.grant_admin_privilege(

  3       grantee   => ‘system’,

  4       grant_privileges => true);

  5  end;

  6  /

 

PL/SQL procedure successfully completed.

 

SQL> shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup mount;

ORACLE instance started.

 

Total System Global Area 2471931904 bytes

Fixed Size    2230872 bytes

Variable Size  570426792 bytes

Database Buffers1895825408 bytes

Redo Buffers    3448832 bytes

Database mounted.

SQL> alter database archivelog;

 

Database altered.

 

SQL> alter database open;

 

Database altered.

Next I’ll go back to ODI Studio and enable the source Model for journalising by double-clicking on the model in ODI Studio and then selecting the JKM from the Journalizing tab, like this:

NewImage

I then right-click on the EMP and DEPT tables within the source model and select Changed Data Capture > Add to CDC, where I can fine-tune the replication order so that new departments (DEPTNO) employees link to will always have been created before the employee record hits.

NewImage

It’s at this next stage, when I enable journalizing, that Streams is set-up on the source and target database servers and all the supporting tables and views are created. To enable journalizing I click on the model, not the individual tables, and select Changed Data Capture > Start Journal, like this:

NewImage

If you’ve read any of our previous posts on ODI and changed data capture you’ll realise that this setup process is that same regardless of the underlying replication technology, which makes it easy to start with database-centric CDC technologies such as this and then move to GoldenGate later on without lots of rework or re-training. For now though lets run this setup process using the local agent and then check the Operator navigator to see what it did (and whether it worked…)

NewImage

And it did work. Enabling journalising with the Oracle Streams JKM involves quite a few setup steps including checking that all the database settings and parameters are enabled correctly, then running the various DBMS_STREAMS and other packages to setup the capture and transmission process. Then, as with all of the ODI JKMs a set of J$ tables are created to hold the primary keys of new and changed records coming from the source system, along with JV$ views that join those primary keys to the full incoming replicated rows – this blog post by the Oracle ODI team explains the background to JKMs very well if you want to understand them in more detail. Looking at the source SCOTT schema in SQL*Developer I can see the CDC and J$/JV$ tables and views created in the schema; if I didn’t want these tables created in the actual data schema I could have specified a different schema as the WORK schema when I created the database connection in ODI Studio prior to this exercise.

NewImage

Next I have to define one or more “subscribers’ to the journals; for these more advanced “consistent set JKMs” of which the Oracle Streams one is one, you can define multiple consumers or “subscribers” to the changed data so that one can be further down the queue than the other (Simple JKMs only allow a single subscriber). I call my subscriber “SUNOPSIS” as this is the default subscriber name ODI adds to the mappings downstream. 

NewImage

Pressing OK after adding the subscriber again brings up the prompt to select an agent, and going over to the Operator navigator I can see that another set of steps have run, again doing some streams setup but also adding details of the journal subscribers to the tables created on the source database.

NewImage

I can check that journalising is now working by using the View Data… feature in the Designer navigator to insert new rows into the EMP and DEPT tables, and then checking the J$ tables in the source database schema to see if the rows’ primary keys have been added – which they have been.

NewImage

To now just read journalised new and changed data into my mapping, all I do then is go to the Physical mapping diagram, select the first source table and check the Use Journalized Data Only checkbox, then do the same for the other table (note it is the table source you select, not the access point for that table into the target execution block).

NewImage

So now I’ll run the mapping and check the results in the Operator navigator … but instead, I get an error:

NewImage

This is because, given the way that ODI handles CDC and journalising, it can’t allow two journalised tables to be directly joined in a mapping – we think this is because ODI can’t guarantee both tables are in the same update “state” and therefore makes you copy their data into a staging or intermediary table before you can do the join. I therefore amend the mapping to load the journalised tables into staging tables on the target database server, amend the joins and filter to reference the staging tables, and then join their contents to then filter and load into the target reporting table, like this:

NewImage

With the Physical mapping details, the two incoming Oracle staging tables are loaded by the same LKM SQL to SQL component-style mapping as we used before to extract data from the two journalised tables, and the Journalized Data Only flag is still set for the source tables, as you can see below.

NewImage

What this also highlights is a key difference between the way I trickle-fed transactions across from my source database back with the OWB and Changed Data Capture article back at the start of this post, and the way ODI’s JKMs do it; in the OWB example I set up an actual trickle-feed process outside of OWB which transferred changed data across the network to my target data warehouse, whereapon I then read those change tables and used them to update the target DW tables in my data warehouse.

And this is actually how the GoldenGate KMs work with ODI – a GoldenGate replication process copies new and changed data from the source database to the target and ODI then reads from these change tables, whereas the Streams (and other non-GoldenGate) JKMs create the change capture tables back on the source database (the various J$ and JV$ tables I reviewed using SQL*Developer earlier on), with ODI then reading from those remote change tables and bringing across the new and changed data to the target database server. This I guess makes things easier to set up – you don’t have to worry about configuring the target database for streams replication – but it does mean that you still incur the network traffic every-time you micro-batch the changes across the network rather than spreading that traffic transaction-by-transaction.

Anyway, back to ODI and this time, when I run the mapping it works, though looking at the Operator navigator again I can see that no new data came across, and the journal data is still waiting to be consumed afterwards. Why is this?

NewImage

If you’ve only used the Simple CDC JKMs from Oracle before this is normally all you need to do, but with Consistent Set ones such as this one, or the GoldenGate JKMs, you need to lock the subscriber view of the journalised data and extend the CDC window before you can access the journal records; for the Simple JKMs the IKM (Integration Knowledge Module) takes care of the unlock, extend, purge and lock operations for you automatically in the background, whereas with Consistent Set ones you typically do this as part of a wider ODI package as shown in the screenshot below, with the first and last tasks created by dragging and dropping the journalized model onto the package canvas and selecting Journalizing Model as the type (the subscriber name, “SUNOPSIS” in this case is typed-in below those settings and is off-screen in the screenshot)

NewImage

Now when I run this package, as opposed to the mapping on its own, I get the row of new data I was expecting and the journal table is now empty, as I was the only subscriber.

NewImage

Finally, if I was looking for real-time continuous loading into this target table, I could wrap the package in an event-detection loop that waits for in this case ten seconds for three journal rows to be written, then either processes the three as soon as the third arrives or loops around again every ten seconds (obviously in-reality you’d want to put in a mechanism to halt the loop if needed, but in my case I’ll just kill the job from the Operator navigator when I want it to stop)

NewImage

So that’s the basics of using ODI with Oracle Streams for Oracle-to-Oracle changed data capture; if you’re just copying data between two Oracle databases and you’re on 11gR2 or earlier this might be an option, but long-term you’ll need to think about GoldenGate as Oracle aren’t developing Streams beyond the 11gR2 release. Note also that all the streams activity happens over on the source database server so you still need this additional step to copy the journaled data across to the target data warehouse, but it’s still a fairly non-invasive way to capture changes on the source Oracle database and it does have the considerable advantage (compared to GoldenGate) of being free-to-use.

Categories: BI & Warehousing

Security patches released for OBIEE 11.1.1.7/11.1.1.9, and ODI DQ 11.1.1.3

Wed, 2015-07-15 00:14

Oracle issued their quarterly Critical Patch Update yesterday, and with it notice of several security issues of note:

  • The most serious for OBIEE (CVE-2013-2186) rates 7.5 (out of 10) on the CVSS scale, affecting the OBIEE Security Platform on both 11.1.1.7 and 11.1.1.9. The access vector is by the network, there’s no authentication required, and it can partially affect confidentiality, integrity, and availability.
    • The patch for users of OBIEE 11.1.1.7 is to install the latest patchset, 11.1.1.7.150714 (3GB, released – by no coincidence I’m sure – just yesterday too).
    • For OBIEE 11.1.1.9 there is a small patch (64Kb), number 21235195.
  • There’s also an issue affecting BI Mobile on the iPad prior to 11.1.1.7, the impact being partial impact on integrity.
  • For users of ODI DQ 11.1.1.3 there’s a whole slew of issues, fixed in CPU patch 21418574.
  • Exalytics users who are on ILOM versions earlier that 3.2.6 are also affected by two issues (one of which is 10/10 on the CVSS scale)

The CPU document also notes that it is the final patch date for 10.1.3.4.2. If you are still on 10g, now really is the time to upgrade!

Full details of the issues can be found in Critical Patch Update document, and information about patches on My Oracle Support, DocID 2005667.1.

Categories: BI & Warehousing

Options for Enabling ODI11g+12c Standalone Agents for High-Availability (or … Why JEE Agents are the Best Option)

Sat, 2015-07-11 08:31

A few years ago we posted some articles on the blog around Oracle Data Integrator ETL restartability and resilience, and making ODI’s agents highly-availabile. The context around those posts was a large data integration project we were doing for a client who was deploying Oracle Fusion Middleware across the entire enterprise, and they wanted us to use the new ability within ODI11g to deploy agents within WebLogic Server Managed Servers to provide true high-availability for their ETL platform; ODI11g in this topology uses Oracle Coherence to provide a middleware-based data grid that holds a shared record of the agent schedule so that another agent can pick-up and resume tasks in the schedule if the main agent falls-over mid-way through execution. The diagram below shows this topology, which was then brought-forward into ODI12c and can be used now, through a feature called “JEE Agents”.

NewImage

And this is our standard recommendation to clients who want to make their ODI agents as highly-available as possible, so that there’s always an active agent ready to receive execution or schedule requests, and schedules keep running even if the agent that’s running it falls-down mid-way through execution. But the JEE Agent approach has some significant prerequisites:

  • You need to deploy them within Oracle WebLogic Server Enterprise Edition, with at least two machines in a cluster within the WebLogic domain, which can be expensive for clients without the required WebLogic Server licenses or who haven’t got suitable hardware available
  • You also need Oracle Coherence for the shared record of ODI schedules that each JEE agent can get access to, and the client may not have a license for Coherence in their current WebLogic/Fusion Middleware license deal
  • The customer might also, understandably, what to properly understand what they can do with just Standalone Agents before they shell-out for WebLogic Server Enterprise Edition, Coherence and a bunch of new servers

For anyone new to Oracle Data Integrator, agents are Java processes that either sit within WebLogic Server clusters or directly on source and target servers, and take the instructions within your ODI mappings, packages and loan plans and execute them as SQL instructions sent to databases, web service calls or whatever other interactions your ETL processes require. Agents can either receive instructions ad-hoc from the ODI Studio application or via command-line or web service calls, or they can be assigned on or more schedules that they then execute at the appropriate time to move data into your data warehouse or between various applications. ODI11g has two types of agents; JEE agents as mentioned earlier, and what are called Standalone Agents, agent that run in a Java JVM and are started from the command-line, that don’t require WebLogic Server and are mainly used because of their small footprint – you an run them directly on the database server that’ll be performing the Load and Transform part of an ELT job, for example, but they don’t have the HA capabilities of agents running in WebLogic Server and they don’t have access to all the “enterprise” Java features provided to agents running within WebLogic Server.

Borkur and I worked with just such a customer recently and it was a useful opportunity to dig into exactly how Standalone Agents handle failover, and how close you can get to JEE agents if you’d like some degree of high-availability for your ODI setup. As this was a new implementation of ODI for the customer and they were currently using OBIEE11g, their first though was to deploy ODI11g until such time as they could upgrade their entire Fusion Middleware stack to 12c; as we were only talking about Standalone Agents though my advice was that there was no harm in deploying ODI12c instead of the 11g version as it didn’t really involve any WebLogic integration, and they could then benefit from the improved developer features in ODI12c such as flow-based mappings and deployment specifications.

High-Availability for ODI11g Standalone Agents – OPMN and “Cross-Wiring” to Create a Single Physical/Logical Standalone Agent

So let’s start with Standalone Agents within an ODI11g deployment. In this instance, the customer had two datacenters and wanted to put an agent in each to provide some degree of HA and disaster recovery, with one agent at any one time being the “primary” agent that received job execution requests and was assigned the scheduled daily load. In this instance we were able to also deploy OPMN (Oracle Process Manager and Notification Server), installed as part of the Oracle Web Tier Utilities download and the recommend way to provide some degree of high-availability for ODI11g Standalone Agents.

What we thought would be interesting here though would be to try and follow John Goodwin’s instructions on cross-wiring two Standalone Agent + OPMN installs to provide active/passive failover across two server nodes. In John’s case he’s talking about agents running Hyperion data loading jobs but it’s just as applicable to our situation, so we went with the topology below to try-out active/passive failover for our ODI11g Standalone Agents across two nodes.

NewImage

To start with then, each Standalone Agent is now registered with their local OPMN instance so that opmntcl can now be used to stop, start and restart them. In our setup, the primary ODI node (odi11g-node1.rittmandev.com) currently has the active Standalone Agent instance running, as I can see by using the opmnctl command-line utility:

[oracle@odi11g-node1 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   21891 | Alive   
ohs1                             | OHS                |     N/A | Down

 If I switch over to the second node (odi11g-node2.rittmandev.com) I’m currently leaving that agent as disabled, as it’s the “passive” part of my active/passive arrangement.

[oracle@odi11g-node2 ~]$ cd /home/oracle/Middleware/Oracle_WT1/instances/instance1/bin
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |     N/A | Down    
ohs1                             | OHS                |     N/A | Down

The clever part now, and as detailed in John Goodwin’s article, is registering each OPMN server with the other server in the active/passive setup, so that if one goes down, OPMN brings up the other one. This method doesn’t seem to be documented in the ODI11g documentation and later on I’ll explain why I think this is, but to “cross-wire” the OPMN servers in this way you have to make a few manual changes to each OPMN server’s opmn.xml configuration file, in my case located at /home/oracle/Middleware/Oracle_WT1/instances/instance1/config/OPMN/opmn/opmn.xml:

NewImage

1. First you create a new <topology></topology> section within the file that lists out the hosts that have OPMN installed that you want to include in the failover cluster (use the remote 6701 port rather than the local 6700 port for this)

2. Next you add the service-failover=“1” and service-weight settings to the <process-type></process-type> section for the agent you want to

[oracle@odi11g-node1 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   22064 | Alive   
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node1 bin]$ kill -9 22064
[oracle@odi11g-node1 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   22147 | Init    
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node1 bin]$ kill -9 22147
[oracle@odi11g-node1 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |     N/A | Down    
ohs1                             | OHS                |     N/A | Down

and then moving over to the secondary’s node OPMN instance, you can see that it’s agent has now changed from disabled, to running:

[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |     N/A | Down    
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10263 | Alive   
ohs1                             | OHS                |     N/A | Down

… which is pretty cool. But there’s a couple of drawbacks with this approach. The first is that whilst OPMN can move the active agent between two or more server nodes, it doesn’t update the agent’s hostname in the ODI Master Repository, which means you either need to hack around with the ODI Master Repository tables to update its record of the hostname each time the agent fails-over, or you do what I did and create a CNAME or VIP entry in the DNS server’s records so that the agent is reached by a virtual hostname, in our case odi11agent.rittmandev.com

NewImage

The problem though, is that you need a means of updating this CNAME or VIP record when OPMN fails-over the agent to the secondary node, otherwise any further requests to this agent through ODI Studio will direct to the failed node, not the one that’s now running the active Standalone agent. In-practice you could set-up something that pings both agents and updates the CNAME record in your DNS zone when the primary agent moves, but this involves working with your organization’s network or sysadmin team which can sometimes be a challenge. 

The other issue is what happens when an ad-hoc or scheduled job is running when the active agent goes down. Let’s check this now by creating a Load Plan and then running it through what’s currently the active agent:

NewImage

I’ll now kill the process, and let OPMN restart it on that server node:

[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10263 | Alive   
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ kill -9 10263
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10452 | Alive   
ohs1                             | OHS                |     N/A | Down

Notice how the pid (process ID) has now changed for the agent, which indicates that OPMN has restarted it on that node. Going back to ODI Studio, after a while I can see that the agent on restart has marked the load plan execution as having failed:

NewImage

If I restart the load plan execution though, a new load plan execution instance is created which automatically skips the steps that ran successfully before:

NewImage

If I kill the agent’s process three times in-a-row though, like this:

[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10452 | Alive   
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ kill -9 10452
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10557 | Init    
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ kill -9 10557
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10602 | Init    
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ kill -9 10602
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |     N/A | Down    
ohs1                             | OHS                |     N/A | Down

the other agent will become the active agent, but it may take some time before it sync’s up correctly with the Master Repository because the CNAME created to point to the active node might take a while to change or get updated. In my case, the load plan continues to display as “running” for a couple of minutes more, until it’s marked as failed.

NewImage

What it is good to see, though, is that this agent that OPMN has failed-over too still has the schedule assigned to it; as far as the Master Repository is concerned it’s still the same logical/physical agent, even though OPMN has moved it to a new node – this is because we access it via the special odi11agent.rittmandev.com virtual host name.

NewImage

So using OPMN and this active/passive failover-setup has the advantage of making sure there’s always an agent ready to receive ad-hoc job requests or run jobs on the schedule, but it requires you to maintain a CNAME or Virtual IP for the agent and restart failed mappings to make it work. So how does this all look with ODI12c?

High-Availability for ODI12c Standalone Agents – Node Manager and an Active/Active Setup

ODI12c has the same Standalone Agents and JEE Agents that ODI11g does, but introduces a third type of agent, Co-Located Agents, that confuses things a bit more. Moreover, ODI12c Standalone Agents are managed by Node Manager not OPMN, which has been removed from the overall Fusion Middleware 12c architecture, with Node Manager in this case running within a lightweight, pseudo-domain especially for ODI12’c Standalone Agent in this case, and with the new Co-Located Agents running within a full WebLogic domain but outside of a WebLogic Server Managed Server.

So what this means in-practice is that each node’s Standalone Agent with an ODI12c install is managed by Node Manager rather than OPMN, but as Node Manager outside of a full WebLogic domain can’t link-up with another Node Manager for an active/passive high-availability setup, each Standalone Agent has to be  active all the time (or as much as possible) in what in this case is now an active/active HA setup.

NewImage

Within ODI Studio and the Master Repository topology then, each agent is registered separately and it’s the choice of the operator which agent jobs are sent to, and which agent runs the daily ETL schedule. You could of course set these agents up in a load-balancing cluster with a parent or master agent accepting all the job execution requests, but then you’d need to make that master agent highly-avaialble, where would it go, and you’re back to the same issue again.

NewImage

Each agent in this instance can either be started via the agent.sh utility, or via Node Manager like this (edited for brevity):

[oracle@odi12c-node1 bin]$ ./startComponent.sh ODI12_Primary_Standalone
Starting system Component ODI12_Primary_Standalone ...
 
Initializing WebLogic Scripting Tool (WLST) ...
 
...
Welcome to WebLogic Server Administration Scripting Shell
 
Type help() for help on available commands
 
Reading domain from /home/oracle/odi12c/user_projects/domains/base_domain
 
 
Please enter Node Manager password:
Connecting to Node Manager ...
Successfully Connected to Node Manager.
Starting server ODI12_Primary_Standalone ...
Successfully started server ODI12_Primary_Standalone ...
Successfully disconnected from Node Manager.
 
 
Exiting WebLogic Scripting Tool.
 
Done

By default, like ODI11g Standalone Agents and OPMN, Node Manager won’t automatically restart an agent under it’s control if it falls-over. If, however, you set the CrashRecoveryEnabled setting to “true” in the /home/oracle/odi12c/user_projects/domains/base_domain/nodemanager/nodemanager.properties file as shown in the screenshot below, ODI12c Standalone Agents that were started by NodeManager will get auto-restarted if the process fails.

NewImage

As you can see in the Terminal session screenshot below, Node Manager picks up the fact that the ODI12c Standalone Agent process is started is no longer running (by checking running processes against lock files the system component created on startup), then automatically restarts it so that it’s running again at the end.

NewImage

As with ODI11g’s Standalone Agents and OPMN, load plans that are running when an ODI12c Standalone Agent fails then fail themselves, as shown in the screenshot below:

NewImage

You can in-fact restart failed load plans with either active agent – in this case I’m restarting it with the secondary Standalone Agent, simulating a complete failure of the first node rather than a process failure that Node Manager could automatically deal with.

NewImage

Where things do get complicated with this active/active setup is when the agent you’ve assigned the schedule to fails; in this case, unlike ODI11g where there was just a single logical agent, in this instance the second agent is completely independent of the first, and will need to have the schedule assigned to it if the primary agent fails; the primary agent, once the server node is running again, will need to have the schedule taken away from it otherwise you’ll have two agents acting as the scheduler each one running it’s own copy of the daily data load. One way around this would be to have the scheduling done by a third-party scheduler, which would either access the two active/active agents through a load-balancer and round-robin approach, or try and use one agent where possible falling over to the other one.

Conclusions – And why JEE Agents, WebLogic and Coherence Aren’t Such Overkill, After All…

Once you’ve been through all of this and looked at the limitations of a Standalone Agent-only setup, you can sort-of see why Oracle went down the WebLogic and Coherence route for their fully HA-capable agent topology; Node Manager can work across multiple servers when you run it within a WebLogic Server domain, and using Coherence to store a local (middleware-level) copy of the schedule means that all of the agents can keep track of what’s been executed and where they are in the schedule. Coupled with a load balancer and a Virtual IP that provides a consistent address for all of the JEE agents in the cluster, it’s a bit more engineering but you can see why it was needed; in the meantime, the customer went with 12c in the end and put in-place manual processes to deal with Standalone Agent failure in this active/active setup, which should be enough to tide-them over until the next budget round when they can put additional hardware and the JEE Agent setup into place.

Categories: BI & Warehousing

Essbase Studio checkModelInSyncExternal Error

Wed, 2015-07-08 02:12

This week I am back at one of my favourite clients to assist with some issues since the latest upgrade to 11.1.1.9. These guys are seriously technical and know their stuff, so when issues pop up I’m always ready for an interesting and challenging day in the office.

As with all the recent 11g versions, they had to decide between in-place and out-of-place upgrades, and here they opted for an in-place upgrade because they have a fairly settled configuration using Answers, Publisher and Essbase. They were trying to avoid the need to reconfigure components like:

  • SQL Group Providers
  • Customisations
  • Application Roles used in OBIEE and Essbase Filters

Plus, when reconfiguring the above you also run the risk of missing something and it could take a long time to track down where it went wrong.

The Problem

In the end this was not a very complicated problem or solution but seeing as we couldn’t find anything on the web or Oracle Support regarding this error, I thought it might be useful to share in case others run into this same issue.

After performing the upgrade from OBIEE 11.1.1.7.140114 to 11.1.1.9 the client was unable to view or edit the Essbase Model properties in Essbase Studio. In fact, they couldn’t even see their Essbase Model at all. Only the Cube Schema was visible.

cube_schema_only1

When we tried to select or edit the Schema the following message appeared:

Error in the external procedure 'checkModelInSyncExternal'. Line = 136.

checkinsyncerror1

Oracle Support Came To The Party

After trying several different options to fix it, none of which made any difference, the client raised a P1 query with Oracle Support. After going through the initial standard questions and a few messages between the client and Oracle Support, they came back with a solution. All of this within 24 hours…

The Reason

After applying the patch the catalog version is not synchronised with the new version of Essbase Studio Server.

The Solution

Even thought we couldn’t find any reference to this in the post-patching section of the documentation, when you read the Oracle Essbase Studio 11.1.2.4.000 Readme there is a section in there describing the problem and solution.

To fix the problem we followed these simple steps

  1. Navigate to
    <ORACLE_BI_HOME>\products\Essbase\EssbaseStudio\Server\scripts.template\
  2. Copy startCommandLineClient.bat.template to
    <ORACLE_BI_HOME>\products\Essbase\EssbaseStudio\Server\
  3. Rename the new startCommandLineClient.bat.template to startCommandLineClient.bat
  4. Edit startCommandLineClient.bat so it looks like this
    NOTE – Update the paths according to your environment AND use full paths
    @echo off
    
    set JAVA_HOME=C:\oracle\Oracle_BI1\jdk\jre
    "%JAVA_HOME%\bin\java" -Xmx128m %JAVA_OPTIONS% -jar "C:\oracle\Oracle_BI1\products\Essbase\EssbaseStudio\Server\client.jar"
    
    if %errorlevel% == 0 goto finish
    
    echo .
    echo Press any key to finish the client session
    pause > null
    
    : finish
  5. Open a CMD window and start startCommandLineClient.bat
  6. Enter the requested information (Host, Username and Password)
  7. Enter the following command and wait for it to complete
    reinit
  8. Type Exit to leave the script and close the CMD window

You should now be able to open the Model Properties in Essbase Studio without any issues.

cube_schema_and_essb_props1

Categories: BI & Warehousing

How Engaged Are Your OBIEE Users?

Thu, 2015-06-25 10:58

Following on from Jon’s blog post “User Engagement: Why does it matter?”, I would like to take this one step further by talking about measurement. At Rittman Mead we believe that if you can’t measure it, you can’t improve it. So how do you measure user engagement?

Metrics

User engagement for OBIEE is like most web based products or services:

  • both have users who access the product or service and then take actions.
  • users of both use it repeatedly if they get value from those actions.

A lot of thought has gone into measuring the customer experience and engagement for web based products and services. Borrowing some of these concepts will help us understand how to measure user engagement for BI solutions.

We look at three metrics:

  • Frequency of use
  • Recency of use
  • Reach of the system
Usage Tracking Data

OBIEE offers visibility of what its users are doing through its Usage Tracking feature, we can use this to drive our metrics.

Figure 1

UT UE LDM

As we can see from Figure 1, the usage tracking data can support our three metrics.

Frequency of use
  • Number of times a user or group of users visit in a specific period (Day / Month / Year)
  • Number of times a dashboard / report is accessed in a specific period.
  • How are these measures changing over time?
Recency of use
  • How recently was a report / dashboard used by relevant user groups?
  • What are the average days between use of each report / dashboard by relevant use group?
  • Number of dashboards / reports used or not used in a specific period (Day / Month / Year)
  • Number of users that have used or not used OBIEE in a specific period (Day / Month / Year)
  • How are these changing over time?
Reach of the system
  • Overall number of users that have used or not used OBIEE. This can be further broken down by user groups.
  • How is it changing over time?
User engagement KPI perspective

We have compared BI solutions to web-based products and services earlier in this post. Let’s look at some popular KPIs that many web-based products use to measure engagement and how they can be used to measure OBIEE engagement.

  • Stickiness: Generally defined as the amount of time spent at a site over a given period.
  • Daily Active Users (DAU): Number of unique users active in a day
  • Monthly Active Users (MAU): Number if unique users active in a month.

DAU and MAU are also used as a ratio (DAU / MAU) to give an approximation of utility.

The R&D division of Rittman Mead has developed the Rittman Mead User Engagement Toolkit, a set of tools and reports to capture and visualise user engagement metrics. The example charts given below have been developed using the R programming language.

Figure 2 – DAU over time with a trailing 30-day average (Red line)

 MAU trailing 30 day average V0.3

Figure 3 – Forecast DAU/MAU for 30 days after the data was generated

MAU

What Can You Do With These Insights?

Recall that Jon’s blog post points out the folowing drivers of user engagement:

  • User interface and user experience
  • Quality, relevance, and confidence in data
  • Performance
  • Ability to use the system
  • Accessibility – is the system available in the right way, at the right time?

There are several actions you can take to influence the drivers as a result of monitoring the aforementioned metrics.

  • Identify users or groups that are not using the system as much as they used to. Understand their concerns and address the user engagement drivers that are causing this.
  • Verify usage of any significant enhancement to the BI solution over time.
  • Analyse one of the key drivers, performance, from usage data.
  • Determine peak usage to project future hardware needs.
Conclusion

User engagement is the best way users can get value from their OBIEE systems. Measuring user engagement on an ongoing basis is important and can be monitored with the use of some standard metrics and KPIs.

Future blog posts in this series will address some of the key drivers behind user engagement in addition to providing an overview of the Rittman Mead User Engagement Toolkit.

If you are interested in hearing more about User Engagement please sign up to our mailing list below.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Other posts in this series
Categories: BI & Warehousing

Rittman Mead at ODTUG KScope’15, Hollywood Florida

Fri, 2015-06-19 05:50

NewImage

ODTUG KScope’15 is running in Hollywood, Florida next week and Rittman Mead are running a number of sessions during the week on OBIEE, Essbase, ODI and Big Data. I’ve personally been attending ODTUG KScope (or “Kaleidoscope”, as it used to be known) for many years now and it’s the best developer-centric conference we go to, coupled with amazing venues and a great community atmosphere.

Sessions we’re running over the week include:

  • Gianni Ceresa : 2-in-1: RPD Magic and Hyperion Planning “Adapter”
  • Jerome : Manage Your Oracle Data Integrator Development Lifecycle
  • Michael Rainey : Practical Tips for Oracle Business Intelligence Applications 11g Implementations
  • Michael Rainey : GoldenGate and Oracle Data Integrator: A Perfect Match
  • Mark Rittman : Bringing Oracle Big Data SQL to OBIEE and ODI
  • Mark Rittman : End-to-End Hadoop Development Using OBIEE, ODI, and Oracle Big Data
  • Mark Rittman : Thursday Deep Dive – Business Intelligence: Bringing Oracle Tools to Big Data
  • Andy Rocha & Pete Tamisin : OBIEE Can Help You Achieve Your GOOOOOOOOOALS!

We’ll also be taking part in various “Lunch and Learn” sessions, community and ACE/ACE Director events, and you can also talk to us about our new OBIEE “User Engagement” initiative and how you can get involved as an early adopter. Details and agenda for KScope’15 can be found on the event website, and if you’re coming we’ll look forward to seeing you in sunny Hollywood, Florida!

Categories: BI & Warehousing

Replicating Hive Data Into Oracle BI Cloud Service for Visual Analyzer using BICS Data Sync

Thu, 2015-06-18 15:26

In yesterday’s post on using Oracle Big Data Discovery with Oracle Visual Analyzer in Oracle BI Cloud Service, I said mid-way through the article that I had to copy the Hadoop data into BI Cloud Service so that Visual Analyzer could use it; at present Oracle Visual Analyzer is only available as part of Oracle BI Cloud Service (BICS) so at some point the data prepared by Big Data Discovery had to be moved into BICS so that Visual Analyzer (VA) could access it. In the future once Visual Analyzer is available on-premise as part of Oracle Business Intelligence 12c we’ll be able to connect the on-premise RPD directly to Hadoop via the Cloudera Impala ODBC driver, but for now to get this early access to VA features we’re going to have to copy the data up to BICS and report on it from there. So how does this work?

With this second release of BICS there are actually a number of ways to get on-premise data up into BICS’s accompanying database service:

  • As before, you can export data as CSV or an Excel Spreadsheet, and upload it manually into BICS using the Data Load feature (the approach I took in this recent Oracle Magazine article)
  • You can use SQL*Developer to SFTP “carts” of Oracle database data up into BICS, where it’ll then be unpacked and used to create Oracle tables in the accompanying database
  • You can now also connect BICS to the full Oracle Database-as-a-Service, a full database rather than a single schema that also provides a SQL*Net connection that ETL tools can connect to, for example ODI or Informatica
  • And there’s now a new utility called “Data Sync” that we’ll use in this example, to replicate tables or files up into BICS’s database store with options for incremental refresh, drop-and-reload and so forth

In our case the situation is a bit more complicated in that our data sits in a Hadoop cluster, as Hive tables that we’re accessing through the Cloudera Impala MPP engine. OBIEE 11.1.1.9 can actually connect directly to Impala and if we were just using Answers and Dashboards we wouldn’t have any more work to do, but as we’re using VA through BICS and BICS can’t access on-premise data sources, we need some way of copying the data up into BICS so VA can access it. Again, there’s many ways you can get data out of Hive on Hadoop and into databases and files, but the approach I took is this:

  1. First export each of the Hive tables I accessed through the on-premise RPD into CSV files, in my case using the Hue web-based user interface in CDH5
  2. Then use the Data Sync to upload the contents of those CSV files to BICS’s database store, selecting the correct Oracle datatypes for each of the columns
  3. Do any modeling on those tables to add any sequences or keys that I’m going to need when working with BICS’s more simplistic RPD modeller
  4. Then create a replica (or as close to replica) RPD model in BICS to support the work I’m going to want to do with VA

Again, there are also other ways to do this – another option is to just lift-and-shift the current 11.1.1.9 RPD up into BICS, and replicate the Hive/CSV data into Oracle Database-as-a-Service and then repoint the uploaded RPD to this service, but I haven’t got a DBaaS instance to-hand and I think it’d be more useful to replicate using BICS and recreate the RPD manually – as that’s what most customers using BICS will end-up doing. So the first step then is to export the Hive data out into CSV files using Hue, by first running a SELECT * FROM … for each table, then using the menu option to export the query results to a CSV file on my workstation.

NewImage

Then it’s a case of setting up BICS Data Sync to first connect to my BICS cloud instance, and then selecting one-by-one the CSV files that I’ll be uploading into BICS via this tool.

NewImage

Of course anyone who’s been around Oracle BI for a while will recognise Data Sync as being built on the DAC, the ETL orchestration tool that came with the 7.9.x releases of BI Apps and worked in-conjunction with Informatica PowerCenter to load data into the BI Apps data warehouse. The DAC is actually a pretty flexible tool (disclaimer – I know the development PMs at Redwood Shores and think they’re a pretty cool bunch of people) and more recently it gained the ability to replicate BI Apps DW data into TimesTen for use with Exalytics, so it’s pluggable architecture and active development team meant it provided a useful platform to deliver something in-between BICS’s ApEX data uploader and fully-fledged ODI loading into Oracle DBaaS. The downside of using something built on the DAC is that the DAC had some UI “quirks”, but equally the upside is that if you know the DAC, you can pretty much pick up Data Sync and guess how it works.

As part of uploading each CSV file, I also get to sample the file contents and confirm the datatype choices that Data Sync has suggested; these can of course be amended, and if I’m bringing in data from Oracle, for example, I wouldn’t need to go through such an involved process. 

NewImage

Then it’s a case of uploading the data. In my case one of the tables uploaded OK first time, but an issue I hit was where Hive tables had numeric columns containing NULLs that got exported as the text “NULL” and then caused the load to fail when trying to insert them into numeric columns. Again, a bit of knowledge of how the DAC worked came in useful as I went through the log files and then restarted parts of the load – in the end I replaced the word NULL with an empty string and the loads then succeeded. 

NewImage

Now the data should be uploaded to BICS, you can check out the new tables and their contents either from within BICSs Data Modeller function, or from within the ApEx console that comes with BICS’s database part.

NewImage

One thing I did know I’d have to give some thought to was how to do the types of RPD modelling I’d done in the on-premise RPD, within the more constrained environment of the BICS data modeller. Looking back at the on-premise RPD I’ve made a lot of use of aliases to create fact and dimension versions of key log tables (posts, tweets) and multiple versions of the date dimensions, whereas in BICS you don’t get aliases but you can create database views. What was more worrying was that I’d used columns from the main webserver log table to populate both the main logical fact table and another dimension whilst still keeping a single source table as their physical source, but in BICS I’d have to create these two sources as views and then join them on a common key, which would be tricky as the log table in Hive didn’t have an obvious primary key. In the end I “cheated” a bit and created a derived copy of the incoming log file table with a sequence number added to it, so that I could then join both the derived fact table and dimension table on this synthetic unique key column.

NewImage

Now it’s a case of modelling out the various incoming tables uploaded via Data Sync into the facts and dimensions that the BICS data model will use; again something to be aware of is that each of these tables will need to join to its relevant dimensions or facts, so you need to leave the joining keys in the fact table rather than remove them as you’d do when creating logical fact tables in on-premise OBIEE.

NewImage

Tables that only perform one role, for example the IMP_RM_POSTS_VA table that contains details of all blog posts and web pages on our site, can be brought into the model as they are without creating views. For the second time when I add in the time dimension table, this time to create a time dimension role table for the Tweets fact table, I have to create a view over the table that performs a similar role to alias tables in on-premise OBIEE, and I’m then directed to create a fact or dimension object in the model from that view.

NewImage

Once this is all done, I end up with a basic BICS data model that’s starting to look like the one I had with the on-premise OBIEE install.

NewImage

Then finally, once I’d amended all the column names, brought in all of the additional columns and lookup tables to provide for example lists of Twitter user handles, I could then view the model in BICS’s Visual Analyzer and start produce data visualisation projects off of it.

NewImage

So – it’s admittedly a bit convoluted in the first iteration but once you’ve set up the BICS data model and the Data Sync upload process, you can use DataSync to refresh the replicated Hive table data in the BICS database instance and keep the two systems in-sync. As I said, OBIEE12c will come with Visual Analyzer as part of the on-premise install, but until then this is the way we link VA to Big Data Discovery on Hadoop to enable Visual Analyzer access to BDD’s enriched datasets.

Categories: BI & Warehousing

Introducing Rittman Mead’s New User Engagement Service

Thu, 2015-06-18 05:15

NewImage

Although you’ve been hearing a lot on the blog recently about extending Oracle BI&DW with Hadoop and big data technologies, another initiative Rittman Mead have been working on over the past couple of months is user engagement. All-too often developers create BI systems that meet all an organisation’s technical goals, but then fail to get adopted by users for reasons we can’t understand; all we know is that our brilliantly-architected system just doesn’t seem to engage users, and that’s just as fatal to a project as the ETL not working or the underlying technology being out-of-date.

Surveys have shown that adoption rates for new BI systems are often as low as 25%, because we don’t focus enough on the user experience (UX) or user interface, queries run too slow, the data or reports just aren’t relevant or the overall experience just isn’t up to the standard that internet users expect now. If you’re a BI manager or a CIO for your organisation it’s essential that you know whether the BI systems you’re providing for your users are actually being used, and it’s quite often the case that a BI system provided by the IT department is thought to be well-used by the end-users, but when you check the usage stats you’ll find that engagement has really fallen-off since the initial rollout of the system – and the worst thing is, you probably don’t really know why this is, all you know is that users aren’t happy and they’re now looking to implement some self-service tools that’ll break your data governance model.

To help you improve user engagement for your BI system and increase the return on your investment in Oracle technology, Rittman Mead are developing a User Engagement Service to address these issues along with a User Engagement Toolkit developed by the likes of Robin Moffatt, Jordan Meyer, Tom Underhill and other OBIEE and UX experts within Rittman Mead. Initial details of the service are on our Rittman Mead User Engagement Service homepage and the Rittman Mead User Engagement Service Datasheet, look-out for more information on these services over the coming days and weeks, and if you’re interested in getting-involved in our early-adopter program you can sign-up using the form below and we’ll get back to you shortly.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Categories: BI & Warehousing

Combining Oracle Big Data Discovery and Oracle Visual Analyzer on BICS

Wed, 2015-06-17 13:11

So now that Oracle Visual Analyzer is out as part of Oracle BI Cloud Service, and Visual Analyzer (VA) is due to ship on-premise as part of OBIEE12c sometime in the next twelve months, several of our customers have asked us if they need both VA and Oracle Big Data Discovery if they’re looking to analyse Hadoop data as part of a BI project. It’s an interesting question so I thought it’d be useful to go through my thoughts on how the two tools work together, when to use one, and when to use the other.

Taking our standard “big data” dataset of website log activity, Twitter mentions and page details from our WordPress blogging software, before Visual Analyzer came along the two usual ways we’d want to analyze these datasets is either a traditional BI metrics analysis-type scenario, and a data discovery/visualization scenario where we’re more interested in the content of the data rather than precise metrics. My half of the recent BI Forum 2015 Masterclass goes through these two scenarios in detail (presentation slides in PDF format here), and it’s Big Data Discovery that provides the more “Tableau”-type experience with fast point-and-click access to both datasets joined together on their common website page URL details.

NewImage

Now we have Visual Analyzer though, things get interesting; in my article on Visual Analyzer within BICS I showed a number of data visualisations that look pretty similar to what you’d get with Big Data Discovery, and when we have VA available on-site as part of OBIEE12c we’ll be able to connect it directly to Hadoop via Cloudera Impala, potentially analyzing the whole dataset rather than the (representative) sample that Big Data Discovery loads into its Endeca Server-based engine.

NewImage

So if the customer is looking to analyze data held in Hadoop and Visual Analyzer is available, where’s the value in Big Data Discovery (BDD)? To my mind there’s three areas where BDD goes beyond what VA can do, or helps you perform tasks that you’ll need to do before you can work with your data in VA:

  • The initial data discovery, preparation and cleansing that you’d otherwise have to do using HiveQL or an ETL tool such as ODI12c
  • Providing you with a high-level overview and landscape of your data, when that’s more important to you at the time than precise counts and drill-down analysis
  • Understanding how data joins together, and how best to use your datasets in terms of metrics, facts, dimensions and so forth

Taking the data preparation and cleansing part first, I’ve covered in several blogs over the past couple of years how tools such as ODI can be used to create formal, industrialized data pipelines to ingest, prepare and then summarise data coming into your Hadoop system, and how you can drop-down to languages such as HiveQL, Pig and Spark to code these data transformations yourself. In the case of my webserver log, twitter and page details datasets this work would include standardising URL formats across the three sources, geocoding the IP addresses in the access logs to derive the country and city for site visitors, turning dates and times in different formats into ones that work as Hive timestamps, and so forth. Doing this all using ODI and/or HiveQL can be a pretty technical task, so where BDD comes in useful even – if VA and an OBIEE RPD is the final destination for the data.

NewImage

Datasets that you transform and enrich in Big Data Discovery can be saved back to Hive as new Hive tables, or exported out as files for you to load into Oracle using SQL*Developer, or upload into BICS to use in Visual Analyzer. Where BDD then becomes useful is giving you a quick, easy to use overview of your dataset before you get into the serious business of defining facts, dimensions and aliases against these three Hive tables. The screenshots below show a couple of typical Big Data Discovery Studio data visualisation pages against the webserver logs dataset, and you can see how easy it is to create simple charts, tag clouds and maps against the data you’re working with – the aim being to give you an overview of the data you’re working with, help you understand its contents and “shape”, before moving further down the curation process and applying formal structures to the data.

NewImage

Where things get harder to do within Big Data Discovery is when more-and-more formatting, complex joining and “arranging” of the data is required; for example, BDD gives you a lot of flexibility in how you join datasets, but this flexibility can be confusing for end-users if they’re then presented with every possible variation of a three-table join rather than having the data presented to them as simple facts and dimensions. But this is how we’d really expect it – if you go back to the logical data architecture I went through in the blog post a while ago about the updated Oracle Information Management Reference Architecture, the trade-off in using schema-on-read data reservoirs is that this data, although quick and cheap to store, requires a lot more work to be done each time you access the data to get “value” from it.

NewImage

OBIEE, in contrast, makes you define your data structures in-full before you present data to end-users, dividing data in the three datasets into measures (for the fact tables) and attributes (for dimensions) and making it possible to add more dimension lookups (for a date dimension, for Twitter users in this case) and separate the overall set of data into more focused subject areas. Working with the dataset on the on-premise version of OBIEE first, the RPD that I created to present this data in a more formal, dimensional and hierarchical way to users looked like this:

NewImage

I can leave this RPD connected directly to the underlying Hive and Impala tables if I want to use just Answers and Dashboards, but for the time being I either need to export the underlying Hive tables into CSV files or into an Oracle Database before uploading into Visual Analyzer, but come OBIEE12c this should all be seamless. What users are then presented with when they go into Visual Analyzer is then something like this:

NewImage

Notice how the various attributes of interest are grouped into fact and dimension table folders, and there’s a simple means to add calculations, change the visualisation type and swap chart settings around. Note also that the count on the screen is the actual count of records in the full dataset, not the sample that BDD takes in order to provide an overview of values and distribution in the full dataset. Whilst it’s relatively easy to create a line chart, for example, to show tweets per user within BDD, using Visual Analyzer it’s just a case of double-clicking on the relevant measures and attributes on one side of the page, selecting and arranging the visualisation and applying any filters using dialog boxes and value-selectors – all much more familiar and obvious to BI users.

NewImage

Enrichment to the data that I’ve done in Big Data Discovery should in most cases be able to come through to Visual Analyzer; for example, I used Big Data Discovery’s text enrichment features to determine the sentiment of blog post titles, tweets and other commentary data, I could use the latitude and longitude values derived during the visitor IP address geocoding to plot site visitors on a map. Using the sentiment value derived from the post title, tweet contents and other textual data, I can create a chart of our most popular posts mentioned on Twitter and colour bars to show how positive, or negative, the comments about the post were.

NewImage

The only thing that Visual Analyzer can’t yet do that would be useful, is to be able to include more than one subject area in a project. To analyze the number of tweets and the number of page views for posts in a scatter chart, for example, I currently have to create a separate subject area that includes both sets of facts and dimensions, though I understand BICS on VA will have the ability to include multiple subject areas in a forthcoming release.

NewImage

So in summary, I’d say that Big Data Discovery, and Visual Analyzer as part of BI Cloud Service, are complementary tools rather than one being able to replace the other in a big data context. I find that Big Data Discovery is a great tool to initially understand, catalog and view at a high-level data sources going into VA, and then  to do some user-driven cleaning-up of the data, enhancing it and enriching it before committing it to the formal dimensional model that Visual Analyzer requires.

In its BICS guise there’s the additional step of having to export the Hadoop data out of your Big Data Appliance or other Hadoop cluster and upload it in the form of files using BICS’s data load or the new Data Sync utility, but when VA comes as part of OBIEE12c in the next twelve months you’ll be able to directly connect to the Hadoop cluster using Impala ODBC and analyse the data directly, in-place.

I’ll be covering more on BICS over the next few weeks, including how I got data from Hadoop into BICS using the new Data Sync utility.

Categories: BI & Warehousing

Can Better Visual Design Impact User Engagement?

Tue, 2015-06-16 04:44
Background

For every dashboard succinctly displaying key business metrics, there’s another that is a set of unconnected graphs which don’t provide any insight to its viewers.

In order for your users to get value from your business intelligence and analytics systems they need to be engaging, they need to tell a story.

As part of its User Engagement initiative Rittman Mead has created a User Engagement Service. A key part of this is a Visual Redesign process. Through this process, we review an organisation’s existing dashboards and reports and transform them into something meaningful and engaging.

This service focuses on the user interface and user experience; here we will use our expertise in data visualisation to deliver high value OBIEE dashboards.

The process starts by prioritising your dashboards and then, taking one at a time, rebuilds them. There are 3 key concepts that lie behind this process.

Create a guided structure of information

The layout of information on a dashboard should tell the user a story. This makes it much easier for data to be consumed because users can identify related data and instantly see what is relevant to them. If a user can consume the data they need easily, they’re more likely to come back for more. ‘The founder of modern management’, Peter Drucker said “If you can’t measure it, you can’t manage it.” When users are in touch with their data, they will be more engaged with the business.

visual-redesign-sm

To achieve this, first we must consider the audience. We need to know who will be consuming the dashboards and how they will be used. Then we can begin to create a design that will satisfy the users’ needs. Secondly, we need to think about what we exclude as much as what we include on the dashboard. There is limited space, so it needs to be used effectively. Only information that adds value should be displayed on a dashboard and everything should be there for a reason. Finally, we need to consider what questions the users want to answer and what decisions will be based on this. This will enable us to guide the user to the information that will help them take action.

Choose the right visuals

One common design mistake is overcrowding of dashboards. Dashboards often develop over time to become a jumbled array of graphs and tables with no consideration of the visual design.

The choice of graphs will determine how readable the information being displayed on a dashboard is. We constantly ask ourselves “What is the best graph for the data?” Understanding how different types of graphs answer different questions, allows us to make the best visual choices. This is a vital tool for communicating messages to the users and providing them with the ability to identify patterns and relationships in the data more efficiently.

Thoughtful use of colour

The use of colour is an effective way to draw attention to something, connect related objects and evoke users’ emotions. Thoughtful use of colour can have a big impact on user engagement. To be sure we choose the best colours throughout dashboard design, the key question we need to ask ourselves is, “how will these colours make the user feel?”

Like the charts themselves, every different colour used on a dashboard should be there for a reason. Intentional use of colour could determine how a user will feel whilst consuming the information being displayed to them. Bright, unnatural colours will alarm users and attract their attention. Cool colours will give a restful, calming feel to the user and are most effective for displaying sustained trends. Through taking into consideration the most effective way to use colour, we can work towards creating an attractive visual design, which is engaging and enjoyable to use.

Applying these 3 concepts through Rittman Mead’s visual redesign process, has proven to result in engaging OBIEE dashboards. Users are equipped to make the most out of their data, allowing them to make informative business decisions.

Rittman Mead’s Visual Redesign process is a key part Rittman Mead’s User Engagement Service, for more info see http://www.rittmanmead.com/user-engagement-service/.

If you are interested in hearing more about User Engagement please sign up to our mailing list below.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */


Categories: BI & Warehousing

Practical Tips for Oracle BI Applications 11g Implementations

Mon, 2015-06-08 06:35

As with any product or technology, the more you use it the more you learn about the “right” way to do things. Some of my experiences implementing Oracle Business Intelligence Applications 11g have led me to compile a few tips that will improve the overall process for installation and configuration and make the application more maintainable in the future. You can find me at KScope15 in Hollywood, FL beginning June 21st, presenting this exact topic. In this post I want to give you a quick preview of a couple of the topics in my presentation.

Data Extract Type – Choose Wisely

Choosing how the data is extracted from the source and loaded to the data warehouse target is an important part of the overall ETL performance in Oracle BI Applications 11g. In BI Apps, there are three extract modes to choose from:

  • JDBC mode
    This default mode will use the generic Loading Knowledge Modules (LKM) in Oracle Data Integrator to extract the data from the source and stream it through the ODI Agent, then down to the target. The records are streamed through the agent to translate datatypes between heterogeneous data sources. That makes the JDBC mode useful only when the source database is non-Oracle (since the target for BI Apps will always be an Oracle database).
  • Database link mode
    If your source is Oracle, then the database link mode is the best option. This mode uses the database link functionality built-in to the Oracle database, allowing the source data to be extracted across this link. This eliminates the need for an additional translation of the data as occurs in the JDBC mode.
  • SDS mode
    This should really be called “GoldenGate mode”, but I’m sure Oracle wants to keep their options open. In this mode, Oracle GoldenGate is used to replicate source transactions to the target data warehouse in what is called a Source Dependent Data Store (SDS) schema. This SDS schema mimics the source schema(s), allowing the SDE process to extract from the DW local SDS schema rather than across the network to the actual source.
    If the use of GoldenGate is an option, it’s hands-down better than JDBC mode should you be extracting data from a non-Oracle source. (Have a look at my OTN ArchBeat 2-minute Tech Tip as I attempt beat the clock while discussing when to use GoldenGate with BI Apps!)

OBIA architecture

Let’s go into a bit more detail about using GoldenGate with BI Applications. Because the SDS is setup to look exactly like the source schema, the Oracle Data Integrator pre-built Interfaces can change which source they are using from within the Loading Knowledge Module (LKM) by evaluating a variable (IS_SDS_DEPLOYED) at various points throughout the LKM. Using this approach, the GoldenGate integration can be easily enabled at any point, even after initial configuration. The Oracle BI Applications team did a great job of utilizing the features of ODI that allow the logical layer to be abstracted from the physical layer and data source connection. For further information about how to implement Oracle GoldenGate with Oracle BI Applications 11g, check out the OTN Technical article I wrote which describes the steps for implementation in detail.

Disaster Recovery

If the data being reported on in BI Applications is critical to your business, you probably want a disaster recovery process. This will involve an entirely different installation on a full server stack located somewhere that is not near the production servers. Now, there are many different approaches to DR with each of the products involved in BI Applications – OBIEE, ODI, databases, etc., but I think this approach is more simple than many others.

BI Apps DR Architecture

The installation of BI Apps would occur on each site (primary and standby) as standalone installations. It’s critical that you have a well defined, hopefully scripted and automated, process for installation and configuration, since everything will need to be exactly the same between the two sites. Looking at the architecture diagram above, you can see the data warehouse, ODI repository, and BIACM repository schemas will be replicated from primary to standby via Oracle Dataguard. The OBIEE metadata repositories are not replicated due to much of the configuration information being stored in files rather than the database schema.

With the installation and configuration identical, any local, internal URLs will be setup to use the local site URL (e.g. http://biapps.rittmanmead-primary.com). The external URLs, such as the top-level site (e.g. http://biapps.rittmanmead.com/biacm) or database JDBC connection URLs, will all use canonical names (CNAMEs) as the URL. A CNAME is really just an alias used in the DNS, allowing an easy switch when redirecting from one site to another. For example, the CNAME biapps.rittmanmead-primary.com will have an alias of biapps.rittmanmead.com. This alias will switch to point to biapps.rittmanmead-standby.com during the failover / switchover process.

We can now run through a few simple steps to perform the failover or switchover to the standby server.

  • Update Global CNAMEs
  • Switch primary database via DataGuard
  • Update the Web Catalog and Application Role assignments
  • Start NodeManager, OHS, WebLogic AdminServer
  • Update Embedded LDAP User GUID in ODI (if necessary)
  • Start BI and ODI Managed Servers
  • Update and Deploy the RPD
  • Start the BI Services

Looks pretty straightforward, right? With the appropriate attention to detail up front during the installation and configuration, it becomes simple to maintain and perform the DR switchover and failover. I’ll go into more detail on these topics and others, such as installation and configuration, LDAP integration, and high availability, during my presentation at KScope15 later this month. I hope to see you there!

 

Categories: BI & Warehousing

New In Oracle BI Cloud Service – Oracle Visual Analyzer, and Data Mashups in VA

Fri, 2015-06-05 13:27

Oracle released an update to Oracle BI Cloud Service a few weeks ago that included Oracle Visual Analyzer, along with some other improvements including support for full Oracle Database-as-a-Service as the database backend, the ability to upload RPDs and run them in the cloud, and support for a new utility called DataSync. In this post though I want to take a quick look at Visual Analyzer, and in-particular look at the data-mashup feature it provides.

Visual Analyzer is of course one of the tentpole features in OBIEE12c that we’ve all been looking forward to, as is 12c’s ability to allow users to upload spreadsheets of data and join them to existing subject areas in Answers. I’m covering Visual Analyzer in an upcoming edition of Oracle Magazine so I won’t go into too much detail on the product at a high-level here, but in summary Visual Analyzer provides a single-pane-of-glass, Tableau-type environment for analysing and visualising datasets stored in Oracle Cloud Database and modelled in BICS’s cut-down web-based data-modeller. In the Oracle Magazine article I take the Donors Choose dataset that we featured at the recent Rittman Mead BI Forum 2015, and create a range of visualizations as I explore the dataset and pick the type of project I’d most like to donate to.

NewImage

Visual Analyzer differs from Answers in that all of the available data items are listed down one side of the page, there’s no flicking backwards-and-forwards between the Criteria tab and the Results tab, filters are set by just right-clicking on the column you wish to filter by, and the visualisation builds up in front-of your eyes as you add more columns, move things around and arrange the data to get the most appropriate view of it.

From an IT manager’s perspective, where Visual Analyzer improves on desktop analysis tools such as Tableau and Spotfire is that the data you work with is the same governed dataset that Answers and Dashboards users work with, the same security rules and auditing apply to you as to other Presentation Services and Catalog users, but those types of “self-service” users who just want to play-around with and explore the data – rather than create reports and dashboards for mass consumption – now can work with the type of tool they’ve up-to-now had to look elsewhere for.

NewImage

One of the other headline features for OBIEE12c announced at last year’s Oracle Openworld is “Model Extensibility and Data Mashup”. Announced as part of Paul Rodwick’s “Business Analytics and Strategy Roadmap” session and described in the slide below, this feature extends the capabilities of the BI Server to now handle data the user uploads from the Answers (and now Visual Analyzer) report creation page, joining that data as either “fact extensions” or “measure extensions” to an existing Presentation Services subject area. 

NewImage

I won’t go into the technical details of how this works at this point but in terms of how it looks to the end-user, let’s consider a situation where I’ve got a spreadsheet of additional state-level data that I’d like to use in this Visual Analyzer (VA) project, to in this case colour the states in the map based on the income level of the people living there. The spreadsheet of data that I’ve got looks like this:

NewImage

Note the cunningly-named columns in the first row – they don’t have to match the column names in your VA data model, but if they do as you’ll see in a moment it speeds the matching process up. To add this spreadsheet of data to my VA project I therefore switch the menu panel on the left to the Data Sources option, right-click and then choose Add Data Source…

NewImage

Then using the Add Data Source dialog, upload the XLSX file from your workstation. In my instance, because I named the columns in the top row of the spreadsheet to match the column names already in the BICS data model, it’s matched the SCHOOL_STATE column in the spreadsheet to the corresponding column in the SCHOOLS table and worked out that I’m adding measures, joined on that SCHOOL_STATE column.

NewImage

If my spreadsheet contained other text fields matched to the existing model via a dimension attribute, the upload wizard would assume I’m adding dimension attributes, or if it detects them wrong I can match the columns myself, and specify whether the new file contains measures or attributes. BICS then confirms the join between the two datasets and I can then start selecting from the new measures to add to my project.

NewImage

My final step then is to add the HOUSEHOLD_INCOME measure to my visualisation, so that each state is now shaded by the household income level, allowing me to see which states might benefit most from my school project donation.

NewImage

One thing to bear-in-mind when using mashups though, is that what you’re effectively doing is adding a new fact table that joins to the existing one on one or more dimension levels. In my case, my HOUSEHOLD_INCOME and POPULATION measures only join to the DONATIONS dataset on the SCHOOL dimension, and then only at the STATE level, so if I try and reference another column from another dimension – to add, for example a filter on the FUNDING STATUS column within the PROJECTS dimension – the project will error as that dimension isn’t conformed across both facts.

NewImage

My understanding is that Oracle will fix this in a future release by setting all the non-conformed dimensions to “Total” as you can do with the on-site version of OBIEE yourself, but for now this restricts mashups to datasets that use fully-conformed dimensions, and with filters that only use those conformed dimensions from the join-level up.

So that’s VA on BICS in a nutshell, with this article drilling-down further into the very interesting new data mashup feature. Look out for more on this new release of BICS soon as I cover the new DataSync feature, RPD uploads and connecting BICS to the full Oracle Database-as-a-Service.

 

Categories: BI & Warehousing

User Engagement – Why Does it Matter?

Mon, 2015-06-01 09:25

The value of any system can be measured by how often you interact with it. The perfect example is Slack, which is now seen as the most successful collaboration app ever and is valued at $1.2bn. The key metric for this is DAU/MAU (the ratio of daily users over monthly users), which scales from 0 (no visits) to 1 (every user visits daily) based on how active your users are.

How does this relate to your BI or analytics system? We believe that the same concept applies. The more you interact with your BI/analytics system, the more valuable it is to your organisation.

Return on investment (ROI) of BI/analytics systems has always been hard to measure. There is widespread belief that the battle was either won or lost during the development stage of its lifecycle so, agile project approaches, stakeholder involvement, and effective change management programs are employed to closer align the system to users’ true requirements. However, even that may not drive users to use the system.

What can you do if your system is already live?

Focus on user engagement. The BI Scorecard describes typical BI/analytics systems as having only 22% user adoption. Gartner, in its The Consumerization of BI Drives Greater Adoption paper, puts the typical user adoption rate at 30%. This leaves a lot of room for improvement.

We believe that there are 5 drivers for user engagement:

  • User interface and user experience
  • Quality, relevance, and confidence in data
  • Performance
  • Ability to use the system
  • Accessibility – is the system available in the right way, at the right time?

Addressing these can drive low cost and low risk improvements to the user engagement of your system and, in turn, increase its value.

Over the next few months, we are focusing on user engagement and will be launching a set of articles and initiatives to assist organisations. We will also launch a User Engagement Service that baselines your current user engagement levels, makes a series of recommendations for improvements based on the drivers above, implements them and then measures their impact on the system. The results are a demonstrable increased ROI on your BI/analytics system.

Over the next few months we will be putting a lot of focus into user engagement, plus developing a set of tools to extend OBIEE to use technologies like D3. If you are interested in hearing more, please sign up to our mailing list below.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:500px;} /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */


Categories: BI & Warehousing

OBIEE 11.1.1.9: New Features for Front-End Developers and Business Users

Tue, 2015-05-26 11:13

OBIEE 11.1.1.9 was released this May and Robin and Mark wrote great posts about New Features from System Administrators and the new support for HiveServer2 and Cloudera Impala respectively. In this post, we will see some useful new features for Front-End development. Some of these features like tree map, new styles, etc. were included in the OBIEE 11.1.1.7.10 version, but I’m including them here anyway because if you are moving from an earlier release of 11.1.1.7 you probably haven’t seen them yet.

Search & Sort options inside the Subject Area

There is a new option to search inside the subject area when we create an analysis. If we press the magnifying glass icon in the Subject Areas pane, a search box appears and OBIEE will return all the objects that contains the text that we entered. We can also use the % wildcard.

search_option

In addition there is also the option to order the objects inside the subject area sorting the folders and the presentation columns inside them. The default value is showing the objects in the way that are saved in the repository.

sort_subj_area

Save Calculated Items in the Web Catalog

A very useful feature for business users is the possibility to create calculated items in an analysis and save it in the catalog in order to reuse it in other analyses for the same subject area.

In the Results tab there is a new icon for creating a new measure. The calculation is created in the same way the column formulas are created.

new_calc_measure

After the new measure is created you can go to the Criteria tab and in the column pop-up menu select the Save Column As option. In this manner you save this new measure as a new object in the catalog to be reused in other analyses. The feature of creating a new calculated measure for tables and pivot tables was included in the 11.1.1.7.10 version but the possibility to save the column in the catalog is a new feature of the OBIEE 11.1.1.9.

savenew_colcalculated_measure4

You can also change the properties of the new column in the Criteria tab but be aware that conditional formatting and conditional action links for the column are not saved to the catalog. Regular action links are saved for the column as well as the format properties.

The saved column can be edited from the catalog pane and catalog page. All the changes that you make in the catalog are applied to every analysis that used this column. If you modify it inside a particular analysis, these changes are valid only for the analysis that you are working on.

editnew_measure

To be able to save new measures as columns you should have the Save Column privilege.

calculatedmeasure_priv

Global Variables

In this version we find a new new type of variables: the Global Variables. Global Variables can be defined in the context of an analysis and can be used in other analysis. Useful to do some intermediate reusable calculations.

To create a new global variable, select the Edit Formula option for the required column. When you press the Variable button in the Formula Editor you will see the new Global option. The Insert Global Variable dialog appears and you can select an existing global variable to be used in the formula, or you can create a new one. To create a new one you need to enter a name, type and value. If you want to use an expression like in the example (Min(Revenue)) as value, the data type should be Text.

global_variable

To reference a global variable you need to use the fully qualified name including the context: @{global.variables.variablename}.

global_var_result

Also to manage global variables you should have the Manage Global Variables privilege.

New Visualisation: Treemap

There is a new visualisation called Treemap since OBIEE 11.1.1.7.10. Treemap groups the dimensional data that you selected in the analysis in tiles. By default, the tiles size is based in the content of first measure of the analysis and the tiles colour is based in the content of the second measure. If you have only one measure in your analysis, is used for both size and colour.

treemap

You can edit it as any other view and change the group by information as well as the measures which affects the size and colour of the tile and the range of colours that you want to use. Also you can choose the style between seeing the information in coloured bins or using continuous colour fill. If you selected the first one you can also select the amount of different coloured bins do you want to use.

treemap2

New Preview Styles

When you want to preview an analysis in the Results tab, instead of showing immediately how the analysis will look, OBIEE offers you different Styles to see your analysis. These feature was also included since the 11.1.1.7.10 version. In case you want to create a custom style, Oracle recommends to use the new Skyros style as a starting point.

preview

Horizontal Layout for Radio Button and Check Box Dashboard Prompts

Radio button and check box dashboards prompts can be showed horizontally. When you create a new dashboard prompt using check box or radio button under Options, you can select between horizontal or Vertical Layout appears under Options.

horizontal_prompts

horizontal_prompt1

Enhancements in Export & Print Options

In this version, you will find more options in the Print and Export Options dialog. You can select if you want to include charts, images and formatting or to specify column properties like the column width and to wrap the text in columns.

dashboardprintoptions

In summary, these features are simple but quite useful for business users and front-end developers, and give more flexibility to create better data visualisations.

 

Categories: BI & Warehousing

Deploying Oracle BI Mobile HD within Good, Oracle OMSS and Other Secure MDM Containers

Sun, 2015-05-24 10:53

A few months ago ODTUG announced their ODTUG Mobile Day in Utrecht, Netherlands and asked if I’d like to put forward an abstract for a presentation. Something that’s been on my mind for a while is the issues some of our customers have been reporting trying to run Oracle BI Mobile HD with the Good Mobile Device Management toolkit, so I thought this might be a good opportunity to look into the topic in more detail and try and come-up with some recommendations on how to get this working. As the ODTUG Mobile Day was for all Oracle developers, not just BI ones, I thought it’d be a good opportunity to look into iOS and Android development in-general as it relates to Oracle, and in-particular what’s involved in deploying into these “container” environments that many large enterprise customers insist on when deploying mobile applications. The presentation is available on Slideshare if you want to download it, but I thought I’d expand on some of the concepts and tips in a separate blog post as it’s all actually quite interesting.

As a bit of background, OBIEE itself can of course display its standard dashboards through mobile web browsers with graphics and charts switching to iOS and Android-friendly formats rather than trying to render in Flash. But standard dashboards and analyses look too small and are hard to work with in mobile browsers, so using this approach to mobile-enable your dashboards isn’t generally recommended. Instead of course, as most readers will already be aware, Oracle released a number of mobile clients for OBIEE over the years, with Oracle BI Mobile and Oracle BI Mobile App Designer available in the latest 11.1.1.9 and suiting most end-user and customer needs for mobile access to their business analytics data.

 NewImage

Oracle BI Mobile App Designer apps can be deployed using SSL and SSO, whilst BI Mobile HD takes advantage of iOS and Android platform security features such as password storage in encrypted on-device keystores and remote-wipe of devices. But for some organizations that need higher-levels of device control and data security, this platform-level security isn’t enough as:

  • Unknown and unsecured devices accessing their networks (for example,via VPN) is considered an unacceptable security risk
  • For Android devices in-particular, there is known malware out on the internet that can compromise “rooted” devices
  • It’s possible to cut-and-paste sensitive information from BI Mobile applications into other applications
  • Users don’t always set secure passwords, and lost or stolen devices can potentially expose BI data stored on the device to unauthorised users

To address these issues, a number of Mobile Device Management vendors provide enterprise-level solutions to these issues, typically by having the customer specially-sign applications they wish to deploy to indicate they’ve been authorised for use, and by deploying those applications within managed containers that keep them separate from the general mobile apps on the users’ device. As well as specially signed and deployed applications such as BI Mobile HD, these MDM suites also typically provide secure and containerised web browsers and email devices, both of which need to work with Oracle BI and Oracle BI Mobile App Designer. Vendors in this space include Good Technology and MobileIron, and Oracle have their own MDM solution after they acquired Bitzer Mobile back in 2013. So how do these solutions work, why do some customers have trouble getting them working, and what’s the “preferred” approach if a customer asks you to just get it working for them?

To start with the simplest approach, the tested and recommended way to run Oracle BI Mobile HD, and Oracle BI Mobile App Designer through an MDM-supplied web browser, is to use Oracle Mobile Security Suite (OMSS), based on the technology Oracle acquired from Bitzer Mobile. OMSS is a complete platform for deploying mobile apps in a secure, managed container environment, and takes customer-signed iOS and Android applications and deploys them onto enterprise users’ mobile devices using a centralized console and management service.

NewImage

To support deployment of Oracle BI Mobile into OMSS and other vendor MDM solutions, Oracle have made an un-packed and un-signed version of Oracle BI Mobile available for download on OTN as “Oracle Business Intelligence Mobile Security Toolkit”, and taking the iOS version as an example you can then compile this is Apple Xcode and sign it for deployment in your organisation’s OMSS installation. For some organisations this can be a bit of a challenge as they don’t generally use Macs and don’t do iOS development, but for Rittman Mead testing this out was pretty straightforward as we all use Macs for our consulting work and some of us play around with Xcode and iOS development in our spare time. What can also be a challenge is setting up an iOS Developer Account so that you can sign the BI Mobile HD application with your organization’s own certificate, but we set up such an account a couple of years ago and were able to get it all setup with just a couple of hours’ work. What you will need then to get this running (apart from the OMSS part that I’ll cover in a moment) is the following bits of hardware and software, in this case for the iOS version of BI Mobile HD:

  • An Apple Mac that can run recent versions of Xcode
  • An Apple Developer account that can develop and sign iOS applications, in your organization’s name and not an individual developer’s – note that you generally need to be registered with Dun and Bradstreet’s business verification service to set this up, which can take a few weeks if your entry is out-of-date or not matching your current company details
  • Oracle Business Intelligence Mobile Security Toolkit
  • Apple Xcode

NewImage

Then it’s a case of setting up a new project in Xcode, selecting Single View Application and Universal as the device type, entering your project and organization’s information, and then merging the Oracle Mobile Security Toolkit’s project files with the empty project you just created and setting any other project properties.

NewImage

At this point you should be able to run the application within the Xcode project and test it using Xcode’s built-in iPhone and iPad simulators, so that you can check all the features still work as expected.

NewImage

Now comes the point where you sign the app you’ve just created so that you can then deploy it into OPSS for distribution to your users. This point is important to security-conscious customers as it’s now the customer, rather than Oracle, that have signed the app and you can be much more certain that no malware or other backdoors have been introduced into the code before compiling and distribution. Signing of the app takes place within Xcode, with a similar process being used for the Android version of BI Mobile HD within the Android Studio IDE.

NewImage

The resulting compiled .app and .apk files are then uploaded into OMSS’s catalog for distribution to users, with provisioning taking place using emails set to corporate users that provide download links to these containerised, managed and secured versions of Oracle’s BI Mobile HD app.

NewImage

So all of this looks pretty straightforward, albeit maybe a bit more complicated for organisations that don’t use Macs and don’t generally develop iOS applications – but Oracle partners such as ourselves can help with this if you’d like to offload this part of the process to a specialist team. Where things do sometimes get a bit more complicated is when other MDM vendor technologies are used, particularly Good Dynamics MDM solution that works in a slightly different way to Oracle Mobile Security Suite.

Unlike OMSS’s approach where it has you compile and sign BI Mobile HD within Apple’s own iOS application, Good requires you to build and export the unsigned Oracle Mobile Security Toolkit project in Xcode as an .ipa file, and then copy it along with your iOS Developer Program certificate and the certificate password into Good’s own Good Control Management Console. There your application is then combined with Good’s security libraries, signed with your certificate password and deployed as a “wrapped application” to then be distributed to users using a similar method to the one OMSS takes; however all mobile application access then goes through a Good proxy server, typically placed outside the main company network and providing secure communications between these managed applications running outside of the company firewall into that company’s secure servers – in this case, OBIEE11g.

NewImage

There’s nothing inherently wrong with this compared to how OMSS does it, and organisations often pick Good Dynamics over other MDM solutions because of the extra functionality Good’s approach provides through the insertion of their security SDK into your mobile application; but its when organisations take advantage of these features to provide custom integration with their own security platform that problems can sometimes occur.

For example, a common issue we hear about when deploying Oracle BI Mobile HD using Good is when the customer tries to integrate their SSO solution into the user authentication process. Good’s security SDK makes it possible to intercept user login events and route the request to the customer SSO server, but it’s essential that control is passed back to the BI Server as if this re-routing hadn’t taken place and returning the authentication details the BI Server expects, and if the custom login process doesn’t quite do this then the authentication process fails. Another issue we heard about recently was recent versions of iOS (iOS 7) deprecating synchronous API calls but BI Mobile HD still making them; in this case Oracle supplied a patch and all calls are now made asynchronously but until then, deployment in the Good environment mysteriously failed.

What makes these issues doubly-tricky to identify and resolve is the restrictions most security-conscious enterprise customers place around disclosing details of their network setup, to the point where they often aren’t allowed to tell partners or Oracle Support any of the details they’ll need to work out how traffic passes around the network and over to OBIEE and the Good MDM environment. What troubleshooting often comes down to then is good old-fashioned packet-sniffing and investigation by someone familiar with OBIEE (in particular, the BI Server’s authentication process), Good’s security SDK and the customer’s network setup.

NewImage

So given all of this, what is our recommendation for customers looking to implement mobile OBIEE11g clients within an MDM, container solution? My advice would be, where the customer doesn’t currently have an MDM solution and wants the easiest way to deploy Oracle BI Mobile within a secure container, go for the Oracle Mobile Security Suite option – it’s what Oracle support and test for, and as an Oracle solution it’s obviously going to be easier to go down this route than try and troubleshoot another vendor’s solution if things go wrong. But if an organization is insisting on deploying Oracle BI Mobile in a secure container its unlikely this is the first time they’ve had to do it, so there’s most probably already an MDM solution in-place and it’s likely to be from Good.

In this case, first of all remember that it should work and the two products are compatible; what you need to be aware of though is the correct way of linking BI Mobile HD in this environment to your corporate SSO and security platform, and work together with your network security team to quickly identify where the issue is coming from if things don’t work first time. Engaging with an experienced OBIEE partner such as Rittman Mead can of course help, and Oracle’s own product development and support teams have most probably seen most of the issues that can come up and can help if things get tricky. The team here at Rittman Mead have several customers successfully using Good and other vendor’s MDM solutions along with Oracle BI Mobile, and of course we can help with the app signing and deployment process if your organization doesn’t usually work with Macs or have experience with Xcode and Oracle Mobile Security Toolkit.

Finally, thanks to Chris Redgrave from the Rittman Mead team, and Oracle’s Matt Milella and Jacques Vigeant who helped me with the background research for this article and the ODTUG BI Mobile Day presentation. As I mentioned earlier the presentation from the ODTUG event is available on Slideshare, and there’s also walkthroughs for deploying BI Mobile HD within Oracle OMSS on iOS and Android on the OTN website.

Categories: BI & Warehousing

New Oracle Magazine article on Oracle BI Cloud Service

Sun, 2015-05-24 07:42

NewImageThe May/June 2015 edition of Oracle Magazine is now out, and my Business Intelligence article in this edition is on Oracle BI Cloud Service (BICS). In “Upload, Model, Analyze and Report” I focus on the “departmental power-user” use-case where someone with a small team wants to share data and dashboards with others in the department, is familiar with OBIEE but wants to get something up-and-running quickly without having to include IT or the formal development process. By just uploading a spreadsheet of data and quickly modeling it into a star-schema using BICS’s simple-to-use web-based tools, you can create reports that can be shared with others in your team using OBIEE’s familiar dashboard interface.

Rittman Mead offer a number of services around Oracle BI Cloud Service and now have our own BICS “pod” for demonstration and evaluation purposes. Now that Oracle have released Visual Analyser early for BICS we’re expecting a lot of interest and demand for services, support and training around Oracle’s cloud version of OBIEE, so if you’re interested in moving part of your OBIEE estate in the cloud, or you’re a departmental BI lead looking to run OBIEE within your department without the need to get IT involved, drop us a line at enquiries@rittmanmead.com and we’ll be pleased to help.

Categories: BI & Warehousing

Connecting OBIEE 11.1.1.9 to Hive, HBase and Impala Tables for a DW-Offloading Project

Fri, 2015-05-22 07:28

In two previous posts this week I talk about a client request to offload part of their data warehouse top Hadoop, taking data from a source application and loading it into Hive tables on Hadoop for subsequent reporting-on by OBIEE11g. In the first post I talked about hosting the offloaded data warehouse elements on Cloudera Hadoop CDH5.3, and how I used Apache Hive and Apache HBase to support insert/update/delete activity to the fact and dimension tables, and how we’d copy the Hive-on-HBase fact table data into optimised Impala tables stored in Parquet files to make sure reports and dashboards ran fast.

NewImage

In the second post I got into the detail of how we’d keep the Hive-on-HBase tables up-to-date with new and changed data from the source system, using HiveQL bulk-inserts to load up the initial table data and a Python script to handle subsequent inserts, updates and deletes by working directly with the HBase Client and the HBase Thrift Server. Where this leaves us at the end then is with a set of fact and dimension tables stored as optimised Impala tables and updatable Hive-on-HBase tables, and our final step is to connect OBIEE11g to it and see how it works for reporting.

NewImage

As I mentioned in another post a week or so ago, the new 11.1.1.9 release of OBIEE11g supports Cloudera Impala connections from Linux servers to Hadoop, with the Linux Impala drivers being shipped by Oracle as part of the Linux download and the Windows ones used for the Admin Tool workstation downloadable directly from Cloudera. Once you’ve got all the drivers and OBIEE software setup, it’s then just a case of setting up the ODBC connections on the Windows and Linux environments, and you should then be in a position to connect it all up.

NewImage

In the Impala side, I first need to create a copy of the Hive-on-HBase table I’ve been using to load the fact data into from the source system, after running the invalidate metadata command to refresh Impala’s view of Hive’s metastore.

[oracle@bigdatalite~]$impala-shell
 
[bigdatalite.localdomain:21000]>invalidate metadata;
 
[bigdatalite.localdomain:21000]>create table impala_flight_delays
                                >stored as parquet
                                >as select *from hbase_flight_delays;

Next I import the Hive-on-HBase and the Impala table through the Impala ODBC connection – even though only one of the tables (the main fact table snapshot copy) was created using Impala, I still get the Impala speed benefit for the other three tables created in Hive (against the HBase source, no less). Once the table metadata is imported into the RPD physical layer, I can then create a business model and subject area as I would do normally, so my final RPD looks like this:

NewImage

Now it’s just a case of saving the repository online and creating some reports. If you’re using an older version of Impala you may need to disable the setting where a LIMIT clause is needed for every GROUP BY (see the docs for more details, but recent (CDH5+) versions will work fine without this). Something you’ll also need to do back in Impala is compute statistics for each of the tables, like this:

[bigdatalite.localdomain:21000] > compute stats default.impala_flight_delays;
Query: compute stats default.impala_flight_delays
+-----------------------------------------+
| summary                                 |
+-----------------------------------------+
| Updated 1 partition(s) and 8 column(s). |
+-----------------------------------------+
Fetched 1 row(s) in 2.73s
[bigdatalite.localdomain:21000] > show table stats impala_flight_delays;
Query: show table stats impala_flight_delays
+---------+--------+---------+--------------+---------+-------------------+
| #Rows   | #Files | Size    | Bytes Cached | Format  | Incremental stats |
+---------+--------+---------+--------------+---------+-------------------+
| 2514141 | 1      | 10.60MB | NOT CACHED   | PARQUET | false             |
+---------+--------+---------+--------------+---------+-------------------+
Fetched 1 row(s) in 0.01s

Apart from being generic “good practice” and giving the Impala query optimizer better information to form a query plan with, you might hit the error below in OBIEE if you don’t do this.

NewImage

If you do hit this error, go back to the Impala Shell or Hue and compute statistics, and it should go away next time. Then, finally, you can go and create some analyses and dashboards and you should find the queries run fine against the various tables in Hadoop, and moreover the response time is excellent if you use Impala as the main query engine.

NewImage

I did a fair bit of testing of OBIEE 11.1.1.9 running against Cloudera Impala, and my findings were that all of the main analysis features worked (prompts, hierarchies, totals and subtotals etc) and the response time was comparable with a well-turned data warehouse, maybe even Exalytics-level of speed. If you take a look at the nqquery.log file for the Impala SQL queries OBIEE is sending to Impala, you can see they get fairly complex (which is good, as I didn’t hit any errors when running the dashboards) and you can also see where the BI Server takes a more simple approach to creating subtotals, nested queries etc compared to the GROUP BY … GROUPING SETS that you get when using a full Oracle database.

select D1.c1 as c1,
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4,
     D1.c5 as c5,
     D1.c6 as c6,
     D1.c7 as c7,
     D1.c8 as c8,
     D1.c9 as c9,
     D1.c10 as c10,
     D1.c11 as c11,
     D1.c12 as c12
from 
     (select 0 as c1,
               D1.c3 as c2,
               substring(cast(NULL as  STRING ), 1, 1 ) as c3,
               substring(cast(NULL as  STRING ), 1, 1 ) as c4,
               substring(cast(NULL as  STRING ), 1, 1 ) as c5,
               'All USA' as c6,
               substring(cast(NULL as  STRING ), 1, 1 ) as c7,
               1 as c8,
               substring(cast(NULL as  STRING ), 1, 1 ) as c9,
               substring(cast(NULL as  STRING ), 1, 1 ) as c10,
               D1.c2 as c11,
               D1.c1 as c12
          from 
               (select sum(T44037.late) as c1,
                         sum(T44037.flights) as c2,
                         T43925.carrier_desc as c3
                    from 
                         
                              hbase_carriers T43925 inner join 
                              impala_flight_delays T44037 On (T43925.key = T44037.carrier)
                    where  ( T43925.carrier_desc = 'American Airlines Inc.' or T43925.carrier_desc = 'Delta Air Lines Inc.' or T43925.carrier_desc = 'Southwest Airlines Co.' or T43925.carrier_desc = 'Spirit Air Lines' or T43925.carrier_desc = 'Virgin America' ) 
                    group by T43925.carrier_desc
               ) D1
          union all
          select 1 as c1,
               D1.c3 as c2,
               substring(cast(NULL as  STRING ), 1, 1 ) as c3,
               substring(cast(NULL as  STRING ), 1, 1 ) as c4,
               D1.c4 as c5,
               'All USA' as c6,
               substring(cast(NULL as  STRING ), 1, 1 ) as c7,
               1 as c8,
               substring(cast(NULL as  STRING ), 1, 1 ) as c9,
               D1.c4 as c10,
               D1.c2 as c11,
               D1.c1 as c12
          from 
               (select sum(T44037.late) as c1,
                         sum(T44037.flights) as c2,
                         T43925.carrier_desc as c3,
                         T43928.dest_state as c4
                    from 
                         
                              
                                   hbase_carriers T43925 inner join 
                                   impala_flight_delays T44037 On (T43925.key = T44037.carrier) inner join 
                              hbase_geog_dest T43928 On (T43928.key = T44037.dest)
                    where  ( T43925.carrier_desc = 'American Airlines Inc.' or T43925.carrier_desc = 'Delta Air Lines Inc.' or T43925.carrier_desc = 'Southwest Airlines Co.' or T43925.carrier_desc = 'Spirit Air Lines' or T43925.carrier_desc = 'Virgin America' ) 
                    group by T43925.carrier_desc, T43928.dest_state
               ) D1
          union all
          select 2 as c1,
               D1.c3 as c2,
               substring(cast(NULL as  STRING ), 1, 1 ) as c3,
               D1.c4 as c4,
               D1.c5 as c5,
               'All USA' as c6,
               substring(cast(NULL as  STRING ), 1, 1 ) as c7,
               1 as c8,
               D1.c4 as c9,
               D1.c5 as c10,
               D1.c2 as c11,
               D1.c1 as c12
          from 
               (select sum(T44037.late) as c1,
                         sum(T44037.flights) as c2,
                         T43925.carrier_desc as c3,
                         T43928.dest_city as c4,
                         T43928.dest_state as c5
                    from 
                         
                              
                                   hbase_carriers T43925 inner join 
                                   impala_flight_delays T44037 On (T43925.key = T44037.carrier) inner join 
                              hbase_geog_dest T43928 On (T43928.key = T44037.dest and T43928.dest_state = 'Georgia')
                    where  ( T43925.carrier_desc = 'American Airlines Inc.' or T43925.carrier_desc = 'Delta Air Lines Inc.' or T43925.carrier_desc = 'Southwest Airlines Co.' or T43925.carrier_desc = 'Spirit Air Lines' or T43925.carrier_desc = 'Virgin America' ) 
                    group by T43925.carrier_desc, T43928.dest_city, T43928.dest_state
               ) D1
          union all
          select 3 as c1,
               D1.c3 as c2,
               D1.c4 as c3,
               D1.c5 as c4,
               D1.c6 as c5,
               'All USA' as c6,
               D1.c4 as c7,
               1 as c8,
               D1.c5 as c9,
               D1.c6 as c10,
               D1.c2 as c11,
               D1.c1 as c12
          from 
               (select sum(T44037.late) as c1,
                         sum(T44037.flights) as c2,
                         T43925.carrier_desc as c3,
                         T43928.dest_airport_name as c4,
                         T43928.dest_city as c5,
                         T43928.dest_state as c6
                    from 
                         
                              
                                   hbase_carriers T43925 inner join 
                                   impala_flight_delays T44037 On (T43925.key = T44037.carrier) inner join 
                              hbase_geog_dest T43928 On (T43928.key = T44037.dest and T43928.dest_city = 'Atlanta, GA')
                    where  ( T43925.carrier_desc = 'American Airlines Inc.' or T43925.carrier_desc = 'Delta Air Lines Inc.' or T43925.carrier_desc = 'Southwest Airlines Co.' or T43925.carrier_desc = 'Spirit Air Lines' or T43925.carrier_desc = 'Virgin America' ) 
                    group by T43925.carrier_desc, T43928.dest_airport_name, T43928.dest_city, T43928.dest_state
               ) D1
     ) D1
order by c1, c6, c8, c5, c10, c4, c9, c3, c7, c2 limit 65001

Not bad though for a data warehouse offloaded entirely to Hadoop, and it’s good to see such a system handling full updates and deletes to data as well as insert appends, and it’s also good to see OBIEE working against an Impala datasource and with such good response times. If any of this interests you as a potential customer, feel free to drop me an email at mark.rittman@rittmanmead.com, or check-out our Big Data Quickstart page on the website.

Categories: BI & Warehousing