Skip navigation.

BI & Warehousing

More Details on the Lars George Cloudera Hadoop Masterclass – RM BI Forum 2014, Brighton & Atlanta

Rittman Mead Consulting - Tue, 2014-04-29 08:39

NewImage

It’s just over a week until the first of the two Rittman Mead BI Forum 2014 events take place in Brighton and Atlanta, and one of the highlights of the events for me will be Cloudera’s Lars George’s Hadoop Masterclass. Hadoop and Big Data are two technologies that are becoming increasingly important to the worlds of Oracle BI and data warehousing, so this is an excellent opportunity to learn some basics, get some tips from an expert in the field, and then use the rest of the week to relate it all back to core OBIEE, ODI and Oracle Database.

Lars’ masterclass is running on the Wednesday before each event, on May 7th at the Brighton Hotel Seattle and then the week after, on Wednesday 14th May at the Renaissance Atlanta Midtown Hotel. Attendance for the masterclass is just £275 + VAT for the Brighton event, or $500 for the Atlanta event, but you can only book it as part of the overall BI Forum – so book-up now while there are still places left! In the meantime, here’s details of what’s in the masterclass:

Session 1 – Introduction to Hadoop 

The first session of the day sets the stage for the following ones. We will look into the history of Hadoop, where it comes from, and how it made its way into the open-source world. Part of this overview are the basic building blocks in Hadoop, the file system HDFS and the batch processing system MapReduce. 

Then we will look into the flourishing ecosystem that is now the larger Hadoop project, with its various components and how they help forming a complete data processing stack. We also briefly look into how Hadoop based distributions help today tying the various components together in a reliable manner based on predictable release cycles. 

Finally we will pick up the topic of cluster design and planning, talking about the major decision points when provisioning a Hadoop cluster. This includes the hardware considerations depending on specific use-cases as well as how to deploy the framework on the cluster once it is operational.

Session 2 – Ingress and Egress

The second session dives into the use of the platform as part of an Enterprise Data Hub, i.e. the central storage and processing location for all of the data in a company (large, medium, or small). We will discuss how data is acquired into the data hub and provisioned for further access and processing. There are various tools that allow the importing of data from single event based systems to relational database management systems. 

As data is stored the user has to make decisions how to store the data for further processing, since that can drive the performance implications considerably. In state-of-the-art data processing pipelines there are usually hybrid approaches that combine lightweight LT (no E for “extract” needed), i.e. transformations, with optimised data formats as the final location for fast subsequent processing. Continuous and reliable data collection is vital for productionising the initial proof-of-concept pipelines.

Towards the end we will also look at the lower level APIs available for data consumption, rounding off the set of available tools for a Hadoop practitioner.

Session 3 – NoSQL and Hadoop

For certain use-cases there is an inherent need for something more “database” like compared to the features offered by the original Hadoop components, i.e. file system and batch processing. Especially for slow changing dimensions and entities in general there is a need for updating previously stored data as time progresses. This is where HBase, the Hadoop Database, comes in and allows for random reads and writes to existing rows of data, or entities in a table. 

We will dive into the architecture of HBase to derive the need for proper schema design, one of the key tasks implementing a HBase backed solution. Similar to the file formats from session 2, HBase allows to freely design table layouts which can lead to suboptimal performance. This session will introduce the major access patterns observed in practice and explain how they can play to HBase’s strengths. 

Finally a set of real-world examples will show how fellow HBase users (e.g. Facebook) have gone through various modification of their schema design before arriving at their current setup. Available open-source projects show further schema designs that will help coming to terms with this involved topic.

Session 4 – Analysing Big Data

The final session of the day tackles the processing of data, since so far we have learned mostly about the storage and preparation of data for subsequent handling. We will look into the existing frameworks atop of Hadoop and how they offer distinct (but sometimes also overlapping) functionalities. There are frameworks that run as separate instance but also higher level abstractions on top of those that help developers and data wranglers of all kinds to find their right weapon of choice.

Using all of the learned the user will then see how the various tools can be combined to built the promised reliable data processing pipelines, landing data in the Enterprise Data Hub and using automatisms to start the said subsequent processing without any human intervention. The closing information provided in this session will look into the external interfaces, such as JDBC/ODBC, enabling the visualisation of the computed and analysed data in appealing UI based tools.

Detailed Agenda:

  • Session 1 – Introduction to Hadoop
    • Introduction to Hadoop
      • Explain pedigree, history
      • Explain and show HDFS, MapReduce, Cloudera Manager
    • The Hadoop Ecosystem
      • Show need for other projects within Hadoop
      • Ingress, egress, random access, security
    • Cluster Design and Planning
      • Introduce concepts on how to scope out a cluster
      • Typical hardware configuration
      • Deployment methods 
  • Session 2 – Ingress and Egress
    • Explain Flume, Sqoop to load data record based or in bulk
    • Data formats and serialisation
      • SequenceFile, Avro, Parquet
    • Continuous data collection methods
    • Interfaces for data retrieval (lower level)
       
  • Session 3 – NoSQL and Hadoop
    • HBase Introduction
    • Schema design
    • Access patterns
    • Use-cases examples
       
  • Session 4 – Analysing Big Data
    • Processing frameworks
      • Explain and show MapReduce, YARN, Spark, Solr
    • High level abstractions
      • Hive, Pig, CrunchImpalaSearch
    • Datapipelines in Hadoop
      • Explain Oozie, Crunch
    • Access to data for existing systems
      • ODBC/JDBC

Full details of both BI Forum events can be found at the Rittman Mead BI Forum 2014 web page, and full agenda details for Brighton and Atlanta are on this blog post from last week.

Categories: BI & Warehousing

Preview of the Rittman Mead BI Forum in Atlanta

Rittman Mead Consulting - Fri, 2014-04-25 07:20

Mark has done a great job of previewing the upcoming content for both BI Forums, the one running locally for us in Atlanta, as well as the one in Brighton, UK. We have an exceptional Master Class this year with Lars George from Cloudera, including an introduction to the Cloudera Big Data stack with full details on building, loading and analyzing Hadoop clusters. The exact details of what’s covered, as well as the timetable for all speaker presentations, is listed here. Additionally, Mark posted on the two special presentations occurring at the two events: Maria Colgan on the In-Memory database option in Atlanta, and myself and Andrew Bond covering the latest iteration of Oracle’s Information Management Reference Architecture in Brighton. And finally, Mark also covered three presentations for the Atlanta event covering Advanced Visualizations and Mobility. Instead of rehashing all of that, I wanted to do a blog post diving a bit more into the Atlanta event, and some of the content not previously mentioned, especially those by Oracle. We’ve always had incredible representation from Oracle at the BI Forum, and we are very appreciate that the different teams consider our event to be so important in the community.

I wanted to start off by discussing the venue a bit: the Renaissance Hotel in Midtown Atlanta. It’s a modern, upscale Atlanta hotel in Midown that also has the amazing Rooftop 866 bar with incredible views of the city (those of you that have “socialized” with me over the years know I’ll be spending some time up there). I’m confident this will be our best venue to date.

homeimg

Before diving into the sessions that Oracle will be presenting in Atlanta, it seems prudent to give those folks a “warm and fuzzy” feeling, show our appreciation, and make them feel safe and sound. So here’s an image that many of our readers will already recognize; for those who don’t, I’m sure you’ll know it by heart when the two events conclude:

safe-harbour

Philippe Lions will be back again this year previewing the newest version of Sample App. For customers and partners who are like us at Rittman Mead, Sample App is a pivotal part of your OBIEE methodology. It allows us the ability to demonstrate anything from simple OBIEE analyses and dashboards, to some of the crazy mad-scientist stuff that Philippe’s team comes up with. If Oracle and Philippe didn’t design and build Sample App and keep it current, then we would have to build it ourselves. From my understanding, this will be the first time Philippe has previewed this content external to Oracle, so we are pleased and honored that he chose the BI Forum as the venue for this. It’s also worth noting that Philippe is a BI Forum veteran… he has never missed the Atlanta event since it’s inception four years ago.

We also have Jack Berkowitz, VP of Product Management for Business Analytics at Oracle, speaking on “Analytics Applications and the Cloud”. He’ll be discussing Oracle BI Applications (OBIA) in detail and the roadmap Oracle has for deploying those applications in the Cloud. I imagine that Jack will be giving the Wednesday night Keynote (as he did last year with Philippe), which is always a crowd-pleaser. Jack also spoke on the new Mobile Application Designer last year, so I imagine he will also be able to update us on that product even though his focus at Oracle has shifted. Also from Oracle we have Matt Bedin (another BI Forum veteran) talking about Oracle BI and Cloud, but with a focus on Oracle’s roadmap with regular Oracle Analytics in the Cloud, which equates to having a Cloud-optimized OBIEE running in Oracle’s Public Cloud. As this product is not yet generally available, attendees will get the scoop on where this product is going… and we might even get some hints on when to expect it.

We are excited to have Chris Lynskey, Senior Director, Product Management and Strategy at Oracle, making his first appearance at the BI Forum. He’ll be speaking on “Endeca Information Technology for Self-Service and Big Data”, so we’ll see Endeca’s positioning for structured and non-structured reporting on an adhoc basis. We’ll have several presentations that delve into Endeca, but it will be good to hear from Chris on this topic, as he was with Endeca prior to the acquisition by Oracle, and has been deeply involved with the 3.1 release. Rounding out Oracle’s participation is BI Forum newcomer Susan Cheung, Vice President of Product Management for Oracle TimesTen. Susan will be speaking on “TimesTen In-Memory Database for Analytics – Best Practices and Use Cases”. So it will be interesting to have both Susan and Maria Colgan at the Forum, so attendees will have a chance to see Oracle’s complete In-Memory strategy and roadmap at one setting.

The final session I’d like to discuss is an entry from yours truly on “ExtremeBI: Agile, Real-Time BI with Oracle Business Intelligence, Oracle Data Integrator and Oracle GoldenGate”. I know… it’s an incredibly long title… but I had to get in all the buzz words. I also rely heavily on the Information Management Reference architecture that Andrew Bond and I are presenting at the UK BI Forum, so my Atlanta session will be based around this newest release. I love this content, and I think it shows with my excitement level every time I present it. I describe an Agile methodology that utilizes Oracle’s BI stack to the fullest: integrating OBIEE, ODI and perhaps the most beneficial element: Oracle GoldenGate. For those organizations who are investigating ways to deliver content rapidly while also making the end user central to the development process, then this session is for you.

Manifesto for Agile Software Development

Their are still slots available at both venues, so feel free to contact me directly if you have questions about either event.

Categories: BI & Warehousing

Simple Data Manipulation and Reporting using Hive, Impala and CDH5

Rittman Mead Consulting - Thu, 2014-04-24 13:54

Althought I’m pretty clued-up on OBIEE, ODI, Oracle Database and so on, I’m relatively new to the worlds of Hadoop and Big Data, so most evenings and weekends I play around with Hadoop clusters on my home VMWare ESXi rig and try and get some experience that might then come in useful on customer projects. A few months ago I went through an example of loading-up flight delays data into Cloudera CDH4 and then analysing it using Hive and Impala, but realistically it’s unlikely the data you’ll analyse in Hadoop will come in such convenient, tabular form. Something that’s more realistic is analysing log files from web servers or other high-volume, semi-structured sources, so I asked Robin to download the most recent set of Apache log files from our website, and I thought I’d have a go at analysing them using Pig and Hive, and maybe the visualise the output using OBIEE (if possible, later on).

As I said, I’m not an expert in Hadoop and the Cloudera platform, so I thought it’d be interesting to describe the journey I went through, and also give some observations from myself on when to use Hive and when to use Pig; when products like Cloudera Impala could be useful, and also the general state-of-play with the Cloudera Hadoop platform. So the files I started off with were Apache weblog files, with 10 in total and sizes ranging from 350MB to around 2MB.

NewImage

Looking inside one of the log files, they’re in the standard Apache log file format (or “combined log format”), where the visitor’s IP address is recorded, the date of access, some other information and the page (or resource) they requested:

NewImage

What I’m looking to do is count the number of visitors on a day, which was the most popular page, what time of day are we most busy, and so on. I’ve got a Cloudera Hadoop CDH5.0 6-node cluster running on a VMWare ESXi server at home, so the first thing to do is log into Hue, the web-based developer admin tool that comes with CDH5, and upload the files to a directory on HDFS (Hadoop Distributed File System), the Unix-like clustered file system that underpins most of Hadoop.

NewImage

You can, of course, SFTP the files to one of the Hadoop nodes and use the “hadoop fs” command-line tool to copy the files into HDFS, but for relatively small files like these it’s easier to use the web interface to upload them from your workstation. Once I’ve done that, I can then view the log files in the HDFS directory, just as if they were sitting on a regular Unix filesystem.

NewImage

At this point though, the files are still “unstructured’ – just a single log entry per line – and I’ll therefore need to do something before I can count things like number of hits per day, what pages were requested and so on. At this beginners level, there’s two main options you can use – Hive, a SQL interface over HDFS that lets you select from, and do set-based transformations with, files of data; or Pig, a more procedural language that lets you manipulate file contents as a series of step-by-step tasks. For someone like myself with a relational data warehousing background, Hive is probably easier to work with but it comes with some quite significant limitations compared to a database like Oracle – we’ll see more on this later.

Whilst Hive tables are, at the most simplest level, mapped onto comma or otherwise-delimted files, another neat feature in Hive is that you can use what’s called a “SerDe”, or “Serializer-Deserializer”, to map more complex file structures into regular table columns. In the Hive DDL script below, I use this SerDe feature to have a regular expression parse the log file into columns, with the data source being an entire directory of files, not just a single one:

CREATE EXTERNAL TABLE apachelog (
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE
LOCATION '/user/root/logs';

Things to note in the above DDL are:

  • EXTERNAL table means that the datafile used to populate the Hive table sits somewhere outside Hive’s usual /user/hive/warehouse directory, in this case in the /user/root/logs HDFS directory.
  • ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ tells Hive to use the Regular Expressions Serializer-Deserializer to interpret the source file contents, and 
  • WITH SERDEPROPERTIES … gives the SerDe the regular expression to use, in this case to decode the Apache log format.

Probably the easiest way to run the Hive DDL command to create the table is to use the Hive query editor in Hue, but there’s a couple of things you’ll need to do before this particular command will work:

1. You’ll need to get hold of the JAR file in the Hadoop install that provides this SerDE (hive-contrib-0.12.0-cdh5.0.0.jar) and then copy it to somewhere on your HDFS file system, for example /user/root. In my CDH5 installation, this file was at opt/cloudera/parcels/CDH/lib/hive/lib/, but it’ll probably be at /usr/lib/hive/lib if you installed CDH5 using the traditional packages (rather than parcels) route. Also if you’re using a version of CDH prior to 5, the filename will be renamed accordingly. This JAR file then needs to accessible to Hive, and whilst there’s various more-permanent ways you can do this, the easiest is to point to the JAR file in an entry in the query editor File Resources section as shown below.

2. Whilst you’re there, un-check the “Enable Parameterization” checkbox, otherwise the query editor will interpret the SerDe output string as parameter references.

NewImage

Once the command has completed, you can click over to the Hive Metastore table browser, and see the columns in the new table. 

NewImage

Behind the scenes, Hive maps its table structure onto all the files in the /user/root/logs HDFS directory, and when I run a SELECT statement against it, for example to do a simple row count, MapReduce mappers, shufflers and sorters are spun-up to return the count of rows to me.

NewImage

But in its current form, this table still isn’t all that useful – I’ve just got raw IP addresses for page requesters, and the request date is a format that’s not easy to work with. So let’s do some further manipulation, creating another table that splits out the request date into year, month, day and time, using Hive’s CREATE TABLE AS SELECT command to transform and then load in one command:

CREATE TABLE apachelog_date_split_parquet
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS 
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
AS
SELECT host,
       identity,
       user,
       substr(time,9,4)  year,
       substr(time,5,3)  month,
       substr(time,2,2)  day,
       substr(time,14,2) hours,
       substr(time,17,2) secs,
       substr(time,20,2) mins,
       request,
       status,
       size,
       referer,
       agent
FROM   apachelog
;

Note the ParquetHive SerDe I’m using in this table’s row format definition – Parquet is a compressed, column-store file format developed by Cloudera originally for Impala (more on that in a moment), that from CDH4.6 is also available for Hive and Pig. By using Parquet, we potentially take advantage of speed and space-saving advantages compared to regular files, so let’s use that feature now and see where it takes us. After creating the new Hive table, I can then run a quick query to count web server hits per month:

NewImage

So – getting more useful, but it’d be even nicer if I could map the IP addresses to actual countries, so I can see how many hits came from the UK, how many from the US, and so on. To do this, I’d need to use a lookup service or table to map my IP addresses to countries or cities, and one commonly-used such service is the free GeoIP database provided by MaxMind, where you turn your IP address into an integer via a formula, and then do a BETWEEN to locate that IP within ranges defined within the database. How best to do this though?

There’s several ways that you can enhance and manipulate data in your Hadoop system like this. One way, and something I plan to look at on this blog later in this series, is to use Pig, potentially with a call-out to Perl or Python to do the lookup on a row-by-row (or tuple-by-tuple) basis – this blog article on the Cloudera site goes through a nice example. Another way, and again something I plan to cover in this series on the blog, is to use something called “Hadoop Streaming” – the ability within MapReduce to “subcontract” the map and reduce parts of the operation to external programs or scripts, in this case a Python script that again queries the MaxMind database to do the IP-to-country lookup.

But surely it’d be easiest to just calculate the IP address integer and just join my existing Hive table to this GeoIP lookup table, and do it that way? Let’s start by trying to do this, first by modifying my final table design to include the IP address integer calculation defined on the MaxMind website: 

CREATE TABLE apachelog_date_ip_split_parquet
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS 
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
AS
SELECT host,
      (cast(split(host,'\\.')[0] as bigint) * 16777216) 
     + (cast(split(host,'\\.')[1] as bigint) * 65535) 
     + (cast(split(host,'\\.')[2] as bigint) * 256) 
     + (cast(split(host,'\\.')[3] as bigint)) ip_add_int,
       identity,
       user,
       substr(time,9,4)  year,
       substr(time,5,3)  month,
       substr(time,2,2)  day,
       substr(time,14,2) hours,
       substr(time,17,2) secs,
       substr(time,20,2) mins,
       request,
       status,
       size,
       referer,
       agent
FROM   apachelog
;

Now I can query this from the Hive query editor, and I can see the IP address integer calculations that I can then use to match to the GeoIP IP address ranges.

NewImage

I then upload the IP Address to Countries CSV file from the MaxMind site to HDFS, and define a Hive table over it like this:

create external table geo_lookup (
  ip_start      string,
  ip_end        string,
  ip_int_start  int,
  ip_int_end    int,
  country_code  string,
  country_name  string
  )
row format DELIMITED 
FIELDS TERMINATED BY '|' 
LOCATION '/user/root/lookups/geo_ip';

Then I try some variations on the BETWEEN clause, in a SELECT with a join:

select a.host, l.country_name
from apachelog_date_ip_split a join geo_lookup l 
on (a.ip_add_int > l.ip_int_start) and (a.ip_add_int < l.ip_int_end)
group by a.host, l.country_name;

select a.host, l.country_name
from apachelog_date_ip_split_parquet a join geo_lookup l 
on a.ip_add_int between l.ip_int_start and l.ip_int_end;

.. which all fail, because Hive only supports equi-joins. One option is to use a Hive UDF (user-defined function) such as this one here to implement a GeoIP lookup, but something that’s probably a bit more promising is to switch over to Impala, which has the ability to do non-equality joins through the crossjoin feature (Hive can in fact also use cross-joins, but they’re not very efficient). Impala also has the benefit of being much faster for BI-type queries than Hive, and it’s also designed to work with Parquet, so let’s switch over to the Impala query editor, run the “invalidate metadata” command to re-sync it’s table view with Hive’s table metastore, and then try the join in there:

NewImage

Not bad. Of course this is all fairly simple stuff, and we’re still largely working with relational-style set-based transformations. In the next two posts in the series though I want get a bit more deep into Hadoop-style transformations – first by using a feature called “Hadoop Streaming” to process data on its way into Hadoop, done in parallel, by calling out to Python and Perl scripts; and then take a look at Pig, the more “procedural” alternative to Hive – with the objective being to enhance this current dataset to bring in details of the pages being requested, filter out the non-page requests, and do some work with authors, tag and clickstream analysis.

Categories: BI & Warehousing

Previewing Three Oracle Data Visualization Sessions at the Atlanta US BI Forum 2014

Rittman Mead Consulting - Tue, 2014-04-22 04:30

Many of the sessions at the UK and US Rittman Mead BI Forum 2014 events in May focus on the back-end of BI and data warehousing, with for example Chris Jenkins’ session on TimesTen giving us some tips and tricks from TimeTen product development, and Wayne Van Sluys’s session on Essbase looking at what’s involved in Essbase database optimisation (full agendas for the two events can be found here). But two areas within BI that have got a lot of attention over the past couple of years are (a) data visualisation, and (b) mobile, so I’m particularly pleased that our Atlanta event has three of the most innovative practitioners in this area – Kevin McGinley from Accenture (left in pictures below), Christian Screen from Art of BI (centre), and Patrick Rafferty from Branchbird (right), talking about what they’ve been doing in these areas.

NewImage

If you were at the BI Forum a couple of years ago you’ll of course know Kevin McGinley, who won “best speaker” award the previous year and most recently has gone on to organise the BI track at ODTUG KScope and write for OTN and his own blog, Oranalytics.blogspot.com. Kevin also hosts, along with our own Stewart Bryson, a video podcast series on iTunes called “Real-Time BI with Kevin & Stewart”, and I’m excited that he’s joining us again at this year’s BI Forum in Atlanta to talk about adding 3rd party visualisations to OBIEE. Over to Kevin…

“I can’t tell you how many times I’ve told someone that I can’t precisely meet a certain charting requirement because of a lack of configurability or variety in the OBIEE charting engine.  Combine that with an increase in the variety and types of data people are interested in visualizing within OBIEE and you have a clear need.  Fortunately, OBIEE is web-based tool and can leverage other visualization engines, if you just know how to work with the engine and embed it into OBIEE.

In my session, I’ll walk through a variety of reasons you might want to do this and the various approaches for doing it.  Then, I’ll take two specific engines and show you the process for building a visualization with them right in an OBIEE Analysis.  In both examples, you’ll come away with a capability you’ve never been able to do directly in OBIEE before.”

NewImage

Another speaker, blogger, writer and developer very-well known to the OBIEE community is Art of BI Software’s Christian Screen, co-author of the Packt book “Oracle Business Intelligence Enterprise Edition 11g: A Hands-On Tutorial” and developer of the OBIEE collaboration add-in, BITeamwork. Last year Christian spoke to us about developing plug-ins for OBIEE, but this year he’s returned to a topic he’s very passionate about – mobile BI, and in particular, Oracle’s Mobile App Designer. According to Christian:

“Last year Oracle marked its mobile business intelligence territory by updating its Oracle BI iOS application with a new look and feel. Unbeknownst to many, they also released the cutting-edge Oracle BI Mobile Application Designer (MAD). These are both components available as part of the Oracle BI Foundation Suite. But it is where they are taking the mobile analytics platform that is most interesting at the moment as we look at the mobile analytics consumption chain. MAD is still in its 1.x release and there is a lot of promise with this tool to satisfy the analytical cravings growing in the bellies of many enterprise organizations. There is also quite a bit of discussion around building new content just for mobile consumption compared to viewing existing content through the mobile applications native to major mobile devices.

The “Oracle BI Got MAD and You Should be Happy” session will discuss these topics and I’ll be sharing the stage with Jayant Sharma from Oracle BI Product Development where we’ll also be showing some cutting edge material and demos for Oracle BI MAD.  Because MAD provides a lot of flexibility for development customizations, compared to the Oracle BI iOS/Android applications, our session will explore business use cases around pre-built MAD applications, HTML5, mobile security, and development of plug-ins using the MAD SDK.  One of the drivers for this session is to show how many of the Oracle Analytics components integrate with MAD and how an Oracle BI developer can quickly leverage the capabilities of MAD to show the tool’s value within their current Oracle BI implementation.

We will also discuss the common concern of mobile security by touching on the BitzerMobile acquisition and using the central mobile configuration settings for Oracle BI Mobile. The crowd will hopefully walk away with a better understanding of Oracle BI mobility with MAD and a desire to go build something.”

NewImage

As well as OBIEE and Oracle Mobile App Designer, Oracle also have another product, Oracle Endeca Information Discovery, that combines a data aggregation and search engine with dashboard visuals and data discovery. One of the most innovative partner companies in the Endeca space are Branchbird, and we’re very pleased to have Branchbird’s Patrick Rafferty join us to talk about “More Than Mashups – Advanced Visualizations and Data Discovery”. Over to Patrick …

“In this session, we’ll explore how Oracle Endeca customers are moving beyond simple dashboards and charts and creating exciting visualizations on top of their data using Oracle Endeca Studio. We’ll discuss how the latest trends in data visualization, especially geospatial and temporal visualization, can be brought into the enterprise and how they drive competitive advantage.

This session will show in-production real-life examples of how extending Oracle Endeca Studio’s visualization capabilities to integrate technology like D3 can create compelling discovery-driven visualizations that increase revenue, cut cost and enhance the ability to answer unknown questions through data discovery.”

NewImage

The full agenda for the Atlanta and Brighton BI Forum agendas can be found on this blog post, and full details of both events, including registration links, links to book accommodation and details of the Lars George Cloudera Hadoop masterclass, can be found on the Rittman Mead BI Forum 2014 home page.

Categories: BI & Warehousing

Preview of Maria Colgan, and Andrew Bond/Stewart Bryson Sessions at RM BI Forum 2014

Rittman Mead Consulting - Wed, 2014-04-16 02:11

We’ve got a great selection of presentations at the two upcoming Rittman Mead BI Forum 2014 events in Brighton and Atlanta, including sessions on Endeca, TimesTen, OBIEE (of course), ODI, GoldenGate, Essbase and Big Data (full timetable for both events here). Two of the sessions I’m particularly looking forward to though are ones by Maria Colgan, product manager for the new In-Memory Option for Oracle Database, and another by Andrew Bond and Stewart Bryson, on an update to Oracle’s reference architecture for Data Warehousing and Information Management.

The In-Memory Option for Oracle Database was of course the big news item from last year’s Oracle Openworld, promising to bring in-memory analytics and column-storage to the Oracle Database. Maria is of course well known to the Oracle BI and Data Warehousing community through her work with the Oracle Database Cost-Based Optimizer, so we’re particular glad to have her at the Atlanta BI Forum 2014 to talk about what’s coming with this new feature. I asked Maria to jot down a few worlds for the blog on what she’ll be covering, so over to Maria:


NewImage“Given this announcement and the performance improvements promised by this new functionality is it still necessary to create a separate access and performance layer in your data warehouse environment or to run  your Oracle data warehouse  on an Exadata environment?“At Oracle Open World last year, Oracle announced the upcoming availability of the Oracle Database In-Memory option, a solution for accelerating database-driven business decision-making to real-time. Unlike specialized In-Memory Database approaches that are restricted to particular workloads or applications, Oracle Database 12c leverages a new in-memory column store format to speed up analytic workloads.

This session explains in detail how Oracle Database In-Memory works and will demonstrate just how much performance improvements you can expect. We will also discuss how it integrates into the existing Oracle Data Warehousing Architecture and with an Exadata environment.”

The other session I’m particularly looking forward to is one being delivered jointly by Andrew Bond, who heads-up Enterprise Architecture at Oracle and was responsible along with Doug Cackett for the various data warehousing, information management and big data reference architectures we’ve covered on the blog over the past few years, including the first update to include “big data” a year or so ago.

NewImage

Back towards the start of this year, Stewart, myself and Jon Mead met up with Andrew and his team to work together on an update to this reference architecture, and Stewart carried on with the collaboration afterwards, bringing in some of our ideas around agile development, big data and data warehouse design into the final architecture. Stewart and Andrew will be previewing the updated reference architecture at the Brighton BI Forum event, and in the meantime, here’s a preview from Andrew:

“I’m very excited to be attending the event and unveiling Oracle’s latest iteration of the Information Management reference architecture. In this version we have focused on a pragmatic approach to “Analytics 3.0″ and in particular looked at bringing an agile methodology to break the IT / business barrier. We’ve also examined exploitation of in-memory technologies and the Hadoop ecosystem and guiding the plethora of new technology choices.

We’ve worked very closely with a number of key customers and partners on this version – most notably Rittman Mead and I’m delighted that Stewart and I will be able to co-present the architecture and receive immediate feedback from delegates.”

Full details of the event, running in Brighton on May 7-9th 2014 and Atlanta, May 15th-17th 2014, can be found on the Rittman Mead BI Forum 2014 homepage, and the agendas for the two days are on this blog post from earlier in the week.

Categories: BI & Warehousing

Final Timetable and Agenda for the Brighton and Atlanta BI Forums, May 2014

Rittman Mead Consulting - Mon, 2014-04-14 07:00

It’s just a few weeks now until the Rittman Mead BI Forum 2014 events in Brighton and Atlanta, and there’s still a few spaces left at both events if you’d still like to come – check out the main BI Forum 2014 event page, and the booking links for Brighton (May 7th – 9th 2014) and Atlanta (May 14th – 16th 2014).

We’re also able now to publish the timetable and running order for the two events – session order can still change between now at the events, but this what we’re planning to run, first of all in Brighton, with the photos below from last year’s BI Forum.

Brighton

Brighton BI Forum 2014, Hotel

Seattle, Brighton

Wednesday 7th May 2014 – Optional 1-Day Masterclass, and Opening Drinks, Keynote and Dinner

  • 9.00 – 10.00 – Registration
  • 10.00 – 11.00 : Lars George Hadoop Masterclass Part 1
  • 11.00 – 11.15 : Morning Coffee
  • 11.15 – 12.15 : Lars George Hadoop Masterclass Part 2
  • 12.15 – 13.15 : Lunch
  • 13.15 – 14.15 : Lars George Hadoop Masterclass Part 3
  • 14.15 – 14.30 : Afternoon Tea/Coffee/Beers
  • 14.30 – 15.30 : Lars George Hadoop Masterclass Part 4
  • 17.00 – 19.00 : Registration and Drinks Reception
  • 19.00 – Late :  Oracle Keynote and Dinner at Hotel
Thursday 8th May 2014
  • 08.45 – 09.00 : Opening Remarks Mark Rittman, Rittman Mead
  • 09.00 – 10.00 : Emiel van Bockel : Extreme Intelligence, made possible by …
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Chris Jenkins : TimesTen for Exalytics: Best Practices and Optimisation
  • 11.30 – 12.30 : Robin Moffatt : No Silver Bullets : OBIEE Performance in the Real World
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Adam Bloom : Building a BI Cloud
  • 14.30 – 14.45 : TED: Paul Oprea : “Extreme Data Warehousing”
  • 14.45 – 15.00 : TED : Michael Rainey :  “A Picture Can Replace A Thousand Words”
  • 15.00 – 15.30 : Afternoon Tea/Coffee/Beers
  • 15.30 – 15.45 : Reiner Zimmerman : About the Oracle DW Global Leaders Program
  • 15.45 – 16.45 : Andrew Bond & Stewart Bryson : Enterprise Big Data Architecture
  • 19.00 – Late: Depart for Gala Dinner, St Georges Church, Brighton

Friday 9th May 2014

  • 9.00 – 10.00 : Truls Bergensen – Drawing in a New Rock on the Map – How will of Endeca Fit in to Your Oracle BI Topography
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Nicholas Hurt & Michael Rainey : Real-time Data Warehouse Upgrade – Success Stories
  • 11.30 – 12.30 : Matt Bedin & Adam Bloom : Analytics and the Cloud
  • 12.30 – 13.30 : Lunch13.30 – 14.30 : Gianni Ceresa : Essbase within/without OBIEE – not just an aggregation engine
  • 14.30 – 14.45 : TED : Marco Klaassens : “Speed up RPD Development”
  • 14.45 – 15:00 : TED : Christian Berg : “Neo’s Voyage in OBIEE:”
  • 15.00 – 15.30 : Afternoon Tea/Coffee/Beers
  • 15.30 – 16.30 : Alistair Burgess : “Tuning TimesTen with Aggregate Persistence”
  • 16.30 – 16.45 : Closing Remarks (Mark Rittman)
Then directly after Brighton we’ve got the US Atlanta event, running the week after, Wednesday – Friday, with last year’s photos below: Us

Atlanta BI Forum 2014, Renaissance Mid-Town Hotel, Atlanta

Wednesday 14th May 2014 – Optional 1-Day Masterclass, and and Opening Drinks, Keynote and Dinner

  • 9.00-10.00 – Registration
  • 10.00 – 11.00 : Lars George Hadoop Masterclass Part 1
  • 11.00 – 11.15 : Morning Coffee
  • 11.15 – 12.15 : Lars George Hadoop Masterclass Part 2
  • 12.15 – 13.15 : Lunch
  • 13.15 – 14.15 : Lars George Hadoop Masterclass Part 3
  • 14.15 – 14.30 : Afternoon Tea/Coffee/Beers
  • 14.30 – 15.30 : Lars George Hadoop Masterclass Part 4
  • 16.00 – 18.00 : Registration and Drinks Reception
  • 18.00 – 19.00 : Oracle Keynote & Dinner

Thursday 15th May 2014

  • 08.45 – 09.00 : Opening Remarks Mark Rittman, Rittman Mead
  • 09.00 – 10.00 : Kevin McGinley : Adding 3rd Party Visualization to OBIEE
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Richard Tomlinson : Endeca Information Discovery for Self-Service and Big Data
  • 11.30 – 12.30 : Omri Traub : Endeca and Big Data: A Vision for the Future
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Dan Vlamis : Capitalizing on Analytics in the Oracle Database in BI Applications
  • 14.30 – 15.30 : Susan Cheung : TimesTen In-Memory Database for Analytics – Best Practices and Use Cases
  • 15.30 – 15.45 : Afternoon Tea/Coffee/Beers
  • 15.45 – 16.45 : Christian Screen : Oracle BI Got MAD and You Should Be Happy
  • 18.00 – 19.00 : Special Guest Keynote : Maria Colgan : An introduction to the new Oracle Database In-Memory option
  • 19.00 – leave for dinner

Friday 16th May 2014

  • 09.00 – 10.00 : Patrick Rafferty : More Than Mashups – Advanced Visualizations and Data Discovery
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 12. 30 : Matt Bedin : Analytic Applications and the Cloud
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Philippe Lions : What’s new on 2014 HY1 OBIEE SampleApp
  • 14.30 – 15.30 : Stewart Bryson : ExtremeBI: Agile, Real-Time BI with Oracle Business Intelligence, Oracle Data Integrator and Oracle GoldenGate
  • 15.30 – 16.00 : Afternoon Tea/Coffee/Beers
  • 16.00 – 17.00 : Wayne Van Sluys : Everything You Know about Oracle Essbase Tuning is Wrong or Outdated!
  • 17.00 – 17.15 : Closing Remarks (Mark Rittman)
Full details of the two events, including more on the Hadoop Masterclass with Cloudera’s Lars George, can be found on the BI Forum 2014 home page.

Categories: BI & Warehousing

The Riley Family, Part III

Chet Justice - Thu, 2014-04-10 20:44


That's Mike and Lisa, hanging out at the hospital. Mike's in his awesome cookie monster pajamas and robe...must be nice, right? Oh wait, it's not. You probably remember why he's there, Stage 3 cancer. The joys.

In October, we helped to send the entire family to Game 5 of the World Series (Cards lost, thanks Red Sox for ruining their night).

In November I started a GoFundMe campaign, to date, with your help, we've raised $10,999. We've paid over 9 thousand dollars to the Riley family (another check to be cut shortly).

In December, Mike had surgery. Details can be found here. Shorter: things went fairly well, then they didn't. Mike spent 22 days in the hospital and lost 40 lbs. He missed Christmas and New Years at home with his family. But, as I've learned over the last 6 months, the Riley family really knows how to take things in stride.

About 6 weeks ago Mike started round 2 of chemo, he's halfway through that one now. He complains (daily, ugh) about numbness, dizziness, feeling cold (he lives in St. Louis, are you sure it's not the weather?), and priapism (that's a lie...I hope).

Mike being Mike though, barely a complaint (I'll let you figure out where I'm telling a lie).

Four weeks ago, a chilly (65) Saturday night, Mike and Lisa call. "Hey, I've got some news for you."

"Sweet," I think to myself. Gotta be good news.

"Lisa was just diagnosed with breast cancer."

WTF?

ARE YOU KIDDING ME? (Given Mike's gallows humor, it's possible).

"Nope. Stage 1. Surgery on April 2nd."

FFS

(Surgery was last week. It went well. No news on that front yet.)

Talking to them two of them that evening you would have no idea they BOTH have cancer. Actually, one of my favorite stories of the year...the hashtag for Riley Family campaign was #fmcuta. Fuck Mike's Cancer (up the ass). I thought that was hilarious, but I didn't think the Riley's would appreciate it. They did. They loved it. I still remember Lisa's laugh when I first suggested it. They've dropped the latest bad news and Lisa is like, "Oh, wait until you hear this. I have a hashtag for you."

"What is it?" (I'm thinking something very...conservative. Not sure why, I should know better by now).

#tna

I think about that for about .06 seconds. Holy shit! Did you just say tna? Like "tits and ass?"

(sounds of Lisa howling in the background).

Awesome. See what I mean? Handling it in stride.

"We're going to need a bigger boat." All I can think about now is, "what can we do now?"

First, I raised the campaign goal to 50k. This might be ambitious, that's OK, cancer treatments are expensive enough for one person, and 10K (the original amount) was on the low side. So...50K.

Second, Scott Spendolini created a very cool APEX app, ostensibly called the Riley Support Group (website? gah). It's a calendar/scheduling app that allows friends and family coordinate things like meals, young human (children) care and other things that most of us probably take for granted. Pretty cool stuff. For instance, Tim Gorman provides pizza on Monday nights (Dinner from pizza hut...1 - large hand-tossed cheese lovers, 1 - large thin-crispy pepperoni, 1 - 4xpepperoni rolls, 1 - cheesesticks).

Third. There is no third.

So many of you have donated your hard earned cash to the Riley family, they are incredibly humbled by, and grateful for, everyone's generosity. They aren't out of the woods yet. Donate more. Please. If you can't donate, see if there's something you can help out with (hit me up for details, Tim lives in CO, he's not really close). If you can't do either of those things, send them your prayers or your good thoughts. Any and all help will be greatly appreciated.
Categories: BI & Warehousing

Data Warehouse for Big Data: Scale-Up vs. Scale-Out

Dylan Wan - Thu, 2014-01-02 15:33

Found a very good paper: http://research.microsoft.com/pubs/204499/a20-appuswamy.pdf


This paper discuss if it is a right approach of using Hadoop as the analytics infrastructure.


It is hard to argue with the industry trend.  However, Hadoop is not
new any more.  It is time for people to calm down and rethink about the
real benefits.

Categories: BI & Warehousing