Skip navigation.

BI & Warehousing

Six Months an ACE…”Boy, that escalated quickly”

Rittman Mead Consulting - Mon, 2014-12-08 12:34

Boy, that escalated quickly. I mean, my path to becoming an Oracle ACE since joining Rittman Mead!

When I began with Rittman Mead back in March 2012, I wasn’t planning on joining the ACE program. In fact, I really wasn’t sure what an Oracle ACE even was. If you’re not familiar, the Oracle ACE program “highlights excellence within the global Oracle community by recognizing individuals who have demonstrated both technical proficiency and strong credentials as community enthusiasts and advocates”. That’s quite a mouthful! Now let’s find out how I got to this point in my career.

I wrote my first blog post ever shortly after joining Rittman Mead, sharing an innovative way to integrate Oracle Data Integrator and GoldenGate for data warehousing. This lead to submitting my first abstracts to several Oracle User Group conferences. My goal was to challenge myself to grow as a public speaker, something I had never been very good at in the past, and to share my data integration know-how, of course. To my amazement, the first abstract submitted, for UKOUG 2012, was accepted. Then, the other conferences continued to accept my abstracts! During the first couple of years, I kept blogging, speaking, wrote an article for RMOUG SQL>UPDATE Magazine, and joined in on a couple of ODTUG ODI expert webcasts and OTN ArchBeat podcasts over the past couple of years. Before I knew it, I had taken advantage of many knowledge sharing opportunities and built a decent list for my Oracle ACE nomination. I’ve found that networking and sharing your experiences are both wonderful ways to get noticed by other Oracle experts.

BI Forum 2014

So now what is an Oracle ACE to me? I think it is someone knowledgeable about an Oracle product (or products) who enjoys sharing their knowledge and loves contributing to the greater Oracle community. It’s that simple. One thing about the Rittman Mead organization, sharing what we know is a part of the ethos of the company. If you’re an avid reader this blog, you’re well aware that Mark Rittman and others love to share their technical knowledge. We also have an internal system that allows us to share our know-how amongst each other. That’s one of the great benefits of working at Rittman Mead. If I don’t know the answer to a question, I can easily reach out to the entire company and get a response from experts like Mark, James Coyle, Andy Rocha or others within minutes. I love that about this organization!

Well, around March or April timeframe, it was mentioned in a company meeting that if anyone has the goal of joining the ACE program, reach out to one of our current ACEs or ACE Directors and work on a plan to get there (again, a testament to how the RM organization works and thinks). I talked with Stewart Bryson, Oracle ACE Director. After reviewing my contribution to the Oracle community he agreed to submit the nomination for me. Again, to my amazement, and very much my delight, my nomination was accepted as an Oracle ACE! What began as several small goals of improving my writing and speaking skills, sharing technical content, and having an article published, led to the overall end goal (though not known to me at the time) of an Oracle ACE program invitation. I worked hard to earn the nomination, but I also know I was very fortunate to have been surrounded by, and mentored by, some great individuals (and I still am!).

oow14-badge-small

Since joining the ACE program, I’ve gained some additional Twitter followers, a great new ribbon on my conference badges, and some sweet ACE gear – but really it feels like not too much else has changed. Well, I do have an additional group of experts available to help me when I’m in a bind, because that’s what ACEs tend to do. I can now send a question via a tweet or email to someone in the ACE program and I’ll typically get an expert answer. It’s a community of like-minded individuals that love their specific Oracle technology – and love to share knowledge and help others. If this sounds like you, it’s time to work towards that ACE nomination!

Here at Rittman Mead we have several ACEs: myself, Venkat JanakiramanEdelweiss Kammermann, and our newest ACE, Robin Moffatt, as well as an ACE Director, Mark Rittman. And our ACEs don’t just sit back and blog or speak at conferences (not that blogging and speaking isn’t hard work – it is!). We also provide all of the Rittman Mead services offered as consultants: training, consulting services, managed services support – all of it! Which means that when you hire Rittman Mead, the 5 folks in the ACE program are included, along with the other 100+ BI experts throughout the world. If you need some help on your Business Intelligence, Data Integration, or Advanced Analytics project, drop us a line at info@rittmanmead.com and we’ll be happy to have a chat.

Now that I’ve met the goal of Oracle ACE, what’s ahead for me? Well, I’m constantly working to keep up on the latest and greatest from the Oracle Data Integration team, while also consulting full-time. I plan to continue speaking and writing blog posts, but also work towards some additional published content in newsletters and magazines for the Oracle community. And I’m going to keep learning, because education is a never-ending journey. “They’ve done studies, you know. 60 percent of the time, it works every time.” — Brian Fantana 

By the way, if you think Rittman Mead is the type of company you’d like to work for, give us a shout at careers@rittmanmead.com.

Categories: BI & Warehousing

Going Beyond MapReduce for Hadoop ETL Pt.2 : Introducing Apache YARN and Apache Tez

Rittman Mead Consulting - Mon, 2014-12-08 02:00

In the first post in this three part series on going beyond MapReduce for Hadoop ETL, I looked at how a typical Apache Pig script gets compiled into a series of MapReduce jobs, and those MapReduce jobs pass data between themselves by writing intermediate resultsets to disk (HDFS, the Hadoop cluster file system). As a reminder, here’s the Pig script we’re working with:

register /opt/cloudera/parcels/CDH/lib/pig/piggybank.jar
raw_logs = LOAD '/user/mrittman/rm_logs' USING TextLoader AS (line:chararray);
logs_base = FOREACH raw_logs
GENERATE FLATTEN
  (REGEX_EXTRACT_ALL(line,'^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)"')
)AS
  (remoteAddr: chararray, remoteLogname: chararray, user: chararray,time: chararray, request: chararray, status: chararray, bytes_string: chararray,referrer:chararray,browser: chararray);
logs_base_nobots = FILTER logs_base BY NOT (browser matches '.*(spider|robot|bot|slurp|bot|monitis|Baiduspider|AhrefsBot|EasouSpider|HTTrack|Uptime|FeedFetcher|dummy).*');
logs_base_page = FOREACH logs_base_nobots GENERATE SUBSTRING(time,0,2) as day, SUBSTRING(time,3,6) as month, SUBSTRING(time,7,11) as year, FLATTEN(STRSPLIT(request,' ',5)) AS (method:chararray, request_page:chararray, protocol:chararray), remoteAddr, status;
logs_base_page_cleaned = FILTER logs_base_page BY NOT (SUBSTRING(request_page,0,3) == '/wp' or request_page == '/' or SUBSTRING(request_page,0,7) == '/files/' or SUBSTRING(request_page,0,12) == '/favicon.ico');
logs_base_page_cleaned_by_page = GROUP logs_base_page_cleaned BY request_page;
page_count = FOREACH logs_base_page_cleaned_by_page GENERATE FLATTEN(group) as request_page, COUNT(logs_base_page_cleaned) as hits;
page_count_sorted = ORDER page_count BY hits DESC;
page_count_top_10 = LIMIT page_count_sorted 10;
posts = LOAD '/user/mrittman/posts.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() as (post_id:int,title:chararray,post_date:chararray,post_type:chararray,author:chararray,url:chararray,generated_url:chararray);
posts_cleaned = FOREACH posts GENERATE CONCAT(generated_url,'/') as page_url,author as author, title as title;
pages_and_post_details = JOIN page_count by request_page, posts_cleaned by page_url;
pages_and_posts_trim = FOREACH pages_and_post_details GENERATE page_count::request_page as request_page, posts_cleaned::author as author, posts_cleaned::title as title, page_count::hits as hits;
pages_and_posts_sorted = ORDER pages_and_posts_trim BY hits DESC;
pages_and_post_top_10 = LIMIT pages_and_posts_sorted 10;
store pages_and_post_top_10 into 'top_10s/pages';

When you submit the script for execution using the Pig client, it then parses the Pig Latin script, logically optimizes it and then compiles it into MapReduce programs; these programs are then sent in-turn to the Hadoop 1.0 Job Manager which then sends them for execution on the Hadoop cluster – in the case of our Pig script, there’s five MapReduce programs generated in-total.

NewImage

Each one of these MapReduce programs are what’s called “directed acyclic graphs”, or DAGs. A directed acyclic graph is a programming style within distributed computing where processing is broken down into functions that can be run independently of one another, as long as one is not an ancestor of another. In MapReduce terms, this means that all mappers can run independently of each other (and therefore on different nodes in a cluster) and it’s only the reducers that have to wait for their ancestors to finish before they can also start their work independently of the other reducers. It’s a great programming model for processing large amounts of data with fault tolerance across a cluster and it was the key insight that made MapReduce possible and the “big data” systems that we work with today.

NewImage

A Pig script that generates five MapReduce jobs then effectively creates five DAGs, with each one using files (HDFS) to persist and hand-off data between them and JVMs being spun-up for the individual map and reduce jobs. As such, there’s no way for this version of MapReduce and the Hadoop framework it runs on to consider the overall dataflow, and each DAG has to run in isolation.

NewImage

To address this issue, version 2.0 of Hadoop introduced a new feature called Apache YARN, or “yet another resource negotiator”. YARN took on the resource management and job scheduling/monitoring parts of Hadoop and made it so that YARN then effectively became an “operating system” for Hadoop that then allowed frameworks to run on it; initially MapReduce2 (reworked to run on YARN), but since then other ones like Apache Tez, and Apache Spark.

NewImage

YARN also crucially supported frameworks that used DAGs that describe the entire dataflow, not just individual MapReduce jobs, and a new framework that came out of this that used that new capability was Apache Tez. Tez is a generalisation of the MapReduce distributed compute framework that supports these dataflow-style DAGs and runs either MapReduce code unaltered, or has its own API for describing DAGs using vertexes  (logic and resources) or edges (connections). Both Hive and Pig have been ported to run on Tez, and in Pig’s case this means another type of compiler is added to the existing MapReduce one, and Pig scripts executed on Tez can now submit a DAG that encompasses all stages in the dataflow and the reducers and be linked-together via in-memory passing of intermediate steps, rather than having to write those intermediate steps to disk.

NewImage

In practice, what this means is that if your version of Pig or Hive has been updated to also run on Tez, you can run your code unaltered on Tez and typically see a 2-3x performance improvement without any changes to your code or application logic. Hortonworks have been the main Hadoop vendor backing Tez and their new Hortonworks Data Platform 2.2 comes with Tez support, so I took my Pig script and ran it on a five-node cluster to get an initial timing and it took 4 min 17 secs to run, again generating the same five MapReduce jobs that I saw on the Cloudera CDH5 cluster. Then, running it using Tez as the execution engine (pig -x tez analyzeblog.pig) the same script took 2mins 25secs to run, around twice as fast as when we ran it using regular MapReduce.

NewImage

 

And Tez is great for getting existing Pig and Hive scripts to run faster – if your platform supports it, you should use it instead of MapReduce as your execution engine; MapReduce code submitted “as is” will benefit from better YARN container re-use, and Hive and Pig scripts that run on the Tez execution engine can run as a single DAG and use memory, rather than disk, to pass data between jobs in the DAG. For ETL and analytic jobs that you’re creating from new though, Apache Spark is arguably the framework you should look to use instead of Tez, and tomorrow we’ll find out why.

Categories: BI & Warehousing

Going Beyond MapReduce for Hadoop ETL Pt.1 : Why MapReduce Is Only for Batch Processing

Rittman Mead Consulting - Sun, 2014-12-07 08:28

Over the previous few months I’ve been looking at the various ways you can load data into Hadoop, process it and then report on it using Oracle tools. We’ve looked at Apache Hive and how it provides a SQL layer over Hadoop, making it possible for tools like ODI and OBIEE to use their usual SQL set-based process approach to access Hadoop data; later on, we looked at another Hadoop tool, Apache Pig, which provides a more dataflow-type language over Hadoop for when you want to create step-by-step data pipelines for processing data. Under the covers, both Hive and Pig generate Java MapReduce code to actually move data around, with MapReduce then working hand-in-hand with the Hadoop framework to run your jobs in parallel across the cluster.

But MapReduce can be slow; it’s designed for very large datasets and batch processing, with overall analysis tasks broken-down into individual map and reduce tasks that start by reading data off disk, do their thing and then write the intermediate results back to disk again.

NewImage

Whilst this approach means the system is extremely fault-tolerant and effectively infinitely-scalable, this writing to disk of each step in the process means that MapReduce jobs typically take a long time to run and don’t really take advantage of the RAM that’s available in today’s commodity servers. Whilst this is a limitation most early adopters of Hadoop were happy to live with (in exchange for being able to analyse cheaply data on a scale previously unheard of), over the past few years as Hadoop adoption has broadened there’s been a number of initiatives to move Hadoop past it’s batch processing roots and into something more real-time that does more of its processing in-memory. Whilst there are whole bunch of projects and products out there that claim to improve the speed of Hadoop processing and bring in-memory capabilities – Apache Drill, Cloudera Impala, Oracle’s Big Data SQL are just some examples – the two that are probably of most interest to Hadoop customers working in an Oracle environment are called Apache Spark, and Apache Tez. But before we get into the details of Spark, Tez and how they improve over MapReduce, let’s take a look at why MapReduce can be slow.

MapReduce and Hadoop 1.0 – Scalable, Fault-Tolerant, but Aimed at Batch Processing

Going back to MapReduce and what’s now termed “Hadoop 1.0”, MapReduce works on the principle of breaking larger jobs down into lots of smaller ones, with each one running independently and persisting its results back to disk at the end to ensure data doesn’t get lost if a server node breaks down. To take an example, the Apache Pig script below reads in some webserver log files, parses and filters them, aggregates the data and then joins it to another Hadoop dataset before outputting the results to a directory in the HDFS storage layer:

register /opt/cloudera/parcels/CDH/lib/pig/piggybank.jar
raw_logs = LOAD '/user/mrittman/rm_logs' USING TextLoader AS (line:chararray);
logs_base = FOREACH raw_logs
GENERATE FLATTEN
  (REGEX_EXTRACT_ALL(line,'^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)"')
)AS
  (remoteAddr: chararray, remoteLogname: chararray, user: chararray,time: chararray, request: chararray, status: chararray, bytes_string: chararray,referrer:chararray,browser: chararray);
logs_base_nobots = FILTER logs_base BY NOT (browser matches '.*(spider|robot|bot|slurp|bot|monitis|Baiduspider|AhrefsBot|EasouSpider|HTTrack|Uptime|FeedFetcher|dummy).*');
logs_base_page = FOREACH logs_base_nobots GENERATE SUBSTRING(time,0,2) as day, SUBSTRING(time,3,6) as month, SUBSTRING(time,7,11) as year, FLATTEN(STRSPLIT(request,' ',5)) AS (method:chararray, request_page:chararray, protocol:chararray), remoteAddr, status;
logs_base_page_cleaned = FILTER logs_base_page BY NOT (SUBSTRING(request_page,0,3) == '/wp' or request_page == '/' or SUBSTRING(request_page,0,7) == '/files/' or SUBSTRING(request_page,0,12) == '/favicon.ico');
logs_base_page_cleaned_by_page = GROUP logs_base_page_cleaned BY request_page;
page_count = FOREACH logs_base_page_cleaned_by_page GENERATE FLATTEN(group) as request_page, COUNT(logs_base_page_cleaned) as hits;
page_count_sorted = ORDER page_count BY hits DESC;
page_count_top_10 = LIMIT page_count_sorted 10;
posts = LOAD '/user/mrittman/posts.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() as (post_id:int,title:chararray,post_date:chararray,post_type:chararray,author:chararray,url:chararray,generated_url:chararray);
posts_cleaned = FOREACH posts GENERATE CONCAT(generated_url,'/') as page_url,author as author, title as title;
pages_and_post_details = JOIN page_count by request_page, posts_cleaned by page_url;
pages_and_posts_trim = FOREACH pages_and_post_details GENERATE page_count::request_page as request_page, posts_cleaned::author as author, posts_cleaned::title as title, page_count::hits as hits;
pages_and_posts_sorted = ORDER pages_and_posts_trim BY hits DESC;
pages_and_post_top_10 = LIMIT pages_and_posts_sorted 10;
store pages_and_post_top_10 into 'top_10s/pages';

Pig works by defining what are called “relations” or “aliases”, similar to tables in SQL that contain data or pointers to data. You start by loading data into a relation from a file or other source, and then progressively define further relations take that initial dataset and apply filters, use transformations, re-orientate the data or join it to other relations until you’ve arrived at the final set of data you’re looking for. In this example we start with raw log data, parse it, filter out bit and spider activity, project just the columns we’re interested in and then remove further “noise” from the logs, then join it to reference data and finally return the top ten pages over that period based on total hits.

NewImage

Pig uses something called “lazy evaluation”, where relations you define don’t necessarily get created when they’re defined in the script; instead they’re used as a pointer to data and instructions on how to produce it if needed, with the Pig interpreter only materializing a dataset when it absolutely has to (for example, when you ask it to store a dataset on disk or output to console). Moreover, all the steps leading up to the final dataset you’ve requested are considered as a whole, giving Pig the ability to merge steps, miss out steps completely if they’re not actually needed to produce the final output, and otherwise optimize the flow data through the process.

Running the Pig script and then looking at the console from the script running the Grunt command-line interpreter, you can see that five separate MapReduce jobs were generated to load in the data, filter join and transform it, and then produce the output we requested at the end.

JobId                  Maps Reduces Alias                                                                                                               Feature Outputs
job_1417127396023_0145 12   2       logs_base,logs_base_nobots,logs_base_page,logs_base_page_cleaned,logs_base_page_cleaned_by_page,page_count,raw_logs GROUP_BY,COMBINER 
job_1417127396023_0146 2    1       pages_and_post_details,pages_and_posts_trim,posts,posts_cleaned                                                     HASH_JOIN 
job_1417127396023_0147 1    1       pages_and_posts_sorted                                                                                              SAMPLER 
job_1417127396023_0148 1    1       pages_and_posts_sorted                                                                                              ORDER_BY,COMBINER 
job_1417127396023_0149 1    1       pages_and_posts_sorted                                                                                              hdfs://bdanode1....pages2,

Pig generated five separate MapReduce jobs that loaded, parsed, filtered, aggregated and joined the datasets as part of an overall data “pipeline”, with the intermediate results staged to disk before the next MapReduce job took over. On my six-node CDH5.2 VM cluster it took just over five minutes to load, process and aggregate 5m records from our site’s webserver.

NewImage

Now the advantage of this approach is that its more or less infinitely scalable and certainly resilient, but whilst Pig can look at your overall dataflow “graph” and come up with an optimal efficient way to get to your end result, MapReduce treats every step as atomic and separate and insists on writing every intermediate step to disk before moving on.

What this means in-practice is that ETL routines that use Pig, Hive and MapReduce whilst scaling well, never really get to the point where you can run them as micro-batches or in real-time. For that type of scenario we need to look at moving away from MapReduce and breaking the link between Hadoop (the platform, the cluster management and resource handling part) and the processing that runs on it, so that we can run alternative execution engines on the Hadoop platform such as Apache Tez, which we’ll cover in tomorrow’s post.

Categories: BI & Warehousing

All You Need, and Ever Wanted to Know About the Dynamic Rolling Year

Rittman Mead Consulting - Thu, 2014-12-04 09:50

DRY_1

Among the many reporting requirements that have comprised my last few engagements has been what, from now on, I’ll be referring to as the ‘Dynamic Rolling Year’. That is, through the utilization of a dashboard prompt, the user is able to pick a year – month combination and the chosen view, in this case a line graph, will dynamically display a years’ worth of figures with the chosen year – month as the most current starting point. The picture above better demonstrates this feature. You can see that we’ve chosen the ‘201212’ year – month combination and as a result, our view displays revenue data from 201112 through to 201212. Let’s choose another combination and make sure things work properly and to demonstrate functionality.

DRY_2

From the results on the picture above, it seems we’ve gotten our view to do as expected. We’ve chosen ‘201110’ as our starting year – month combination and the view has changed to display revenue trend back to ‘201010’. Now how do we do this? A quick web search will offer up myriad complex and time-consuming solutions which, while completely feasible, can be onerous to implement. While it was tempting to reap the benefits of someone else’s perhaps hard fought solution, I just knew there had to be an easier way. As a note of caution, I’ll preface the next bit by saying that this only works on a dashboard through using a dashboard prompt in tandem with the target analysis. This solution was tested via a report prompt, however the desired results were not obtained, even when ‘hard coding’ default values for our presentation variables. As well, there are a few assumptions I’m making around what kinds of columns you have in your subject area’s time dimension, namely that you have some sort of Year and Month column within your hierarchy.

1. Establish Your Analysis and Prompt

DRY_3

I’ve brought in the Month (renamed it Year – Month) column from Sample App’s Sample Sales subject area along with the Revenue fact. For the sake of this exercise, we’re going to keep things fairly simple and straightforward. Our Month column exists in the ‘YYYY / MM’ format so we’re going have to dice it up a little bit to get it to where we need it to be. To get our dynamic rolling year to work, all we’re going to do is a little math. In a nutshell, we’re going to turn our month into an integer as ‘YYYYMM’ and simply subtract a hundred as a filter using the presentation variables that correspond to our new month column. To make the magic happen, bring in a dummy column from your time dimension, concatenate the year and month together, cast them as a char and then the whole statement as an integer (as seen in the following formula: CAST(substring(“Time”.”T02 Per Name Month”, 1, 4)||substring(“Time”.”T02 Per Name Month”, -2, 2) AS INT) ). Then, apply a ‘between’ filter to this column using two presentation variables.

DRY_4

Don’t worry that we’ve yet to establish them on our dashboard prompt as that will happen later. In the picture above, I’ve set the filter to be between two presentation variables entitled ‘YEARMONTH_VAR’. As you can see, I’ve subtracted a hundred as, in this case, my year – month concatenation exists in the ‘YYYYMM’ format. You may alter this number as needed. For example, a client already had this column established in their Time dimension, yet it was in the ‘YYYYMMM’ format with a leading zero before the month. In that case, we had to subtract 1000 in order to establish the rolling year. Moving on…. After you’ve set up the analysis, save it off and let’s build the dashboard prompt. The results tab will error out. This is normal. We’re going to be essentially doing the same thing here.

We need to get our analysis and prompt to work together (like they should) and so we’ll need to, once again, format the year – month column on the prompt the same way we did it on our analysis through the ‘prompt for column’ link. You can simply copy and paste the formula you used in the ‘Edit Formula’ column on your analysis into this screen. Don’t forget to assign a presentation variable to this column with whatever name you used on the filter in your analysis (in this case it was YEARMONTH_VAR).

DRY_6

DRY_7

DRY_8

2. Build the Dashboard

2014-12-04_10-04-56

Starting a new dashboard, bring in your prompt and analysis from the Shared Folder. In this exercise, I’ve placed the prompt below the analysis and have done some custom formatting. If you’d like to create this example exactly, then in addition to the default table view, you’ll need to create a line graph view that groups by our year – month column and uses the revenue column as the measure. And voila. Save and run the dashboard. Note that if you didn’t adopt a default value for your dashboard prompt, it will error out before you actually pick and apply a value from the prompt drop-down. Admittedly, the closer to a concatenated year-month (YYYYMM) column you have in your time dimension, the easier the implementation of this method will be. You could even set your default value to an appropriate server/repository variable so that your dashboard opens to the most current and / or a desired year-month combination. However, now that you’ve seen this, hopefully it’s sparked some inspiration into how you might apply this technique to other facets of your work and future reporting.

DRY_1

Categories: BI & Warehousing

Private or Public Training? (and a Discount Code)

Rittman Mead Consulting - Thu, 2014-12-04 08:22

Rittman Mead offers world class training that covers various Business Intelligence, Data Warehousing and Data Integration tools.

Rittman Mead currently offers two types of training, Public and Private – so what is the best option for your organization?

Both Private and Public options are staffed by the same highly qualified, experienced instructors. One thing to note is our instructors aren’t merely technology instructors, but they are real world consultants.

What does that mean for you? It means questions aren’t answered by the instructor flipping to a slide in a deck and reading off the best practice. You’ll probably hear an answer more like “I was on site with a company a few weeks ago who had the same issue. Let me tell you how we helped that client!”

Let’s start with Public courses. Our offerings include an OBIEE Bootcamp, an ODI Bootcamp, Exalytics for System Administrators, RPD Modeling and our all new OBIEE Front End Development, Design and Best Practices.

Why Public? If you have four or fewer attendees, the public classes are typically the most cost effective choice. You’ll attend the course along with other individuals from various public and private sector organizations. There’s plenty of time to learn the concepts, get your hands dirty by completing labs as well as getting all of your questions answered by our highly skilled instructors.

Now, our Private courses. There’s primarily two reasons a company would choose private training – cost, and the ability to customize.

Cost : Our Private Training is billed based on the instructor’s time not the number of attendees. If you have five or more individuals, it is most likely going to be a more cost effective option for us to come to you. Also, you’ll be covering travel costs for one instructor rather than your entire team.

Customization : Once you select the course that is best for your team, we’ll set up a call with a trainer to review your team’s experience level and your main goals. Based on that conversation, we can tweak the syllabus to make sure the areas you need extra focus are thoroughly covered.

One thing to note is we do offer a few courses (OWB 11gR2 & ODI 11g) privately that aren’t offered publicly.

We also offer a Custom Course Builder (http://www.rittmanmead.com/training/customobiee11gtraining/) which allows you to select modules out of various training classes and build a customized course for your team – this flexibility is a great reason to select Private Training.

Some of our clients love having their team get away for the week and focus strictly on training while others prefer to have training on site at their offices with the flexibility to cut away for important meetings or calls. Whichever program works best for your team, we have got you covered!

For more information, reach out to us at info@rittmanmead.com. If you’ve made it all the way to the end of this blog post, congratulations! As a way of saying thanks, you can mention the code “RMTEN” to save 10% on your next private or public course booking!

Categories: BI & Warehousing

VirtualBox and Windows driver verifier

Amardeep Sidhu - Wed, 2014-12-03 04:34

I was troubleshooting some Windows hangs on my Desktop system running Windows 8 and enabled driver verifier. Today when I tried to start VirtualBox it failed with this error message.

Failed to load VMMR0.r0 (VERR_LDR_MISMATCH_NATIVE)

Most of the online forums were asking to reinstall VirtualBox to fix the issue. But one of the thread mentioned that it was being caused by Windows Driver Verifier. I disabled it, restarted Windows and VirtualBox worked like a charm. Didn’t have time to do more research as i quickly wanted to test something. May be we can skip some particular stuff from Driver Verifier and VirutalBox can then work.

Categories: BI & Warehousing

Rittman Mead Founder Member of the Red Expert Alliance

Rittman Mead Consulting - Thu, 2014-11-20 05:24

At Rittman Mead we’re always keen to collaborate with our peers and partners in the Oracle industry, running events such as the BI Forum in Brighton and Atlanta each year and speaking at user groups in the UK, Europe, USA and around the world. We were pleased therefore to be contacted by our friends over at Amis in the Netherlands just before this year’s Openworld with an idea they had about forming an expert-services Oracle partner network, and to see if we’d be interested in getting involved.

Fast-forward a few weeks and thanks to the organising of Amis’ Robbrecht van Amerongen we had our first meeting at San Francisco’s Thirsty Bear, where Amis and Rittman Mead were joined by the inaugural member partners Avio Consulting, E-VIta AS, Link Consulting, Opitz Consulting and Rubicon Red.

Gathering of the expert alliance partners at #oow14 @Oracle open world. pic.twitter.com/PNK8fpt7iU

— Red Expert Alliance (@rexpertalliance) November 3, 2014

The Red Expert Alliance now has a website, news page and partner listing, and as the “about” page details, it’s all about collaborating between first-class Oracle consulting companies:

NewImage

“The Red Expert Alliance is an international  network of first-class Oracle consulting companies. We are  working together to deliver the maximum return on our customers investment in Oracle technology. We do this by collaborating, sharing and challenging each other to improve ourselves and our customers.

Collaborating with other companies is a powerful way to overcome challenges of today’s fast-paced world and improve competitive advantage. Collaboration provides participants mutual benefits such as shared resources, shared expertise and enhanced creativity; it gives companies an opportunity to improve their performance and operations, achieving more flexibility thanks to shared expertise and higher capacity. Collaboration also fuels innovation by providing more diversity to the workplace which can result in better-suited solutions for customers.”

There’s also a press release, and a partner directory listing out our details along with the other partners in the alliance. We’re looking forward to collaborating with the other partners in the Red Expert Alliance, increasing our shared knowledge and collaborating together on Oracle customer projects in the future.

Categories: BI & Warehousing

Bordering Text

Tim Dexter - Tue, 2014-11-18 16:08

A tough little question appeared on one of our internal mailing lists today that piqued my interest. A customer wanted to place a border around all data fields in their BIP output. Something like this:


Naturally you think of using a table, embedding the field inside a cell and turning the cell border on. That will work but will need some finessing to get the cells to stretch or shrink depending on the width of the runtime text. Then things might get a bit squirly (technical term) if the text is wide enough to force a new line at the page edge. Anyway, it will get messy. So I took a look at the problem to see if the fields can be isolated in the page as far as the XSLFO code is concerned. If the field can be siolated in its own XSL block then we can change attribute values to get the borders to show just around the field. Sadly not.

This is an embedded field YEARPARAM in a sentence.

translates to

 <fo:inline height="0.0pt" style-name="Normal" font-size="11.0pt" style-id="s0" white-space-collapse="false" 
  font-family-generic="swiss" font-family="Calibri" 
  xml:space="preserve">This is an embedded field <xsl:value-of select="(.//YEARPARAM)[1]" xdofo:field-name="YEARPARAM"/> in a sentence.</fo:inline>


If we change the border on tis, it will apply to the complete sentence. not just the field.
So how could I isolate that field. Well we could actually do anything to the field. embolden, italicize, etc ... I settled on changing the background color (its easy to change it back with a single attribute call.) Using the highlighter tool on the Home tab in Word I change the field to have a yellow background. I now have:

 This gives me the following code.

<fo:block linefeed-treatment="preserve" text-align="start" widows="2" end-indent="5.4pt" orphans="2"
 start-indent="5.4pt" height="0.0pt" padding-top="0.0pt" padding-bottom="10.0pt" xdofo:xliff-note="YEARPARAM" xdofo:line-spacing="multiple:13.8pt"> 
 <fo:inline height="0.0pt" style-name="Normal" font-size="11.0pt" style-id="s0" white-space-collapse="false" 
  font-family-generic="swiss" font-family="Calibri" xml:space="preserve">This is an embedded field </fo:inline>
  <fo:inline height="0.0pt" style-name="Normal" font-size="11.0pt" style-id="s0" white-space-collapse="false" 
   font-family-generic="swiss" font-family="Calibri" background-color="#ffff00">
    <xsl:attribute name="background-color">white</xsl:attribute> <xsl:value-of select="(.//YEARPARAM)[1]" xdofo:field-name="YEARPARAM"/> 
  </fo:inline> 
 <fo:inline height="0.0pt" style-name="Normal" font-size="11.0pt" style-id="s0" white-space-collapse="false" 
  font-family-generic="swiss" font-family="Calibri" xml:space="preserve"> in a sentence.</fo:inline> 
</fo:block> 

Now we have the field isolated we can easily set other attributes that will only be applied to the field and nothing else. I added the following to my YEARPARAM field:

<?attribute@inline:background-color;'white'?> >>> turn the background back to white
<?attribute@inline:border-color;'black'?> >>> turn on all borders and make black
<?attribute@inline:border-width;'0.5pt'?> >>> make the border 0.5 point wide
<?YEARPARAM?> >>> my original field

The @inline tells the BIP XSL engine to only apply the attribute values to the immediate 'inline' code block i.e. the field. Collapse all of this code into a single line in the field.
When I run the template now, I see the following:

 


Its a little convoluted but if you ignore the geeky code explanation and just highlight/copy'n'paste, its pretty straightforward.

Categories: BI & Warehousing

OBIEE SampleApp v406 Amazon EC2 AMI – available for public use

Rittman Mead Consulting - Thu, 2014-11-13 08:34

I wrote a while ago about converting Oracle’s superb OBIEE SampleApp from a VirtualBox image into an EC2-hosted instance. I’m pleased to announce that Oracle have agreed for us to make the image (AMI) on Amazon available publicly. This means that anyone who wants to run their own SampleApp v406 server on Amazon’s EC2 cloud service can do so.

2014-11-11_17-47-32

2014-11-12_09-15-41

Important caveats

Before getting to the juicy stuff there’s some important points to note about access to the AMI, which you are implicitly bound by if you use it:

  1. In accessing it you’re bound by the same terms and conditions that govern the original SampleApp
  2. SampleApp is only ever for use in your own development/testing/prototyping/demonstrating with OBIEE. It must not be used as the basis for any kind of Productionisation.
  3. Neither Oracle nor Rittman Mead provide any support for SampleApp or the AMI, nor warranty to any issues caused through their use.
  4. Once launched, the server will be accessible to the public and its your responsibility to secure it as such.
How does it work?
  1. Create yourself an AWS account, if you haven’t already. You’ll need your credit card for this. Read more about getting started with AWS here.
  2. Request access to the AMI (below)
  3. Launch the AMI on your AWS account
  4. Everything starts up automagically. After 15-20 minutes, enjoy your fully functioning SampleApp v406 instance, running in the cloud!
How much does it cost?

You can get an estimate of the cost involved using the Amazon Calculator.

As a rough guide, as of November 2014 an “m3.large” instance costs around $4 a day  – but it’s your responsibility to check pricing and commitments.

Be aware that once a server is created you’ll incur costs on it right through until you “terminate” it. You can “stop” it (in effect, power it off) which reduces the running costs but you’ll still pay for the ‘disk’ (EBS volume) that holds it. The benefit of this though is that you can then power it back up and it’ll be as you left it (just with a different IP).

You can track your AWS usage through the AWS page here.

Security
  • Access to the instance’s command line is through SSH as the oracle user using SSH keys only (provided by you when you launch the server) – no password access
    • You cannot ssh to the server as root; instead connect as oracle and use sudo as required.
    • The ssh key does not get set up until the very end of the first boot sequence, which can be 20 minutes. Be patient!
  • All the OBIEE/WebLogic usernames and passwords are per the stock SampleApp v406 image, so you are well advised to change them. Otherwise if someone finds your instance running, they’ll be able to access it
  • There is no firewall (iptables) running on the server. Since this is a public server you’d be wise to make use of Amazon’s Security Group functionality (in effect, a firewall at the virtual hardware level) to block access on all ports except those necessary.
    For example, you could block all traffic except 7780, and then enable access on port 22 (SSH) and 7001 (Admin Server) just when you need to access it for admin.
Using the AMI
  1. You first need to get access to the AMI, through the form below. You also need an active AWS account.
  2. Launch the server:
    1. From the AWS AMI page locate the SampleApp AMI using the details provided when you request access through the form below. Make sure you are on the Ireland/eu-west-1 region. Click Launch.
    2. Select an Instance Type. An “m3.large” size is a good starting point (this site is useful to see the spec of all instances).2014-11-11_22-42-48
    3. Click through the Configure Instance DetailsAdd Storage, and Tag Instance screens without making changes unless you need to.
    4. On the Security Group page select either a dedicated security group if you have already configured one, or create a new one.
      A security group is a firewall that controls traffic to the server regardless of any software firewall configured or not on the instance. By default only port 22 (SSH) is open, so you’ll need to open at least 7780 for analytics, and 7001 too if you want to access WLS/EM as well
      Note that you can amend a security group’s rules once the instance is created, but you cannot change which security group it is bound to. For ad-hoc purposes I’d always use a dedicated security group per instance so that you can change rules just for your server without impacting others on your account.
      2014-11-11_22-47-33
    5. Click on Review and Launch, check what you’ve specified, and then click Launch. You’ll now need to either specific an existing SSH key pair, or generate a new one. It’s vital that you get this bit right, otherwise you’ll not be able to access the server. If you generate a new key pair, make sure you download it (it’ll be a .pem file). 2014-11-11_22-49-29
    6. Click Launch Instances
      2014-11-11_22-51-09You’ll get a hyperlinked Instance ID; click on that and it’ll take you to the Instances page filtered for your new server.
      2014-11-11_22-52-01
      Shortly you’ll see the server’s public IP address shown.
      2014-11-11_22-52-09
  3. OBIEE is configured to start automagically at boot time along with the database. This means that in theory you don’t need to actually access the server directly. It does take 15-20 minutes on first boot to all fire up though, so be patient.
  4. The managed server is listening on port 7780, and admin server on 7001. If your server IP is 42.42.42.42 the URLs would be:
    • Analytics: http://42.42.42.42:7780/analytics
    • WLS: http://42.42.42.42:7001/console
    • EM: http://42.42.42.42:7001/em
On the server

The server is a stock SampleApp v406 image, with a few extras:

  • obiee and dbora services configured and set to run at bootup. Control obiee using:
    sudo service obiee status
    sudo service obiee stop
    sudo service obiee start
    sudo service obiee restart
  • screen installed with a .screenrc setup
Accessing the AMI

To get access to the AMI, please complete this short form and we will send you the AMI details by email.

By completing the form and requesting access to the AMI, you are acknowledging that you have read and understood the terms and conditions set out by Oracle here.

Categories: BI & Warehousing

Rittman Mead anuncia su catálogo de cursos en Español y en Portugués.

Rittman Mead Consulting - Wed, 2014-11-12 07:55

Spanish_Portuguese_Training

UnitedKingdom-ukflag brazil-flag

Tenemos el agrado de anunciarles que desde ahora Rittman Mead ofrece su catálogo completo de cursos en Español y en Portugués.  Esta es una gran noticia para quienes viven en América Latina, España y Portugal, ya que ahora pueden recibir la mejor capacitación, incluyendo el material teórico/práctico, en su idioma local.

Los cursos se dictan tanto en forma remota, con clases virtuales en vivo a través de nuestra plataforma web como también en forma presencial. De ambas maneras, cada estudiante recibirá el material teórico/práctico en forma electrónica y tendrá acceso durante el curso a una máquina virtual de uso exclusivo, para realizar las prácticas.

Ofrecemos un amplia variedad de cursos de Oracle Business Intelligence y tecnologías de Data Warehousing: desde cursos intensivos de 5 días (bootcamps)  a conjunto de cursos específicos orientados por rol de trabajo. Acceda al catálogo completo en Español, Portugués o Inglés.

¿Está interesado en nuestros cursos o quiere ser nuestro partner en capacitación? No dude en consultarnos al mail training@rittmanmead.com

Categories: BI & Warehousing

BI Applications in Cloud

Dylan's BI Notes - Mon, 2014-10-06 18:28
Prepackaged analytics applications are available as cloud services. The idea is that the client company does not need to use their own hardware and does not need to install the software or apply patches by themselves. What they need is just simply the browsers. For the end users, there should not be much difference.   The BI apps built […]
Categories: BI & Warehousing

Working Capital and Inventory Management

Dylan's BI Notes - Sun, 2014-09-28 23:24
The importance of the inventory management can be best described via working capital management. Cash is the blood of the company. Cash conversion cycle is measured as ( ( Account Receivables + Inventory – Account Payables )  / (Total Revenue / 365) These Youtube videos from Mark West from Redstack Analytics describes this CCC metric very […]
Categories: BI & Warehousing

Supply Chain Questions

Dylan's BI Notes - Sun, 2014-09-28 22:42
The key question are about inventory and orders. What is our inventory status? by product and by location? Do we have enough inventory on hand and in process to meet customer demand? How can we maximize customer satisfaction while minimizing cost and size of inventory on hand? Both over stocking and under stocking are bad […]
Categories: BI & Warehousing

Data Warehouse for Big Data: Scale-Up vs. Scale-Out

Dylan Wan - Thu, 2014-01-02 15:33

Found a very good paper: http://research.microsoft.com/pubs/204499/a20-appuswamy.pdf


This paper discuss if it is a right approach of using Hadoop as the analytics infrastructure.


It is hard to argue with the industry trend.  However, Hadoop is not
new any more.  It is time for people to calm down and rethink about the
real benefits.

Categories: BI & Warehousing