Rittman Mead Consulting
Well, it’s been awhile since I’ve posted one of our Rittman Mead Data Integration Tips, so I thought a recent challenge might be the next great candidate. I was working through the Kimball ETL Subsystems and the Error Event Schema, Subsystem 5 if you’re familiar with the methodology, and attempting to build the schema from Oracle Data Integrator 12c (ODI 12c) metadata tables. If you caught my presentation “A Walk Through the Kimball ETL Subsystems with Oracle Data Integration” at Oracle OpenWorld 2015, then you’ll know a bit more about where I’m coming from. You can still catch my presentation on this topic at both IOUG Collaborate16 and ODTUG KScope16 (being invited to speak at each of these events is always an honor). Now this Data Integration Tip isn’t solely related to the Kimball ETL Subsystems, though the solution did prove rather useful for my presentation (and upcoming article in the IOUG Select Journal). It’s actually an interesting twist in how Oracle Data Integrator mapping metadata is stored differently between the ODI 11g and ODI 12c repositories.
The challenge is to find the target table, or Datastore, for a given Mapping in ODI 12c, or Interface in ODI 11g. When I say “find”, I mean query the ODI work repository tables and return the target table, or tables in 12c, for a given Mapping. Luckily, I’m equipped with some guidance from our friends at My Oracle Support. If you look at support document Oracle Data Integrator 11g and 12c Repository Description (Doc ID 1903225.1) you’ll find the data dictionaries for both ODI11g (188.8.131.52.0+) and ODI12c (12.1.2, 12.1.3, & 12.2.1). These are invaluable resources from Oracle – though the may be works in progress and somewhat incomplete!
Let’s take a look first at ODI 11g and how simple things used to be. Back when Interfaces were the mechanism for extracting, transforming, and loading, we were allowed only 1 single target Datastore. Ahh, those were the good ‘ol days! Digging into the repository we really only had one place to look for the target table – SNP_POP. This table in the ODI 11g repository, SNP_POP (which essentially stands for Synopsis – Populate), contains a column I_TABLE. This identifying column represents the target table for that particular Interface. Here’s a query that ties it all together.
select p.pop_name interface_name, t.table_name target_table, m.cod_mod model_code from snp_pop p inner join snp_table t on p.i_table = t.i_table inner join snp_model m on t.i_mod = m.i_mod;
As you can see, the key to capturing the target table for an Interface is simply in the SNP_POP.I_TABLE column. Because there is only one target, we can easily figure it out.
Now, ODI 12c is where the real challenge lies. As you may know, with the move from Interfaces in 11g to flow based Mappings in 12c, we were allowed to do new and exciting things, such as load multiple target tables from a single Mapping. We may also have a case where a Datastore component maps to a Filter component, which then maps to another Datastore component, etc. As you can see in the image below, we can have lots of tables, and lots of tables that may be sources or targets, but we’re only interested in the final target table (or tables, for that matter!).
Ok, so let’s dig into the Work Repository now. It seems an ODI Mapping can’t be that much more difficult than an Interface, right? Well…
First, there are quite a few tables related to the Mapping itself.
SNP_MAPPING SNP_MAP_ATTR SNP_MAP_ATTR_INFO SNP_MAP_COMP SNP_MAP_COMP_TYPE SNP_MAP_CONN SNP_MAP_CP SNP_MAP_CP_ROLE SNP_MAP_DATA_TYPE SNP_MAP_EXPR SNP_MAP_EXPR_REF SNP_MAP_PROP SNP_MAP_PROP_DEF SNP_MAP_REF SNP_MAP_REF_PP
Whoa…we’ve got some work to do. Lucky for you, I’ve already done the work. Let’s look at how it all fits together. We can start with the Mapping (SNP_MAPPING) table. We also have different components on the Mapping, such as Lookups, etc, so we can join in the table SNP_MAP_COMP as well. Here’s the information we’ll be able to see with that simple join.
select ... from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping
That’s interesting, we’ve captured all components in the mapping. But there are still quite a lot of non-targets here. Ok, maybe if we add the connection points for each component we can find the input and output for each. Components each have an input connector point, allowing an output from a different component to flow into it.
select ... from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping inner join snp_map_cp cp on mc.i_map_comp = cp.i_owner_map_comp
That just added the connection point information for each component and whether it is an INPUT or OUTPUT. Not extremely useful by itself, so let’s dig a bit deeper. How about adding the SNP_MAP_REF table? This table seems to contain a reference to all types of other attributes about the Mapping and its Components. We also need to consider that Datastores, just as any other Component, will have both an input and output. Right now, the dataset shows both the input and output connectors for each Component. Let’s remove the input connection points to limit our result set.
select ... from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping inner join snp_map_cp cp on mc.i_map_comp = cp.i_owner_map_comp inner join snp_map_ref mr on mc.i_map_ref = mr.i_map_ref where cp.direction = 'O' --output connection point only
Joining the reference table has now allowed us to focus on the Datastores in the Mapping and their OUTPUT connection point. What I really want is to see only the Datastores that do not have their OUTPUT connection point connected to an INPUT connector. Therefore, if the OUTPUT is empty, it must be the target table!
select ... from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping inner join snp_map_cp cp on mc.i_map_comp = cp.i_owner_map_comp inner join snp_map_ref mr on mc.i_map_ref = mr.i_map_ref where cp.direction = ‘O' and --output connection point only. cp.i_map_cp not in (select i_start_map_cp from snp_map_conn) --not a starting connection point
The SNP_MAP_CONN, which stores the mapping connections, will allow me to limit the query to the components that only have an output, but not an input. The connections table will contain all component connections in the ODI 12c mappings. Here’s what we get as a result.
Hey, now we’re onto something here. In fact, this is what I was looking for! Target table(s) in a single Mapping in ODI12c. Not quite as simple as ODI11g, but with a bit of SQL and understanding of the repository tables, you can do it. Here’s the final query again, joining in the SNP_TABLE & SNP_MODEL tables to complete the dataset.
select m.name mapping_name, mr.qualified_name, mc.name datastore_alias, t.table_name target_table, mdl.cod_mod model_code from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping inner join snp_map_cp cp on mc.i_map_comp = cp.i_owner_map_comp inner join snp_map_ref mr on mc.i_map_ref = mr.i_map_ref inner join snp_table t on mr.i_ref_id = t.i_table inner join snp_model mdl on t.i_mod = mdl.i_mod where cp.direction = 'O' and --output connection point cp.i_map_cp not in (select i_start_map_cp from snp_map_conn) --not a starting connection point ;
Please let me know if you find this Data Integration Tip useful or if you have a better way of accessing the target table in a mapping. One of my Rittman Mead colleagues asked, “why not just use the ODI Java API?”. For accessing the ODI repository, I do prefer using some Groovy script and the API. But in this case, I’m interested in building out a dimensional schema and writing ETL to load the dimensions and facts, which lends itself to SQL rather than Groovy script.
As always, if you’re team needs help around Oracle Data Integrator, or Oracle Data Integration Solutions in general, drop us a line at firstname.lastname@example.org. Or feel free to reach out to me directly via email (email@example.com) or Twitter (@mRainey). Cheers!
The post Data Integration Tips: ODI 12c Repository Query – Find the Mapping Target Table appeared first on Rittman Mead Consulting.
Here at Rittman Mead we’ve been working with OBIEE 12c for some time now, as part of the beta programme and more recently with clients looking to get the most out an upgrade to OBIEE 12c. We’ve also been hard at work on our brand new OBIEE 12c training course. What we’ve seen in terms of the stability of OBIEE 12c has been pleasantly surprising. Anyone who’s worked with software long enough will be familiar with the reputation that first releases in general have for nasty bugs, and it’s probably fair to say that with the first release of 11g (184.108.40.206) this was proven out. With the first release of OBIEE 12c, however, we’re seeing a stable tool with very few issues so far.
That said…I’m going to demonstrate an issue here that is a bit of a nasty one. It’s nasty because the trigger for it appears innocuous, and what it breaks is one of the really new things in OBIEE 12c—the way in which the RPD is stored on disk and accessed by the BI Server. This makes it a bit of a tough one to get to the bottom of at first, but it’s a good excuse to go digging!Summary
If you open the RPD in online mode (use File –> Copy As and then use the Save option), the password on the server gets corrupted.
[nQSError: 43113] Message returned from OBIS [nQSError: 13042] Repository password is wrong
From this point on you cannot checkin any changes, and when you restart the BI Server it will fail to start up.Details
In OBIEE (12c, and before) it is possible to open the RPD as a straight binary file on disk (“Offline” mode), or by connecting directly to the BI Server and opening the copy that it is currently running (“Online mode”). Offline mode suits larger changes and development, with the RPD then being deployed onto the server once the development work is ready to be tested. Online mode is a good way for making changes on a dedicated dev server, minor changes on a shared server, or indeed just for viewing the RPD that’s currently being run.
Here’s what we’ve seen as the problem:
- Open RPD in online mode
- File -> Copy As
- Enter a password with which to protect the RPD being saved on disk.
- Do one of:
- File -> Close, and then when prompted to save changes click Yes
- File -> Save
- Click the Save icon on the toolbar
What happens now is two-fold:
- You cannot check in any changes made online—the check in fails with an error from the Administration Tool:
[nQSError: 43113] Message returned from OBIS [nQSError: 13042] Repository password is wrong
- The BI Server will fail on restart with the same error:
Opening latest versioned cached RPD for : /app/oracle/biee/bi/bifoundation/server/empty.rpd which is /app/oracle/biee/user_projects/domains/bi/bidata/service_instances/ssi/metadata/datamodel/customizations/liverpd.rpd_5 [nQSError: 13042] Repository password is wrong. [[
We saw this on SampleApp v511, as well as on vanilla installations of OBIEE. Versions on both were 220.127.116.11.0.
After we reported this to Oracle, they agreed it was a bug and have logged it as bug number 22682937, with no patch currently (February 10, 2016) available.Workaround
If you open the RPD online and use File -> Copy As, don’t hit save or check in, even if prompted by the Admin Tool. Close the RPD straightaway.
Often people will use File -> Copy As to take a copy of the current live RPD before doing some changes to it. At Rittman Mead, we’d always recommend using source control such as git to store all code including the RPD, and using this approach you obviate the need to open the RPD online simply to get the latest copy (because the latest copy is in source control).
You can also use the data-model-cmd downloadrpd option to download the actual live RPD—that’s exactly what this option is provided for.Solution – if BI Server (OBIS) has not yet been restarted
If you’ve hit this bug and are hitting “Repository password is wrong” when you try to checkin, and if the BI Server is still running, then redeploy the RPD using the
data-model-cmd uploadrpd tool. By redeploying the RPD the password appears to get sorted out.
If the BI Server is down, then this is not an option because it has to be running in order for
data-model-cmd uploadrpd to work.
At this point using
data-model-cmd uploadrpd is not possible because OBIS is not running and so the
data-model-cmd uploadrpd will fail with the error:
[oracle@demo ~]$ /app/oracle/biee/user_projects/domains/bi/bitools/bin/data-model-cmd.sh uploadrpd -I /home/oracle/rmoff.rpd -W Password01 -U weblogic -P Admin123 -SI ssi Service Instance: ssi Operation failed. An exception occurred during execution, please check server logs.
The only option from this point is to use
importServiceInstance to reset the service instance, either to an empty, SampleAppLite, or an existing
.bar export of your environment. For example:
This will enable OBIS to start up correctly, from which point the desired RPD can then be re-uploaded if required using
The easiest thing is to simply not use File -> Copy As in online mode. Whilst this on its own is fine, the UI means it’s easy to accidentally use the Save option, which then triggers this problem. Instead, use
data-model-cmd downloadrpd, and/or use source control so that you can easily identify the latest RPD that you want to develop against.
If you do hit this repository password corruption problem, then keep calm and don’t restart the BI Server—just re-upload the RPD using
data-model-cmd uploadrpd. If you have already uploaded the RPD, then you need to use
importServiceInstance to restore things to a working state.
As part of the diagnostics that we did to get to the bottom of this issue, we found some interesting things in OBIEE 12c, such as a web service endpoint for RPD upload/download, as well as the detailed workings of the RPD upload process and that infamous liverpd.rpd file. Stay tuned for a blog post on this and more soon! And in the meantime, be sure to get in touch with us to discuss how we can help you with your OBIEE systems, including OBIEE 12c upgrade, and OBIEE 12c training.
Greetings, readers! My name is Nick Padgett, and this is my first blog post with Rittman Mead. I started with Rittman Mead as an intern over a year ago while I was attending Kennesaw State, a local university, but I’ve graduated from both school and the internship. Since then I’ve been working on several interesting applications. Today, I would like to take some time to discuss developing one such application, our OBIEE Commentary tool, as well as my learning experiences working at Rittman Mead.
When I first started with Rittman Mead in October 2014, I was beginning my final year in a Computer Science program. When I joined, I knew very little about Business Intelligence but was eager to learn everything I could. Learning BI while still in school created a powerful synergy between my class studies and the real world. As I learned web development and security in class, I learned about data warehouses and agile development at work. In addition, I had the opportunity to practice my new knowledge and skills on a daily basis.
The most important lesson I learned during my time as an intern was the importance of developing primarily to meet the user requirements. This subject was often ignored in school, except for the obligatory Software Engineering course. But now, as a formal developer, I was always being reminded of the users we were developing for. All features for all projects were first qualified by asking “Is this what the user wants or needs?” or “Will this add value for the users?”
The first project I was assigned was the Rittman Mead commentary tool. As I assisted in development, the team always looked back to the initial requirements: “Users want commentary in OBIEE.” All development was focused in that direction, and any feature which hindered the goal, despite the technical impressiveness, was removed from the application.
Now, after spending many hours developing the commentary tool, the team is confident it has effectively answered the primary user requirements. We’re all very excited to provide this tool to our customers, and we are confident it will meet all the requirements we spent so long accumulating and developing for.The Primary Use-case: OBIEE Commentary
In a typical environment, users are required to leave the application to communicate with their peers. This often involves screenshots, long text descriptions of the subject, and a service such as email or Slack. However, forcing users to leave the application in order to do their job is less than optimum. What’s to keep them using the expensive tool you purchased for them if they have to leave it every few minutes?
Many organizations recognize this issue and seek to implement commentary themselves. Frequent approaches typically use either a text object on a dashboard, a write-back enabled input, or even strange iFrame solutions to achieve the desired capabilities. Obviously, these implementations have serious drawbacks, and may not be maintainable or feasible for many organizations.
The commentary tool we have developed uses none of the above approaches, but instead uses a standard web deployment, hosted in WebLogic, to provide similar capabilities. This means no RPD modifications, no requirement for dashboard designers to become involved, and no security threats with iFrames or Cross-Site Scripting (XSS) attacks, while still fulfilling the primary use case of adding commentary to your OBIEE dashboards.
Our tool allows for users to create several different types of comments, all displayed through standard HTML in a web browser. Comments can be added on a dashboard to provide commentary for the page as a whole, while other comments can be placed on reports, which will be shown anywhere the report is located. You can even submit replies to existing comments, allowing for comment threads over a particular subject. All of these comments are available for use without having to leave the application.A Second Use-case: Documentation
Some comments are highly structured and curated, containing a wealth of information. Some users like to add comments explaining how a measure is calculated, or a post-mortem for the launch of a new product. These comments are best classified as documentation, as they don’t serve to engage in a conversation, but to present information. In addition, there may be formal documentation that is required to be on a dashboard, and relates directly to a series of reports. The most common way to provide this is to supply links to external applications specifically suited for documentation. Among a myriad of issues pertaining to accessibility concerns, this approach forces users to leave the application, some of whom may never return.
As an alternative, documentation can be placed directly on a dashboard or report, but this leads to design problems. Having a twenty-page document on a dashboard is hardly an acceptable solution, even though this is the best place to put your documentation. In addition, users will have to write their documentation in HTML, or at least be able to understand it.
The commentary tool also allows for this use case. Documentation can be created or edited using a WYSIWYG (What You See Is What You Get) editor, similar to any standard document editor, and added to a “Table of Contents” on a dashboard. The Table of Contents lists all current documents available to view and displays them in a dedicated area. This provides access to any pertinent content directly on a dashboard, rather than forcing users to migrate to an external application, which is the entire point for having native commentary in the first place.A Third Use-case: Integration
A common concern may be the introduction of yet another channel for communication. For example, many organizations use Slack as a communication tool or Atlassian Confluence for documentation. Rittman Mead does not desire to replace these tools or make them obsolete to your organization. We acknowledge the fact you spend money on these applications, and they may be widely adopted within your organization.
Instead of forcing you to maintain several channels of communication, our commentary tool allows integrations with applications such as JIRA and Confluence. Comments can be submitted as JIRA issues, and documents can be synchronized with Confluence pages. Several other integrations are available by default, while additional implementations are available on a per-customer basis. Now, if users absolutely must leave the application, they can do it on their own terms, with the convenience of a few clicks.
As I have grown during my time here with Rittman Mead, so has our commentary application. While I learned how to cater to user requirements, the tool evolved into an extension of this knowledge and was developed with the intention to meet all customer needs. I am incredibly excited to see this product released soon, and I can’t wait for users to start getting more out of their BI applications.
More information on this tool will be coming very soon, so keep an eye on our blog and website for updates!
In addition to being a product manager at Rittman Mead, I consider myself to be a nerd of the highest order. My love of comic books, fantasy, sci-fi and general geekery began long before the word ‘Stark’ had anything to do with Robert Downey Jr or memes about the impending winter. As such, any chance to incorporate my hobbies into my work is a welcomed opportunity. For my next few blog entries, I’ve decided to construct a predictive classification model using comic book sales data whose eventual goal will be to build a model that can accurately predict whether a comic will rocket off the shelves or if it will be a sales dud. The first blog of the series shows some of the pitfalls that can come up when preparing your data for analysis. Data preparation, or data wrangling as it has come to be known, is an imperfect process that usually takes multiple iterations of transformation, evaluation and refactoring before the data is “clean” enough for analysis.
While the steps involved in data wrangling vary based on the state and availability of the raw data, for this blog I have chosen to focus on the gathering of data from disparate sources, the enrichment of that data by merging their attributes and the restructuring of it to facilitate analysis. Comic book sales data is readily available on the interwebs, however finding that data in a usable format proved to be a more difficult task. In the end, I had to resort to dreaded process of screen scraping the data from a comic research site. For those of you who are lucky enough be unfamiliar with it, screen scraping is the process of programmatically downloading HTML data and stripping away that formatting to make it suitable for use. This is generally used as a last resort because web sites are volatile creatures that are prone to change their appearance as often as my teenage kids do preparing to leave the house. However, for the purposes of this series, as my friend Melvin the handyman would often say, “We works with what we gots.”
This leads us to the first issue you may run into while wrangling your data. You have access to lots of data but it’s not pretty. So make it pretty. Working with raw data is like being a sculptor working with wood. Your job is not to change the core composition of the data to suit your purposes but to chip away at the excess to reveal what was there all along, a beautiful horse… er I mean insight. Sorry, I got lost in my analogy. Actually to expand on this analogy a bit, the first tool I pulled out of my toolbox for this project was Python, the Leatherman of programming languages. Python is fast, plays well with other technologies and most importantly in this case, Python is ubiquitous. Used for tasks ranging from process automation and ETL to gaming and academic pursuits, Python is truly a multipurpose tool. As such, if you have a functional need, chances are there is a native module or someone has already written a public library to perform that function. In my case, I needed some scripts to “scrape” HTML tables containing comic sales data and combine that data with other comic data that I retrieved elsewhere. The “other” data is metadata about each of the issues. Metadata is just data about data. In this case, information about who authored it, how it was sold, when it was published, etc… More on that later.
Luckily for me, the format of the data I was scraping was tabular, so extracting the data and transforming it into Python objects was a relatively simple matter of iterating through the table rows and binding each table column to the designated Python object field. There was still a lot of unnecessary content on the page that needs to be ignored, like the titles and all of the other structural tags, but once I found the specific table holding the data, I was able to isolate it. At that point, I wrote the objects to to a CSV file, to make the data easy to transport and to facilitate usability by other languages and/or processes.
The heavy lifting in this process was performed by three different Python modules: urllib2, bs4 and csv. Urllib2, as the name implies, provides functions to open URLs. In this case, I found a site that hosted a page containing the estimated issue sales for every month going back to the early 1990’s. To extract each month without manually updating the hardcoded URL over and over, I created a script that accepted MONTH and YEAR as arguments, month_sales_scraper.py
The response from the urlopen(url) function call was the full HTML code that is typically rendered by a web browser. In that format, it does me very little good, so I needed to employ a parser to extract the data from the HTML. In this context, a parser is a program that is used to read in a specific document format, break it down into its constituent parts while preserving the established relationships between those parts, and then finally provide a means to selectively access said parts. So an HTML parser would allow me to easily access all the <TD> column tags for a specific table within an HTML document. For my purposes, I chose BeautifulSoup, or bs4.
BeautifulSoup provided search functions that I used to find the specific HTML table containing the sales data and loop through each row, while using the column values to populate a Python object.
This Python object, named data, contains fields populated with data from different sources. The year and month are populated using the arguments passed to the module. The format field is dynamically set based on logic related to the rankings and the remaining fields are set based on their source’s position in the HTML table. As you can see, there is a lot of hard coded logic that would need to be updated, should the scraped site change their format. However, for now this logic gets it done.
The final step of this task was to write those Python objects to a CSV file. The python module, CSV, provides the function writerow(), which accepts an array as a parameter and writes each of the array elements as columns in the CSV.
My first pass raised the an exception because the title field contained unicode characters that the CSV writer could not handle.
To rectify this, I had to add a check for unicode and encoded the content as UTF-8. Unicode and UTF-8 are character encodings; meaning they provide a map computers use to identify characters. This includes alphabets and logographic symbols from different languages as well as common symbols like ®.
Additionally, there was the matter of reformatting the values of some of the numeric fields to allow math to be performed on them later(ie stripping ‘$’ and commas). Other than that, the data load went pretty smoothly. A file named (MONTH)_(YEAR).CSV was generated for each month. Each file turned out like so:
While this generated tens of thousands of rows of comic sales data, it was not enough. Rather, it had the volume but not the breadth of information I needed. In order to make an accurate prediction, I needed to feed more variables to the model than just the comic’s title, issue number, and price. The publisher was not relevant as I decided to limit this exercise to only Marvel comics and passing in the the estimated sales would be cheating, as rank is based on sales. So to enhance my dataset, I pulled metadata about each of the issues down from “the Cloud” using Marvel’s Developer API. Thankfully, since the API is a web service, there was no need for screen scraping.
Retrieving and joining this data was not as easy as one might think. My biggest problem was that the issue titles from the scraped source were not an exact match to the titles stored in the Marvel database. For example, the scraped dataset lists one title as ‘All New All Different Avengers”. Using their API to search the Marvel database with that title retrieved no results. Eventually, I was able to manually find it in their database listed as “All-New All-Different Avengers”. In other cases, there were extra words like “The Superior Foes of Spider-Man” vs “Superior Foes of Spider-Man”. So in order to perform a lookup by name, I needed to know the titles as they expected them. To do this I decided to pull a list of all the series titles whose metadata was modified during the timeframes for which I had sales data. Again, I ran into a roadblock. The Marvel API only allows you to retrieve up to 100 results per request and Marvel has published thousands of titles. To get around this I had to perform incremental pulls, segmented alphabetically.
Even then there were a few minor issues, as some letters like ‘S’ had more than 100 titles. To get around that I had to pull the list for ‘S’ titles sorted ascending and descending then combine the results, making sure to remove duplicates. So my advice on this one is be sure to read up on the limitations of any API you are using. It may enforce limits but you may be able to work around the limits using creative querying.
At this point I have my list of Marvel series titles, stored in some CSV files that I eventually combined into a single file, MarvelSeriesList.csv, for ease of use. Actually, I have more than that. While retrieving the series titles, I also pulled down the ID for the series and an appropriateness rating. Searching the API by ID will be much more accurate than name and the appropriateness rating may be useful when building out the prediction model. The next step was to iterate through each row of the CSVs we created from the sales data, find the matching ID from MarvelSeriesList.csv and use that ID to retrieve its metadata using the API.
If you remember, the point of doing that last step was that the titles stored in the sales data files don’t match the titles in the API, so I needed to find a way to join the two sources. Rather than writing cases to handle each of the scenarios (e.g. mismatched punctuation, extra filler words), I looked for a python library to perform some fuzzy matching. What I found was a extremely useful library called, Fuzzy Wuzzy. Fuzzy Wuzzy provides a function called extractOne() that allows you to pass in a term and compare it with an array of values. The extractOne() function will then return the term in the array that has the highest match percentage. Additionally, you can specify a lower bound for acceptable matches (ie. only return result where the match is >= 90%).
Again, it took a few passes to get the configuration to work effectively. The first time through about only about 65% of the titles in the sales file found a match. That was throwing away too much data for my liking so I had to look at the exceptions and figure out why the matches were falling through. One issue that I found was titles that tacked on the publication years in the Marvel database, like “All-New X-Men (2012)”, had a match score in the 80’s when matched against a sales title like, “All New X-Men”. This was a pretty consistent issue, so rather than lowering the match percentage, which could introduce some legitimate mismatches, I decided to strip the year, if present, on mismatches and run it through that matching process again. This got me almost there. The only other issue I ran into was Fuzzy Wuzzy had trouble matching acronyms/acrostics. So ‘S.H.E.I.L.D.’ had a match score in the 50’s when matching ‘SHIELD’. That’s because half the characters (periods) were missing. Since there were only two titles affected, I built a lookup dictionary of special cases that needed to be translated. For the purposes of this exercise, I would still have had enough matches to skip that step, but by doing it brought us up to 100% matching between the two sources. Once the matching function was working, I pulled out urllib2 and retrieved all the metadata I could for each of the issues.
The resulting files contained not only sales data (title, issue number, month rank, estimated sales), but information about the creative team, issue descriptions, characters, release dates and associated story arcs. This would be enough to get us started with building our predictive classification model.
That being said, there was still a lot of structural rearranging required to make it ready for the type of analysis I wanted to do, but we will deal with that in the next blog. Hopefully, you picked up some useful tips on how to combine data from different sources or at the very least found solace in knowing that while you may not be the coolest person in the world, somewhere out there is a grown man who still likes to read comic books enough to write a blog about it. Be sure to tune in next week, True Believers, as we delve into The Mysteries of R!
During a recent project installing Oracle BI Applications, I became much better acquainted with OPatch, Oracle’s standard tool for managing application patches. By acquainted, I mean how to troubleshoot when OPatch patching fails. Since, at last count, there are around 50 patches total for Oracle BI Applications 18.104.22.168.2, the first patching attempt may not apply all patches successfully. There are any number of reasons for a failure, like an extra slash at the end of a path, a misspelled word, Weblogic or NodeManager still running, or some other reason. We will take a look at the logs for each section, learn where additional logs can be found, and learn how to turn on OPatch debugging to better understand the issue. Then, following the ideas from a previous OPatch post by Robin, I’ll describe how to manually apply the patches with OPatch at the command line for any patches that weren’t applied successfully using the provided perl script.
*Disclaimers – Please read the readme files for patches and follow all Oracle recommendations. Patch numbers are subject to change depending on OS and OBIA versions. Commands and paths here are of the linux/unix variety, but there are similar commands available for Windows OS.Perl Script patching
Unzip the patch files to a patch folder. I have included the OBIEE patch as well.
unzip pb4biapps_22.214.171.124.2_.zip -d patches/
unzip pb4biapps_126.96.36.199.2_generic_1of2.zip -d patches/
unzip pb4biapps_188.8.131.52.2_generic_2of2.zip -d patches/
unzip p20124371_111170_.zip -d patches/
While installing the Oracle BI Applications versions 184.108.40.206. and up, patches get applied with a perl script called APPLY_PATCHES.pl. Following Oracle’s install documentation for 220.127.116.11 version of Oracle BI Applications here, there is a text file to modify and pass to the perl script. Both the perl script and the text file reside in the following directory: $ORACLE_HOME/biapps/tools/bin. In the text file, called apply_patches_import.txt, parameters are set with the path to the following directories:
WINDOWS_UNZIP_TOOL_EXE (only needed if running on Windows platforms)
Some pro tips to modifying this text file:
1. Oracle recommends you use the JDK in the ORACLE_BI1 directory.
2. Use ORACLE_BI1 as the ORACLE_HOME.
3. Ensure WORKDIR and PATCH_ROOT_DIR are writeable directories.
4. Don’t add a path separator at the end of the path.
5. Commented lines are safe to remove.
Then you run the APPLY_PATCHES.pl passing in the apply_patches_import.txt. If everything goes well, at the end of the perl script, the results will look similar to the following:
If this is the case, CONGRATULATIONS!!, you can move on to the next step in the install documentation. Thanks for stopping by and come back soon! However, if any patch or group of patches failed, the rest of this post is for you.Log file location
First, the above patching report does not tell you where to find the logs, regardless of success or failure. If you remember though, you set a WORKDIR path in the text file earlier. In that directory is where you will find the following log files:
Open the final_patching_report.log to determine first if all patches were applied and identify ones that were not successful. For example, looking that this log may show that the Oracle Common patches failed.
However, this doesn’t tell you what caused the failure. Next we will want to look into the oracle_common_generic_patches.log to gather more information.
From the $WORKDIR:
Here you will see the error, that a component is missing. Patch ######## requires component(s) that are not installed in OracleHome. These not-installed components are oracle.jrf.thirdparty.jee:18.104.22.168.0. Notice also that in this log there is a path to another log file location. The path is in the $COMMON_ORACLE_HOME/cfgtoollogs/opatch/ directory. This directory has more detailed logs specific to patches applied to oracle_common. Additionally, there are logs under $ORACLE_HOME/cfgtoollogs/opatch/, $WL_HOME/cfgtoollogs/opatch/, and $ODI_HOME/cfgtoollogs/opatch/. These locations are very helpful to know, so you can find the logs for each group of patches in the same relative path.
Going back to the above error, we are going to open the most recent log file listed in the $COMMON_ORACLE_HOME/cfgtoollogs/opatch/ directory.
The beginning of this log file has two very interesting pieces of information to take note of for use later. It has the actual OPatch command used, and it has a path to a Patch History file. Looks like we will have to page down in the file to find the error message.
Now we see our missing component error. Once the error occurs, the java program starts backing out and then starts cleanup by deleting the files extracted earlier in the process. This log does have more detail, but still doesn’t say much about the missing component. After some digging around on the internet, I found a way to get more detailed information out of OPatch by setting
export OPATCH_DEBUG=TRUE. After turning OPatch debugging on, run the OPatch command we found earlier that was at the top of the log. A new log file will be generated and we want to open this most recent log file.
Finally, the results now get me detailed information about the component and the failure.
Side Note: If you are getting this specific error, I’ll refer you back to a previous blog post that happened to mention making sure to grab the correct version of OBIEE and ODI. If you have a wrong version of OBIEE or ODI for the Oracle BI Apps version you are installing, unfortunately you won’t start seeing errors until you get to this point.Manually running Oracle BI Application patches
Normally, the error or reason behind a patch or group of patches failing doesn’t take that level of investigation, and the issue will be identified in the first one or two logs. Once the issue is corrected, there are a couple of options available. Rerunning the perl script is one option, but it will cycle through all of the patches again, even the ones already applied. There is no harm in this, but it does take longer than running the individual patches. The other option is to run the OPatch command at the command line. To do that, first I would recommend setting the variables from the text file. I also added the Oracle_BI1/OPatch directory to the PATH variable.
Next, unzip the patches in the required directory. For example, the $PATCH_FOLDER/oracle_common/generic might look like this after unzipping files:
Below are the commands for each group of patches:Oracle Common Patches:
$COMMON_ORACLE_HOME/OPatch/opatch napply $PATCH_FOLDER/oracle_common/generic -silent -oh $COMMON_ORACLE_HOME -id 16080773,16830801,17353546,17440204,18202495,18247368,18352699,18601422,18753914,18818086,18847054,18848533,18877308,18914089,19915810BIApps Patches:
opatch napply $PATCH_FOLDER/biappsshiphome/generic -silent -id 16913445,16997936,19452953,19526754,19526760,19822893,19823874,20022695,20257578ODI Patches:
/$ODI_HOME/OPatch/opatch napply $PATCH_FOLDER/odi/generic -silent -oh $ODI_HOME -id 18091795,18204886Operating Specific Patches:
opatch napply $PATCH_FOLDER/ -silent -id ,,Weblogic Patches:
$JAVA_HOME/bin/java -jar $PATCH_FOLDER/suwrapper/generic/bsu-wrapper.jar -prod_dir=$WL_HOME -install -patchlist=JEJW,LJVB,EAS7,TN4A,KPFJ,RJNF,2GH7,W3Q6,FKGW,6AEJ,IHFB -bsu_home=$MW_HOME/utils/bsu -meta=$PATCH_FOLDER/suwrapper/generic/suw_metadata.txt -verbose > $PATCH_FOLDER/weblogic_patching.log
Even though this is a very specific error as an example, understanding the logs and having the break-down of all of the patches will help with any number of patch errors at this step of the Oracle BI Applications installation. I would love to hear your thoughts if you found this helpful or if any part was confusing. Keep an eye out for the next Becky’s BI Apps Corner where I move on from installs and start digging into incremental logic and Knowledge Modules.
The post Becky’s BI Apps Corner: OBIA install Perl Script Patching and troubleshooting when they fail appeared first on Rittman Mead Consulting.
At Rittman Mead, we believe that people are more important than profit.
This manifests itself in two ways. First, we want to impact the world beyond data and analytics, and secondly, we want our employees to be able to contribute to organizations they believe are doing impactful work.
This year, we’ve put a Community Service requirement in place for all of our full-time employees.
We’ll each spend 40 hours this year serving with various nonprofits. Most of our team are already involved with some amazing organizations, and this “requirement” allows us to not only be involved after hours and on the weekends, but even during normal business hours.
We want to highlight a few team members and show how they’ve been using their Community Service hours for good.
Beth is our Finance Manager and she has been serving with Sawnee Women’s Club. Most of her work has been around getting sponsorship and donations for their annual silent auction. She’s also helped with upgrading a garden at the local high school, collecting toys and gift wrap for their Holiday House, and collecting prom dresses and accessories for girls in need.
Charles is the Managing Director of North America. He recently ran in the Dopey Challenge down at Disney World which means he ran a 5k, 10k, half marathon, and full marathon in 4 days. He did the run to raise funds for Autism Speaks. Charles was recognized as the third largest fundraiser for Autism Speaks at the Dopey Challenge!
David is our U.S. Business Development rep. He recently served with the nonprofit Hungry For A Day for their Thanksgiving Outreach. He flew up to Detroit the week of Thanksgiving and helped serve over 8,000 Thanksgiving dinners to the homeless and needy in inner city Detroit.
Andy is our Consulting Manager. Andy is a regular volunteer and instructor with Vine City Code Crew. VC3 works with inner city youth in Atlanta to teach them about electronics and coding.
Pete is a Principal Consultant. He is also involved as a volunteer and instructor with the aforementioned Code Crew. Pete has taught a course using Makey Makey electronic kits for VC3.
This is just a sample of what our team has done, but engaging in our local communities is something that Rittman Mead is striving to make an integral piece of our corporate DNA.
We can’t wait to show you how we’ve left our communities better in 2016!
The post Corporate Social Responsibility (Where Can We Serve?) appeared first on Rittman Mead Consulting.
With the release of OBIEE 12c we got a number of interesting new features on the front-end. We’re all talking about the cleaner look-and-feel, Visual Analyzer, and the ability to create data-mashups, etc.
While all this is incredibly useful, it’s one of the small changes you don’t hear about that’s truly got me excited. I can’t tell you how thrilled I am that we can finally save a column back to the web catalog as an independent object (to be absolutely fair, this actually first shipped with 22.214.171.124).
For the most part, calculations should be pushed back to the RPD. This reduces the complexity of the reports on the front-end, simplifies maintenance of these calculations, and ultimately assures that the same logic is used across the board in all dashboards and reports… all the logic should be in the RPD. I agree with that 110%… at least in theory. In reality, this isn’t always practical. When it comes down to it, there’s always some insane deadline or there’s that pushy team (ahem, accounting…) riding you to get their dashboard updated and migrated in time for year end, or whatever. It’s quite simply just easier sometimes to just code the calculation in the analysis. So, rather than take the time to modify the RPD, you fat finger the calculation in the column formula. We’ve all done it. But, if you spend enough time developing OBIEE reports and dashboards, sooner or later you’ll find that this is gonna come back to bite you.
Six months, a year from now, you’ll have completely forgotten about that calculation. But there will be a an org change, or a hierarchy was updated… something, to change the logic of that calculation and you’ll need make a change. Only now, you know longer remember the specifics of the logic you coded, and even worse you don’t remember if you included that same calculation in any of the other analyses you were working on at the time. Sound familiar? Now, a change that should have been rather straightforward and could have been completed in an afternoon takes two to three times longer as you dig through all your old reports trying to make sense of things. (If only you’d documented your development notes somewhere…)
Saving columns to the web catalog is that middle ground that gives us the best of both worlds… the convenience of quickly coding the logic on the front-end but the piece of mind knowing that the logic is all in one place to ensure consistency and ease maintenance.
After you update your column formula, click OK.
From the column dropdown, select the Save Column As option.
Save the column to the web catalog. Also, be sure to use the description field. The description is a super convenient place to store a few lines of text that your future self or others can use to understand the purpose of this column.
As an added bonus, this description field is also used when searching the web catalog. So, if you don’t happen to remember what name you gave a column but included a little blurb about the calculation, all is not lost.
Saved columns can be added from the web catalog.
Add will continue to reference the original saved column, so that changes to made to the saved column will be reflected in your report. Add Copy will add the column to your report, but future changes to the saved column will not be reflected.
One thing to note, when you add a saved column to a report it can no longer be edited from within the report. When you click on Edit Formula you will still be able to see the logic, but you will need to open and edit that saved column directly to make any changes to the formula.
Try out the saved columns, you’ll find it’s invaluable and will greatly reduce the time it takes to update reports. And with all that free time, maybe you’ll finally get to play around with the Visual Analyzer!
The post The best OBIEE 12c feature that you’re probably not using appeared first on Rittman Mead Consulting.