Skip navigation.

Feed aggregator

Logging Actual Application User Names for Oracle E-Business Suite, SAP, PeopleSoft, and OBIEE

Knowing which person, not just which database account, has been a challenge for database logging and auditing when working with enterprise software applications such as the Oracle E-Business Suite, SAP, PeopleSoft, and OBIEE.  Knowing which application user did what and when is now much easier because of adoption of standard Oracle functionality.

Standard functionality of Oracle database is the CLIENT_IDENTIFER attribute.  The CLIENT_IDENTIFIER is a predefined attribute of the built-in application context namespace, USERENV, and can be used to capture the application user name.

CLIENT IDENTIFIER is set using the DBMS_SESSION.SET_IDENTIFIER procedure to store the application username.  The CLIENT IDENTIFIER attribute is one the same as V$SESSION.CLIENT_IDENTIFIER.  Once set, you can query V$SESSION or select sys_context('userenv','client_identifier') from dual.

The table below offers examples of how CLIENT_IDENTIFIER is now being used by the Oracle E-Business Suite, SAP, and PeopleSoft. If you are running one of these software packages, Integrigy highly recommends that you incorporate the information that the CLIENT_IDENTIFIER provides into your logging and auditing solution.



Application Usage


E-Business Suite

As of Release 12, the Oracle E-Business Suite automatically sets and updates client_identifier to the FND_USER.USERNAME of the user logged on.  Prior to Release 12, follow Support Note How to add DBMS_SESSION.SET_IDENTIFIER(FND_GLOBAL.USER_NAME) to FND_GLOBAL.APPS_INITIALIZE procedure (Doc ID 1130254.1)



Starting with PeopleTools 8.50, the PSOPRID is now additionally set in the Oracle database CLIENT_IDENTIFIER attribute. 


With SAP version 7.10 above, the SAP user name is stored in the CLIENT_IDENTIFIER.

Oracle Business Intelligence Enterprise Edition(OBIEE)

When querying an Oracle database using OBIEE the connection pool username is passed to the database.  To also pass the middle-tier username, set the user identifier on the session.  To do this in OBIEE, open the RPD, edit the connection pool settings and create a new connection script to run at connect time.  Add the following line to the connect script:


If you have questions, please contact us at

Reference Tags: AuditingOracle DatabaseOracle E-Business SuiteOracle PeopleSoftSAPOracle Business Intelligence (OBIEE)
Categories: APPS Blogs, Security Blogs

Digital Learning: LVC customer quotes

The Oracle Instructor - Mon, 2014-11-03 03:34


Here’s a collection of customer quotes as a follow-up to my last post about the importance of attitude towards Live Virtual Classes (LVCs). They are from courses that I have taught personally this year with an average delivery score of about 96%:

Oracle Database 12c: Managing Multitenant Architecture

Oracle Grid Infrastructure 11g: Manage Clusterware and ASM

Oracle Database 12c: New Features for Administrators

Exadata Database Machine Administration Workshop

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated

“My first experience of an LVC. Pleased to say it was very positive. Introduction and start on Monday morning was smooth. I would recommend to my colleagues.”

Always important to make a good first impression!

“The whole LVC package just worked. From the comfort of my own environment with a great instructor makes for happy learning :)”

And that is exactly what we strive to deliver.

“Both, host and producer were very professional and guided the students through the course.”

An LVC producer takes care for all technical aspects apart from the course itself, like access to the learning platform. The instructor appears as “host” on the learning platform.

“Instructor professionally answered students’ questions and kept up a positive mood in the community!”

LVCs can be funny too :-)

“I appreciate the way how the course was presented. Very well controlled time, organization of presentation, exercises. Interaction with us was great. Always ready to answer a question, give an examples to difficult topic, illustrating topics.”

So much about allegedly missing interaction in LVCs.

“I work few years on RAC databases, my knowledge was not so clear regarding some topic on RAC and Grid after completing this training I’m sure that I will handle our RAC and Grid environment differently and for sure will have positive impact in our production environment. Great thank!”

You cannot top that with a classroom course either :-)

“LVC is offering great and flexible way to gain knowledge without travel or hotels etc.” “LVCs reduce travel costs and help the students to manage their time on their own, i.e. to join the classes from home and focus on the presented content.”

Trust me, I didn’t make up the last two although they may sound like your manager talking – or mine, for that matter ;-)

Tagged: Digital Learning, LVC
Categories: DBA Blogs

Annonce : Web Séminaire DBaaS

Jean-Philippe Pinte - Mon, 2014-11-03 03:32

Accélérer considérablement vos délais de mise à disposition des services DB avec une infrastructure de Cloud Privé pour vos bases de données
(Database as a service).

Mardi 18 novembre 2014 ; de 11h à 12h

Les solutions Oracle pour bâtir un Cloud Privé permettent de transformer les organisations IT en apportant une architecture agile, évolutive et mutualisée permettant de répondre à leurs besoins et objectifs IT.
Inscrivez-vous au webinar du 18 novembre 2014 pour comprendre la solution de cloud privé Database, les caractéristique, les enjeux, les bénéfices, etc...
En effet, lors de ce webinar, vous nous présenterons:
  • Les principales fonctionnalités de la solution : automatisation, portail libre service, catalogue de services, snapclone, mesure et facturation de l’usage.
  • Les principaux bénéfices :  Réduction du délai de mise à disposition des services DB : jours/semaines è minutes/heures, réduction drastique des volumétries Hors Production, Plus d’agilité, Meilleur contrôle des ressources, Consolidation.
  • Une démonstration de notre solution DBaaS Private Cloud.
  • Des exemples Clients.
Ce webinar s’adresse à toute personne impliquée dans la planification, le déploiement et l’administration d’un cloud privé mais également les équipes Hors Production exprimant le besoin de services DB : DBAs, Administrateurs systèmes, Architectes techniques, Ingénieurs qualité, Développeurs, Chefs de projet applicatif/Web, Testeurs etc...
Inscrivez-vous dès maintenant en envoyant un e-mail à .

MdsMetadataResourceProvider Error in ADF

Andrejus Baranovski - Mon, 2014-11-03 02:59
If you have migrated ADF application to ADF version recently, you may see error in the log related to the internal MetadataResourceXmlImpl class usage. Runtime behaviour will not be affected, but there could be many redundant error messages in the log, like this one:

Error UNEXPECTED_CLIENT_OBJECT_TYPE found oracle.adf.controller.internal.metadata.xml.MetadataResourceXmlImpl, points to the ADF internal class usage, instead of public. You must look through the source code and you should find illegal internal imports for MetadataService and TaskFlowDefinition classes. Typically ADF developer would use such classes in the previous ADF versions to check if specific ADF TF exists:

In ADF you can use public classes for MetadataService and TaskFlowDefinition, instead of private. To fix the error, you only need to change class import:

Here you can download sample application with correct API usage -

Oracle BI Cloud Service for SaaS Application Reporting Part 1: Integrating BICS to using REST APIs

Rittman Mead Consulting - Mon, 2014-11-03 01:30

Last month Mark Rittman covered a series of posts detailing the Oracle BI Cloud Service (BICS), aimed at departmental users who want the power of OBIEE 11g without the need to stand-up their own infrastructure. If you’re coming in late, here’s the link to the series.

Before the GA Rittman Mead participated in the beta program for release one of Oracle’s Business Intelligence Cloud Service (BICS), the global aim of the beta was both to understand the capabilities and to identify potential use cases of the new BICS platform.  As Mark wrote, an excellent use case for BICS is to report on top of any SaaS application that expose the data (stored in the cloud) using REST APIs by taking advantage of the ApEx capabilities hosted in the Database Schema Service that comes with BICS. SaaS applications like Oracle’s Fusion CRM, Taleo or – that was used during the beta program – can be easily integrated and queried with BICS.

The technical goal of our beta program has been to check the features, options and limitations of Oracle BICS by connecting it to a instance, accessing the data exposed through REST APIs by using the ApEx native functions,  storing the data in the Database Schema Service, creating the Repository using the new Model Editor and showing the data in dashboard by keeping the same data security settings configured in the source platform. to Oracle BI Cloud Service

This series of post is going to explain all the details about our successful PoC, over the next few days we’ll be covering the following topics, and we’ll update the list with hyperlinks once the articles are published:

  • Oracle BI Cloud Service for SaaS Application Reporting Part 1: Integrating BICS to using REST APIs
  • Oracle BI Cloud Service for SaaS Application Reporting Part 2: BICS ApEx components
  • Oracle BI Cloud Service for SaaS Application Reporting Part 3: BICS Repository and Front-end configurations

In the first post of the series we’re going to define the steps required to setup a demo environment that can be accessed with REST APIs. Later in the post we intend to analyse the most interesting REST APIs that are used to extract the data from the platform. Before starting digging into the integration, some definitions may be needed:

  • RESTful APIs: A Web service API can be defined as RESTful if it conforms to the REST architectural constraints. Some more details about RESTful APIs can be found here.
  • is company specialised in software as a service (SaaS) and one of the major providers of CRM in the cloud. In addition to CRM offers a cloud platform as a service (PaaS) that developers can use in order to build multitenant applications hosted on servers. All the applications can be accessed by using RESTful APIs.
Environments Setup

In a new BI project it is never safe to directly extract data from a live (production) environment. Thus the first step needed in order to test connection is to obtain a environment that we can use without the risk of slowing it down, or even worse crashing and impacting users. A free demo platform can be obtained by subscribing for the developers program and can be customised as needed. After requesting the demo environment the creation of a security token is needed, the token is associated with the user invoking the RESTful API. To generate the token: login to with the specific user -> click on the username -> My Settings -> Reset My Security Token. The new token should then be sent to the user’s email.

Salesforce Reset token Procedure

The third step in order to access data with REST APIs is to define a Connected App. The creation of a Connected App generates a Consumer Key and a Consumer Secret that is going to be used later during the REST login calls. Once defined a Connected App, the last bit missing is to populate the instance since it’s empty by default, it can be populated manually or by writing population scripts based on the create REST API statements explained here. An Oracle BICS instance is needed in order to analyse the data, all the info regarding Oracle BICS and how to activate one instance can be found at this link. Having covered all the basics, it’s time to start analysing how the data can be extracted. REST APIs

In the following sections we are going to analyse some of the REST APIs used in our process. The REST APIs will be called with cURL commands, cURL is a command line tool and library for transferring data with URL syntax. Authentication

The first step in order to download the data from any instance is to execute the authentication and retrieve the access token, the access token will then be used in all the following REST calls. There are various authentication mechanisms that can be used against, for the aim of the beta program we used the one called “Username-Password OAuth Authentication Flow” that is described in detail here and in the image below. User Password OAuth

The REST API authentication command is

curl -v -d "grant_type=password" \
    -d "client_id=CLIENT_ID" -d "client_secret=CLIENT_SECRET" \
    -d "username=USERNAME" -d "password=PASSWORD_TOKEN"

where the following parameters must be assigned:

  • CLIENT_ID:  The client ID generated during the creation of the Connected App (step described in Basic Setup)
  • CLIENT_SECRET: The client Secret generated during the creation of the Connected App (step described in Basic Setup)
  • USERNAME: The username you want to use in order to access data
  • PASSWORD_TOKEN: The concatenation of password and user security token (e.g. if the password is ABC and the security token is 123 then PASSWORD_TOKEN is ABC123)

The response (if all the parameters are correct) should be like the following


where the most interesting parameters are:

  • instance_url: defines the salesforce instance to use.
  • access_token: defines the time-limited security token to use for all the following calls. List of Objects and Retrieve Metadata for an Object

Two of the REST APIs were very useful when trying to build a general ETL that could be applied to any customised instance:

  • Get list of Objects: retrieves the list of all objects available in (custom or not) with some additional information for each object.
  • Retrieve Metadata for an Object: retrieves the list of columns for the selected object with related data types and additional information.

The Get list of Objects call is the following

curl INSTANCE/services/data/VERSION/sobjects/ -H "Authorization: Bearer TOKEN"


  • INSTANCE is the instance_url parameter retrieved from the authentication call.
  • TOKEN is the access_token parameter retrieved from the authentication call.
  • VERSION is the REST APIs version used, in our beta test we used v29.0.

The response should be similar to the following in which you could see for each object (identified by the “name” field) all the metadata available.


The Retrieve Metadata of an Object can be called for each object listed in Get list of Object response, the code is the following:

curl INSTANCE/services/data/VERSION/sobjects/OBJECT/ \
    -H "Authorization: Bearer TOKEN"


  • INSTANCE and TOKEN  and VERSION are the same parameters defined for Get list of objects call
  • OBJECT is the object to retrieve

The response for the Account object is similar to the following


Interesting parameters for each column are:

  • name: is the columns name.
  • retrievable: if true it means that the particular column can be part of a query
  • soapType and length or byteLength: these fields provide the field type information and the length of the field.

With the three REST APIs analysed we could recreate all the objects (custom or not) in any (or instance as tables in the Oracle Database Schema Service. In the next section we will see how to extract the data from the list of objects by using the query capabilities. query

Salesforce provides two methods of querying the objects: including or excluding deleted object. The difference in the code is minimal, the first uses the /queryAll suffix while the second uses the /query. A REST API query call is:

curl INSTANCE/services/data/VERSION/query/?q=SELECT+LIST_OF_COLUMNS+from+OBJECT \
    -H "Authorization: Bearer TOKEN"


  • INSTANCE and TOKEN parameters are the ones retrieved from the Authentication call
  • VERSION is the REST API version used
  • OBJECT is the object to query
  • LIST_OF_COLUMNS is the comma delimited list of columns to retrieve

The response of a query “select+Name+from+Account” is:

         "Name":"Test 1"
         "Name":"Test 2"

In the JSON result you can find:

  • totalsize: the number of records retrieved
  • Name: for each record the name of the Account

If the resultSet is too big, will start paging results. If the initial query returns only part of the results, the end of the response will contain a field called nextRecordsUrl. For example:

"nextRecordsUrl" : "/services/data/v20.0/query/01gD0000002HU6KIAW-2000"

In order to get the next page of data a call like the following is needed passing the access token as a parameter.

curl INSTANCE/services/data/v20.0/query/01gD0000002HU6KIAW-2000 \
    -H "Authorization: Bearer TOKEN" limits

It’s important to be aware of the limit set by on the number of REST API calls per day. In order to check at any time the remaining number of calls available execute the following code.

curl INSTANCE/services/data/VERSION/limits/ -H "Authorization: Bearer TOKEN "X-PrettyPrint:1"

The DailyApiRequest -> Remaining field contained in the response shows the amount of calls still available.


Resultset output format default output format is JSON, which is a common format for most of the cloud application. However there is the opportunity to also retrieve the result in XML format by using the HTTP ACCEPT header set to “application/xml”. This first release of the tool is based on Oracle DB11g which supports native XML and not JSON. For this reason during our beta program with Oracle BI Cloud Service we decided to use the XML output. Once BICS will be bundled with the Oracle DB 12c that supports JSON and XML native, the resultset output format could be kept as default in JSON. In this post we defined the steps required in order to setup a demo environment that can be accessed with REST APIs and which are the most interesting REST APIs that we used to extract the data in order to analyse it with Oracle BI Cloud Service platform. In the next post we will look more in detail at the BICS ApEx part of it by analysing how the REST APIs can be called from ApEx.

Categories: BI & Warehousing

Science needs to explain this?

FeuerThoughts - Sun, 2014-11-02 08:57
Christopher Nolan of Dark Knight fame releasing new sci-fi movie: Interstellar.

In a Chicago Tribune interview, he says:
I could be wrong, but science needs to cross a threshold and explain why a monkey typing infinitely would never type the works of Shakespeare.
Well, I could be wrong, but maybe Nolan is a bit of an idiot when it comes to science.
Please, Mr. Nolan, tell me which scientists make this claim?
I guess he read somewhere about infinity and how incredibly awesome and big and never-ending it is, and so eventually anything would be done by anybody or anything and so even monkeys would "eventually" write Shakespeare and and and....
Produce a movie called Interstellar with Matthew McConaughey. 
In fact, maybe Chris Nolan is actually a monkey who crossed over from that obelisk in 2001, and got super smart and so a monkey already has produced a movie called Interstellar.
Damn, that is just so cool and so weird and it's like, that's never going to happen, man, no way.
So scientists had better figure out WHY that is not going to happen when they obviously really believe that it WILL happen (go, monkey, go!).
And to do that, they are going to have a cross a threshold, 'cause clearly science has hit its limit here. Just like with souls. Science can't explain souls, so I guess scientists had better cross over - maybe into a parallel universe -
Because really what could be cooler than parallel universes?

Categories: Development

Notes on predictive modeling, November 2, 2014

DBMS2 - Sun, 2014-11-02 05:49

Following up on my notes on predictive modeling post from three weeks ago, I’d like to tackle some areas of recurring confusion.

Why are we modeling?

Ultimately, there are two reasons to model some aspect of your business:

  • You generally want insight and understanding.
    • This is analogous to why you might want to do business intelligence.
    • It commonly includes a search for causality, whether or not “root cause analysis” is exactly the right phrase to describe the process.
  • You want to do calculations from the model to drive wholly or partially automated decisions.
    • A big set of examples can be found in website recommenders and personalizers.
    • Another big set of examples can be found in marketing campaigns.
    • For an example of partial automation, consider a tool that advises call center workers.

How precise do models need to be?

Use cases vary greatly with respect to the importance of modeling precision. If you’re doing an expensive mass mailing, 1% additional accuracy is a big deal. But if you’re doing root cause analysis, a 10% error may be immaterial.

Who is doing the work?

It is traditional to have a modeling department, of “data scientists” or SAS programmers as the case may be. While it seems cool to put predictive modeling straight in the hands of business users — some business users, at least — it’s rare for them to use predictive modeling tools more sophisticated than Excel. For example, KXEN never did all that well.

That said, I support the idea of putting more modeling in the hands of business users. Just be aware that doing so is still a small business at this time.

“Operationalizing” predictive models

The topic of “operationalizing” models arises often, and it turns out to be rather complex. Usually, to operationalize a model, you need:

  • A program that generates scores, based on the model.
  • A program that consumes scores (for example a recommender or fraud alerter).

In some cases, the two programs might be viewed as different modules of the same system.

While it is not actually necessary for there to be a numerical score — or scores — in the process, it seems pretty common that there are such. Certainly the score calculations can create a boundary for loose-coupling between model evaluation and the rest of the system.

That said:

  • Sometimes the scoring is done on the fly. In that case, the two programs mentioned above are closely integrated.
  • Sometimes the scoring is done in batch. In that case, loose coupling seems likely. Often, there will be ETL (Extract/Transform/Load) to make the scores available to the program that will eventually use them.
  • PMML (Predictive Modeling Markup Language) is good for some kinds of scoring but not others. (I’m not clear on the details.)

In any case, operationalizing a predictive model can or should include:

  • A process for creating the model.
  • A process for validating and refreshing the model.
  • A flow of derived data.
  • A program that consumes the model’s outputs.

Traditional IT considerations, such as testing and versioning, apply.

What do we call it anyway?

The term “predictive analytics” was coined by SPSS. It basically won. However, some folks — including whoever named PMML — like the term “predictive modeling” better. I’m in that camp, since “modeling” seems to be a somewhat more accurate description of what’s going on, but I’m fine with either phrase.

Some marketers now use the term “prescriptive analytics”. In theory that makes sense, since:

  • “Prescriptive” can be taken to mean “operationalized predictive”, saving precious syllables and pixels.
  • What’s going on is usually more directly about prescription than prescription anyway.

Edit: Ack! I left the final paragraph out of the post, namely:

In practice, however, the term “prescriptive analytics” is a strong indicator of marketing nonsense. Predictive modeling has long been used to — as it were — prescribe business decisions; marketers who use the term “prescriptive analytics” are usually trying to deny that very obvious fact.

Categories: Other

Analytics for lots and lots of business users

DBMS2 - Sun, 2014-11-02 05:45

A common marketing theme in the 2010s decade has been to claim that you make analytics available to many business users, as opposed to your competition, who only make analytics available to (pick one):

  • Specialists (with “PhD”s).
  • Fewer business users (a thinner part of the horizontally segmented pyramid — perhaps inverted — on your marketing slide, not to be confused with the horizontally segmented pyramids — perhaps inverted — on your competition’s marketing slides).

Versions of this claim were also common in the 1970s, 1980s, 1990s and 2000s.

Some of that is real. In particular:

  • Early adoption of analytic technology is often in line-of-business departments.
  • Business users on average really do get more numerate over time, my three favorite examples of that being:
    • Statistics is taught much more in business schools than it used to be.
    • Statistics is taught much more in high schools than it used to be.
    • Many people use Excel.

Even so, for most analytic tools, power users tend to be:

  • People with titles or roles like “business analyst”.
  • More junior folks pulling things together for their bosses.
  • A hardcore minority who fall into neither of the first two categories.

Asserting otherwise is rarely more than marketing hype.

Related link

Categories: Other

Does Increasing An Oracle Background Process OS Priority Improve Performance?

Does Increasing An Oracle Background Process OS Priority Improve Performance?
Does increasing an Oracle Database background process operating system priority improve performance? As you might expect, the answer is, "It depends."

In this posting I will explain the results of an experiment where I increase the Oracle Database 12c log writer background processes operating system priority.

In my experiment I created a clear CPU bottleneck and the top wait event was log file parallel write. I gathered some data. Next I increased all the log writer background process priorities. Again, I gathered some data and then I analyzed the two data sets. The results were disappointing, not surprising, but a fundamental rule in performance tuning was demonstrated and reinforced.

You can download the "analysis pack" which contains the raw experimental data, histogram, statistical R results and the R statical package commands HERE. You can download total time delta reporting script (ttpctx.sql) I show below, which is contained within my OraPub System Monitor (OSM) Toolkit HERE.

Before I could gather some data, I needed to create the appropriate system load, the data gather scripts and the data analysis scripts. Here's a quick overview of each.

Increasing The LGWR Processes PriorityIf you are not familiar with changing Oracle Database background OS process priority, I blogged about how to do this HERE.

My experiment consisted of creating two situations and statistically comparing them to see if increasing the LGWR background process OS priority increased performance. The only difference in the "two situations" was the instance parameter, _high_priority_processes. For the "normal priority" part of the experiment, the default "LMS*|VKTM" was used. For the "high priority" part of the experiment the parameter was changed to "LMS*|VKTM|LG*". The "LG*" caused the increase in the Linux OS priority of all log writer processes from the default 19 to 41.

Ready for some version specifics? Oracle continues to make OS priority related instance parameter changes... even within 12c releases. Since this experiment was done with Oracle Database version the parameter default was "LMS*|VKTM" not "LMS*" as with version Also, in VKTM is placed into a new parameter, _highest_priority_processes.

Generating The DML With CPU Bottleneck Load
To generate the DML workload, I used my OP Load Generator. You can download it HERE. It's quick, easy and I'm familiar with it. I kept increasing the number of DML processes until the CPU utilization was pegged at 100%. While the database server was clearly out of CPU power, the log writer background processes were rarely seen as the top CPU consuming process. I used the "top" program to monitor the process activity. The top CPU consuming processes were almost always the Oracle server/foreground/shadow processes. As I'll explain in the Results section below, this has significant ramifications on the results.

Oracle Time Based Analysis Summary
The data collection script was similar to the one I used back in 2012 when collecting data regarding Oracle's commit write facility. Essentially, I collected multiple three minute samples of the delta elapsed time, user commits, total non-idle wait time and CPU consumption. In both the normal and high priority runs, around 99% of the database time was CPU consumption and 1% of the time Oracle non-idle wait time. Also the top wait event (~70%) was log file parallel write (display name: log file redo write).

If you want master an Oracle Time Based Analysis, check out my online seminar, Tuning Oracle Using An AWR Report. It will teach you how to using an AWR report to optimize Oracle performance so users will feel the difference.

The OraPub System Monitor Toolkit script ttpctx.sql report below was taken during the "normal priority" log writer load.

SQL> @ttpctx

Database: prod35 16-MAY-14 06:07pm
Report: ttpctx.sql OSM by OraPub, Inc. Page 1
Total Time Activity (39 sec interval)

Avg Time Time Wait
Time Component % TT % WT Waited (ms) (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs 98.91 0.00 0.000 238.716 0
log file redo write 0.73 67.56 8.082 1.770 0
control file parallel write 0.18 16.41 33.077 0.430 0
target log write size 0.05 4.20 6.111 0.110 0
oracle thread bootstrap 0.03 3.05 40.000 0.080 0
os thread creation 0.02 1.53 20.000 0.040 0
commit: log file sync 0.01 0.76 10.000 0.020 0
enq: CR - block range reuse ckpt 0.01 0.76 20.000 0.020 0
Disk file operations I/O 0.00 0.00 0.000 0.000 0

The OSM report below was taken during the "high priority" log writer load.

SQL> @ttpctx

Database: prod35 16-MAY-14 09:25pm
Report: ttpctx.sql OSM by OraPub, Inc. Page 1
Total Time Activity (41 sec interval)

Avg Time Time Wait
Time Component % TT % WT Waited (ms) (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs 98.92 0.00 0.000 238.733 0
log file redo write 0.83 77.01 8.272 2.010 0
control file parallel write 0.08 7.28 14.615 0.190 0
target log write size 0.05 4.98 5.909 0.130 0
oracle thread bootstrap 0.03 3.07 40.000 0.080 0
os thread creation 0.02 1.92 25.000 0.050 0
commit: log file sync 0.01 0.77 10.000 0.020 0
enq: CR - block range reuse ckpt 0.01 0.77 20.000 0.020 0
enq: RO - fast object reuse 0.00 0.38 10.000 0.010 0

Data Collection
For the normal priority load 27 three minute samples where collected. For the high priority situation there were 30 three minute samples collected. (I forgot why there was only 27 samples collected for the normal priority.)  I collected the elapsed time, total non-idle wait time, total CPU consumption (v$sys_time_model: db_cpu + background cpu time) and total user commits.

In this experiment more user commits processed per second means better performance.

Experimental Results
I used the free statistics package "R" ( to analyze the data. I demonstrate how to get, install and use "R" in my online video seminar, Using Skewed Data To Your Advantage.

With the normal log writer process priority, an average of 984.5 commits/sec and a median of 983.0 commits/sec occurred. With the LG* high process priority, an average of 993.6 commits/sec and a median of 991.0 commits/sec occurred. While the "high priority" situation was able to process more commits per second, is this statistically significant?

The red "smoothed" histogram is the normal priority situation and the blue smoothed histogram is when the log writers were set to the higher priority. The more separated the two histograms the more "different" the sample sets, the more likely there is a statistically significant difference and the more likely a user would feel the difference. Looking at the above histograms plot, there does not appear to be a real difference. But let's do a proper significance test!

Because both sample sets are normally distributed (details are in the Analysis Pack), I could use a simple t-test. R produced a p-value of 0.04451. To be statistically "different" I want the p-value to be less than 0.05 and it is. What does this mean?

While statistically and numerically the commit rates are different, I wouldn't expect any special performance tuning award! In fact, the hassles with cycling a production instance and setting underscore/hidden parameters would make it very unlikely I would increase the OS priority of the log writer background processes. I want to see a big performance difference.

To Summarize... What I Learned... Again
This situation is a perfect example of focusing on the wrong thing! While there is a clear operating system CPU bottleneck and the top wait event is about redo, the log writers are not suffering from a want/need of CPU resources. I suspect the server processes want more CPU resources, but they are NOT the processes we increased their OS priority.

If the log writers were suffering from a lack of CPU resources and fighting for CPU resources, I would expect to see them consuming CPU resources along with the Oracle server processes. And I would definitely expect to see them near the top of the "top" process monitor... especially when their priority has been increased!

Because of this "misguided" tuning effort, this experiment does not build a case for or against changing the log writer priority. What it reinforces is in our quest to optimize performance, make sure we focus on the right thing.

As a side note, this is a great statistical analysis example for two reasons. First, our samples sets look similar, but statistically they are not. Second, while they are statistically different, the performance impact will not be very different. And my guess is the users won't feel a thing... except frustration and anger.

To Super Summarize
When considering increasing a background process's operating system priority, make sure the process is in need of CPU and is not able to get it. In this situation, the DBA could have been mislead by the CPU bottleneck. But upon closer inspection of the log writers from an operating system perspective and knowing the wait event "log file parallel write" is probably more about IO than CPU (Oracle perspective) it would be unlikely that increasing the log writer processes OS priority would help increase the commits per second.

Thanks for reading!


Categories: DBA Blogs

StatsPack and AWR Reports -- Bits and Pieces -- 2

Hemant K Chitale - Sat, 2014-11-01 08:52
This is the second post in a series on reading StatsPack and AWR reports.
(The first is available here)

Comparing Reports :

Here are two 9.2 StatsPack extracts from one database:

Extract A  : 9.2 StatsPack
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 56,031.63 3,084.68
Logical reads: 68,286.24 3,759.32
Block changes: 314.88 17.33
Physical reads: 842.92 46.40
Physical writes: 134.76 7.42
User calls: 271.32 14.94
Parses: 146.46 8.06
Hard parses: 7.37 0.41
Sorts: 93.83 5.17
Logons: 0.33 0.02
Executes: 296.70 16.33
Transactions: 18.16

Extract B : 9.2 StatsPack
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 89,615.76 2,960.48
Logical reads: 210,302.81 6,947.42
Block changes: 541.83 17.90
Physical reads: 1,465.04 48.40
Physical writes: 161.68 5.34
User calls: 213.82 7.06
Parses: 125.28 4.14
Hard parses: 6.13 0.20
Sorts: 104.31 3.45
Logons: 0.35 0.01
Executes: 664.81 21.96
Transactions: 30.27

Extract B shows a higher volume of activity -- Transactions per second, Redo Size per second, Logical Reads per second.
Actually, although these are both from the same database but for two different time-windows on different dates.  The fact that the time-windows aren't the same make the comparisons difficult.  Extract A is for an 11-hour window on day 1 and Extract B is for a 6-hour window on day 6.  The two windows aren't strictly comparable.  You have to be very careful when comparing StatsPack or AWR reports, even when from the same database if they don't cover comparable time-windows with similar loads.  The first Report includes non-peak hours, the second is for Peak hours only. The level of activity averaged out over 11hours that include non-peak hours isn't strictly comparable with a 6hour peak window.

Extract C : 10.2 AWR
Load Profile
Per SecondPer TransactionRedo size: 520,776.15 50,948.36Logical reads: 353,525.71 34,585.98Block changes: 1,854.93 181.47Physical reads: 14,285.23 1,397.55Physical writes: 295.84 28.94User calls: 1,265.14 123.77Parses: 822.64 80.48Hard parses: 15.40 1.51Sorts: 168.09 16.44Logons: 0.16 0.02Executes: 1,040.31 101.78Transactions: 10.22 
Extract D : 10.2 AWRLoad Profile
Per SecondPer TransactionRedo size: 517,862.01 54,681.39Logical reads: 288,341.09 30,446.12Block changes: 1,879.27 198.43Physical reads: 12,820.96 1,353.77Physical writes: 323.90 34.20User calls: 1,115.78 117.82Parses: 719.39 75.96Hard parses: 17.28 1.82Sorts: 95.74 10.11Logons: 0.15 0.02Executes: 935.33 98.76Transactions: 9.47 
Extracts C and D are from the same database for the same time-window on 2 consecutive workdays.  They are comparable.  A difference of 10% for some statistic may not be significant.  Understand that every business, every activity, every system will have some "normal" fluctuations day-to-day.


Categories: DBA Blogs

The First PASS Summit Bloggers’ Meetup

Pythian Group - Fri, 2014-10-31 13:02

We are stoked to announce the first ever PASS Summit Bloggers’ Meetup!

What: PASS Summit Bloggers’ Meetup 2014
When: Thursday, November 6th, 5pm – 7pm
Where: Sports Restaurant & Bar, 140 4th Ave North, Suite 130, Seattle, WA 98109
How: Please comment with “COUNT ME IN” if coming — we need to know attendance numbers.

We’re excited to meet old friends, and make new ones in Seattle this year. Pythian will sponsor munchies and drinks. There will be a networking contest with some cool prizes, plus everyone will get their very own PASS Summit Bloggers Meetup shirt! Meetups are a great way to make new friends in the community, so come on out — all bloggers are welcome!

If you are planning to attend, please comment here with the phrase “COUNT ME IN”. This will help us ensure we have the attendance numbers right. Please provide your blog URL with your comment — it’s a Bloggers Meetup after all! Make sure you comment here if you are attending so that we have enough room, food, and drinks.

The location is perfect to get ready for the appreciation event — minutes walk from EMP Museum! Snacks and drinks before the big event and mingle with fellow bloggers. What can be better?

Of course, do not forget to blog and tweet about this year’s bloggers meetup using #Summit14 #sqlpass. See you there!

Categories: DBA Blogs


Jonathan Lewis - Fri, 2014-10-31 11:31

No, not the 10th posting about first_rows() this week – whatever it may seem like – just an example that happens to use the “calculate costs for fetching the first 10 rows” optimizer strategy and does it badly. I think it’s a bug, but it’s certainly a defect that is a poster case for the inherent risk of using anything other than all_rows optimisation.  Here’s some code to build a couple of sample tables:


create table t1
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
	rownum					id,
	trunc(dbms_random.value(1,1000))	n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
	generator	v1,
	generator	v2
	rownum <= 1e6

create index t1_n1 on t1(id, n1);

create table t2
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
	rownum					id,
	trunc(dbms_random.value(10001,20001))	x1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
	generator	v1,
	generator	v2
	rownum <= 1e6

create index t2_i1 on t2(x1);

		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'

		ownname		 => user,
		tabname		 =>'T2',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'


create or replace view  v1
	id, n1, small_vc, padding
from	t1 
where	n1 between 101 and 300
union all
	id, n1, small_vc, padding
from	t1 
where	n1 between 501 and 700

The key feature of this demonstration is the UNION ALL view and what the optimizer does with it when we have first_rows_N optimisation – this is a simplified model of a production problem I was shown a couple of years ago, so nothing special, nothing invented. Here’s a query that behaves badly:

	/*+ gather_plan_statistics */
where =
and	t2.x1 = 15000

I’m going to execute this query in three different ways – as is, using all_rows optimisation; as is, using first_rows_10 optimisation, then using all_rows optimisation but with the necessary hints to make it follow the first_rows_10 execution path. Here are the resulting plans from an instance of (the same thing happens in

first_rows_10 plan
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT                     |       |     1 |    35 |   107   (0)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |     1 |    35 |   107   (0)| 00:00:01 |
|   2 |   VIEW                               | V1    |    24 |   600 |     4   (0)| 00:00:01 |
|   3 |    UNION-ALL                         |       |       |       |            |          |
|*  4 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |   100 |       |     3   (0)| 00:00:01 |

all_rows plan
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT                       |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                          |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   VIEW                                 | V1    |     1 |    25 |     8   (0)| 00:00:01 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |

first_rows_10 plan hinted under all_rows optimisation
| Id  | Operation                    | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | 
|   0 | SELECT STATEMENT             |       |   200 |  8600 |       |  6124   (3)| 00:00:01 | 
|*  1 |  HASH JOIN                   |       |   200 |  8600 |    17M|  6124   (3)| 00:00:01 |
|   2 |   VIEW                       | V1    |   402K|    12M|       |  5464   (3)| 00:00:01 | 
|   3 |    UNION-ALL                 |       |       |       |       |            |          | 
|*  4 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2731   (3)| 00:00:01 | 
|*  5 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2733   (3)| 00:00:01 | 
|   6 |   TABLE ACCESS BY INDEX ROWID| T2    |   100 |  1000 |       |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |   100 |       |       |     3   (0)| 00:00:01 |

I’m not sure why the first_rows_10 plan uses “table access by rowid batched”, but I’d guess it’s because the optimiser calculates that sorting the index rowids before visiting the table may have a small benefit on the speed of getting the first 10 rows – eventually I’ll get around to checking the 10053 trace file. The important thing, though, is the big mistake in the strategy, not the little difference in table access.

In the first_rows_10 plan the optimizer has decided building an in-memory hash table from the UNION ALL of the rows fetched from the two copies of the t1 table will be fast and efficient; but it’s made that decision based on the assumption that it will only get 10 rows from each copy of the table – and at run-time it HAS to get all the relevant t1 rows to build the hash table before it can get any t2 rows. We can get some idea of the scale of this error when we look at the hinted plan under all_rows optimisation – it’s a lot of redundant data and a very expensive hash table build.

In contrast the all_rows plan does an efficient indexed access into the t2 table then, for each row, does a join predicate pushdown into the union all view using an indexed access path. If we only wanted to fetch 10 rows we could stop after doing a minimum amount of work. To demonstrate the error more clearly I’ve re-run the experiment for the first two plans from SQL*PLus, setting the arraysize to 11, the pagesize to 5, and stopping after the first 10 rows. Here are the plans showing the rowsource execution stats:

first_rows_10 plan
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
|   0 | SELECT STATEMENT                     |       |      1 |        |   107 (100)|     12 |00:00:00.43 |   35150 |       |       |          |
|*  1 |  HASH JOIN                           |       |      1 |      1 |   107   (0)|     12 |00:00:00.43 |   35150 |    24M|  3582K|   23M (0)|
|   2 |   VIEW                               | V1    |      1 |     24 |     4   (0)|    400K|00:00:06.48 |   35118 |       |       |          |
|   3 |    UNION-ALL                         |       |      1 |        |            |    400K|00:00:04.20 |   35118 |       |       |          |
|*  4 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.71 |   17559 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.63 |   17559 |       |       |          |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |       |       |          |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |       |       |          |

all_rows plan
| Id  | Operation                              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
|   0 | SELECT STATEMENT                       |       |      1 |        |   904 (100)|     12 |00:00:00.01 |     213 |
|   1 |  NESTED LOOPS                          |       |      1 |     43 |   904   (1)|     12 |00:00:00.01 |     213 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |
|   4 |   VIEW                                 | V1    |     28 |      1 |     8   (0)|     12 |00:00:00.01 |     181 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |     28 |        |            |     12 |00:00:00.01 |     181 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    212K|     4   (0)|      8 |00:00:00.01 |      93 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      8 |00:00:00.01 |      85 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    213K|     4   (0)|      4 |00:00:00.01 |      88 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      4 |00:00:00.01 |      84 |

If I had set the optimizer_mode to first_rows_10 because I really only wanted to fetch (about) 10 rows then I’ve managed to pay a huge overhead in buffer visits, memory and CPU for the privilege – the all_rows plan was much more efficient.

Remember – we often see cases where the first_rows(n) plan will do more work to get the whole data set in order to be able to get the first few rows more quickly (the simplest example is when the optimizer uses a particular index to get the first few rows of a result set in order without sorting rather than doing a (faster) full tablescan with sort. This case, though, is different: the optimizer is choosing to build a hash table as if it only has to put 10 rows into that hash table when it actually HAS to build the whole has table before it can take any further steps – we don’t get 10 rows quicker and the rest more slowly; we just get 10 very slow rows.


It’s possible that this is an example of bug 9633142: (FIRST_ROWS OPTIMIZER DOES NOT PUSH PREDICATES INTO UNION INLINE VIEW) but that’s reported as fixed in 12c, with a couple of patches for However, setting “_fix_control”=’4887636:off’, does bypass the problem. (The fix control, introduced in has description: “remove restriction from first K row optimization”)

Data Warehouse Appliance Offerings

Chris Foot - Fri, 2014-10-31 11:15


Information Technology units will continue to be challenged by the unbridled growth of their organization’s data stores. An ever-increasing amount of data needs to be extracted, cleansed, analyzed and presented to the end user community. Data volumes that were unheard of a year ago are now commonplace. Day-to-day operational systems are now storing such large amounts of data that they rival data warehouses in disk storage and administrative complexity. New trends, products, and strategies, guaranteed by vendors and industry pundits to solve large data store challenges, are unveiled on a seemingly endless basis.

Choosing the Large Data Store Ecosystem

Choosing the correct large data store ecosystem (server, storage architecture, OS, database) is critical to the success of any application that is required to store and process large volumes of data. This decision was simple when the number of alternatives available was limited. With the seemingly endless array of architectures available, that choice is no longer as clear cut. Database administrators now have more choices available to them than ever before. In order to correctly design and implement the most appropriate architecture for their organization, DBAs must evaluate and compare large data store ecosystems and not the individual products.

Traditional Large Data Store Technologies

Before we begin our discussion on the various advanced vendor offerings, we need to review the database features that are the foundation of the customized architectures we will be discussing later in this article. It is important to note that although each vendor offering certainly leverages the latest technologies available, the traditional data storage and processing features that DBAs have been utilizing for years remain critical components of the newer architectures.


Partitioning data into smaller disk storage subsets allows the data to be viewed as a single entity while overcoming many of the challenges associated with the management of large data objects stored on disk.

Major database vendor products offer optimizers that are partition aware and will create query plans that access only those partitioned objects needed to satisfy the query’s data request (partition pruning). This feature allows administrators to create large data stores and still provide fast access to the data.

Partitioning allows applications to take advantage of “rolling window” data operations. Rolling windows allow administrators to roll off what is no longer needed. For example, a DBA may roll off the data in the data store containing last July’s data as they add this year’s data for July. If the data is ever needed again, administrators are able to pull the data from auxiliary or offline storage devices and plug the data back into the database.

Query Parallelism

Query parallelism improves data access performance by splitting work among multiple CPUs. Most database optimizers are also parallel aware and are able to break up a single query into sub queries that access the data simultaneously.

Without parallelism, a SQL statement’s work is performed by a single process. Parallel processing allows multiple processes to work together to simultaneously process a single SQL statement or utility execution. By dividing the work necessary to process a statement among multiple CPUs, the database can execute the statement more quickly than if the work was single-threaded.

The parallel query option can dramatically improve performance for data-intensive operations associated with decision support applications or very large database environments. Symmetric multiprocessing (SMP), clustered, or massively parallel systems gain the largest performance benefits from the parallel query option because query processing can be effectively split up among many CPUs on a single system.

Advanced Hardware and Software Technologies

Let’s continue our discussion by taking a high-level look at the advanced data warehouse offerings from the three major database competitors, Oracle, Microsoft and IBM. Each of the vendors’ offerings are proprietary data warehouse ecosystems, often called appliances, that consist of hardware, OS and database components. We’ll complete our review by learning more about Hadoop’s features and benefits.

Oracle Exadata

Oracle’s Exadata Machine combines their Oracle database with intelligent data storage servers to deliver very high performance benchmarks for large data store applications. Exadata is a purpose-built warehouse ecosystem consisting of hardware, operating system and database components.

Oracle Exadata Storage Servers leverage high speed interconnects, data compression and intelligent filtering and caching features to increase data transfer performance between the database server and intelligent storage servers. In addition, the Exadata architecture is able to offload data intensive SQL statements to the storage servers to filter the results before the relevant data is returned to the database server for final processing.

Exadata uses PCI flash technology rather than flash disks. Oracle places the flash memory directly on the high speed PCI bus rather than behind slow disk controllers and directors. Each Exadata Storage Server includes 4 PCI flash cards that combine for a total of 3.2 TB of flash memory. Although the PCI flash can be utilized as traditional flash disk storage, it provides better performance when it is configured as a flash cache that sits in front of the disks. Exadata’s Smart Flash Cache will automatically cache frequently accessed data in the PCI cache, much like its traditional database buffer cache counterpart. Less popular data will continue to remain on disk. Data being sent to the PCI Flash cache is also compressed to increase storage capacity.

Exadata also offers an advanced compression feature called Hybrid Columnar Compression (HCC) to reduce storage requirements for large databases. Exadata offloads the compression/decompression workload to the processors contained in the Exadata storage servers.

These technologies enable Exadata to deliver high performance for large data stores accessed by both decision support and online operational systems. The Exadata machine runs an Oracle database which allows Oracle-based applications to be easily migrated. Oracle describes the Exadata architecture as “scale out” capable meaning multiple Exadata servers can be lashed together to increase computing and data access horsepower . Oracle RAC, as well as Oracle’s Automatic Storage Management (ASM), can be leveraged to dynamically add more processing power and disk storage.

Microsoft SQL Server PDW

SQL Server Parallel Data Warehouse (PDW) is a massively parallel processing (MPP) data warehousing appliance designed to support very large data stores. Like Oracle’s Exadata implementation, the PDW appliance’s components consist of the entire database ecosystem including hardware, operating system and database.

Database MPP architectures use a “shared-nothing” architecture, where there are multiple physical servers (nodes), with each node running an instance of the database and having its own dedicated CPU, memory and storage.

Microsoft PDW’s architecture consists of:

  • The MPP Engine
    • Responsible for generating parallel query execution plans and coordinating the workloads across the system’s compute nodes
    • Uses a SQL Server database to store metadata and configuration data for all of the databases in the architecture
    • In essence, it acts as the traffic cop and the “brain” of the PDW system
  • Computer Nodes
    • Each compute node also runs an instance of the SQL Server database
    • The compute nodes’ databases are responsible for managing the user data

As T-SQL is executed in the PDW system, the queries are broken up to run simultaneously over multiple physical nodes, which utilizes parallel execution to provide high performance data access. The key to the success when using PDW is to select the appropriate distribution columns that are used to intelligently distribute the data amongst the nodes. The ideal distribution column is one that is accessed frequently, is able to evenly distribute data based on the column’s values and has low volatility (doesn’t change a lot).

Microsoft Analytics Platform (APS)- Hadoop and SQL Server Integration

Microsoft’s Analytics Platform System (APS) combines massively parallel processing offering (PDW) with HDInsight, their version of Apache Hadoop. Microsoft has partnered with Hortonworks, a commercial Hadoop software vendor that provides a Windows-based, 100% Apache Hadoop distribution. Please see section below for more detailed information on the Hadoop engine.

Integrating a Hadoop engine into SQL Server allows Microsoft to capture, store, process and present both structured (relational) and unstructured (non-relational) data within the same logical framework. Organizations wanting to process unstructured data often turned to Apache Hadoop environments which required them to learn new data storage technologies, languages and an entirely new processing architecture.

Microsoft’s Polybase provides APS users with the ability to query both structured and non-structured data with a single T-SQL based query. APS application programmers are not required to learn MapReduce or HiveQL to access data stored in the APS platform. Organizations using APS do not incur the additional costs associated with to re-training their existing staff or hiring personnel with experience in Hadoop access methods.

IBM PureData Systems for Analytics

Not to be outdone by their database rivals, IBM also provides a proprietary appliance called IBM PureData System for Analytics. The system, powered by Netezza, once again, combines the hardware, database and storage into a single platform offering. Pure Data Analytics is an MPP system utilizing IBM Blade Servers and dedicated disk storage servers that, like its competitors, is able to intelligently distribute workloads amongst the processing nodes.

IBM leverages field-programmable gate arrays (FPGAs) which are used in their FAST engines. IBM runs the FAST engine on each node to provide compression, data filtering and ACID compliance on the Netezza systems. The real benefit of FAST is that the FPGA technology allows the engines to be custom tailored to the instructions that are being sent to them for processing. The compiler divides the query plan into executable code segments, called snippets, which are sent in parallel to the Snippet Processors for execution. The FAST engine is able to customize the filtering according to the snippet being processed.

IBM’s Cognos, Data Stage, and InfoSphere Big Insights software products are included in the offering. IBM’s goal is to provide a total warehouse solution, from ETL to final data presentation, to Pure Data Analytics users.

In addition, IBM also provides industry-specific warehouse offerings for banking, healthcare, insurance, retail and telecommunications verticals. IBM’s “industry models” are designed to reduce the time and effort needed to design data warehousing systems for the organizations in these selected business sectors. IBM provides the data warehouse design and analysis templates to accelerate the data warehouse build process. IBM consulting assists the customer to tailor the architecture to their organization’s unique business needs.

Non-Database Vendor Technologies

New “disruptive” products that compete with the traditional database vendor offerings continue to capture the market’s attention. The products range the spectrum from No-SQL products that provide easy access to unstructured data to entirely new architectures like Apache’s Hadoop.

Major database vendors will make every effort to ensure that disruptive technologies gaining market traction become an enhancement, not a replacement, for their traditional database offerings. Microsoft’s APS platform is an excellent example of this approach.

Apache Hadoop

Apache’s Hadoop is a software framework that supports data-intensive distributed applications under a free license. The Hadoop software clusters’ commodity servers offer scalable and affordable large-data storage and distributed processing features in a single architecture.

A Hadoop cluster consists of a single master and multiple worker nodes. The master provides job control and scheduling services to the worker nodes. Worker nodes provide storage and computing services. The architecture is distributed, in that the nodes do not share memory or disk.

A distributed architecture allows computing horsepower and storage capacity to be added without disrupting on-going operations. Hadoop’s controlling programs keep track of the data located on the distributed servers. In addition, Hadoop provides multiple copies of the data to ensure data accessibility and fault tolerance.

Hadoop connects seamlessly to every major RDBMS through open-standard connectors providing developers and analysts with transparent access through tools they are familiar with. When used simply as a large-data storage location, it is accessible through a variety of standards-based methods such as FUSE or HTTP. Hadoop also offers an integrated stack of analytical tools, file system management and administration software to allow for native exploration and mining of data.

The Hadoop architecture is able to efficiently distribute processing workloads amongst dozens and hundreds of cost-effective worker nodes. This capability dramatically improves the performance of applications accessing large data stores. Hadoop support professionals view hundreds of gigabytes as small data stores and regularly build Hadoop architectures that access terabytes and petabytes of structured and unstructured data.

One of Hadoop’s biggest advantages is speed. Hadoop is able to generate reports in a fraction of the time required by traditional database processing engines. The reductions can be measured by orders of magnitude. Because of this access speed, Hadoop is quickly gaining acceptance in the IT community as a leading alternative to traditional database systems when large data store technologies are being evaluated.


As stated previously, there is an endless array of offerings that focus on addressing large data store challenges. Large data store architecture selection is the most important decision that is made during the warehouse development project. A correctly chosen architecture will allow the application to perform to expectations, have the desired functionality and be easily monitored and administered. Incorrect architecture decisions may cause one or more of the following problems to occur: poor performance, limited functionality, high total cost of ownership, complex administration and tuning, lack of scalability, poor vendor support, poor reliability/availability and so on. All market- leading database vendors understand the importance of addressing the challenges inherent with large data stores and have released new products and product enhancements designed to simplify administration and improve performance.

The post Data Warehouse Appliance Offerings appeared first on Remote DBA Experts.

USA War Casualties

Nilesh Jethwa - Fri, 2014-10-31 10:35 maintains documented list of all fatalities for Iraq and Afghanistan wars.

Analysing the dataset for Afghanistan, we summarize the results by the year

NOTE: This contains only Afghanistan metrics. We will later update the visuals to reflect Iraq war.



USA war fatalities by year

We are approaching the levels of 2002 and hope for the best that we don’t have to suffer another wars.

Here is another view by year and month


InfoCaptor : Analytics & dashboards


The dataset contains the age of each person died in the war so summarizing by Age


War Deaths by Age

Checking it against the year


Why so many young deaths between age 20 and 30 for the year 2014?


Where did most of the deaths occur?



Where were the soldiers from?


Deaths by Rank


InfoCaptor : Analytics & dashboards


Cause of Death

Attack Types




Helicopter Crash is the one of the top death cause in Non Hostile situations

Speaking at the Spanish Virtual PASS Chapter

Pythian Group - Fri, 2014-10-31 10:09

Title : Recuperación de desastres y soluciones de alta disponibilidad con SQL Server
Event link:
Event description: “Esta presentación presenta las soluciones de recuperacion de desastres (Disaster Recovery) y alta disponibilidad (High Availability) con SQL Server y ofrece escenarios creativos por usar las soluciones para reportages (Reporting), BI y almacen de datos (Datawarehouse). ”

Please feel free to register!

Categories: DBA Blogs

Log Buffer #395, A Carnival of the Vanities for DBAs

Pythian Group - Fri, 2014-10-31 07:44

This Log Buffer Edition ventures out in the fields of Oracle, SQL Server and MySQL and picks some of the coolest blog posts for this week.


In Oracle, if we add a column to a table which is NOT NULL, we are allowed to do it directly, in a single statement, as long as we supply a DEFAULT value to populate any pre-existing rows.

There have been rumblings from the HPC community indicating a general suspicion of and disdain for Big Data technology which would lead one to believe that whatever Google, Facebook and Twitter do with their supercomputers is not important enough to warrant seriousness—that social supercomputing is simply not worthy.

Work-around Instance Migration Limits of BPM Suite 11g.

Oracle Event Processing 12c: java errors when deploying a new OEP project.

Creating a WebLogic 12c Data Source Connection to Pivotal GemFireXD 1.3.

SQL Server:

It sounds simple enough. Either your column will always have a value or it may not. Yet somehow such a seemingly simple decision can become a never-ending debate where database schema begins to resemble superstition and designing effective tables seems more contentious than you expected it to be.

There’s a joke doing the rounds at SQL conferences and seminars: three DBAs walk into a NoSQL bar and leave when they can’t find a table.

Stairway to AlwaysOn Level 3: Infrastructure 101.

How to compare two databases and email the results to every one who needs to know.

Free eBook: Troubleshooting SQL Server: A Guide for the Accidental DBA.


Set up an SSL-encrypted connection between Sphinx and MySQL.

Sometimes users ask for something that doesn’t really make sense. On the first glance. But then you start asking and realize that the user was right, you were wrong, and it is, actually, a perfectly logical and valid use case.

PECL/mysqlnd_ms needs updates for MySQL Group Replication.

Why should you migrate from MySQL to MariaDB?

The Perfect Server – CentOS 7 (Apache2, Dovecot, ISPConfig 3).

Categories: DBA Blogs

Open Source Virtualization Project at Risk [VIDEO]

Chris Foot - Fri, 2014-10-31 04:02


Hi, welcome to RDX. Virtualization and cloud technology pretty much go hand-in-hand.

Many popular cloud providers, such as Amazon and Rackspace, use Xen, an open-source virtualization platform to optimize their environments.

According to Ars Technica, those behind the Xen Project recently released a warning to those using its platform. Apparently, a flaw within the program's hypervisor allows cybercriminals to corrupt a Xen virtual machine, or VM. From there, perpetrators could read information stored on the VM, or cause the server hosting it to crash. Monitoring databases hosted on Xen VMs is just one necessary step companies should take. Reevaluating access permissions and reinforcing encryption should also be priorities. 

Thanks for watching! Be sure to visit us next time for any other advice on security vulnerabilities. 

The post Open Source Virtualization Project at Risk [VIDEO] appeared first on Remote DBA Experts.

OSB12c: Errorhandling in REST

Darwin IT - Fri, 2014-10-31 03:39
Yesterday, I had an OSB consulting day at a customer. We looked into a REST service that was to be extended with update functionality. Since calling an update service of an EIS (Enterprise Information System) can go wrong with all sorts of errors, it is important to be able to return a fault-message with the errors, jason format.

Now in OSB12c it's very apparent how you define possible fault-messages and even how the should be formatted in JSON:

In this sample case we created a more or less simple xsd for faults (dutch: fouten). To test with different fault messages we simply duplicated the 'fouten' element in the xsd to 'fouten2'. You can assign different HTTP-status codes to the different fault.

So this is configuration is pretty simple and straight forward. But it is not quite clear in the documents how you would return a specific fault within your error-handlers in the pipeline.

Internally OSB works not only 'XML'-based but actually SOAP-based. So the trick in the end is to replace the body with a soap-fault message and the selection of the REST/JSON errormessage is done based on the structure of the document in the details-section of the SOAP-Fault. In the screen above, you would define for each fault message an xsd-element and apparently it validates the soap-fault-details content against each XSD defined, and the xsd against which the detail-content is valid points to the returned fault, with the corresponding HTTP Status.

So we created a XQuery transformation as follows:
xquery version "1.0" encoding "utf-8";

(:: OracleAnnotationVersion "1.0" ::)

declare namespace ns2="";
(:: import schema at "../PS/Schemas/fouten.xsd" ::)
declare namespace ns1="";
(:: import schema at "../BS/Schemas/XMLSchema_-130439696.xsd" ::)
declare namespace soap-env="";

declare variable $input as element() (:: schema-element(ns1:ServiceErrorMessage) ::) external;

declare function local:func($input as element() (:: schema-element(ns1:ServiceErrorMessage) ::)) as element() (:: schema-element(ns2:fouten) ::) {
for $detail in $input/ns1:detail
<ns2:ErrorMessage>{fn:concat("ERROR: ", fn:data($detail/ns1:message))}</ns2:ErrorMessage></ns2:ErrorMessages>

Of course the actual fault detail must follow the xsd for that particular fault. We tested but the faultcode or fault string does not have any affect in selection of the REST-fault or HTTP statuscode.
With the xquery above we got the 'fault' returned as defined in the REST definition, as shown in the screendump above.
 In our example, if we would changed the contents of this xquery and replace the tag <ns2:fouten> to <ns2:fouten2> then we got the other fault (fault2), with the corresponding HTTP-status.
A detail with contents that does not correspond to any of the defined fault-xsd's would result in HTTP-status 500: internal server error. So it is important to have a proper transformation where the returning fault-detail is valid to at least one of the fault-xsd's.

Another conclusion is that since the fault selection is apparently based on the detail-contents against the registered fault-xsd-elements, you apperently can't have different faults with the same xsd. Since JSON is 'namespace-less', you probably can solve this by defining several copies of the same xsd with a different namespace, one for each fault. The choosen namespace in the xquery would then be the selector for the fault. But since the underlying element names are the same, it would not differ in the resulting JSON-message. Of course in an XML result it would differ.

MySQL : What management tools do you use?

Tim Hall - Fri, 2014-10-31 02:35

A quick question out to the world. What management tools do you use for MySQL?

We currently have:

  • MySQL Workbench : It’s OK, but I don’t really like it. It feels like half a product compared to tools I’ve used for other database engines…
  • phpMyAdmin : I’ve used this on and off for over a decade for my own website. While I’m typing this sentence, they’ve probably released 4 new versions. :) We have an installation of this which we use to access our MySQL databases should the need arise.
  • mysql Command Line : I use the command line and a variety of scripts for the vast majority of the things I do.

When I’m working with Oracle, my first port of call for any situation is to use SQL*Plus along with a variety of scripts I’ve created over the years. The performance stuff in Cloud Control (if you’ve paid for the Diagnostics and Tuning option) is the big exception to that of course.

I still consider myself a newbie MySQL administrator, but I’ve found myself spending more and more time at the command line, to the point where I rarely launch MySQL Workbench or phpMyAdmin these days. I’m wondering if that is common to other MySQL administrators, or if it is a carry over from my Oracle background…

Enquiring minds need to know!



MySQL : What management tools do you use? was first posted on October 31, 2014 at 9:35 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Index Advanced Compression vs. Bitmap Indexes (Candidate)

Richard Foote - Thu, 2014-10-30 23:59
A good question from Robert Thorneycroft I thought warranted its own post. He asked: “I have a question regarding bitmapped indexes verses index compression. In your previous blog titled ‘So What Is A Good Cardinality Estimate For A Bitmap Index Column ? (Song 2)’ you came to the conclusion that ‘500,000 distinct values in a 1 […]
Categories: DBA Blogs