Skip navigation.

Feed aggregator

Flipkart and Focus 3 - There’s Something (Profitable) About Your Privacy

Abhinav Agarwal - Wed, 2015-05-20 09:45
The third in my series on Flipkart and focus appeared in DNA on April 18th, 2015.

Part III – There’s Something (Profitable) About Your Privacy
Why do so many companies hanker after apps? Smartphone apps, tablet apps, iOS apps, Android apps, app-this, app-that….
Leave aside for a moment the techno-pubescent excitement that accompanies the launch of every new technology (if you are not old enough to remember words like “client-server[1]”, then “soa[2]” will surely sound familiar enough). Every Marketing 101 course drills into its students that acquiring a new customer is way costlier than retain an existing. Loyal customers (leaving aside the pejorative connotation the word “loyal” carries, implying that customers who shop elsewhere for a better deal are of dubious moral character) are what you should aspire to – that keep buying from you for a longer period of time[3] – and which allows you to refocus your marketing and advertising dollars towards the acquisition of newer customers, faster. If you spend less on unnecessary discounts and expensive retention schemes then margins from existing customers are automatically higher.

Customers can stay loyal if you can build a bond of affinity with them. You should aspire to be more like the local kirana owner (only infinitely richer), who in a perfect world knew everything about you – your likes, dislikes, which festivals you celebrated, and therefore which sweets you would buy, when your relatives came over to stay and what their likes were, what exotic food items you wanted, and so on. And who knew your name. Hence the marketer’s love for loyalty programs[4], no matter that customer loyalty is notoriously difficult to guarantee[5].

In the world of online retailing (actually, it applies just as well to any kind of retailing), how do you get to acquire a deep level of intimacy with your customer? Smartphone apps provide this degree of intimacy that desktop / laptop browsers cannot. This is by simple virtue of the fact that the smartphone travels with the user, the user is constantly logged on to the app, and the app knows where you go and where you are. So no wonder that in December 2011, Amazon offered a “brazen[6]” deal to its customers in brick-and-mortar stores to do an “in-store” price-check of items using the Amazon Price Check app[7], and if the same product was available on Amazon, get it at a discount off the store’s price. Though termed “not a very good deal[8]”, it nonetheless angered[9] the Retail Industry Leaders Association, and elsewhere was described as “Evil But It's the Future[10]”. The combination of availability – the app was installed on the smartphone that was with the user – and the integrated capabilities in the device – a camera that fed into a barcode scanner app –made this possible. The appeal of apps is undeniable.

The magical answer is – “app”. Your best-thing-since-sliced-bread app is installed on the customer’s smartphone (or tablet or phablet), is always running (even when it is not supposed to be running), knows everyone in your contacts (from your proctologist to the illegal cricket bookie), can hear what you speak (even your TV can do this now[11]), knows where you are, who you call, what text messages you send and receive, knows what other apps you have installed on your smartphone (presumably so it can see how potentially disloyal you could be), which Wi-Fi networks you connect to, access what photos and videos you have taken (naughty!) and so on and so forth. All this the better to hear you with, the better to see you with, and ultimately the better to eat you (your wallet) with – with due apologies to Little Red Riding Hood[12]. You may want to take a closer look at the permissions your favorite app wants when you install it – like Amazon India[13], eBay[14], Flipkart[15], Freecharge[16], HomeShiop18[17], Jabong[18], MakeMyTrip[19], Myntra[20], SnapDeal[21]. Great minds do seem to think alike, don’t they?

[Technical aside: I covered the red herrings thrown in favour of apps in the first part, but here is some more… You can store more data, more effectively, and process that data better using an app than you can with a plain browser-based approach. True. But not quite. The ever-evolving world of HTML5 (the standard that underpins how information is structured and presented on the web) has progressed to make both these points moot – with offline storage[22] and local SQL database support[23]. Yes, there are arguments to be made about handling large amounts of data offline with browser-based mechanisms, but these are for the most part edge-cases. To be fair, there are some high-profile cases of companies switching to native apps after experimenting with HTML5-based apps (hybrid apps that wrapped a browser-based UI with a native shell), like LinkedIn[24] and Facebook[25]. The appeal of apps therefore is undeniable. But, as I argued earlier, the appeal of apps does not negate the utility of browser-based interfaces.]

What is all this useful for? Your app now knows that you Ram, Shyam, and Laxman in your contacts have birthdays coming up, and it can suggest an appropriate gift for them. Convenient, isn’t it? While driving to work, you can simply tell your app – speak out the commands – to search for the latest perfume that was launched last week and to have it gift wrapped and delivered to your wife. The app already has your credit card details, and it knows your address. Your app knows that you are going on a vacation next week (because it can access your calendar, your SMS-es, and perhaps even your email) to Sikkim; it helpfully suggests a wonderful travel book and some warm clothing that you may need. The imagined benefits are immense.

But, there is a distinctly dark side to apps – as it relates to privacy – that should be a bigger reason of concern for customers and smartphone users alike. Three sets of examples should suffice.
You get a flyer from your favourite brick-and-mortar store, letting you know that you can buy those items that your pregnant daughter will need in the coming weeks. You head over to the store, furious – because your daughter is most certainly not pregnant. Later you find out that she is, and that the store hadn’t made a mistake. It turns out the truth is a little more subtler than that[26], and a little more sedate than what tabloid-ish coverage - with headlines like “How Companies Learn Your Secrets[27]” - made it out to be (the original presentation made at the PAW Conference is also available online[28]).

There are enough real dangers in this world without making it easier to use technology to make it even more unsafe. Considering how unsafe[29] air travel can be for women[30] and even girls[31], one has to question the wisdom of making it even[32] more so[33]. If this does not creep you out, then perhaps the Tinder app – which uses your location and “displays a pile of snapshots of potential dates in a user’s immediate area”[34], to as close as within 100 feet[35] - may give you pause for thought.

Do apps need all the permissions they ask for? No. But, … no! Would they work if they didn’t have all those permissions? 99% of the time, yes – they would work without a problem. For example, an app would need to access your camera if you wanted to scan a barcode to look up a product. The app would need access to your microphone if you wanted to speak out your query rather than type it in the app. What if you don’t particularly care about pointing your camera at the back of books to scan their barcodes, or speaking like Captain Kirk into your phone? Sorry, you are out of luck. You cannot selectively choose to not grant to certain privileges to an app – at least on a device running the Android mobile operating system. In other words, it is a take-it-or-leave-it world, where the app developer is in control. Not you. And wanting to know your location? Even if you are a dating app, it’s still creepy.

But surely app makers will ask you before slurping your very personal, very private information to its servers in the cloud? Yes, of course – you believe that to be true, especially if you are still in kindergarten.

A few weeks before its IPO[36], JustDial’s app was removed from the Google Play Store[37]. It was alleged that the updated version of the JustDial app had “started retrieving and storing the user’s entire phone book, without a warning or disclaimer. [38],[39]” Thereafter, JustDial’s mobile “Terms and Conditions” were updated to include the following line: “You hereby give your express consent to Justdial to access your contact list and/or address book for mobile phone numbers in order to provide and use the Service.[40]

In 2013, US-based social networking app Path was caught as it “secretly copied all its users’ iPhone address books to its private servers.”[41] Action was swift. The FTC investigated and reached a settlement with Path, which required “Path, Inc. to establish a comprehensive privacy program and to obtain independent privacy assessments every other year for the next 20 years. The company also will pay $800,000 to settle charges that it illegally collected personal information from children without their parents’ consent.”[42] In the US, a person’s address book “is protected under the First Amendment[43].” When the controversy erupted, it was also reported that “A person’s contacts are so sensitive that Alec Ross, a senior adviser on innovation to Secretary of State Hillary Rodham Clinton, said the State Department was supporting the development of an application that would act as a “panic button” on a smartphone, enabling people to erase all contacts with one click if they are arrested during a protest[44].” Of course, politics is not without its dose of de-rigueur dose of irony. That dose was delivered in 2015 when it emerged that Hillary Clinton had maintained a private email account even as she was Secretary of State in the Barack Obama presidency and refused to turn over those emails[45].

So what happened to Just Dial for allegedly breaching its users’ privacy? Nothing. No investigation. No fine. No settlement. No admission. No mea-culpa. In short, nothing. It was business as usual.
Apps can be incredibly liberating in eliminating friction in the buying process. But hitching your strategy to an app-only world is needless. It is an expensive choice – from many, many perspectives, and not just monetary. The biggest costs are of making you look immature should you have to reverse direction. As a case-in-point, one can point to the entirely avoidable brouhaha over Flipkart, Airtel, and Net Neutrality[46]. In this battle, no one came smelling like roses, least of all Flipkart, which attracted mostly negative attention[47] from the ill-advised step, notwithstanding post-fact attempts to bolt the stable door[48].

Let me end with an analogy. The trackpad on your laptop is very, very useful. Do you then disable the use of an externally connected mouse?

Disclaimer: views expressed are personal.

[1] "Computerworld - Google Books",
[2] "SOA: Hype vs. Reality - Datamation",
[3] "How Valuable Are Your Customers? - HBR",
[4] "Loyalty programmes: Are points that consumers stockpile juicy enough to keep them coming back? - timesofindia-economictimes",
[5] "What Loyalty? High-End Customers are First to Flee — HBS Working Knowledge",
[6] "Amazon's Price Check App Undercuts Brick-and-Mortar Stores Prices |",
[7] " Help: About the Amazon Price Check App",
[8] "Amazon pushing Price Check app with controversial online discounts | The Verge",
[9] "Retail association pissed about's Price Check app - GeekWire",
[10] "Amazon Price Check May Be Evil But It's the Future - Forbes",
[11] "Samsung smart TV issues personal privacy warning - BBC News",
[12] "Little Red Riding Hood - Wikipedia, the free encyclopedia",
[22] "Web Storage",
[23] "Offline Web Applications",
[24] "Why LinkedIn dumped HTML5 & went native for its mobile apps | VentureBeat | Dev | by J. O'Dell",
[25] "Mark Zuckerberg: Our Biggest Mistake Was Betting Too Much On HTML5 | TechCrunch",
[26] "Did Target Really Predict a Teen’s Pregnancy? The Inside Story",
[27] "How Companies Learn Your Secrets -",
[28] "Predictive Analytics World Conference: Agenda - October, 2010",
[29] "Federal judge upholds verdict that North Bergen man molested woman on flight ‹ Cliffview Pilot",
[30] "Man accused of groping woman on flight to Newark - NY Daily News",
[31] "Man jailed for molesting girl, 12, on flight to Dubai | The National",
[32] "Virgin is Going to Turn Your Flight Into a Creepy Bar You Can't Leave",
[33] "KLM Introduces A New Way To Be Creepy On An Airplane - Business Insider",
[34] "Tinder Dating App Users Are Playing With Privacy Fire - Forbes",
[35] "Include Security Blog | As the ROT13 turns….: How I was able to track the location of any Tinder user.",
[36], accessed April 11, 2015
[37] "Updated: JustDial App Pulled From Google Play Store; Privacy Concerns? - MediaNama",
[38] "Updated: JustDial App Pulled From Google Play Store; Privacy Concerns? - MediaNama",
[39] "Bad App Reviews for Justdial JD",, accessed April 09, 2015
[40] "Terms Of Use”,, accessed April 09, 2015
[41] "The Path Fiasco Wasn't A Privacy Breach, It Was A Data Ownership Breach - The Cloud to Cloud Backup Blog",
[42] "Path Social Networking App Settles FTC Charges it Deceived Consumers and Improperly Collected Personal Information from Users' Mobile Address Books | Federal Trade Commission",
[43] "Anger for Path Social Network After Privacy Breach -",
[44] Ibid.
[45] "Hillary Clinton deleted 32,000 'private' emails, refuses to turn over server - Washington Times",
[46] "Flipkart Pulls Out of Airtel Deal Amid Backlash Over Net Neutrality",
[47] "Flipkart's stand on net neutrality - The Hindu",

[48] "Our Internet is headed in the right direction: Amod Malviya - Livemint",

© 2015, Abhinav Agarwal (अभिनव अग्रवाल). All rights reserved.

Another Take on Maker Faire 2015

Oracle AppsLab - Wed, 2015-05-20 09:05

Editor’s note: Here’s another Maker Faire 2015 post, this one from Raymond. Check out Mark’s (@mvilrokx) recap too for AppsLab completeness.

I went to the Maker Faire 2015 Bay Area show over the weekend. A lot of similarity to last year, but a few new things.

In place of our spot last year, it was HP-Sprout demo stations. I guess HP is the main sponsor this year.


Sprout is an acquisition by HP, that they build a large touchpad and projector, as attachment to HP computer. It is kind of combination of projector, extended screen, touch screen, and working pad – that seems to blend physical things with virtual computer objects, such as capture objects into 3D graphics.

TechHive’s Mole-A-Whack is quite good station too – it is a reverse of classical Whack-A-Mole.


Here’s a video of it in action:

They use arduino-controlled Mole to whack kids who hide in the mole holes, but need raise head out of the hole cover (which is arduino-monitored), and reach to push a button (MaKey connected) to earn points.

The signals go into a Scratch program on computer for tally the winner.

This pipe organ is an impressive build:


As usual, lots of 3D printers, CNC mills, etc. and lots of drones flying.

Also I saw many college groups attending the events this year, bringing in all kinds of small builds for various applications.Possibly Related Posts:

Troubleshooting ASM Proxy instance startup

Oracle in Action - Wed, 2015-05-20 08:53

RSS content

Recently, I had trouble starting ASM proxy instance on one of the nodes in my  2 node flex cluster having nodes host01 and host02. As a result I could not access the volume I created on an ASM  diskgroup.  This post explains  how I resolved it.

While connected to host01, I created a volume VOL1 on DATA diskgroup with corresponding volume device /dev/asm/vol1-106 .

[grid@host01 root]$ asmcmd volcreate -G DATA -s 300m VOL1

[grid@host01 root]$ asmcmd volinfo -G DATA VOL1

Diskgroup Name: DATA

Volume Name: VOL1
Volume Device: /dev/asm/vol1-106
Size (MB): 320
Resize Unit (MB): 32
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS

I created  ACFS file system on the newly created volume

[root@host01 ~]# mkfs -t acfs /dev/asm/vol1-106

I also created corresponding mount point /mnt/acfsmounts/acfs1 on both the nodes in the cluster.

root@host01 ~]# mkdir -p /mnt/acfsmounts/acfs1

root@host02 ~]# mkdir -p /mnt/acfsmounts/acfs1

When I tried to mount the volume device, I could mount the volume device on host01 but not on host02 .

[root@host01 ~]#mount -t acfs /dev/asm/vol1-106 /mnt/acfsmounts/acfs1

[root@host01 ~]# mount | grep vol1

/dev/asm/vol1-106 on /mnt/acfsmounts/acfs1 type acfs (rw)

[root@host02 ~]# mount -t acfs /dev/asm/vol1-106 /mnt/acfsmounts/acfs1

mount.acfs: CLSU-00100: Operating System function: open64 failed with error data: 2
mount.acfs: CLSU-00101: Operating System error message: No such file or directory
mount.acfs: CLSU-00103: error location: OOF_1
mount.acfs: CLSU-00104: additional error information: open64 (/dev/asm/vol1-106)
mount.acfs: ACFS-02017: Failed to open volume /dev/asm/vol1-106. Verify the volume exists.

The corresponding volume device was visible on host01 but not on host02

[root@host01 ~]# cd /dev/asm
[root@host01 asm]# ls

[root@host02 ~]# cd /dev/asm
[root@host02 asm]# ls

Since ADVM / ACFS utilize an ASM Proxy instance in a flex cluster to access metadata from a local /  remote  ASM instance ,  I checked whether ASM Proxy instance was running on both the nodes and realized that whereas ASM Proxy instance was running on host01, it  was not running on host02

[root@host01 ~]# ps -elf | grep pmon | grep APX

0 S grid 27782 1 0 78 0 – 350502 – 10:09 ? 00:00:00 apx_pmon_+APX1

[root@host02 asm]# ps -elf | grep pmon | grep APX

[root@host01 ~]# srvctl status asm -proxy

ADVM proxy is running on node host01

[root@host01 ~]# crsctl stat res ora.proxy_advm -t
Name Target State Server State details
Local Resources

I tried to start ASM  proxy instance manually on host02

[grid@host02 ~]$ . oraenv
ORACLE_SID = [grid] ? +APX2
The Oracle base has been set to /u01/app/grid

[grid@host02 ~]$ sqlplus / as sysasm

SQL*Plus: Release Production on Sat May 2 10:31:45 2015

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup

ORA-00099: warning: no parameter file specified for ASMPROXY instance
ORA-00443: background process "VUBG" did not start

SQL> ho oerr ORA 00443

00443, 00000, "background process \"%s\" did not start"
// *Cause: The specified process did not start.
// *Action: Ensure that the executable image is in the correct place with
// the correct protections, and that there is enough memory.

I checked the memory allocated to VM for host02 – It was 1.5 GB as against 2.5 GB assigned to VM for host01. I  increased the memory of host02 to 2.5 GB and ASM proxy instance started automatically.

[root@host01 ~]# crsctl stat res ora.proxy_advm -t
Name Target State Server State details
Local Resources

Hope it helps!


Oracle documentation


Related Links :


12c RAC Index

12c RAC: ORA-15477: cannot communicate with the volume driver


Comments:  0 (Zero), Be the first to leave a reply!
You might be interested in this:  
Copyright © ORACLE IN ACTION [Troubleshooting ASM Proxy instance startup], All Right Reserved. 2015.

The post Troubleshooting ASM Proxy instance startup appeared first on ORACLE IN ACTION.

Categories: DBA Blogs

Irrecoverable full backup part II : reporting

Laurent Schneider - Wed, 2015-05-20 08:34

After my post Can you restore from a full online backup ?, I needed to come up with a report.

Assuming that each backup goes in a different directory, I just wrote two reports.

  1. Report gaps in v$backup_redolog (or rc_backup_redolog if you use the catalog)
    ------- ------------- ------------
    /bck01/        284891       285140
    /bck01/        285140       285178
    /bck02/        284891       285140
    === GAP ===
    /bck02/        285178       285245 
    /bck03/        285178       285245
    /bck03/        285245       286931
    /bck03/        286931       287803
    /bck03/        287803       288148

    This could be done with analytics, by checking where the last next_change is not the current first_change, within a directory

    SELECT dir, 
      LAG missing_from_change#, 
      first_change# missing_to_change#
    FROM (
      SELECT REGEXP_REPLACE (handle, '[^/\]+$') dir,
        LAG(next_change#) OVER (
          PARTITION BY REGEXP_REPLACE (handle, '[^/\]+$')
          ORDER BY first_change#
        ) LAG
      FROM v$backup_piece p
      JOIN v$backup_redolog l 
        USING (set_stamp, set_count))
    WHERE LAG != first_change#;
    ------- -------------------- ------------------
    /bck02/               285140             285178
  2. Reports directories where archivelogs don’t include changes (backup redolog) from the earliest to the latest checkpoint (backup datafile)
      REGEXP_REPLACE (handle, '[^/\]+$') dir,
      MIN (checkpoint_change#),
      MAX (checkpoint_change#),
      MIN (first_change#),
      MAX (next_change#)
    FROM v$backup_piece p
      LEFT JOIN v$backup_datafile f 
        USING (set_stamp, set_count)
      LEFT JOIN v$backup_redolog l 
        USING (set_stamp, set_count)
    WHERE handle IS NOT NULL
      MIN (checkpoint_change#) < MIN (first_change#)
      MAX (checkpoint_change#) > MAX (next_change#)
    GROUP BY REGEXP_REPLACE (handle, '[^/\]+$');
    ------- ---------- ---------- ---------- ----------
    /bck04/     954292     954299     959487    1145473

    the archives for the changes from 954292 to 959487 are missing.

If some archive backups are missing in one directory, it does not mean the database is irrecoverable, the archive backups could be in another directory. But it means that single directory would no longer permit you to restore or duplicate.

Another approach with RESTORE PREVIEW was provided by Franck in my previous post : List all RMAN backups that are needed to recover.

Usual disclaimer: there are plenty of other irrecoverabilty causes from hardware defect to backup “optimization” that are beyond the scope of this post.

Tabular Form - Add Rows Top - Universal Theme

Denes Kubicek - Wed, 2015-05-20 06:20
This old example shows how to add rows to the top of the tabular form. Unfortunately this doesn't work with the new Universal Theme. In order to make it working some small changes are required. See this example on how to do it using the new Universal Theme.


Categories: Development

I’m Iouri Chadour and this is how I work

Duncan Davies - Wed, 2015-05-20 06:00

May’s entry in the ‘How I Work’ series is PeopleSoft Blogger Iouri “Yury” Chadour. Yury has been sharing his knowledge on his Working Scripts blog for 7 years, so is a valuable and consistent member of our community. Yury’s site is full of tips, particularly new tools to try and techniques ‘around the edges’ of PeopleSoft.  Thanks, and keep up the good work Yury!


Name: Iouri Chadour

Occupation: Vice President at Lazard Freres
Location: In the office in midtown NYC
Current computer: At work I use either standard Lenovo laptop or my VM client, my own machine is Lenovo X1 Carbon
Current mobile devices: Samsung Galaxy S3, iPad Air 2, Kindle Fire (Original)
I work: best when I have a set goal in mind – I like being able to check off my achievements from the list (more on that below.) As many others fellow bloggers have mentioned – challenge and ability to learn new things on the job are very important as well.

What apps/software/tools can’t you live without?
I use all of these Software Development Tools:

Application Designer
Notepad++ with lots of plugins PeopleCode user Defined language, Compare, AutoSave, NppExport, Explorer to name a few
Firefox with Firebug, AdBlock and Hootsuite
Feedly – this my main tool for following all the blogs and keeping up to date on the news
LastPass – very convenient password management for desktop and phone
KeePass – open source password manager
Toad for Oracle 12
Oracle jDeveloper
Aptana Studio
PeopleSoft TraceMagic
Wunderlist – Android app and Desktop for Taks Management
Microsoft Project or Project Libre
MS Excel
Greenshot Screen Capture
Gimp – basic image editing

Besides your phone and computer, what gadget can’t you live without?
I like my original Kindle Fire – I use it for reading more than any other device.

What’s your workspace like?

What do you listen to while you work?
Listening really depends on the mood at time of the day. I mostly use Slacker Radio to listen to everything from contemporary and classic jazz, Classical to Parisian Electro and House music.

What PeopleSoft-related productivity apps do you use?

App Designer
PeopleSoft Query Client for writing queries
Toad 12
Notepad++ to write and examine code and logs
TraceMagic for more advanced log review
Firefox with Firebug for HTML and JavaScript issues
On occasion Aptana Studio for JavaScript and HTML

Do you have a 2-line tip that some others might not know?
If I am stuck with a very difficult problem and can’t seem to find a good solution – I usually leave it and do something else – at some point the solution or a correct directions usually comes to my mind on it’s own.

What SQL/Code do you find yourself writing most often?
Since I work with a lot of Financials Modules so everything related to those modules. I do also write some tools related SQLs when I need to examine Process Scheduler tables.

What would be the one item you’d add to PeopleSoft if you could?
Code completion and Code/Project navigator – I use Notepad++ for now.

What everyday thing are you better at than anyone else?
I do not think I do something in particular better than anyone else, but I believe that I can be more efficient about some things than some of the people.

What’s the best advice you’ve ever received?
My family and my friends provided me with a lot of advice and support and I am greatly thankful for them being present in my life. But I do like the following quote:
“The more things that you read , the more things you will know. The more you learn, the more places you’ll go.” – Dr. Seuss

MemSQL 4.0

DBMS2 - Wed, 2015-05-20 03:41

I talked with my clients at MemSQL about the release of MemSQL 4.0. Let’s start with the reminders:

  • MemSQL started out as in-memory OTLP (OnLine Transaction Processing) DBMS …
  • … but quickly positioned with “We also do ‘real-time’ analytic processing” …
  • … and backed that up by adding a flash-based column store option …
  • … before Gartner ever got around to popularizing the term HTAP (Hybrid Transaction and Analytic Processing).
  • There’s also a JSON option.

The main new aspects of MemSQL 4.0 are:

  • Geospatial indexing. This is for me the most interesting part.
  • A new optimizer and, I suppose, query planner …
  • … which in particular allow for serious distributed joins.
  • Some rather parallel-sounding connectors to Spark. Hadoop and Amazon S3.
  • Usual-suspect stuff including:
    • More SQL coverage (I forgot to ask for details).
    • Some added or enhanced administrative/tuning/whatever tools (again, I forgot to ask for details).
    • Surely some general Bottleneck Whack-A-Mole.

There’s also a new free MemSQL “Community Edition”. MemSQL hopes you’ll experiment with this but not use it in production. And MemSQL pricing is now wholly based on RAM usage, so the column store is quasi-free from a licensing standpoint is as well.

Before MemSQL 4.0, distributed joins were restricted to the easy cases:

  • Two tables are distributed (i.e. sharded) on the same key.
  • One table is small enough to be broadcast to each node.

Now arbitrary tables can be joined, with data reshuffling as needed. Notes on MemSQL 4.0 joins include:

  • Join algorithms are currently nested-loop and hash, and in “narrow cases” also merge.
  • MemSQL fondly believes that its in-memory indexes work very well for nested-loop joins.
  • The new optimizer is fully cost-based (but I didn’t get much clarity as to the cost estimators for JSON).
  • MemSQL’s indexing scheme, skip lists, had histograms anyway, with the cutesy name skiplistogram.
  • MemSQL’s queries have always been compiled, and of course have to be planned before compilation. However, there’s a little bit of plan flexibility built in based on the specific values queried for, aka “parameter-sensitive plans” or “run-time plan choosing”.

To understand the Spark/MemSQL connector, recall that MemSQL has “leaf” nodes, which store data, and “aggregator” nodes, which combine query results and ship them back to the requesting client. The Spark/MemSQL connector manages to skip the aggregation step, instead shipping data directly from the various MemSQL leaf nodes to a Spark cluster. In the other direction, a Spark RDD can be saved into MemSQL as a table. This is also somehow parallel, and can be configured either as a batch update or as an append; intermediate “conflict resolution” policies are possible as well.

In other connectivity notes:

  • MemSQL’s idea of a lambda architecture involves a Kafka stream, with data likely being stored twice (in Hadoop and MemSQL).
  • MemSQL likes and supports the Spark DataFrame API, and says financial trading firms are already using it.

Other application areas cited for streaming/lambda kinds of architectures are — you guessed it! — ad-tech and “anomaly detection”.

And now to the geospatial stuff. I thought I heard:

  • A “point” is actually a square region less than 1 mm per side.
  • There are on the order of 2^30 such points on the surface of the Earth.

Given that Earth’s surface area is a little over 500,000,000 square meters, I’d think 2^50 would be a better figure, but fortunately that discrepancy doesn’t matter to the rest of the discussion. (Edit: As per a comment below, that’s actually square kilometers, so unless I made further errors we’re up to the 2^70 range.)

Anyhow, if the two popular alternatives for geospatial indexing are R-trees or space-filling curves, MemSQL favors the latter. (One issue MemSQL sees with R-trees is concurrency.) Notes on space-filling curves start:

  • In this context, a space-filling curve is a sequential numbering of points in a higher-dimensional space. (In MemSQL’s case, the dimension is two.)
  • Hilbert curves seem to be in vogue, including at MemSQL.
  • Nice properties of Hilbert space-filling curves include:
    • Numbers near each other always correspond to points near each other.
    • The converse is almost always true as well.*
    • If you take a sequence of numbers that is simply the set of all possibilities with a particular prefix string, that will correspond to a square region. (The shorter the prefix, the larger the square.)

*You could say it’s true except in edge cases … but then you’d deserve to be punished.

Given all that, my understanding of the way MemSQL indexes geospatial stuff — specifically points and polygons — is:

  • Points have numbers assigned to them by the space-filling curve; those are indexed in MemSQL’s usual way. (Skip lists.)
  • A polygon is represented by its vertices. Take the longest prefix they share. That could be used to index them (you’d retrieve a square region that includes the polygon). But actually …
  • … a polygon is covered by a union of such special square regions, and indexed accordingly, and I neglected to ask exactly how the covering set of squares was chosen.

As for company metrics — MemSQL cites >50 customers and >60 employees.

Related links

Categories: Other

Indexing and Transparent Data Encryption Part II (Hide Away)

Richard Foote - Wed, 2015-05-20 02:03
In Part I, I quickly ran through how to setup an encrypted tablespace using Transparent Data Encryption and to take care creating indexes outside of these tablespaces. Another method of encrypting data in the Oracle database is to just encrypt selected columns. Although the advantage here is that we can just encrypt sensitive columns of interest (and that the […]
Categories: DBA Blogs

Row Store vs Column Store in SAP HANA

Yann Neuhaus - Wed, 2015-05-20 00:00

The SAP HANA database allows you to create your tables in Row or Column Store mode. In this blog, I will demonstrate that each method has its advantages and disadvantages and should be used for specific cases.

Thanks to two kind of tests, I will show you that the Row Store mode should be used for simple SELECT SQL queries, without aggregation and the Column Store mode should be used for complex SELECT queries, containing aggregation levels.

If you want to have more information regarding the Column Store or the In-memory technologies, don't hesitate to assist at the next dbi services event:

Test 1: Simple SELECT query Goal of the tests

This test will show you the difference of performance using a Row Store and a Column Store table in a simple SQL query.

Description of the test

A SELECT query will be send to the database and we will check the Server time response.

SQL Query Using a Row Store table

The SQL is the following:


Using a Column Store table

The SQL is the following:


Tables Row Store Table

You can find here information regarding the Row Store table used in the test.

Name:                 SALES_ROW

Table type:          Row Store

Row count:         10 309 873

Index:                1

Partition:            0 (SAP HANA doesn’t allow the possibility to create partition on Row Store table)




Column Store Table

You can find here information regarding the Column Store table used in the test.

Name:                  SALES_COLUMN

Table type:           Column Store

Row count:          10 309 873

Index:                 0 (SAP HANA automatically apply a index if it is need)

Partition:             1 RANGE partition on CUST_ID


Result of the test Using the Row Store table


Using the Column Store table


Test 2: Complex SELECT query Goal of the tests

This test will show you the difference of performance using a Row Store and a Column Store table in a complex SQL query.

Description of the test

A SELECT query will be send to the database and we will check the Server time response.

SQL Query Using a Row Store table

The SQL is the following:


Using a Column Store table

The SQL is the following:


Tables Row Store Fact Table

You can find here information regarding the Row Store table used in the test.

Name:                  SALES_ROW

Table type:          Row Store

Row count:         10 309 873

Index:                   2

Partition:             0 (SAP HANA doesn’t allow the possibility to create partition on Row Store table)

Column Store Fact Table

You can find here information regarding the Column Store table used in the test.

Name:                  SALES_COLUMN

Table type:          Column Store

Row count:         10 309 873

Index:                   0 (SAP HANA automatically apply a index if it is need)

Partition:             1 RANGE partition on CUST_ID

Result of the test Using the Row Store tables


Using the Column Store tables



Row and Column store modes in SAP HANA should be used in two different contexts:

 - Tables in Row store mode must be used in SELECT queries WITHOUT any aggregation functions

 -Tables in Column store mode are powerful when they are used to create analytical queries or view, using aggregation functions (GROUP BY, …)

The performance can be highly optimized if the tables selected in the queries have the right store mode.




2 minute Tech Tip: Working with JSON in APEX

Dimitri Gielis - Tue, 2015-05-19 16:30
On Monday Bob Rhubart did a video call with me in his series of 2MTT (2 Minute Tech Tip) on YouTube. You find my 2MMT here.

I talked about using JSON and APEX and gave two examples were we use it.
In previous blog posts I gave more details on those techniques. Here's a quick overview:
Categories: Development

Using HBase and Impala to Add Update and Delete Capability to Hive DW Tables, and Improve Query Response Times

Rittman Mead Consulting - Tue, 2015-05-19 16:21

One of our customers is looking to offload part of their data warehouse platform to Hadoop, extracting data out of a source system and loading it into Apache Hive tables for subsequent querying using OBIEE11g. One of the challenges that the project faces though is how to handle updates to dimensions (and in their case, fact table records) when HDFS and Hive are typically append-only filesystems; ideally writes to fact tables should only require INSERTs and filesystem appends but in this case they wanted to use an accumulating fact snapshot table, whilst the dimension tables all used SCD1-type attributes that had their values overwritten when updates to those values came through from the source system.

The obvious answer then was to use Apache HBase as part of the design, a NoSQL database that sits over HDFS but allows updates and deletes to individual rows of data rather than restricting you just to append/inserts. I covered HBase briefly on the blog a few months ago when we used it to store webserver log entries brought into Hadoop via Flume, but in this case it makes an ideal landing point for data coming into our Hadoop system as we can maintain a current-state record of the data brought into the source system updating and overwriting values if we need to. What was also interesting to me though was how well we could integrate this HBase data into our mainly SQL-style data processing; how much Java I’d have to use to work with HBase, and whether we could get OBIEE to connect to the HBase tables and query them directly (with a reasonable response time). In particular, could we use the Hive-on-HBase feature to create Hive tables over the HBase ones, and then query those efficiently using OBIEE, so that the data flow looked like this?


To test this idea out, I took the Flight Delays dataset from the OBIEE11g SampleApp & Exalytics demo data [PDF] and created four HBase tables to hold the data from them, using the BigDataLite 4.1 VM and the HBase Shell. This dataset has four tables:

  • FLIGHT_DELAYS – around 220m US flight records listing the origin airport, destination airport, carrier, year and a bunch of metrics (flights, late minutes, distance etc)
  • GEOG_ORIGIN – a list of all the airports in the US along with their city, state, name and so on
  • GEOG_DEST – a copy of the GEOG_ORIGIN table, used for filtering and aggregating on both origin and destination 
  • CARRIERS – a list of all the airlines associated with flights in the FLIGHT_DELAYS table

HBase is a NoSQL, key/value-store database where individual rows have a key, and then one or more column families made up of one or more columns. When you define a HBase table you only define the column families, and the data load itself creates the columns within them in a similar way to how the Endeca Server holds “jagged” data – individual rows might have different columns to each other and like MongoDB you can define a new column just by loading it into the database.

Using the HBase Shell CLI on the BigDataLite VM I therefore create the HBase tables using just these high-level column family definitions, with the individual columns within the column families to be defined later when I load data into them.

hbase shell
create 'carriers','details'
create 'geog_origin','origin'
create 'geog_dest','dest'
create 'flight_delays','dims','measures'

To get data into HBase tables there’s a variety of methods you can use. Most probably for the full project we’ll write a Java application that uses the HBase client to read, write, update and delete rows that are read in from the source application (see this previous blog post for an example where we use Flume as the source), or to set up some example data we can use the HBase Shell and enter the HBase row/cell values directly, like this for the geog_dest table:

put 'geog_dest','LAX','dest:airport_name','Los Angeles, CA: Los Angeles'
put 'geog_dest','LAX','dest:airport_name','Los Angeles, CA: Los Angeles'
put 'geog_dest','LAX','dest:city','Los Angeles, CA'
put 'geog_dest','LAX','dest:state','California'
put 'geog_dest','LAX','dest:id','12892'

and you can then use the “scan” command from the HBase shell to see those values stored in HBase’s key/value store, keyed on LAX as the key.

hbase(main):015:0> scan 'geog_dest'
ROW                                    COLUMN+CELL                                                                                                     
 LAX                                   column=dest:airport_name, timestamp=1432067861347, value=Los Angeles, CA: Los Angeles                           
 LAX                                   column=dest:city, timestamp=1432067861375, value=Los Angeles, CA                                                
 LAX                                   column=dest:id, timestamp=1432067862018, value=12892                                                            
 LAX                                   column=dest:state, timestamp=1432067861404, value=California                                                    
1 row(s) in 0.0240 seconds

For testing purposes though we need a large volume of rows and entering them all in by-hand isn’t practical, so this is where we start to use the Hive integration that now comes with HBase. For the BigDataLite 4.1 VM all you need to do to get this working is install the hive-hbase package using yum (after first installing the Cloudera CDH5 repo into /etc/yum.repos.d), load the relevant JAR files when starting your Hive shell session, and then create a Hive table over the HBase table mapping Hive columns to the relevant HBase ones, like this:

ADD JAR /usr/lib/hive/lib/zookeeper.jar;
ADD JAR /usr/lib/hive/lib/hive-hbase-handler.jar;
ADD JAR /usr/lib/hive/lib/guava-11.0.2.jar;
ADD JAR /usr/lib/hive/lib/hbase-client.jar;
ADD JAR /usr/lib/hive/lib/hbase-common.jar;
ADD JAR /usr/lib/hive/lib/hbase-hadoop-compat.jar;
ADD JAR /usr/lib/hive/lib/hbase-hadoop2-compat.jar;
ADD JAR /usr/lib/hive/lib/hbase-protocol.jar;
ADD JAR /usr/lib/hive/lib/hbase-server.jar;
ADD JAR /usr/lib/hive/lib/htrace-core.jar;
 (key string,
  carrier_desc string
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
("hbase.columns.mapping" = ":key,details:carrier_desc")
TBLPROPERTIES ("" = "carriers");
CREATE EXTERNAL TABLE hbase_geog_origin
 (key string,
  origin_airport_name string,
  origin_city string,
  origin_state string,
  origin_id string
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
("hbase.columns.mapping" = ":key,origin:airport_name,origin:city,origin:state,origin:id")
TBLPROPERTIES ("" = "geog_origin");
 (key string,
  dest_airport_name string,
  dest_city string,
  dest_state string,
  dest_id string
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
("hbase.columns.mapping" = ":key,dest:airport_name,dest:city,dest:state,dest:id")
TBLPROPERTIES ("" = "geog_dest");
CREATE EXTERNAL TABLE hbase_flight_delays
 (key string,
  year string,
  carrier string,
  orig string,
  dest string,
  flights tinyint,
  late   tinyint,
  cancelled bigint,
  distance smallint
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
("hbase.columns.mapping" = ":key,dims:year,dims:carrier,dims:orig,dims:dest,measures:flights,measures:late,measures:cancelled,measures:distance")
TBLPROPERTIES ("" = "flight_delays");

Bulk loading data into these Hive-on-HBase tables is then just a matter of loading the source data into a regular Hive table, and then running INSERT INTO TABLE … SELECT commands to copy the regular Hive rows into the HBase tables via their Hive metadata overlays:

insert into table hbase_carriers                           
select carrier, carrier_desc from carriers;
insert into table hbase_geog_origin
select * from geog_origin;
insert into table hbase_geog_dest
select * from geog_dest;
insert into table hbase_flight_delays
select row_number() over (), * from flight_delays;

Note that I had to create a synthetic sequence number key for the fact table, as the source data for that table doesn’t have a unique key for each row – something fairly common for data warehouse fact table datasets. In fact storing fact table data into a HBase table is not a very good idea for a number of reasons that we’ll see in a moment, and bear-in-mind that HBase is designed for sparse datasets and low-latency inserts and row retrievals so don’t read too much into this approach yet.

So going back to the original reason for using HBase to store these tables, updating rows within them is pretty straightforward. Taking the geog_origin HBase table at the start, if we get the row for SFO at the start using a Hive query over the HBase table, it looks like this:

hive> select * from hbase_geog_origin where key = 'SFO'; 
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
SFO   San Francisco, CA: San Francisco   San Francisco, CA   California   14771
Time taken: 29.126 seconds, Fetched: 1 row(s)

To update that row and others, I can load a new data file into the Hive table using HiveQL’s LOAD DATA command, or INSERT INTO TABLE … SELECT from another Hive table containing the updates, like this:

insert into table hbase_geog_origin    
select * from origin_updates;

To check that the value has in-fact updated I can either run the same SELECT query against the Hive table over the HBase one, or drop into the HBase shell and check it there:

hbase(main):001:0> get 'geog_origin','SFO'
COLUMN                                 CELL                                                                                                           
 origin:airport_name                   timestamp=1432050681685, value=San Francisco, CA: San Francisco International                                  
 origin:city                           timestamp=1432050681685, value=San Francisco, CA                                                               
 origin:id                             timestamp=1432050681685, value=14771                                                                           
 origin:state                          timestamp=1432050681685, value=California                                                                      
4 row(s) in 0.2740 seconds

In this case the update file/Hive table changed the SFO airport name from “San Francisco” to “San Francisco International”. I can change it back again using the HBase Shell like this, if I want:

put 'geog_origin','SFO','origin:airport_name','San Francisco, CA: San Francisco'

and then checking it again using the HBase Shell’s GET command on that key value shows it’s back to the old value – HBase actually stores X number of versions of each cell with a timestamp for each version, but by default it shows you the current one:

hbase(main):003:0> get 'geog_origin','SFO'
COLUMN                                 CELL                                                                                                           
 origin:airport_name                   timestamp=1432064747843, value=San Francisco, CA: San Francisco                                                
 origin:city                           timestamp=1432050681685, value=San Francisco, CA                                                               
 origin:id                             timestamp=1432050681685, value=14771                                                                           
 origin:state                          timestamp=1432050681685, value=California                                                                      
4 row(s) in 0.0130 seconds

So, so far so good. We’ve got a way of storing data in Hive-type tables on Hadoop and a way of updating and amending records within them by using HBase as the underlying storage, but what are these tables like to query? Hive-on-HBase tables with just a handful of HBase rows return data almost immediately, for example when I create a copy of the geog_dest HBase table and put just a single row entry into it, then query it using a Hive table over it:

hive> select * from hbase_geog_dest2;
LAXLos Angeles, CA: Los AngelesLos Angeles, CACalifornia12892
Time taken: 0.257 seconds, Fetched: 1 row(s)

Hive in this case even with a single row would normally take 30 seconds or more to return just that row; but when we move up to larger datasets such as the flight delays fact table itself, running a simple row count on the Hive table and then comparing that to the same query running against the Hive-on-HBase version shows a significant time-penalty for the HBase version:

hive> select sum(cast(flights as bigint)) as flight_count from flight_delays;
Total jobs = 1
Launching Job 1 out of 1
Total MapReduce CPU Time Spent: 7 seconds 670 msec
Time taken: 37.327 seconds, Fetched: 1 row(s)

compared to the Hive-on-HBase version of the fact table:

hive> select sum(cast(flights as bigint)) as flight_count from hbase_flight_delays;
Total jobs = 1
Launching Job 1 out of 1
Total MapReduce CPU Time Spent: 1 minutes 19 seconds 240 msec
Time taken: 99.154 seconds, Fetched: 1 row(s)

And that’s to be expected; as I said earlier, HBase is aimed at low-latency single-row operations rather than full table scan, aggregation-type queries, so it’s not unexpected that HBase performs badly here, but the response time is even worse if I try and join the HBase-stored Hive fact table to one or more of the dimension tables also stored in HBase.

In our particular customer example though these HBase tables were only going to be loaded once-a-day, so what if we copy the current version of each HBase table row into a snapshot Hive table stored in regular HDFS storage, so that our data loading process looks like this:


and then OBIEE queries the snapshot of the Hive-on-HBase table joined to the dimension table still stored in HBase, so that the query side looks like this:


Let’s try it out by taking the original Hive table I used earlier on to load the hbase_flight_delays table. and join that to one of the Hive-on-HBase dimension tables; I’ll start first by creating a baseline response time by joining that source Hive fact table to the source Hive dimension table (also used earlier to load the corresponding Hive-on-HBase table):

select sum(cast( as bigint)) as flight_count, o.origin_airport_name from flight_delays f 
join geog_origin o on f.orig = o.origin                                                             
and o.origin_state = 'California'                                                                       
group by o.origin_airport_name; 
17638Arcata/Eureka, CA: Arcata
9146Bakersfield, CA: Meadows Field
125433Burbank, CA: Bob Hope
1653Santa Maria, CA: Santa Maria Public/Capt. G. Allan Hancock Field
Time taken: 43.896 seconds, Fetched: 27 row(s)

So that’s just under 44 seconds to do the query entirely using regular Hive tables. So what if I swap-out the regular Hive dimension table for the Hive-on-HBase version, how does that affect the response time?

hive> select sum(cast( as bigint)) as flight_count, o.origin_airport_name from flight_delays f       
    > join hbase_geog_origin o on f.orig = o.key                                                        
    > and o.origin_state = 'California'                                                                 
    > group by o.origin_airport_name;
17638Arcata/Eureka, CA: Arcata
9146Bakersfield, CA: Meadows Field
125433Burbank, CA: Bob Hope
1653Santa Maria, CA: Santa Maria Public/Capt. G. Allan Hancock Field
Time taken: 51.757 seconds, Fetched: 27 row(s)

That’s interesting – even though we used the (updatable) Hive-on-HBase dimension table in the query, the response time only went up a few seconds to 51, compared to the 44 when we used just regular Hive tables. Taking it one step further though, what if we used Cloudera Impala as our query engine and copied the Hive-on-HBase fact table into a Parquet-stored Impala table, so that our inward data flow looked like this:


By using the Impala MPP engine – running on Hadoop but directly reading the underlying data files, rather than going through MapReduce as Hive does – and in-addition storing its data in column-store query-orientated Parquet storage, we can take advantage of OBIEE’s new support for Impala and potentially bring the query response time even further. Let’s go into the Impala Shell on the BigDataLite 4.1 VM, update Impala’s view of the Hive Metastore table data dictionary, and then create the corresponding Impala snapshot fact table using a CREATE TABLE … AS SELECT Impala SQL command:

[oracle@bigdatalite ~]$ impala-shell
[bigdatalite.localdomain:21000] > invalidate metadata;
[bigdatalite.localdomain:21000] > create table impala_flight_delays
                                > stored as parquet
                                > as select * from hbase_flight_delays;

Now let’s use the Impala Shell to join the Impala version of the flight delays table with data stored in Parquet files, to the Hive-on-HBase dimension table created earlier within our Hive environment:

[bigdatalite.localdomain:21000] > select sum(cast( as bigint)) as flight_count, o.origin_airport_name from impala_flight_delays f
                                > join hbase_geog_origin o on f.orig = o.key
                                > and o.origin_state = 'California'  
                                > group by o.origin_airport_name;
Query: select sum(cast( as bigint)) as flight_count, o.origin_airport_name from impala_flight_delays f
join hbase_geog_origin o on f.orig = o.key
and o.origin_state = 'California'
group by o.origin_airport_name
| flight_count | origin_airport_name                                              |
| 31907        | Fresno, CA: Fresno Yosemite International                        |
| 125433       | Burbank, CA: Bob Hope                                            |
| 1653         | Santa Maria, CA: Santa Maria Public/Capt. G. Allan Hancock Field |
Fetched 27 row(s) in 2.16s

Blimey – 2.16 seconds, compared to the best time of 44 seconds we go earlier when we just used regular Hive tables, let alone join to the dimension table stored in HBase. Let’s crank-it-up a bit and join another dimension table in, filtering on both origin and destination values:

[bigdatalite.localdomain:21000] > select sum(cast( as bigint)) as flight_count, o.origin_airport_name from impala_flight_delays f
                                > join hbase_geog_origin o on f.orig = o.key
                                > join hbase_geog_dest d on f.dest = d.key
                                > and o.origin_state = 'California'  
                                > and d.dest_state = 'New York'
                                > group by o.origin_airport_name;
Query: select sum(cast( as bigint)) as flight_count, o.origin_airport_name from impala_flight_delays f
join hbase_geog_origin o on f.orig = o.key
join hbase_geog_dest d on f.dest = d.key
and o.origin_state = 'California'
and d.dest_state = 'New York'
group by o.origin_airport_name
| flight_count | origin_airport_name                                   |
| 947          | Sacramento, CA: Sacramento International              |
| 3880         | San Diego, CA: San Diego International                |
| 4030         | Burbank, CA: Bob Hope                                 |
| 41909        | San Francisco, CA: San Francisco International        |
| 3489         | Oakland, CA: Metropolitan Oakland International       |
| 937          | San Jose, CA: Norman Y. Mineta San Jose International |
| 41407        | Los Angeles, CA: Los Angeles International            |
| 794          | Ontario, CA: Ontario International                    |
| 4176         | Long Beach, CA: Long Beach Airport                    |
Fetched 9 row(s) in 1.48s

Even faster. So that’s what we’ll be going with as our initial approach for the data loading and querying; load data into HBase tables as planned at the start, taking advantage of HBase’s CRUD capabilities but bulk-loading and initially reading the data using Hive tables over the HBase ones; but then, before we make the data available for querying by OBIEE, we copy the current state of the HBase fact table into a Parquet-stored Impala table, using Impala’s ability to work with Hive tables and metadata and create joins across both Impala and Hive tables, even when one of the Hive tables uses HBase as its underlying storage.

Categories: BI & Warehousing

List all RMAN backups that are needed to recover

Yann Neuhaus - Tue, 2015-05-19 09:49

This blog post is something I had in draft and Laurent Schneider blog post reminds me to publish it. With the right RMAN configuration you should not have to managed backup files yourself. The RMAN catalog knows them and RMAN should be able to access them. If you want to keep a backup for a long time, you just tell RMAN to keep it.
But sometimes, RMAN is not connected to your tape backup software, or the backups are not shared on all sites, and you have to restore or copy the set of files that is needed for a restore database or a duplicate database.

A customer was in that case, identifying the required files from their names because they are all timestamped with the beginning of the backup job. It's our DMK default. In order to rely on that, the 'backup database plus archivelog' was run. And in order to be sure to have all archived logs in those backup sets, any concurrent RMAN job are blocked during that database backup. Because if a concurrent job is doing some archivelog backups, they will be timestamped differently.

RPO and availability

I don't like that. I don't want that anything can block the backup of archived logs.
They are critical for two reasons:

  • The Recovery Point Objective is not fulfilled if some archivelog backups are delayed
  • The frequency of archivelog backup is also defined to prevent a full FRA
But if we allow concurrent backup of archived logs, we need something else to be able to identify the whole set of files that are needed to restore the database at that point in time. then my suggestion was to generate the list of those files after each database backup, and keep that list. When we need to restore that backup, then we can send the list to the backup team ans ask them to restore them.

The script

Here is my script, I'll explain later:

echo "restore controlfile preview; restore database preview;" | rman target / | awk '
/Finished restore at /{timestamp=$4}
/Recovery must be done beyond SCN /{if ($7>scn) scn=$7 }
/^ *(Piece )Name: / { sub(/^ *(Piece )Name: /,"") ; files[$0]=1 }
END{ for (i in files) print i > "files-"timestamp"-SCN-"scn".txt" }
this script generate the following file:
which list the files needed to do a RESTORE/RECOVER UNTIL SCN 47682382860

the content of the file is:

oracle@dbzhorap01:/home/oracle/ [DB01PP1] sort files-20150519019910-SCN-47682382860.txt
and lists the backup pieces for the incremental 0, incremental 1 and archivelogs needed to recover to a consistent state that can be opened. The script lists only backup sets so we are supposed have have backed up the latest archived logs (with backup database plus archivelog for example).

You can put an 'until scn'^but my primary goal was to run it just after a backup database in order to know which files have to be restored to get that backup (restore or duplicate).

Restore preview

The idea is to rely on RMAN to find the files that are needed to restore and recover rather than doing it ourselves from the recovery catalog. RMAN provides the PREVIEW restore for that:

RMAN> restore database preview
Starting restore at 20150501390436
using channel ORA_DISK_1
using channel ORA_DISK_2
using channel ORA_DISK_3
using channel ORA_DISK_4

List of Backup Sets

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
166388  Incr 0  10.53G     DISK        00:52:56     20150516031010
        BP Key: 166388   Status: AVAILABLE  Compressed: YES  Tag: WEEKLY
        Piece Name: /u00/app/oracle/admin/DB01PP/backup/20150516_023003_inc0_DB01PP_961537327_s168278_p1.bck
  List of Datafiles in backup set 166388
  File LV Type Ckp SCN    Ckp Time       Name
  ---- -- ---- ---------- -------------- ----
  1    0  Incr 47581173986 20150516023945 +U01/DB01Pp/datafile/system.329.835812499
  2    0  Incr 47581173986 20150516023945 +U01/DB01Pp/datafile/undotbs1.525.835803187
  10   0  Incr 47581173986 20150516023945 +U01/DB01Pp/datafile/cpy.676.835815153
  17   0  Incr 47581173986 20150516023945 +U01/DB01Pp/datafile/cpy.347.835815677
  23   0  Incr 47581173986 20150516023945 +U01/DB01Pp/datafile/cpy.277.835814327
  25   0  Incr 47581173986 20150516023945 +U01/DB01Pp/datafile/cpy.342.835811161
BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
167586  Incr 1  216.09M    DISK        00:01:34     20150519012830
        BP Key: 167586   Status: AVAILABLE  Compressed: YES  Tag: DAYLY
        Piece Name: /u00/app/oracle/admin/DB01PP/backup/20150519_010013_inc1_DB01PP_961537327_s169479_p1.bck
  List of Datafiles in backup set 167586
  File LV Type Ckp SCN    Ckp Time       Name
  ---- -- ---- ---------- -------------- ----
  43   1  Incr 47681921440 20150519012700 +U01/DB01Pp/datafile/cpy_idx.346.835815097

List of Backup Sets

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
167594  105.34M    DISK        00:00:23     20150519015400
        BP Key: 167594   Status: AVAILABLE  Compressed: YES  Tag: DAYLY
        Piece Name: /u00/app/oracle/admin/DB01PP/backup/20150519_010013_arc_DB01PP_961537327_s169481_p1.bck

  List of Archived Logs in backup set 167594
  Thrd Seq     Low SCN    Low Time       Next SCN   Next Time
  ---- ------- ---------- -------------- ---------- ---------
  3    59406   47681333097 20150519010239 47682617820 20150519014652
  4    46800   47681333143 20150519010240 47682617836 20150519014652
  1    76382   47681333188 20150519010240 47682618254 20150519014655
  2    60967   47681333315 20150519010242 47682385651 20150519013711

Media recovery start SCN is 47681637369
Recovery must be done beyond SCN 47682382860 to clear datafile fuzziness
Finished restore at 20150501390440
You see the list of datafiles backupsets and archivelog backupsets and at the end you have information about SCN. Let me explain what are those SCNs.

Recovery SCN

Because it is online backup the datafiles are fuzzy. We need to apply redo generaed during backup.

The 'media recovery start SCN' is the begining of the archivelog to be applied:

SQL> select scn_to_timestamp(47681637369) from dual;

19-MAY-15 AM

The 'recovery must be done beyond SCN' is the last redo that must be applied to have datafiles consistent:

SQL> select scn_to_timestamp(47682382860) from dual;

19-MAY-15 AM

In my example, the backup (incremental level 1 + archivelog) started at 01:00:00 and was completed at 01:35:00


And I have a file with the list of backups that are needed to restore or duplicate the database at that point in time. Why do I need that when RMAN is supposed to be able to retrieve them itself? Because sometimes we backup to disk and the disk is backed up to tape without RMAN knowing it. Of course RMAN can connect directly to the tape backup software but that is not for free. Or we want to duplicate to another site where backups are not shared. We need to know which files we have to bring there. And that sometimes requires a request to another team so it's better to have the list of all files we need.

As usual, don't hesitate to comment if you see something to improve in my small script.

Maker Faire 2015

Oracle AppsLab - Tue, 2015-05-19 09:17

This weekend the 10th Annual Maker Faire Bay Area took place in my backyard and rather than fighting traffic for 2 days with the +130,000 attendees I decided, as I have for the last 9 years, to join them.

Unlike last year, Oracle had no presence at the Maker Faire itself, so I had plenty of time to walk around the grounds and attend sessions.  This post is an overview of what I saw and experienced in the 2 day madness that is called the Maker Faire.

For those of you who have never been to the Maker Faire, the easiest way to describe it is as a mix of Burning Man and a completely out of control hobbyist’s garage, where the hobbyist’s hobbies include, but are not limited to: everything tech related, everything food related, everything engineering related and everything art related, all wrapped up in a family friendly atmosphere, my kids love the Maker Faire.

You can find the tech giants of the world next to the one person startup, beer brewers next to crazy knitting contraptions, bus sized, fire breathing rhino’s next to giant cardboard robots etc.  And nobody takes themselves too seriously, e.g. Google was handing out Google Glasses to everybody … Google Safety Glasses that is :-)

Google Safety Goggles

My new Google Glasses :-)

The first thing I noticed was that the Faire expanded . . . again.  A huge tent was erected on what was a parking lot last year that was housing the Make:Labs, I didn’t actually get to spend any time in there but it contained an exploratorium, startup stuff and a section for Young Makers.

Which brings me to the first trend I observed, makers are getting younger and younger and the faire is doubling down on these young folk.

Don’t get me wrong, the faire has always attracted young kids, and some of them were making stuff, but there seem to be more and more of them, the projects they bring are getting more and more impressive and the faire’s expansions all seem to be to cater to these younger makers.

One of the sessions I attended was called “Meet Some Amazing Young Makers” where a 14 year old girl showed of a semi-autonomous robot that could map the inside of caves.  She was showing us the second iteration, she build the first version . . . when she was 8!  Another young man, 13, build a contraption that solved a Rubik’s cube in under 90 seconds.  It wasn’t just that they build these things, they gave solid presentations to a majority adult audience talking about their builds and future plans.

Another trend that was hard to ignore is that the Internet of Things (IoT) is getting huge and it’s definitely here to stay.  There weren’t just many, many vendors promoting their brand of IoT hardware, but a whole ecosystem is developing around them.

From tools that let you visualize all the data collected by your “things” to remote configuration and customization.  This trend will not just Cross the Chasm, it’s going to rocket right passed it.

I attended a panel discussion with Dominic Pajak (Director IoT Segments, ARM), Paul Rothman (Director of R&D at littleBits Electronics), Andrew Witte (CTO, Pebble), Alasdair Allan (scientist, tinkerer) and Pierre Roux (Atmel) about the current state of IoT and the challenges that lay ahead.

One of the interesting points raised during the discussions is that there currently is no such thing as the Internet of Things!  All these “things” have to be tethered to a phone or other internet capable device (typically using BLE), they cannot connect to the internet directly.

Furthermore, they cannot communicate with each other directly.  So it’s not really an IoT rather the regular “human internet” with regular computers/phones connecting to it, which in turn happen to have have some sensors attached to them that use the internet as a communication vehicle, but that doesn’t really roll of the tongue that well.

There is no interoperability standard at the moment so you can’t really have one device talk to a random other device.  This is one of the challenges the panel felt has to be solved in the sort term.  This could happen with the adoption of IP in BLE or some other mechanism like Fog Computing.

Another challenge brought up was securing IoT devices, especially given that some of the devices could be broadcasting extremely personal information.  This will have to be solved at the manufacturing level as well as at the application level.

Finally, they also mentioned that lowering power consumption needs to be a top priority for these devices.  Even though they have already come a long way, there still is a lot of work to be done.  The ultimate goal would be self sufficient devices that need no external power at all but can harvest the energy they need from their environment.

One such example mentioned is a button/switch that when pressed, uses the energy you put in to press it to generate enough power to send a on/off signal to another device.

Massimo Banzi, co-founder of the Arduino Project, also gave a talk (as he does every year) about the State of Arduino.  It seems that a lot of that state is in legal limbo at the moment as there are now seemingly 2 arduino companies ( and with different views of the future of the project.

As part of his vision, Massimo introduced a partnership with Adafruit to let them produce arduino’s in the USA.  Also as a result of the legal issues with the Arduino brand name, he introduced a new “sister” brand called Genuino (Get it? Genuine Arduino) which will allow them to keep producing at least in the US.

Other announcements included the release of the Arduino Gemma, the smallest Arduino ever, the Modulino, a arduino like product designed and produced in their Bangalore, India, office and a focus on online tools to manage and program arduino’s.

I also attended a few sessions that talked about the BeagleBone board.  I am interested in this board because it bridges that gap between the Raspberry Pi and the Arduino, on the one hand it has a Linux OS, but on the other hand it also has Real Time GPIO pins making it interesting for IoT projects that require this.

It also can be easily programmed using JavaScript (it comes with a node server build in) which is something I am currently working with, I’ll probably write up another blog post about my findings with that board when I get some time to play with it (yes, I got one at the Maker Faire :-).

And finally, some other things you can find at the Maker Faire:

Game of Drones:

Fire and Art:


Robots that solve Rubik’s cubes:


Mark.Possibly Related Posts:

Writing tips

Amardeep Sidhu - Tue, 2015-05-19 03:14

Tim Hall has written some brilliant posts about getting going with writing (blogs, whitepapers etc). This post is the result of inspiration from there only. Tim says that just get started with whatever Winking smile.

If you are into blogging and no so active or even if you aren’t you may want to take a look at all the posts to get some inspiration to document the knowledge you gain on day to day basis.

Here is an index to all the posts by Tim till now

Enjoy !

Categories: BI & Warehousing

SQL Server 2014: First Service Pack (SP1) is available

Yann Neuhaus - Tue, 2015-05-19 01:48

May 14th, Microsoft has released the first Service Pack (SP1) for SQL Server 2014. It is more than thirteen months after the RTM version.
SQL Server 2014 Service Pack 1 includes all of the CU from 1 to 5.

Which issues are fixed in this SP1

There are 29 hotfixes:

  • 19 for the Engine
  • 6 for SSRS
  • 3 for SSAS
  • 1 for SSIS



Some improvements are:

  • Performance improvement of Column store with batch mode operators and a new Extended Event
  • Buffer pool extension improvement
  • New cardinality estimator to boost queries performances

Historic of SQL Server 2014

The build version of SQL Server 2014 SP1 is 12.0.4100.1.
Here, a quick overview of SQL Server 2014 builds since the CTP1:

Date SQL Server 2014 version Build

Juin 2013

Community Technology Preview 1 (CTP1)


October 2013

Community Technology Preview 2 (CTP2)


April 2014



April 2014

Cumulative Update 1 (CU1)


June 2014

Cumulative Update 2 (CU2)


August 2014

Cumulative Update 3 (CU3)


October 2014

Cumulative Update 4 (CU4)


December 2014

Cumulative Update 5 (CU5)


May 2015

Service Pack 1 (SP1)


If you need more information about SQL Server 2014 SP1 or to download it, click here.

As a reminder, Service Packs are very critical and important for bug fixing point of view, product upgrade so take care to install it quickly ;-)
See you.

Change first day of week in APEX 5.0 Calendar

Dimitri Gielis - Tue, 2015-05-19 00:52
APEX 5.0 comes with a new calendar region, which is way nicer than the previous calendar in APEX. It has more features, looks better and is also responsive. Behind the scenes you'll see the calendar region is based on Full Calendar.

In Belgium we use Monday as first day of the week, whereas in the US they seem to use Sunday as start of the week in the calendar overview. I've been integrating Full Calendar before, so I knew that library had an option to set the first day of the week. You could either specify an option called firstDay and set that to 1, or you could change the language, and depending the language it would adjust the start day of the week.

In APEX 5.0 I looked for that option, but there's not a specific attribute to set the first day of the week, instead it's based on the language of your application. If you go to Shared Components > Globalization Attributes by default it's set to en, which has Sunday as start of the week. If you set it to en-gb it will have Monday as start of the week.

I searched some time to find how to do it, so hopefully this post will help others to find it more easily. Thanks to Patrick for sharing the way it was implemented.

Categories: Development

SQL Saturday Lisbon: from Francesinha to Bacalhau

Yann Neuhaus - Mon, 2015-05-18 23:45

The last week-end, I was at the SQL Saturday 369 that held in Lisbon. If you take a look at the agenda, you'll probably see that there is a lot of interesting sessions with a lot of famous speakers. Unfortunately, I was not able to attend to all sessions, so I decided to focus only on those that have a direct correlation with my work.

First, 2 "headache" sessions given by Paul White (aka @SQL_Kiwi) about the query optimizer and some internal stuffs. The QO is definitely a very interesting topic and I'm always willing to discover more and more with guys like Paul to improve my skills.

Then, 2 sessions about In-Memory features with SQL Server 2016. In fact, I'm already aware about potential new features about the next SQL Server version, but attending to a session given by Niko Neugebauer about columnstore and discuss about next features adds always a certain value for sure. Thanks Niko and Murilo Miranda for your sessions! 

Finally another "headache" session to finish this day about batch mode and CPU archictectures given by Chris Adkin. We had a very deep dive explaination about batch mode and how it improves performance with CPU savings.  


Moreover, it was also the opportunity to meet some of my SQL Server MVP friends like Jean-Pierre Riehl and Florian Eiden ...




... and have a good dinner with the SQL Saturday staff and other speakers. A lot of countries represented here: Portugal, Germany, UK, New Zealand, France and probably others.




A beautiful city, a good weather, a lot of very good speakers and a very good staff ... maybe the secret sauce of a successful SQL Server event!

I'm pretty sure that it will be the same to the next SQL Saturday in Paris and I will be there (maybe as a speaker this time)

Indexing and Transparent Data Encryption Part I (The Secret Life of Arabia)

Richard Foote - Mon, 2015-05-18 23:42
Database security has been a really hot topic recently so I thought I might write a few posts in relation to indexing and Transparent Data Encryption (TDE) which is available as part of the Oracle Advanced Security option. To protect the database from unauthorized “backed-door” accesses, the data within the actual database files can be encrypted. […]
Categories: DBA Blogs

First Impression for Evodesk Desktop Unboxing

Michael Dinh - Mon, 2015-05-18 18:53

Disclaimer: I am not being paid by anyone to write positive or negative review.

Opinions are my own based on my limited engineering background.

First, packaging is somewhat poor and could be much better for a desk costing close to $1,000 ($886 for my configuration).

Tape coming off.


I hope my desktop is okay.


Taking a look inside. Is that a tiny scratch I see?


After opening the desktop, this is the torn location – not enough foam.


Look at how much love I give it.

Desktop should be shipped in bubble wrap to prevent damage and scratch.

Cable Pass Through is way too small for 30” x 72”.


Most standing desks I was looking at are 1 inch thick.

By no means is this best in class as Evodesk Desktop is 3/4 inch thin.

You won’t find this information anywhere at Evodesk technical specification.


This is the programmer controller.

Openned ziplock bag and was this a returned repackaged?


My picture does not look at good as Evodesk –

I do like th Posi-Loc and was the final selling point.

Hope this is secure and does not spin.



It looks like Evodesk has updated the information for desktop. Either that or I was blind as a bat the first go round.

Renew™ Desktops
  • 100% reclaimed/recycled wood composite desktop
  • EvoGuard™ durable & stylish non-VOC seamless coating
  • Soft comfort edges eliminate nerve compression and pressure fatigue
  • Corners are slightly rounded for improved safety and style
  • Oversized 3” x 6” Cable Pass Through
  • Pre-drilled for quick and easy setup
  • Available sizes: 48″ (30” x 48″ x .75”), 60″ (30” x 60” x .75”), 72″ (30” x 72” x .75”)
  • Meets California Air Resources Board’s (CARB 2) stringent emission standard
  • Backed by a no-nonsense 2-year no-nonsense limited warranty

Webcast - Digital Mobile Cloud Business Opportunities for Partners

Mobile Simplify Enterprise Mobile Connectivity. Mobility has been penetrating the enterprise for the last couple of years, and there is no sign of it slowing down.  In...

We share our skills to maximize your revenue!
Categories: DBA Blogs