Skip navigation.

Pythian Group

Syndicate content
Love Your Data
Updated: 11 hours 55 min ago

Does your business need to become data-driven?

Sun, 2015-11-01 18:52

Find out how the experts answered at Pythian’s Velocity of Innovation event in NYC

This summer I had the pleasure of moderating panel discussion that brought some of IT’s most innovative thought leaders together with some of North America’s top CIOs. This is the second in our series of posts that outline the discussions that took place at our Velocity of Innovation event this summer in New York. Our panel of experts consisted of: Paul Vallé, Pythian’s founder and CEO; Gene Leganza, vice-president, principal analyst serving enterprise architecture professionals at Forrester Research; and Otto Toth, CTO at Huffington Post. Event attendees included IT leaders from across a range of industries who supplied the questions that formed the discussion.

This series of blog posts focuses on topics covered in the New York Velocity of Innovation discussion this past summer. This post concentrates on a discussion between Gene and Paul about the importance of data as a driver of success. The question was: Do we think that every organization must become data-driven or is this just for customer facing marketing organizations?”

Here’s just a sample of the discussion:
Paul: Gene is one of the world’s leading experts on that exact subject. I’m just dying to hear what he’s going to say about this.

Gene: I’m a believer. It’s a really interesting space. But the question is how to get there, and the culture that needs to exist. I’ve had a lot of discussions with vendors doing interesting things with data. It’s funny, if I ask what their software technology does — how it’s being implemented, and the success their customer are having — somehow they always end up saying, “Well, but that’s a technology, the culture is a lot slower to change.”
Somehow the culture needs to change in order to implement a lot of these technologies and make organizations really data-driven. It is not something that’s going to change overnight. It’s largely why, I think, organizations are not yet embracing the notion of a chief data officer role. When I first did surveys on this maybe two years ago, four or five percent of organizations said they had or were planning on hiring a chief data officer in the next year. Now, it’s up to 50 to 60 percent depending on the industry and region. People don’t always do what they say they’re going to do in surveys. But I think what’s behind these responses is an understanding that someone has to lead the charge, not only on the technology and analytic strategy side, but on the cultural side — making the move to change the organization.

There’s so much data out there, and increasingly so, that there’s probably data to support any decision you want to make relating to your business. If you’re not using data to make the decision, you’re just going to be behind the curve. For a while, you can say, “Oh, we have really smart people who have been in the industry forever. They go by their gut and I really trust that, and that’s really cool.” That will be good a little while longer, but sooner or later as your competition starts getting better at using data to understand what to do and get predictive about what to do, you’re just going to be behind the curve. It takes a while to get there so I think people need to get started now.

When you’re writing business decisions, having business conversations, wondering which way the market was going, what people are buying, what the competition is buying, what your customer friends’ friends are buying, are all things you can now find out. Making assumptions about that is really unfounded. I mean, there’s just too much data. It’s really a race to see who can be better at finding the relevant information and getting the answers out of data than the next guy. It’s what it’s coming down to, making it easier and easier for the business people to do that without having to ask IT.

The old school here is if you need to analyze something, you need to ask IT to provision something in the data warehouse so we can do analytics. Now, you need to play with the data for a while before you know what you really want to look at. The ability to monkey with data from multiple sources to explore the answers to questions has become easy to do. If you don’t do it, your competition will, and you’ll be behind the curve. I think that’s really just the bottom line.
But also, you have to be an evangelist and a cheerleader. Being opportunistic and having a cheerleader explicitly lead the charge to get the people to think differently about it so that you can eventually change processes. It eventually gets done.

I recently I talked with someone who felt extremely behind the curve and they only recently did things like looking at data design as integral to the design phase of their system development. They had always, of course, reviewed design of the applications before implementing them, but data was always an afterthought. Applications create data, so what?

Now that they’re trying to get better at data, they had to actually explicitly make sure you got a data person in there designing the data, looking at the impact of what this application’s doing with the landscape or the information architecture they have and reviewing that before going off and actually writing a code.
That was new for them. It’s important for them that they instantiate that in the process. That starts to win over the hearts and minds of the developers who are doing this sort of stuff. In terms of getting all the various players to get it and do their part, using that data instead of flying off the seat of their pants, now we have a lot of socializing and conversation, but baking data-driven things into processes.

It’s not something that you can do upfront and necessarily just expect a process to take over because then it looks like bureaucracy and people hate you for slowing down their life. You have to be very careful about introducing ideas into the organization and then try to bake them into the fabric of the organization before people see the actual value and they’re actually bought into the concept.

Paul: I have something to chime in, it’s actually in contradiction to Gene, but it will prompt, I think, an interesting discussion right along the lines of your question.

I find that one of the nicest ways to look at a business at a holistic level is to put a lens in front of it. I’m just going to use Fresh Direct as an example because I’m in New York City and I’m seeing the trucks all the time. You all live here, you might order their groceries. It’s useful to think of FreshDirect and look at the business through the lens of a grocer. You’re going to talk about the thickness of the steaks, you’re going to talk about the freshness of the vegetables and that’s great. It’s a great way to look at that business. You can also look at that business and think of it as a technology business. To what degree is FreshDirect a technology business that happens to deliver groceries the way is a technology business that happens to deliver books?

That’s a useful way to look at the business. To me, this whole data idea and the chief data scientist idea is the new lens and that’s what’s powerful about it. It lets you look at FreshDirect and say, “Hey, to what degree is Fresh Direct a data business?”

As for Huffington Post, it’s a media business. You need to be able to look through the media lens to you look at the quality of the content. You need to look at your reach and your audience, but to what degree is the Huffington Post a technology business? That’s a new lens. To what degree is the Huffington Post a data business?

That’s where sometimes, not always and not for every business, but sometimes you realize, “Holy cow! I am a data business.” Sometimes you realize that the most game-changing investments that you can make in your business involve adopting insights that emerge from the data. If you look at Fresh Direct and you say, “Well, Fresh Direct is a logistics business.” It is. Maybe when you look at Fresh Direct through that lens, you have a ‘holy cow’ moment and you realize that the investment in your supply chain management or your logistics allows you to compete better. That’s where your most efficient investments are. That’s okay.
What’s powerful about data is that we never used to talk about the power of our data as a lens through which to look at our businesses holistically and then to quantify our investments. To rephrase your question, I think it’s a really good question, but I wanted to just disagree with Gene just a little bit. I don’t know whether the crowd will lie, but in general, you should listen to Gene.

I think one of the interesting twist on your question is which of us, which of you are looking at your businesses through the data lens and seeing opportunities for investments that you feel that pressure in your belly, in your gut that you need to claim that opportunity?

Attendee # 1: Data can make an enormous difference to your business. Deciding which data to pay attention to, which to collect, where to source data, and what insights you’re looking for is almost more of an art than a science at this point.
That’s why the whole data scientist’s movement, it’s like the most sought-after position, it’s a huge growth space and it’s because there isn’t any prescriptive method. You have to bring judgement to the table. Choosing the data that’s important is probably the harder part right now. If you choose the wrong data, then it doesn’t make sense.

Attendee # 2: I think you need to invest not just in good talent, but also in good knowledge and optimization especially for a legacy business model.

Gene: That’s a really big challenge. We just got some results from a survey that asked what biggest obstacle was to getting more data-driven? There were a bazillion answers that all have some weight. At the top of the list is hiring the right data talent. Obviously, that’s a big deal for everybody. It’s not just data scientists. It’s always interesting to look at how you define what a data scientist is. The best definition of data scientist is somebody that would totally impossible to find because they would bring together PhD level education, a hard science level of math and algorithm knowledge, combined with the skills of a software engineer. All this would be combined with someone who has a domain knowledge we were just talking about it, it’s not just numbers.

This is just a sample of the our discussion on data at the New York event this summer. More of these sessions are planned for the coming weeks. To request an invitation to a Velocity of Innovation event in a city near you, visit

Velocity of Innovation is a series of thought-leadership events for senior IT management hosted by Pythian.The format is a panel discussion in which Pythian leads conversations around today’s disruptive technologies. Topics range from big data and cloud to advanced analytics and DevOps. These events are by invitation only. If you’re interested in attending one of our upcoming Velocity of Innovation events, contact

Categories: DBA Blogs

RMAN Full Backup vs. Level 0 Incremental

Thu, 2015-10-29 11:23

Perhaps you’ve wondered about this. What is the difference between taking an RMAN full backup and a level 0 incremental backup?

If you read the documentation the following explanation will be found here: Incremental Backups

The only difference between a level 0 incremental backup and a full backup is that a full backup is never included in an incremental strategy. Thus, an incremental level 0 backup is a full backup that happens to be the parent of incremental backups whose level is greater than 0.

Logically that is true, but for many folks that is not enough. Are they different physically, and if so exactly how are they different physically?

If you google for an explanation you will find a number of ideas, most based on what the documentation says.

Here are some examples of what I found:

  • The only difference between level 0 and full is that a level 0 is marked in the RMAN catalog as a level 0.
  • Level 0 incremental and full backups are physically identical.

If you are an experienced IT professional then by now you have learned not to assume that articles found on the WWW are always correct.It seemed like it might be an interesting exercise to find out what the differences might be.

This called for some experimentation; here is the general procedure followed:

  • Put the database in mount state so there is no activity in the database.
  • Create two RMAN ‘full’ backups.
  • Create two RMAN incremental level 0 backups.
  • Open the database and perform some DML and/or DDL.
  • Create an incremental level 1 backup.
  • Do more DML/DDL activity.
  • Create another incremental level 1 backup.

This will provide some backup files to examine. You may be wondering why two of each backup type was made. The reason for that is to filter out changes in the backup file that are not relevant to the difference in backup types. That should become more clear with examples.

The test environment is as follows:

  • Test database: Oracle CDB on Linux 6.5.
  • RMAN Catalog database: Oracle on Linux 5.5.

The following backup commands were used to create two full and two level 0 backups:


rman target / catalog rman12c/rman12c@oravm <<-EOF
run {
allocate channel ch1 device type disk format '/mnt/oracle-backups/ora12c/full-01/%U';
backup database tag 'full-01' plus archivelog tag 'full-01';

rman target / catalog rman12c/rman12c@oravm <<-EOF
run {
allocate channel ch1 device type disk format '/mnt/oracle-backups/ora12c/full-02/%U';
backup database tag 'full-02' plus archivelog tag 'full-02';

rman target / catalog rman12c/rman12c@oravm <<-EOF
run {
allocate channel ch1 device type disk format '/mnt/oracle-backups/ora12c/lvl0-01/%U';
backup incremental level 0 database tag 'lvl0-01' plus archivelog tag 'lvl0-01';

rman target / catalog rman12c/rman12c@oravm <<-EOF
run {
allocate channel ch1 device type disk format '/mnt/oracle-backups/ora12c/lvl0-02/%U';
backup incremental level 0 database tag 'lvl0-02' plus archivelog tag 'lvl0-02';

Let’s take a look at the size of the files:

[root@lestrade ora12c]# ls -l full-01/*
-rw-r----- 1 root root    6878208 Oct 27 15:15 full-01/0sqkp2dl_1_1
-rw-r----- 1 root root  761749504 Oct 27 15:16 full-01/0tqkp2dn_1_1
-rw-r----- 1 root root 1696112640 Oct 27 15:17 full-01/0uqkp2eq_1_1
-rw-r----- 1 root root  735043584 Oct 27 15:17 full-01/0vqkp2g7_1_1
-rw-r----- 1 root root  623837184 Oct 27 15:17 full-01/10qkp2h0_1_1

[root@lestrade ora12c]# ls -l full-02/*
-rw-r----- 1 root root    6878208 Oct 27 15:18 full-02/12qkp2hm_1_1
-rw-r----- 1 root root  761749504 Oct 27 15:18 full-02/13qkp2ho_1_1
-rw-r----- 1 root root 1696112640 Oct 27 15:19 full-02/14qkp2ir_1_1
-rw-r----- 1 root root  735043584 Oct 27 15:19 full-02/15qkp2k8_1_1
-rw-r----- 1 root root  623837184 Oct 27 15:20 full-02/16qkp2l2_1_1

[root@lestrade ora12c]# ls -l lvl0-01/*
-rw-r----- 1 root root    6878208 Oct 27 15:20 lvl0-01/18qkp2lm_1_1
-rw-r----- 1 root root  761749504 Oct 27 15:21 lvl0-01/19qkp2lo_1_1
-rw-r----- 1 root root 1696112640 Oct 27 15:21 lvl0-01/1aqkp2mr_1_1
-rw-r----- 1 root root  735043584 Oct 27 15:22 lvl0-01/1bqkp2o8_1_1
-rw-r----- 1 root root  623837184 Oct 27 15:22 lvl0-01/1cqkp2p2_1_1

[root@lestrade ora12c]# ls -l lvl0-02/*
-rw-r----- 1 root root    6878208 Oct 27 15:23 lvl0-02/1eqkp2q4_1_1
-rw-r----- 1 root root  761749504 Oct 27 15:23 lvl0-02/1fqkp2q6_1_1
-rw-r----- 1 root root 1696112640 Oct 27 15:24 lvl0-02/1gqkp2r9_1_1
-rw-r----- 1 root root  735043584 Oct 27 15:25 lvl0-02/1hqkp2sm_1_1
-rw-r----- 1 root root  623837184 Oct 27 15:25 lvl0-02/1iqkp2tf_1_1

The number and sizes of the files appear the same regardless of the backup type, lending some credence to the idea that they may be physically identical. So now let’s dump some of the files to determine the difference.

For purposes of this experiment we are going to examine the backup files that contain datafile 1 from the database, the SYSTEM tablespace datafile.

We can find these backup files with the RMAN list command:

RMAN> list backup of datafile 1;

List of Backup Sets

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
1259    Full    1.58G      DISK        00:00:39     2015-10-27 15:15:13
        BP Key: 1267   Status: AVAILABLE  Compressed: NO  Tag: FULL-01
        Piece Name: /mnt/oracle-backups/ora12c/full-01/0uqkp2eq_1_1
  List of Datafiles in backup set 1259
  File LV Type Ckp SCN    Ckp Time            Name
  ---- -- ---- ---------- ------------------- ----
  1       Full 50957811   2015-10-27 18:08:29 +DATA/JS03/DATAFILE/system.258.856612879

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
1341    Full    1.58G      DISK        00:00:39     2015-10-27 15:17:22
        BP Key: 1349   Status: AVAILABLE  Compressed: NO  Tag: FULL-02
        Piece Name: /mnt/oracle-backups/ora12c/full-02/14qkp2ir_1_1
  List of Datafiles in backup set 1341
  File LV Type Ckp SCN    Ckp Time            Name
  ---- -- ---- ---------- ------------------- ----
  1       Full 50957811   2015-10-27 18:08:29 +DATA/JS03/DATAFILE/system.258.856612879

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
1433    Incr 0  1.58G      DISK        00:00:39     2015-10-27 15:19:30
        BP Key: 1441   Status: AVAILABLE  Compressed: NO  Tag: LVL0-01
        Piece Name: /mnt/oracle-backups/ora12c/lvl0-01/1aqkp2mr_1_1
  List of Datafiles in backup set 1433
  File LV Type Ckp SCN    Ckp Time            Name
  ---- -- ---- ---------- ------------------- ----
  1    0  Incr 50957811   2015-10-27 18:08:29 +DATA/JS03/DATAFILE/system.258.856612879

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
1531    Incr 0  1.58G      DISK        00:00:42     2015-10-27 15:21:55
        BP Key: 1539   Status: AVAILABLE  Compressed: NO  Tag: LVL0-02
        Piece Name: /mnt/oracle-backups/ora12c/lvl0-02/1gqkp2r9_1_1
  List of Datafiles in backup set 1531
  File LV Type Ckp SCN    Ckp Time            Name
  ---- -- ---- ---------- ------------------- ----
  1    0  Incr 50957811   2015-10-27 18:08:29 +DATA/JS03/DATAFILE/system.258.856612879

Now dump the backup pieces to hex files. Just the first 100M is being dumped from each, resulting in ~300M text files.

mkdir -p $DUMPDIR

for f in full-01/0uqkp2eq_1_1 full-02/14qkp2ir_1_1 lvl0-01/1aqkp2mr_1_1 lvl0-02/1gqkp2r9_1_1
        tag=$(dirname $f)
        filename=$(basename $f)
        echo $filename
        # dump first 100M
        dd if=$f bs=512 count=204800 2>/dev/null | hexdump -C > $filename

[root@lestrade hexdump]# ls -l full* lvl0*
-rw-r--r-- 1 root root 305807309 Oct 27 16:11 full-01_0uqkp2eq_1_1.txt
-rw-r--r-- 1 root root 305807309 Oct 27 16:11 full-02_14qkp2ir_1_1.txt
-rw-r--r-- 1 root root 305807309 Oct 27 16:11 lvl0-01_1aqkp2mr_1_1.txt
-rw-r--r-- 1 root root 305807309 Oct 27 16:11 lvl0-02_1gqkp2r9_1_1.txt

We can compare the dump of the two full backup files. As there were no changes to the database during this time (you may recall the database is in MOUNT state) the only difference in the files should be changes to metadata.

[root@lestrade hexdump]# diff full-01_0uqkp2eq_1_1.txt full-02_14qkp2ir_1_1.txt
< 00002010  46 7d 00 00 00 02 10 0c  00 02 10 0c c8 29 05 07  |F}...........)..|
< 00002020  4a 53 30 33 00 00 00 00  de 19 01 00 00 00 00 00  |JS03............|
> 00002010  dd 7e 00 00 00 02 10 0c  00 02 10 0c c8 29 05 07  |.~...........)..|
> 00002020  4a 53 30 33 00 00 00 00  fd 19 01 00 00 00 00 00  |JS03............|
< 00002060  01 00 00 00 da 89 4c 35  1e 00 00 00 04 00 00 00  |......L5........|
< 00002070  01 00 00 00 46 55 4c 4c  2d 30 31 00 00 00 00 00  |....FULL-01.....|
> 00002060  01 00 00 00 5b 8a 4c 35  24 00 00 00 04 00 00 00  |....[.L5$.......|
> 00002070  01 00 00 00 46 55 4c 4c  2d 30 32 00 00 00 00 00  |....FULL-02.....|

These two full backup files differ only the by these four lines. These differences are in the 17th OS block of the files and can be identified as metadata by the tags FULL-0[12] and the database name JS03. There are other differences that do not appear as printable characters. I did spend a little time trying to decode their meanings; they may be backup piece#, backup piece keys, backup set keys, and similar. Determining the meaning of these didn’t really seem necessary to understand the difference between full and incremental level 0 backups, so I did not continue. If you do know what these values represent, please, leave a comment.

Doing a diff on the level 0 hex dump files shows changes in the same lines:

[root@lestrade hexdump]# diff lvl0-01_1aqkp2mr_1_1.txt lvl0-02_1gqkp2r9_1_1.txt
< 00002010  bc 02 00 00 00 02 10 0c  00 02 10 0c c8 29 05 07  |.............)..|
< 00002020  4a 53 30 33 00 00 00 00  1a 1a 01 00 00 00 00 00  |JS03............|
> 00002010  34 03 00 00 00 02 10 0c  00 02 10 0c c8 29 05 07  |4............)..|
> 00002020  4a 53 30 33 00 00 00 00  39 1a 01 00 00 00 00 00  |JS03....9.......|
< 00002060  01 00 00 00 db 8a 4c 35  2a 00 00 00 04 00 00 00  |......L5*.......|
< 00002070  01 00 00 00 4c 56 4c 30  2d 30 31 00 00 00 00 00  |....LVL0-01.....|
> 00002060  01 00 00 00 69 8b 4c 35  30 00 00 00 04 00 00 00  |....i.L50.......|
> 00002070  01 00 00 00 4c 56 4c 30  2d 30 32 00 00 00 00 00  |....LVL0-02.....|

Now it is time to compare a full backup file to a level 0 backup file:

[root@lestrade hexdump]# diff full-01_0uqkp2eq_1_1.txt lvl0-01_1aqkp2mr_1_1.txt
< 00002010  46 7d 00 00 00 02 10 0c  00 02 10 0c c8 29 05 07  |F}...........)..|
< 00002020  4a 53 30 33 00 00 00 00  de 19 01 00 00 00 00 00  |JS03............|
> 00002010  bc 02 00 00 00 02 10 0c  00 02 10 0c c8 29 05 07  |.............)..|
> 00002020  4a 53 30 33 00 00 00 00  1a 1a 01 00 00 00 00 00  |JS03............|
< 00002060  01 00 00 00 da 89 4c 35  1e 00 00 00 04 00 00 00  |......L5........|
< 00002070  01 00 00 00 46 55 4c 4c  2d 30 31 00 00 00 00 00  |....FULL-01.....|
> 00002060  01 00 00 00 db 8a 4c 35  2a 00 00 00 04 00 00 00  |......L5*.......|
> 00002070  01 00 00 00 4c 56 4c 30  2d 30 31 00 00 00 00 00  |....LVL0-01.....|
< 00002090  00 00 00 00 04 04 00 00  5a 08 00 00 c3 2c 04 00  |........Z....,..|
> 00002090  00 00 00 00 05 04 00 00  5a 08 00 00 c3 2c 04 00  |........Z....,..|

Notice the new line 2090? This is the line that contains the difference between a full and a level 0 backup file.

Before continuing it is time to create a couple of level 1 backups:

SQL> create user dilbert identified by phb;

User created.

SQL> alter user dilbert quota 10m on users;

User altered.

SQL> create table dilbert.t1 as select systimestamp t1 from dual;

Table created.

rman target / catalog rman12c/xxx@oravm <<-EOF
run {
allocate channel ch1 device type disk format '/mnt/oracle-backups/ora12c/lvl1-01/%U';
backup incremental level 1 database tag 'lvl1-01' plus archivelog tag 'lvl1-01';

Create another table in dilbert schema

SQL> create table dilbert.t2 as select * from dba_users;

Table created.

rman target / catalog rman12c/xxx@oravm <<-EOF
run {
allocate channel ch1 device type disk format '/mnt/oracle-backups/ora12c/lvl1-03/%U';
backup incremental level 1 database tag 'lvl1-03' plus archivelog tag 'lvl1-03';

Note:  lvl1-02 backup was discarded as I forgot to make db changes before making it.

If we were to go on and compare all of the backup files that contain datafile 1:

[root@lestrade hexdump]# grep ^00002090 *.txt
full-01_0uqkp2eq_1_1.txt:00002090  00 00 00 00 04 04 00 00  5a 08 00 00 c3 2c 04 00  |........Z....,..|
full-02_14qkp2ir_1_1.txt:00002090  00 00 00 00 04 04 00 00  5a 08 00 00 c3 2c 04 00  |........Z....,..|
lvl0-01_1aqkp2mr_1_1.txt:00002090  00 00 00 00 05 04 00 00  5a 08 00 00 c3 2c 04 00  |........Z....,..|
lvl0-02_1gqkp2r9_1_1.txt:00002090  00 00 00 00 05 04 00 00  5a 08 00 00 c3 2c 04 00  |........Z....,..|
lvl1-01_1mqkp7jk_1_1.txt:00002090  00 00 00 00 05 00 00 00  5a 08 00 00 c3 2c 04 00  |........Z....,..|
lvl1-03_22qks49l_1_1.txt:00002090  00 00 00 00 05 00 00 00  5a 08 00 00 c3 2c 04 00  |........Z....,..|

A bitmask emerges that is peculiar to each backup type:

Starting at offset 0x2090
     00 01 02 03 04 05 06 07
Full 00 00 00 00 04 04 00 00  
LVL0 00 00 00 00 05 04 00 00
LVL1 00 00 00 00 05 00 00 00

What might these mean?
Here’s a possible explanation:

0x2094 = 0x04 = FULL
0x2094 = 0x05 = Incremental

0x2095 = 0x04 = level 0 if the previous byte is 0x05
0x2095 = 0x00 = level 1 if the previous byte is 0x05

Of course this doesn’t explain the purpose of 0x04 at 0x205 if the backup is ‘FULL’.

To fully understand these flags would require dumping and examining many more backup files, not just of datafiles but of archive logs, controlfiles, etc.

One thing is clear from this experiment; there are differences in the metadata contained in the file headers when comparing a full backup to a level 0 incremental backup. It is not simply a matter of the one backup being cataloged differently by RMAN; the backup pieces can be identified as either full or incremental level backups based on the metadata in the file header.

So what good is all this effort for a little bit of knowledge?

How about the next time you are asked about the difference between a level 0 and full backup? Now you can say something other that quoting the documentation or some article on the internet (one that is lacking anything to backup the assertions made).

If the question is posed to you during an interview, this information becomes even more valuable.

Now another question comes to mind: Can backup files made with a full backup be modified to work as incremental level 0 backups?

That would be an interesting experiment. This is not something one would normally do, but it is not too hard to imagine a scenario where only a full backup, an incremental level 1 backup and the following archive logs were available to recover a database. This would be a desperate measure and a last resort, but sometimes last resorts are necessary.

Do you think this can be done? If so please give it a try, but only test it on an expendable test system please.

Please be sure to leave a comment if you try this experiment, it would be quite interesting to discuss.


Discover more about our expertise in Oracle.

Categories: DBA Blogs

Pythian More Innovative Than GE?

Tue, 2015-10-27 08:41


GE is known as an innovative company and according to Forbes is one of the worlds most valuable brands. Late summer they made headlines with the announcement that they were retiring performance management reviews.

No big deal? Think again. GE has built a reputation on Jack Welch’s ridgid performance management programs, cutting the bottom 10 percent of under performers each year.

First, I applaud GE for the bold move. Any company wide change in an organization the size of GE’s would be challenging. Second, what took you so long? In 2011, Pythian acted against the crowd and ditched the prevailing zeitgeist by implementing a performance feedback program.

At the time, with approximately one hundred employees and a brand new HR team we were just beginning to establish our HR programs. Like many small companies, we did not have a structured performance management program. We were growing rapidly and identified a need to provide employees with useful feedback and career discussions.

Ratings, rankings and bell curves, “Oh My!” We didn’t even consider them. We designed a program that removed standard performance management elements like numerical rankings. Our program focus was created to facilitate formal, organized feedback discussions, in a comfortable environment, between an employee and their manager. The idea was to base the discussion on each team member’s career aspirations and journey. During a new hire orientation, the first steps of the career journey begin. Following every six months we schedule time to sit down and have focused career feedback discussions. During these discussions, goals are established, recent successes reviewed, progress on goals updated, and challenges chronicled with suggestions to overcome. Furthermore, career aspirations and plans for professional development are discussed and established.

The feedback program is constantly evolving and improving to meet the changing needs and scale of the company.  Of course we listen to employee feedback about the program and implement changes after a review of the suggestions. Change can be difficult for people. Initially, employees more accustomed to traditional performance management were hesitant, but they quickly responded to the easy and relaxed format of our program.

Regular feedback is key. We encourage two way feedback: up and/or down, across the organization, in real time, all the time. We are always working to improve our programs, processes and ideas, e.g. “upping our game” as a company.   We believe it’s important to build a culture of constant feedback. A culture of two way feedback built on trust and transparency is a team effort by all members of the Pythian Team.

During orientation I enjoy encouraging and empowering all new employees with permission to ask their leaders for feedback anytime. I encourage them to not wait to share what’s going well and to disclose where they need extra support/further direction, etc. In my own team meetings I inquire what I could be doing more of and less of. How can I be a better communicator and leader. I create a safe environment for the team to provide feedback so we can collectively improve.

It will be interesting to see if GE’s announcement encourages more companies to re-evaluate their approach to Performance Management systems and encourage more effective dialogue and feedback discussions with their employees.

Categories: DBA Blogs

SQL Server Row Level Security

Tue, 2015-10-27 08:27


Row Level Security (RLS) has been implemented in SQL Server 2016 for both on-premise and v12 of Azure instances.

The problem this solves is: a company with multiple applications accessing sensitive data in one or more tables.

How do you ensure the data being read or written is only the data that login is authorized to see? In the past, this has been accomplished with a complicated series of views or functions, and there’s no guarantee a bug or malicious user wouldn’t be able to bypass those measures. With Row Level Security, it doesn’t matter what privileges you have (including sysadmin) or how you try to access the data.

How it Works

Row Level Security has two options: You can either FILTER the rows or BLOCK the operation entirely. The BLOCK functionality is not yet implemented in CTP 2.4, but the FILTER logic works like a charm.

The steps are very simple:
1 – Figure out what you’re going to associate with your users and data. You will need to create some link between your data and a login’s or user’s properties. Something that will allow the engine to say This Row is ok for This User.

2 – Create a Function defining the relationship between users and the data.

3 – Create a Security Policy for the function and table(s). You can use the same policy on multiple tables or views.

Once the Security Policy has been created, every query or DML operation on the tables or views you’re filtering will automatically have the function applied to the WHERE or HAVING clause. You can see the filter working by reviewing the execution plan as well. SQL Server will generate the Plan Hash value with the filtering logic in place. This allows Plan Re-Use with Row Level Security, and it’s a big improvement over Oracle’s implementation which doesn’t do this (as of Oracle 10g, the last time I worked with it) :-). See the bottom of this post for an example of the same query with RLS turned on & off.

What is particularly nice about these policies is that the data is filtered regardless of the user’s privileges. A sysadmin or other superuser who disables the policy is just an Audit log review away from having to explain what they were doing.

Row Level Security Walk Through

This is an example of setting up an RLS system for the Credit Card data in the AdventureWorks database. After this is completed, only users associated with a Business Entity in the Person.Person table will be able to see or update any credit card information, and the data they can touch will be limited to just their business.

Step 1: Add user_name column to Person.Person table

In this example, I’m associating the user_name() function’s value for each login with the BusinessEntityID. Of course, you can use any value you want, as long as you can access it from a SELECT statement in a Schema-Bound function. This means many system tables are off-limits.

USE AdventureWorks

ALTER TABLE person.person
ADD UserName nvarchar(128) NULL;

— Associate some person.person rows with a login too.
UPDATE person.person
SET UserName = ‘Business1’
BusinessEntityID IN (301, 303, 305);

Step 2: Create Users to Test

I’m just creating a login named Business1 to demonstrate this. Note that the user has db_owner in AdventureWorks

USE [master] GO
USE [AdventureWorks] GO
CREATE USER [business1] FOR LOGIN [business1] GO
USE [AdventureWorks] GO
ALTER ROLE [db_owner] ADD MEMBER [business1] GO

Step 3: Create Function to Filter Data

This function finds all credit cards for the user_name() running the query. Any values not returned by this function will be inaccessible to this user.

CREATE FUNCTION [Sales].[fn_FindBusinessCreditCard] (@CreditCardID INT)

1 AS result
person.person p INNER JOIN
sales.PersonCreditCard pcc ON p.BusinessEntityID = pcc.BusinessEntityID INNER JOIN
sales.CreditCard cc ON pcc.CreditCardID = cc.CreditCardID
cc.CreditCardID = @CreditCardID AND
p.UserName = user_name();

Step 4: Create a Security Policy

This creates a security policy on the Sales.CreditCard table.

CREATE SECURITY POLICY sales.RestrictCreditCardToBusinessEntity
ADD FILTER PREDICATE sales.fn_FindBusinessCreditCard(CreditCardID)
ON Sales.CreditCard

Step 5: Test Away

For all of the following examples, You should be logged in as the Business1 user who can only see 3 credit cards. In reality, there are 19,118 rows in that table.

--Will return three records
pcc.*, cc.*
Sales.PersonCreditCard pcc INNER JOIN
Sales.CreditCard cc ON cc.CreditCardID = pcc.CreditCardID

— Will only update three records
UPDATE Sales.CreditCard
ExpYear = ‘2020’

These are the execution plans for the above query with Row Level Security turned on and off:

Turned On: (and missing an index…)

Execution Plan With Security Turned On

Execution Plan With Security Turned On.

Turned Off:

Execution Plan With Security Turned Off

Execution Plan With Security Turned Off.


Discover more about our expertise in SQL Server.

Categories: DBA Blogs

A Cassandra Consistency Use Case

Tue, 2015-10-27 08:07


I recently completed a project where I worked with a company using Cassandra to keep metadata about objects stored in an Object Store. The application keeps track of individual objects as rows within a partition based on user id. In an effort to save space there is also a mechanism to track duplicate references to the objects in another table. Object writes take place as background activity and the time it takes to complete those writes is invisible to the applications end users. The time it takes to retrieve an object though is very visible to the end user. The keyspace was defined as network topology using two data centers (actual data centers here about 50 ms apart) with replication factor 3 in both data centers.

Initially the application was set up to use consistency ONE for both writes and reads. This seemed to be working okay until we started doing failure testing. At which point objects would come up missing due to the delay time in pushing hinted hand offs from one node to another. A simple solution to this was to make all writes and reads LOCAL_QUORUM. In fact doing so did resolve pretty much all of the testing errors but at a much increased latency, about 3 times longer than with consistency ONE, on both reads and writes. Even so, the latencies were deemed to be acceptable since they were still well under anticipated network latencies outside of the data centers which is what the users would be seeing.

Could we have done better than that though?

The writes are a background activity not visible to the end user. The increased write latency is probably reasonable there. The read latency is visible to the user. There is an option which guarantees finding the stored object references while still keeping the latency to a minimum. This is what I propose, the default read consistency is set back to ONE and most of the time a read to Cassandra will find the object reference as was clear in the initial testing. But, if a read returns no object reference then a second read is issued using LOCAL_QUORUM. This way most, more than 99%, of all reads are satisfied with the much lower latency consistency of ONE only occasionally needing the second read. This can be extended further to a full QUORUM read if the LOCAL_QUORUM read fails.

It is important to note that this approach only works if there are no row versions. E.G. rows only exist or do not exist. If a row may have different versions over time as you might have if the row were updated rather than just inserted and then later deleted. It is also important to note that its possible to find a deleted row this way. For this use case these qualifications are not issues.


Discover more about our expertise in Cassandra.

Categories: DBA Blogs

Running SQL Server Jobs Against an Azure SQL Database

Tue, 2015-10-27 07:50


I recently had a client ask how to run SQL Server jobs against an Azure SQL Database, and because SQL DB doesn’t have an SQL Agent like most other versions of SQL Server, it isn’t always obvious how to implement. Fortunately, we have several options in Azure and within a “normal” instance.

Options to Run Jobs

The first three options require a non-Azure version of SQL Server to be running and to have connectivity to the Azure SQL DB.

  1. Linked Server
  2. Maintenance Plan w/new connection
  3. Powershell Script
  4. Azure Services
Elastic DB Jobs

As mentioned by Warner Chaves in the comments, and currently (Nov 2015) in preview, the new Elastic Database Jobs might also work well.

Linked Server

To create a linked server to an Azure DB, you just need to get the ODBC connection string from the Azure portal and use it as the “Provider String” when defining the LS.




Maintenance Plan w/New Connection

Simply create your maintenance plan with all the required logic for your job, then click the “Manage Connections” menu at the top of the screen and define the new connection to be used for this plan.




Powershell Script

You can use the Powershell Invoke-SqlCmd functionality and run it as a Powershell job step.

Invoke-Sqlcmd -Query "SELECT GETDATE() AS TimeOfQuery;" -ServerInstance "AzurreInstance" -U "UserName" -P "Password"

Azure Services

These change, it seems like, almost daily, so I’ve listed some of the common ways this is currently done.

  1. The Azure Scheduler used in conjunction with mobile services.
  2. Azure Automation


Discover more about our expertise in Cloud and SQL Server.

Categories: DBA Blogs

Log Buffer #446: A Carnival of the Vanities for DBAs

Fri, 2015-10-23 15:25

This Log Buffer Edition covers some more Oracle, SQL Server and MySQL blog posts from across the world.


  • Should I monitor the MGMTDB database?
  • Connecting MAF Applications to Mobile Cloud Service
  • JRE 6, 7, and 8 now certified with EBS 11i, 12.1, and 12.2
  • The database writer copies data blocks from the buffer cache onto disk. The db_writer_processes initialization parameter determines how many processes will do this task. Its default value is 1 or cpu_count / 8, whichever is greater. I found an Oracle 9 database on a Tru64 server with cpu_count set to 1
  • How To Install Latest Verisign G5 Root Certificates

SQL Server:

  • Dynamic Pivot, Passing Temp Table Variables to Dynamic SQL
  • Oracle offers a results cache in the database (from 11.2 onwards) and in PL/SQL (again, from 11.2 onwards) which can greatly reduce execution time of repeated statements when insert/update/delete activity is not heavy.
  • Maintaining a grouped running MAX (or MIN)
  • Harnessing SQL Server Metadata- Disabling and Rebuilding Indexes
  • Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator


  • ClusterControl Tips & Tricks: Updating your MySQL Configuration
  • Become a MySQL DBA blog series – Optimizer Hints for faster query execution
  • Loading Arbitary XML documents into MySQL tables with p_load_xml
  • Use MySQL and Perl to automatically find, follow and unfollow twitter users
  • Great Advice on Monitoring Amazon RDS For MySQL From Datadog

Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs