Skip navigation.

DBA Blogs

A Repeatable Test Environment with Terraform and Chef-solo

Pythian Group - Mon, 2016-02-29 08:57

I have been refreshing my Chef skills, and have been looking for ways to incorporate simple strategies for deployment of temporary lab environments in the cloud, that provide a secure, flexible, repeatable, and cost-effective set of resources for testing code. One of the challenges of working in a Vagrant environment is how to protect our client’s intellectual property (i.e. the code we wish to test, in our case a Chef cookbook) when it is distributed around the globe on our consultants’ laptops. I wanted to tackle the problem simply, one step at a time.

As I see it, cloud services offer a flexible and secure environment in which policies can be managed, costs can be minimized, and access can be controlled. Three things are needed:

  1. A standard, flexible, easy to use deployment framework and template / example environment library
  2. An access control and environmental separation framework that allows lab resources to be deployed securely
  3. A set of templates and policies to manage and control shared infrastructure dependencies and to retire idle resources

Number one is how I spent my Sunday afternoon last week, with this as the result:

Following the README should get you started with deploying the host if you are familiar with Terraform already. This template uses AWS services, but the concept is portable to any of the multiple providers supported by Terraform.

What this is good for:

I think that the benefits of this approach to testing deployment code are clear, but to highlight:

  • We can add the cloud provider’s management framework and services to our test framework, providing a secure location and environmental separation as needed for testing of intellectual property (i.e. code).
  • We can store our test data in the cloud securely, and access it when required by keeping it on persistent storage (EBS in this case).
  • We use templated environments to ensure consistent and repeatable deployments for developers and engineers to collaborate on, as we would with a locally deployed Vagrant test environment.
  • We leverage scale and flexibility in the cloud to create practical testing environments that match our needs and those of our customers.
  • We leverage automation and environmental policies to ensure resource usage is minimized to that required for testing activities; systems are shut down and minimal storage costs incurred when the environments are not in use.


In preparation, you require an Amazon Web Services account with:
– an EC2 keypair with the private key available on your local host (usually in ~/.ssh/ somewhere).
– an IAM access key and secret access key for the AWS account that will be used to deploy.

To deploy the lab:

  1. Install Terraform ( from Hashicorp ( and ensure that the ‘terraform’ directory is somewhere in your path.
  2. Create a build directory and change into it. Clone the repository and change to the project directory. I always set up a remote (origin) at this point in git. We’ll be using the master branch.
  3. Per the README, ensure that you create a ‘terraform.tfvars’ file with 600 permissions and add your authentication token data in the format provided. (This file is excepted in the ‘.gitignore’ to prevent accidental upload of authentication tokens to the repository).
  4. Run ‘terraform plan’ and review the output. If no errors, run ‘terraform apply’. Your environment should start to build in AWS, and Terraform will create some local state files in the working project directory.
  5. Provisioning output will be displayed as the build and bootstrap progresses, with output of the commands run in the remote-exec provisioner section of template file ‘’. (The Seeker‘ is a song by The Who if you are wondering, and IMO the best name ever given to a test host). Review this file to see what’s happening.
  6. When provisioning completes, we should have a functioning instance running Amazon Linux in US-West/Oregon, by default. If you change the region (in ‘’) then don’t forget that you must also update your AMIs (also in ‘’) to match the proper region. I’ve used this AMI in my example, specifically the HVM EBS-backed image.
  7. The public DNS address is output by Terraform. Connect to this using your private EC2 key using ‘ssh -i ~/.ssh/myec2key -l ec2-user’.


If you inspect the template noted in step 5 you will notice that the host is configured to not delete the EBS volume on termination, meaning that after instance termination the EBS volume will be preserved and can be booted from again (so we don’t lose our chef cookbook work between deployments). Do note however that for each node that is freshly provisioned using Terraform, a new volume will be created. If you wish to destroy the volume on creation, update the following section and change the value to ‘true’, however keep in mind that terminating the instance will terminate the data in your ‘chef-repo’.

root_block_device {
delete_on_termination = false

Chef-solo usage:

Now that we’ve logged into our host, let’s change directory to ‘chef-repo’ and have a look…

  1. From the ‘chef-repo’ directory, knife and knife-solo are already configured. You can see the configuration in ‘.chef/knife.rb’.
  2. The ‘cookbooks’ and ‘site-cookbooks’ directories are in the default cookbook path. No roles are configured. There is one node, ‘localhost’, under the nodes directory.
  3. To see the node configuration, run ‘knife node –local-mode list’… ‘localhost‘ is returned.
  4. Let’s have a look at the run list for this node, using ‘knife node –local-mode show localhost’… ‘recipe[ntp::default]‘ is returned in the Run List.
  5. We can see from the output that the run list for the node contains the ‘ntp::default’ recipe as configured by the remote-exec provisioner in the template.
  6. If we wanted to add additional recipes to the run list for the node, we can use ‘knife node –local-mode run_list add localhost ‘recipe[cookbook::recipe]”.
  7. Finally, we can apply the ntp recipe to the node localhost using ‘knife solo cook ec2-user@localhost -i ~/.ssh/mykey’. Note that a second run of the same command will yield a different result, as the new configuration has already been applied by Chef.
  8. When done with the environment, log out of the host and issue a ‘terraform destroy’ command. Type ‘yes’ in response to the confirmation message and Terraform will dismantle the lab and terminate the instance.

Other notes:

Don’t forget that you can now boot the instance from the EC2 console by selecting the volume we created as the new instance’s root (boot) volume.

Chef-solo also supports Berkshelf for dependency management, just install to your ‘chef-repo’ directory and chef-solo will act accordingly.

It’s worth noting that in my research I found that the the chef client now supports local mode operation and will eventually replace chef-solo for headless operation.

A great next step to get going with Terraform templates would be to create a new template file to deploy a second aws instance and then configure it with knife-solo from our first host. To ensure connectivity between nodes, ssh access must be available. To accomplish this simply, we can add the new host to the existing host group, ‘sg_theseeker_access’ which allows ssh login from anywhere by default. Again, refer to the instance template in step 5 in the deployment section above for an example of this.

In my next blog I will be describing a similar, templated Terraform environment that uses Ansible and Ambari Blueprints to deploy a 6+ node Hortonworks Hadoop cluster in ~45 minutes, ready to receive data and code for testing. For a sneak preview, see this github repository and dive into the README.

Have fun!

Categories: DBA Blogs

Links for 2016-02-28 []

Categories: DBA Blogs

Compression -- 1b : (more on) BASIC Table Compression

Hemant K Chitale - Sun, 2016-02-28 04:38
In the previous blog, I demonstrated creating an empty table with BASIC Compression and then populating it.

What if you have a pre-existing table with data that you want to compress ?

Here I start with a table that already has rows but doesn't have compression configured.

PDB1@ORCL> select tablespace_name, segment_type, bytes/1048576
2 from user_segments
3 where segment_name = 'REGULAR_1';

------------------------------ ------------------ -------------

PDB1@ORCL> select pct_free, blocks, num_rows, blocks*8192/1048576
2 from user_tables
3 where table_name = 'REGULAR_1';

---------- ---------- ---------- -------------------
10 6224 364496 48.625


I then proceed to rebuild it as a COMPRESSed table.

PDB1@ORCL> alter table regular_1 move compress;

Table altered.

PDB1@ORCL> exec dbms_stats.gather_table_stats('','REGULAR_1');

PL/SQL procedure successfully completed.

PDB1@ORCL> select tablespace_name, segment_type, bytes/1048576
2 from user_segments
3 where segment_name = 'REGULAR_1';

------------------------------ ------------------ -------------

PDB1@ORCL> select pct_free, blocks, num_rows, blocks*8192/1048576
2 from user_tables
3 where table_name = 'REGULAR_1';

---------- ---------- ---------- -------------------
0 1448 364496 11.3125


Note how not only did the table size shrink to less than 12MB, the PCT_FREE also got  reset to 0 !

Categories: DBA Blogs

One Million Page Views

Oracle in Action - Fri, 2016-02-26 02:06

RSS content

Yesterday page views of  my blog touched the magical figure of 1 Million.  I had started blogging in November 2012 with the aim of saving whatever I learn, on the web in a language simple enough to be understood by even a beginner. Well, I myself was a beginner then and even now I am a beginner.  I think I am a bit skeptical – not easily convinced until I get a proof. So, in most of my posts , I try to verify the facts mentioned in oracle documentation and I presume that’s what has clicked with the readers of my blog.

I would like to thank all the readers of my blog  who spent their precious time in going through my posts and giving their valuable feedback.

Keep visiting my blog…


Comments:  16 comments on this item
You might be interested in this:  
Copyright © ORACLE IN ACTION [One Million Page Views], All Right Reserved. 2016.

The post One Million Page Views appeared first on ORACLE IN ACTION.

Categories: DBA Blogs

Links for 2016-02-25 []

Categories: DBA Blogs

Oracle’s CREATE INDEX command can take a hint

Pythian Group - Thu, 2016-02-25 15:39

Here’s something really useful I discovered by accident when playing with Auto DOP (parallel_degree_policy) in 12c.

The “create index” command can take a hint – in particular the PARALLEL hint. The syntax is as you would expect:

create /*+PARALLEL*/ index tab_ind1 on tab(col1);

Of course, you can specify the number of parallel servers to be used by specifying PARALLEL(24) for example for 24 threads. The really interesting thing about using a hint vs. the documented syntax ("create index tab_ind1 on tab(col1) parallel 24;") is that once created – the index doesn’t have a default degree of parallelism. So you don’t need a second command to make the index noparallel.

Note that if you put the hint and use the “noparallel” attribute like so:

create /*+PARALLEL*/ index tab_ind1 on tab(col1) noparallel;

Then no parallelism will be used.

I tried using hints like FULL(t) to force an index create to use a full table scan instead of an existing index – but that doesn’t seem to work.

I discovered this under really interesting circumstances. I was testing some unrelated functionality that required some indexes created on my play table called CKK.

Here’s the SQL for the CKK table, which will create a 40 GB table with 2 rows per block:

create table ckk nologging tablespace ckk as
select rownum id, mod(rownum,5) mod5_id, mod(rownum,5000) mod5000_id, sysdate dt_fixed, sysdate - rownum/24/60 dt_dec, sysdate + rownum/24/60 dt_pos, sysdate + ora_hash(rownum,65,535)/24 dt_rand, sysdate+mod(rownum,10) dt_mod10, rpad('x',3500,'x') filler
from (select rownum r from dual connect by level <= 10000) r1, (select rownum r from dual connect by level <= 1000)

Then when I attempted to create an index on the table in parallel, Oracle refused to do so:

create index ckk$id on ckk(id) parallel 24; --DOES NOT run in parallel

Instead it created the index with 1 thread only, and then set the parallel degree policy to 24. I have tracked this problem down to the Auto DOP feature. If I turn it off via parallel_degree_policy=manual – the problem goes away. But I never expected this feature to turn off parallelism for index creation when explicitly requested.

Here’s the kicker – once any index is created on the table, future index creations will be automatically done in parallel, regardless if parallel was requested.

For example, this index would be now created in parallel:

create index ckk$mod5_id on ckk(mod5_id);

While before creating the index “ckk$id” – this index would refuse to get created in parallel – when using the parallel attribute.

That’s when I said to myself, “it’s almost like there’s a hint.” I took the hint, and discovered it does work, and it works more consistently than the attribute.

Categories: DBA Blogs

Log Buffer #462: A Carnival of the Vanities for DBAs

Pythian Group - Thu, 2016-02-25 15:07

This Log Buffer Edition covers Oracle, SQL Server and MySQL blog posts listing down few new tricks, tips and workarounds plus the news.


Displaying CPU Graphs For Different Time Ranges in Enterprise Manager

One of the cool things in 12c is that (finally after all these years) a sequence can be assigned as the default value for a column.

Jonathan Lewis demonstrates connect by after receiving an email.

Oracle 12c – PL/SQL “White List” via ACCESSIBLE BY

Oracle DBA, IT Manager, or Something Else

SQL Server:

Using R Machine Learning Script as a Power BI Desktop Data Source

SQL Authentication Via AD Groups Part II: Who has what access?

Using R Machine Learning Script as a Power BI Desktop Data Source

Connect to on premise data sources with Power BI Personal Gateway

Exploring SQL Server 2016 Dynamic Data Masking – Part One – Creating a Table that uses Dynamic Data Masking


Data Encryption at Rest

Planets9s: Download the new ClusterControl for MySQL, MongoDB & PostgreSQL

Automate your Database with CCBot: ClusterControl Hubot integration

Loading JSON into MariaDB or even MySQL – mysqljsonimport 2.0 is available

Privileges in MySQL and MariaDB: The Weed Of Crime Bears Bitter Fruit

Categories: DBA Blogs

delete noprompt obsolete archive log - RMAN

Learn DB Concepts with me... - Thu, 2016-02-25 10:25

RMAN> report obsolete;

using target database control file instead of recovery catalog
RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
Report of obsolete backups and copies
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Archive Log          183    16-FEB-16          /u01/app/oracle/oraarch/ORCLSTB1_1_140_896707677.arc
Archive Log          189    16-FEB-16          /u01/app/oracle/oraarch/ORCLSTB1_1_141_896707677.arc
Archive Log          190    16-FEB-16          /u01/app/oracle/oraarch/ORCLSTB1_1_145_896707677.arc
Archive Log          191    16-FEB-16        

RMAN> delete noprompt obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=32 device type=DISK
Deleting the following obsolete backups and copies:
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Archive Log          183    16-FEB-16          /u01/app/oracle/oraarch/ORCLSTB1_1_140_896707677.arc
Archive Log          189    16-FEB-16          /u01/app/oracle/oraarch/ORCLSTB1_1_141_896707677.arc
Archive Log          190    16-FEB-16          /u01/app/oracle/oraarch/ORCLSTB1_1_145_896707677.arc
Archive Log          191    16-FEB-16          
deleted archived log
archived log file name=/u01/app/oracle/oraarch/ORCLSTB1_1_140_896707677.arc RECID=183 STAMP=903955885
deleted archived log
archived log file name=/u01/app/oracle/oraarch/ORCLSTB1_1_141_896707677.arc RECID=189 STAMP=903956192
deleted archived log
archived log file name=/u01/app/oracle/oraarch/ORCLSTB1_1_145_896707677.arc RECID=190 STAMP=903956192
deleted archived log
archived log file name=/u01/app/oracle/oraarch/ORCLSTB1_1_148_896707677.arc RECID=191 STAMP=903956192

Deleted 4 objects

Categories: DBA Blogs

Restore and recover database from RMAN backup - DB in No archive log mode

Learn DB Concepts with me... - Thu, 2016-02-25 10:19
Simple example for restore and recover database from RMAN backup.

Assuming that : I have a need to restore my DB as some important tables/data was dropped/removed and its really mission critical. I have already got some RMAN backups from past.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1728053248 bytes
Fixed Size                  3046512 bytes
Variable Size            1224737680 bytes
Database Buffers          486539264 bytes
Redo Buffers               13729792 bytes
Database mounted.

C:\Users\oracle>rman target /

Recovery Manager: Release - Production on Fri Feb 19 12:46:12 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCLDB (DBID=1196XXXX70)

RMAN> list backup summary;

using target database control file instead of recovery catalog

List of Backups
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
20      B  F  A DISK        05-FEB-15       1       1       NO         TAG20160205T155541
21      B  F  A DISK        05-FEB-15       1       1       YES        FULL_ORCLDB
22      B  F  A DISK        05-FEB-15       1       1       YES        FULL_ORCLDB
23      B  F  A DISK        05-FEB-15       1       1       YES        FULL_ORCLDB
24      B  F  A DISK        05-FEB-15       1       1       NO         TAG20160205T160507

RMAN> restore database from TAG='FULL_ORCLDB';

Starting restore at 19-FEB-15
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00003 to E:\APP\ORADATA\ORCLDB\UNDOTBS01.DBF
channel ORA_DISK_1: restoring datafile 00119 to E:\APP\ORADATA\TLAPP.DBF
channel ORA_DISK_1: reading from backup piece E:\APP\BACKUP\ORCLDB_BKP_0UQT63BQ_1_1
channel ORA_DISK_1: piece handle=E:\APP\BACKUP\ORCLDB_BKP_0UQT63BQ_1_1 tag=FULL_ORCLDB
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:03:15
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00120 to E:\APP\ORADATA\TLLARGE.DBF
channel ORA_DISK_1: restoring datafile 00121 to E:\APP\ORADATA\TLWORK.DBF
channel ORA_DISK_1: restoring datafile 00122 to E:\APP\ORADATA\WAAPP.DBF
channel ORA_DISK_1: restoring datafile 00123 to E:\APP\ORADATA\ORCLDB\PSMATVW.DBF
channel ORA_DISK_1: reading from backup piece E:\APP\BACKUP\ORCLDB_BKP_0VQT63ER_1_1
channel ORA_DISK_1: piece handle=E:\APP\BACKUP\ORCLDB_BKP_0VQT63ER_1_1 tag=FULL_ORCLDB
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:04:36
Finished restore at 19-FEB-15

Now lets try opening DB (I know it wont work but lets try..)

SQL> alter database open;
alter database open
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'E:\APP\ORADATA\ORCLDB\SYSTEM01.DBF'

Now lets Recover DB

RMAN> recover database;

Starting recover at 19-FEB-15
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 3390 is already on disk as file E:\APP\ORADATA\ORCLDB\REDO03.LOG
archived log for thread 1 with sequence 3391 is already on disk as file E:\APP\ORADATA\ORCLDB\REDO01.LOG
archived log for thread 1 with sequence 3392 is already on disk as file E:\APP\ORADATA\ORCLDB\REDO02.LOG
RMAN-08187: WARNING: media recovery until SCN 69107847 complete
Finished recover at 19-FEB-15

SQL> alter database open;
alter database open
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open


We must open the database in open resetlogs mode. in order for recover to be completed. (WE CAN DO OPEN RESET LOGS FROM RMAN or SQLPLUS PROMPT)
RMAN> alter database open resetlogs;

Statement processed

RMAN> exit

Recovery Manager complete.

Categories: DBA Blogs

Learning To Be Present

Pythian Group - Thu, 2016-02-25 10:06


“Realize deeply that the present moment is all you ever have.”  – Eckhart Tolle

You’re writing an email, reading instant messages, updating your LinkedIn status, listening to music, taking a call, and thinking about what you’re going to order for dinner tonight, all within the same five minutes.

You think, “Gosh I’m productive!” when in fact, you’re really quite the opposite. We now live in a world where information is at our finger tips and we are no longer spending countless hours reading and/or watching TV. Instead, the majority of us are getting lost in the vortex of surfing the web, and as a result it’s becoming increasingly challenging to remain present.

Being present took on a new meaning for me when I became a parent. Throughout his small but vast life to-date, my son has been inadvertently teaching me to keep me calm, clear, and focused on the present moment. Kids don’t care what time it is, or that you have an important call to make. They’re oblivious to your fast approaching project deadline, or that your favorite TV show is starting in five minutes. They exist in their own ‘now’ and generally focus on one moment and experience at a time…even if that moment or experience only lasts 60 seconds.

My eighteen-month-old son consistently reminds me of the importance of being truly present.  He requires me to be focused on one thing, and hearing what he is trying to express within the scope of his current 30 word vocabulary. He doesn’t waste time when he expresses his wants, needs, and feelings. He’ll call me out on my distractions when other people in my life won’t. He acts with purpose when he does things that he knows he shouldn’t, just to bring me back to the moment that we’re in. It’s a very effective method (to say the least), and has helped me gain perspective on being mindful, while also reminding me how important my time is with him. This valuable lesson has spilled over into my day-to-day life.

Now, when I’m on a call, or have one of my team members in my office, I’m listening and hearing them. I make a conscious effort to turn my computer monitor, turn off my email notifications and transfer my phone calls to voicemail until I’ve completed the task at hand. When I’m writing an email, I don’t reply to instant messages until I click ‘send’. When I’m hosting a meeting, I’m fully present in it, which often leads to a reasonably early wrap-up, and a beneficial gift of time to those in attendance.

I still have a distance to go in my efforts to master being present. My level of self-awareness and my relationship with my son are what give me the focus to continue to up my game. The next time you are juggling five tasks at once, challenge yourself to focus on only one, with a heartfelt intent to be truly present.

“Wherever you are, be all there.” – Jim Elliot

Categories: DBA Blogs

database switch over using dgmgrl

Learn DB Concepts with me... - Thu, 2016-02-25 09:59
Perform a switch over test:


SQL> select name,open_mode,db_unique_name ,DATABASE_ROLE from v$database;

--------- -------------------- ------------------------------ ----------------


SQL> select name,open_mode,db_unique_name ,DATABASE_ROLE from v$database;

--------- -------------------- ------------------------------ ----------------


DGMGRL> switchover to 'ORCLSTB1';
Performing switchover NOW, please wait...
Operation requires a connection to instance "ORCLSTB1" on database "ORCLSTB1"
Connecting to instance "ORCLSTB1"...
New primary database "ORCLSTB1" is opening...
Operation requires startup of instance "ORCLPRIM" on database "ORCL"
Starting instance "ORCLPRIM"...
Unable to connect to database
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Warning: You are no longer connected to ORACLE.

Please complete the following steps to finish switchover:
start up and mount instance "ORCLPRIM" of database "ORCL"


I have to start it manually coz dgmgrl was unable to connect to lsnr after role transfer.

SQL> startup mount;
ORACLE instance started.

Total System Global Area  885211136 bytes
Fixed Size    2258320 bytes
Variable Size  566233712 bytes
Database Buffers  310378496 bytes
Redo Buffers    6340608 bytes
Database mounted

SQL> select name,open_mode,db_unique_name ,DATABASE_ROLE from v$database;

--------- -------------------- ------------------------------ ----------------


SQL> select name,open_mode,db_unique_name ,DATABASE_ROLE from v$database;

--------- -------------------- ------------------------------ ----------------

Categories: DBA Blogs

Links for 2016-02-23 []

Categories: DBA Blogs

Partner Webcast - Oracle Digital Transformation: We know why, let’s talk about how

Digital Transformation is being talked about everywhere. It is the hot topic not only in IT Industry but across Industries! Why? Because those companies who master the transformation … digital...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Security in the age of cloud computing: How IT leaders are mitigating risks

Pythian Group - Tue, 2016-02-23 13:37


As the world becomes more connected, and more systems are moving to the cloud, security and data privacy are major considerations for IT leaders. But with the ever-present possibility of security threats, vulnerabilities and privacy exposures, how are today’s CIOs and IT leaders securing their systems?


According to a recent survey by the Cloud Security Alliance, despite concerns about the security of corporate data moving to the cloud, just 35.0 percent of IT leaders said that cloud-based systems of record are less secure than their on-premises counterparts. And 64.9 percent said that the cloud is either more secure than on-premises software or equally secure.


This is not to say that these same IT leaders don’t see security as a major concern as they evolve their systems onto the cloud. While businesses have to focus on innovation and agility to gain competitive advantage, the question of security has to be addressed alongside innovation. But the key is to address security without slowing the business down.


So what are IT leaders doing to secure their systems as they move forward with business and IT innovations? I had the opportunity to discuss this with IT leaders from RMS and Mozilla, during Pythian’s CIO panel discussion in November 2015.


Cory Isaacson, CTO at RMS has been working on bringing big data and scalable systems together to create a new cloud-based platform, and says his customers — some of the world’s largest insurance companies — are constantly concerned about threats to their data. This is an industry rife with concerns over privacy exposures because of the nature of data being handled. RMS runs catastrophe models for their customers, like scenarios that will tell insurance companies what a disaster like an earthquake or hurricane might cost them.


One of the biggest fears on the minds of Isaacson’s customers is about the security of their data. “The best idea is to not have anything that’s worth stealing. We’re looking at techniques that will keep the data encrypted from the user’s browser all the way into the database. If we can solve the data security issue simply by not having anything worth stealing, then that’s much better and much safer. Just take all the confidential and proprietary information and encrypt it end-to-end, and work with it on an encrypted basis,” he said.


RMS is betting on this encryption strategy for the longer term. But, it’s not an easy one to implement. Isaacson admits that it’s going to take some doing, and he hopes that after following ISO standards, going through quality gates and adhering to all of the industry prescribed protections and processes, that he will have some robust security in place.


Sean Rich, director of IT at Mozilla, is leading their application services group, and is facing the question of how to automate security within their day to day processes.  “Just like agile development found ways to build quality assurance into the process and DevOps found ways to build operations into the process, we now need a way to build greater security into the process. The definition of working software has evolved to include all three: quality, runtime and security,” said Rich.


Aaron Lee, the Chief Data Officer at Pythian, believes that we all need to think about automating security, just as we do with things like QA.  “When it comes to security, the cost of inaction is super high and the risk of inaction is super high,”  Lee said.


According to Lee, many IT leaders think the idea of automating security is not feasible. “I think the idea of depending on humans to do the same thing over and over again is nuts,” he said, referring to the manual effort that goes into security.


“The idea that a static security design can secure a dynamic engineering environment is an incredible mismatch,” he added.


Lee’s team at Pythian spends a lot of time with clients trying to figure out how to parse the regulatory requirements to automate as much as possible.


And Lee asserted that companies don’t always know what they’re getting into when they host their data with a third party.


“My favorite example is single tenant versus multi tenant. Single tenant sounds safer because all your data is in one place. But it’s all multi tenant in the end, you’re all sharing the building, you’re all sharing the earth so it’s hard to make that distinction. For a certain part of the SaaS industry, this is an important marketing distinction. But in reality, it’s meaningless. A data isolation strategy that might be implemented hypothetically could involve my business being a tenant on some multi tenant infrastructure in which I have a set of keys that I control and that are specific to my business. My third party doesn’t have access to those keys, and they are not shared by any other tenant. But in reality, are we all sharing the same set of keys?  And how are those keys and secrets managed? Are they sitting on a high risk security module somewhere on somebody’s laptop? Is it some combination of all that? When you start looking at what vendors promise from a security standpoint and what the industry talks about, it doesn’t always match reality,” he said.


These are the issues Pythian faces every day when assessing the vulnerability of a company’s IT systems.


Companies that are serious about security aren’t just checking compliance boxes to make auditors happy, according to Lee.  They’re getting experts like Pythian to address their issues in a constructive way.


“My observation is that the enterprise world at large has gone from needing to have people in place to check boxes to being at true risk of fundamental intellectual property leakage and for that matter, business continuity problems,” Lee said.


But most companies aren’t putting resources on automating. They’re hiring. And according to the Cloud Security Alliance survey, IT leaders see the biggest barrier to mitigating threats to their systems and data is not a limitation in security technology — it’s a human resource limitation. The survey results suggested that companies just can’t find enough security professionals to fill their hiring needs.


In keeping with that trend, The Wall Street Journal recently reported that JPMorgan expected to spend $500 million on cyber security in 2015, double its 2014 budget of $250 million.


While companies continue to spend their security budgets on hiring, Mozilla’s Sean Rich agrees that there’s a need for automation to deal with vulnerability.


“This need has driven transformation. We’ve moved from maintaining compliance to actually dealing with practical threats and with real consequences, looking at every single attack vector and how to mitigate them,” said Rich.


“Finding the answer to mitigating IT risks won’t be easy, and will continue to be costly,” said Pythian’s Aaron Lee.


“The best, most sophisticated zero day vulnerability attacks are utterly widespread in a month, so we all eventually get to feel the enormity of those attacks. Security is the biggest expense and it’s rising. It’s only going to continue to get worse because there are a lot of people who make a lot of money by hacking into systems, and they keep getting better at it.”

Categories: DBA Blogs

Oracle ADF - Getting started with Maven

As a developer you will find yourself working on commercial project which requires you to use build tools. If you want to have Continuous Integrated Delivery of your application on Oracle Cloud...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Links for 2016-02-22 []

Categories: DBA Blogs

Google Cloud Dataproc in ETL pipeline – Part 1 (Logging)

Pythian Group - Mon, 2016-02-22 11:16


Google Cloud Dataproc, now generally available, provides access to fully managed Hadoop and Apache Spark clusters, and leverages open source data tools for querying, batch/stream processing, and at-scale machine learning. To get more technical information on the specifics of the platform, refer to Google’s original blog post and product home page.


Having access to fully managed Hadoop/Spark based technology and powerful Machine Learning Library (MLlib) as part of Google Cloud Platform makes perfect sense as it allows you to reuse existing code and helps many to overcome the fear of being “locked into” one specific vendor while taking a step into big data processing in the cloud. That said, I would still recommend evaluating Google Cloud Dataflow first while implementing new projects and processes for its efficiency, simplicity and semantic-rich analytics capabilities, especially around stream processing.


When Cloud Dataproc was first released to the public, it received positive reviews. Many blogs were written on the subject with few taking it through some “tough” challenges on its promise to deliver cluster startup in “less than 90 seconds”. In general the product was well received, with the overall consensus that it is well positioned against the AWS EMR offering.


Being able, in a matter of minutes, to start Spark Cluster without any knowledge of the Hadoop ecosystem and having access to a powerful interactive shell such as Jupyter or Zeppelin is no doubt a Data Scientist’s dream. But with extremely fast startup/shutdown, “by the minute” billing and widely adopted technology stack, it also appears to be a perfect candidate for a processing block in bigger ETL pipelines. Orchestration, workflow engine, and logging are all crucial aspects of such solutions and I am planning to publish a few blog entries as I go through evaluation of each of these areas starting with Logging in this blog.


Cloud Dataproc Logging

Cluster’s system and daemon logs are accessible through cluster UIs as well as through SSH-ing to the cluster, but there is a much better way to do this. By default these logs are also pushed to Google Cloud Logging consolidating all logs in one place with flexible Log Viewer UI and filtering. One can even create custom log-based metrics and use these for baselining and/or alerting purposes. All cluster logs are aggregated under a “dataproc-hadoop” tag but “structPayload.filename” field can be used as a filter for specific log file.


In addition to relying on Logs Viewer UI, there is a way to integrate specific log messages into Cloud Storage or BigQuery for analysis. Just to get an idea on what logs are available by default, I have exported all Cloud Dataproc messages into BigQuery and queried new table with the following query:



structPayload.filename AS file_name,

count(*) AS cnt





AND metadata.labels.value = ‘cluster-2:205c03ea-6bea-4c80-bdca-beb6b9ffb0d6’




  • hadoop-hdfs-namenode-cluster-2-m.log
  • yarn-yarn-nodemanager-cluster-2-w-0.log
  • container_1455740844290_0001_01_000004.stderr
  • hadoop-hdfs-secondarynamenode-cluster-2-m.log
  • hive-metastore.log
  • hadoop-hdfs-datanode-cluster-2-w-1.log
  • hive-server2.log
  • container_1455740844290_0001_01_000001.stderr
  • container_1455740844290_0001_01_000002.stderr
  • hadoop-hdfs-datanode-cluster-2-w-0.log
  • yarn-yarn-nodemanager-cluster-2-w-1.log
  • yarn-yarn-resourcemanager-cluster-2-m.log
  • container_1455740844290_0001_01_000003.stderr
  • mapred-mapred-historyserver-cluster-2-m.log


Google Cloud Logging is a customized version of fluentd – an open source data collector for unified logging layer. In addition to system logs and its own logs, fluentd is configured (refer to /etc/google-fluentd/google-fluentd.conf on master node) to tail hadoop, hive, and spark message logs as well as yarn application logs and pushes them under “dataproc-hadoop” tag into Google Cloud Logging.

Application Logging

You can submit a job to the cluster using Cloud Console, Cloud SDK or REST API. Cloud Dataproc automatically gathers driver (console) output from all the workers, and makes it available through Cloud Console. Logs from the job are also uploaded to the staging bucket specified when starting a cluster and can be accessed from there.


Note: One thing I found confusing is that when referencing driver output directory in Cloud Dataproc staging bucket you need Cluster ID (dataproc-cluster-uuid), however it is not yet listed on Cloud Dataproc Console. Having this ID or a direct link to the directory available from the Cluster Overview page is especially critical when starting/stopping many clusters as part of scheduled jobs. One way to get dataproc-cluster-uuid and a few other useful references is to navigate from Cluster “Overview” section to “VM Instances” and then to click on Master or any worker node and scroll down to “Custom metadata” section. Indeed, you can also get it using “gcloud beta dataproc clusters describe <CLUSTER_NAME> |grep clusterUuid” command but it would be nice to have it available through the console in a first place.


The job (driver) output however is currently dumped into console ONLY (refer to  /etc/spark/conf/ on master node) and although accessible through Dataproc Job interface, it is not currently available in Cloud Logging.


The easiest way around this issue, which can be easily implemented as part of Cluster initialization actions, is to modify /etc/spark/conf/ by replacing  “log4j.rootCategory=INFO, console” with “log4j.rootCategory=INFO, console, file” and add the following appender:


# Adding file appender




log4j.appender.file.layout.conversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n


Existing Cloud Dataproc fluentd configuration will automatically tail through all files under /var/log/spark directory adding events into Cloud Logging and should automatically pick up messages going into /var/log/spark/spark-log4j.log.


You can verify that logs from the job started to appear in Cloud Logging by firing up one of the examples provided with Cloud Dataproc and filtering Logs Viewer using the following rule:




If after this change messages are still not appearing in Cloud Logging, try restarting fluentd daemon by running “/etc/init.d/google-fluentd restart” command on master node.

Once changes are implemented and output is verified you can declare logger in your process as:

import pyspark

sc = pyspark.SparkContext()

logger =

and submit the job redefining logging level (INFO by default) using “–driver-log-levels”.

Learn more here.

Categories: DBA Blogs

PRKO-2002 : Small Things, Large Shadows

Pythian Group - Mon, 2016-02-22 09:28

This quick and short blog post is a reminder to remember the syntax. I was just trying to bring down local listeners at half rack of Exadata (11gR2) to perform some maintenance. The best way to shutdown resources in RAC is to use srvctl so was trying to run following after setting Grid home as Oracle home, from where the local listeners were running:

[oracle@mytestexa1:~]$ srvctl stop listener LISTENER
PRKO-2002 : Invalid command line option: LISTENER

[oracle@mytestexa1:~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): mytestexa2,mytestexa4,mytestexa1,mytestexa3

Hmmmm. ok, then I tried this:

[oracle@mytestexa1:~]$ srvctl stop listener LISTENER -n mytestexa1
PRKO-2002 : Invalid command line option: LISTENER

Well, ummm ok, then I tried this one:

[oracle@mytestexa1:~]$ srvctl stop listener -n mytestexa1
[oracle@mytestexa1:~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): mytestexa2,mytestexa4,mytestexa3

And it worked. Yes, if the default listener name is not in use, then -l listener_name should be used.

So need to remember the syntax :) . Small things, but in emergencies they cast large shadows for a while.

Categories: DBA Blogs

A View From Down Under: What IT Innovation and Agility Mean to IT Pros in Australia

Pythian Group - Mon, 2016-02-22 06:38

The impact of today’s disruptive technologies on how we do business led Pythian to introduce Velocity of Innovation, a series of panel discussions for senior IT professionals. These are Pythian’s exclusive thought-leadership events where a moderator-led panel engages in lively conversations around today’s emerging technologies.

We recently took our Velocity of Innovation event series to Sydney, Australia to find out what’s on the minds of CIOs and other IT leaders in that region. Our moderator for this event was Tom McCann, senior customer experience analyst with Forrester. You may be thinking: Why would a customer experience analyst be moderating a CIO panel? But here’s why we chose to put a customer experience expert at the centre of this discussion. With the commoditization of products and services, delivering a quality customer experience is one of the few ways that companies now have of truly differentiating themselves from their competitors. From attracting prospects, to acquiring customers, to onboarding and ongoing support, companies need to have the right tools and processes in place to ensure products or services perform as promised, and that every interaction customers have with them is satisfying. They also need to be able to use data to respond to customer needs and market trends. IT plays a critical role in of all of these areas.

The panel consisted of IT leaders with a range of backgrounds and experiences:

  • Francisco Alvarez, vice president, APAC at Pythian
  • Chris Mendes, executive consultant big data and analytics from Industrie IT
  • Tim Sheedy, principal analyst with Forrester Research

As we do to start off many of these events, we began the Sydney discussion by asking this question: Innovation and agility are very common buzzwords that seem to describe what everyone wants. What have you explicitly done to get better at innovating or to make your organization or your customer’s more agile? How has this push for agility impacted your enterprise architecture?

Here’s an excerpt from the discussion that followed.

Chris: I actually don’t think there’s enough innovation in Australia, in particular. There’s a lot of talk about it, people are actually doing a lot of experiments and there are some companies who’ve set up business purely giving them the tool sets to use their data to innovate. You’ve got a few things that seem to be working against that at the moment and I think one of the things working against it is that it doesn’t stand on its own.

It doesn’t matter what the technology is, and in fact what ends up happening is it all comes down to the “Why?” Because you really need to have the technologists interacting with business people so that when they come up with an idea they get immediate feedback. I’ve found that a lot of businesses don’t have time for innovation. They run pretty lean, they are very focused on day-to-day operations and don’t have time to come up with new ideas or to use their data to actually innovate. There are a lot of roadblocks there. I don’t see it as a technology problem, I actually see one of the roadblocks as a management issue. The other one is sort of the culture of no failure, and I think we’ve got a big issue with that in Australia.

Tim: As you are saying, I’ve presented a lot on innovation and Forrester actually has a really good innovation model. We spoke to about 130 companies around the globe who did bits of innovation well and we took all those bits and made them into an end-to-end process that can be used by our clients. We learned a lot from putting this together. We spoke to Telstra in Australia and did the piece on deciding what to innovate with, and where to invest. Other companies told us about how they come up with ideas. I speak to a lot of clients who told us that they had an innovation program in place where they gathered ideas from staff. But now those employees don’t trust their companies anymore because they never got back to them on how they would implement their ideas.

Chris: I think the other thing is I think there’s also a very big misunderstanding at board levels about innovation because boards are there to actually stop you changing your business. The fundamental tenant is: “We’ve got a great business model here, it’s running well, we’ve got to make sure that any change to it doesn’t damage that.” There’s a natural caution at board levels and it’s totally understandable.

Audience: One of the issues I guess that you just raised is that the boards are risk adverse, however if their business is failing, then there is obviously a need for them to do something about it.

Tim: But that’s a bad time to be innovating, right? When everything is going wrong, “We should try to do things differently.” The market’s probably left you behind by that point.

Francisco: The main problem that most of you have hit upon is that innovations equal risk. But take a step back and look at the companies that are really doing well in the market. They’re doing really well because of one factor that differentiates them: they were not afraid to try to innovate. And because of that innovation they are getting their share of the market and gaining ground. Just look at the financial market. CBA was considered crazy a few years ago for all the investment they were doing in technology, social media, apps and so on. They got ahead. And now everybody is trying to do the same.The problem is they need to have a balance. Even with choosing vendors, companies will avoid risk. The will keep going with IBM because they don’t want to take a risk. Or they keep Oracle and let Oracle do everything for them because there might be a risk with moving to another vendor. If they already own a technology, it doesn’t matter if they are not getting good service. They think that for risk capacity, there is insurance. Sometimes you’re not saving money or not improving, but you don’t want to innovate simply because you don’t want to increase the risk to the business.

Tim: Innovation is risk management, this is it. The discussion went further on this topic, and explored areas like the future of IT, security and more.

Interested in being a part of a discussion like this one? Velocity of Innovation is a series of thought-leadership events for senior IT management hosted by Pythian. Pythian invites leading IT innovators to participate in discussions about today’s disruptive technologies: big data, cloud, advanced analytics, DevOps, and more. These events are by invitation only.

If you are interested in attending an upcoming Velocity of Innovation event in a city near you, please contact To view our schedule of upcoming events visit out Velocity of Innovation page.

Categories: DBA Blogs

Compression -- 1 : BASIC Table Compression

Hemant K Chitale - Sun, 2016-02-21 08:30
Now, starting a new series of blog posts on Compression.

Beginning with  BASIC Table Compression.

Basic Table Compression was introduced in 9i Release 2 (9.2.0).  It is free with the Enterprise Edition.  Basic Compression works only with Direct Path (Bulk Load) INSERTs.  It does not apply to "normal" INSERT and UPDATE operations.
Basic Compression is actually "de-duplication".  It identifies values that are repeated within the same database block and replaces them with tokens mapped to the list of values.   Note the section that I have underlined.  De-duplication does not span database blocks.

Here is a first demo of Basic Compression using INSERT /*+ APPEND */  (for Direct Path Insert).

I start with a table that is a multiplied copy of DBA_OBJECTS, named as SOURCE_DATA.

PDB1@ORCL> show user
PDB1@ORCL> select count(*) from source_data;


PDB1@ORCL> select tablespace_name, segment_type, bytes/1048576
2 from user_segments
3 where segment_name = 'SOURCE_DATA';

------------------------------ ------------------ -------------


I then create table to hold compressed data.  Note (let me repeat again : Basic Compression works only with DIRECT PATH INSERT).

PDB1@ORCL> create table compressed_1 tablespace hemant compress as select * from source_data where 1=2;

Table created.

PDB1@ORCL> insert /*+ APPEND */ into compressed_1
2 select * from source_data;

364496 rows created.

PDB1@ORCL> commit;

Commit complete.

PDB1@ORCL> col segment_name format a30
PDB1@ORCL> select segment_name, segment_type, bytes/1048576
2 from user_Segments
3 where tablespace_name = 'HEMANT';

------------------------------ ------------------ -------------


So, a 49MB table is compressed down to 12MB.  What if I UPDATE these rows ?

PDB1@ORCL> update compressed_1
2 set owner = owner;

364496 rows updated.

PDB1@ORCL> commit;

Commit complete.

PDB1@ORCL> select segment_name, segment_type, bytes/1048576
2 from user_segments
3 where tablespace_name = 'HEMANT';

------------------------------ ------------------ -------------


Updating only a single column without changing the length of the data in that column has increased the size of the table.

Let me continue the UPDATE experiment further, without increasing the length of data in any columns.

PDB1@ORCL> update compressed_1
2 set owner = owner, object_name = object_name, subobject_name = subobject_name, object_id = object_id,
3 data_object_id = data_object_id, object_type = object_type, created = created
4 /

364496 rows updated.

PDB1@ORCL> commit;

Commit complete.

PDB1@ORCL> select segment_name, segment_type, bytes/1048576
2 from user_segments
3 where tablespace_name = 'HEMANT';

------------------------------ ------------------ -------------


No further degradation ? (unless my UPDATE expands the size of data in one or more column(s)).

Note a difference between the two tables :

PDB1@ORCL> select table_name, compression, compress_for, pct_free
2 from user_tables;

------------------------------ -------- ------------------------------ ----------


The Compressed table is created with PCT_FREE=10.  PCT_FREE=0  (Note that this can also have an impact when you issue UPDATEs subsequently, UPDATEs that increase the size of data in one or more columns).

UPDATE : In 12c the syntax for BASIC Compression is "ROW STORE COMPRESS BASIC". Simply specifying "COMPRESS" or "COMPRESS BASIC" is supported for backward compatibility.


Categories: DBA Blogs