Skip navigation.

Feed aggregator

Oracle Security Training In York

Pete Finnigan - Wed, 2016-04-27 14:50

We ran a five day Oracle Security training event in York, England from September 21st to September 25th at the Holiday Inn hotel. This proved to be very successful and good fun. The event included back to back teaching by....[Read More]

Posted by Pete On 22/10/15 At 08:49 PM

Categories: Security Blogs

New Presentation - Building Practical Oracle Audit Trails

Pete Finnigan - Wed, 2016-04-27 14:50

I wrote a presentation on designing and building practical audit trails back in 2012 and presented it once and then never again. By chance I did not post the pdf's of these slides at that time. I did though some....[Read More]

Posted by Pete On 01/10/15 At 05:16 PM

Categories: Security Blogs

Protect Your APEX Application PL/SQL Source Code

Pete Finnigan - Wed, 2016-04-27 14:50

Oracle Application Express is a great rapid application development tool where you can write your applications functionality in PL/SQL and create the interface easily in the APEX UI using all of the tools available to create forms and reports and....[Read More]

Posted by Pete On 21/07/15 At 04:27 PM

Categories: Security Blogs

How to recover space from already deleted files

Pythian Group - Wed, 2016-04-27 13:15

Wait, what? Deleted files are gone, right? Well, not so if they’re currently in use, with an open file handle by an application. In the Windows world, you just can’t touch it, but under Linux (if you’ve got sufficient permissions), you can!

Often in the Systems Administration, and Site Reliability Engineering world, we will encounter a disk space issue being reported, and there’s very little we can do to recover the space. Everything is critically important! We then check for deleted files and find massive amounts of space consumed when someone has previously deleted Catalina, Tomcat, or Weblogic log files while Java had them in use, and we can’t restart the processes to release the handles due to the critical nature of the service. Conundrum!

Here at Pythian, we Love Your Data, so I thought I’d share some of the ways we deal with situations like this.

How to recover

First, we grab a list of PIDs with files still open, but deleted. Then iterate over the open file handles, and null them.

PIDS=$(lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq)
for PID in $PIDS; do ll /proc/$PID/fd | grep deleted; done

This could be scripted in an automatic nulling of all deleted files, with great care.

Worked example

1. Locating deleted files:

[root@importantserver1 usr]# lsof | head -n 1 ; lsof | grep -i deleted
 vmtoolsd  2573  root   7u  REG  253,0  9857     65005  /tmp/vmware-root/appLoader-2573.log (deleted)
 zabbix_ag 3091  zabbix 3wW REG  253,0  4        573271 /var/tmp/ (deleted)
 zabbix_ag 3093  zabbix 3w  REG  253,0  4        573271 /var/tmp/ (deleted)
 zabbix_ag 3094  zabbix 3w  REG  253,0  4        573271 /var/tmp/ (deleted)
 zabbix_ag 3095  zabbix 3w  REG  253,0  4        573271 /var/tmp/ (deleted)
 zabbix_ag 3096  zabbix 3w  REG  253,0  4        573271 /var/tmp/ (deleted)
 zabbix_ag 3097  zabbix 3w  REG  253,0  4        573271 /var/tmp/ (deleted)
 java      23938 tomcat 1w  REG  253,0  0        32155  /opt/log/tomcat/catalina.out (deleted)
 java      23938 tomcat 2w  REG  253,0  45322216 32155  /opt/log/tomcat/catalina.out (deleted)
 java      23938 tomcat 9w  REG  253,0  174      32133  /opt/log/tomcat/catalina.2015-01-17.log (deleted)
 java      23938 tomcat 10w REG  253,0  57408    32154  /opt/log/tomcat/localhost.2016-02-12.log (deleted)
 java      23938 tomcat 11w REG  253,0  0        32156  /opt/log/tomcat/manager.2014-12-09.log (deleted)
 java      23938 tomcat 12w REG  253,0  0        32157  /opt/log/tomcat/host-manager.2014-12-09.log (deleted)
 java      23938 tomcat 65w REG  253,0  363069   638386 /opt/log/archive/athena.log.20160105-09 (deleted)

2. Grab the PIDs:

[root@importantserver1 usr]# lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq

Show the deleted files that each process still has open (and is consuming space):

[root@importantserver1 usr]# export PIDS=$(lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq)
[root@importantserver1 usr]# for PID in $PIDS; do ll /proc/$PID/fd | grep deleted; done
 lrwx------ 1 root root 64 Mar 21 21:15 7 -> /tmp/vmware-root/appLoader-2573.log (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/ (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/ (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/ (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/ (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/ (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/ (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 1 -> /opt/log/tomcat/catalina.out (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 10 -> /opt/log/tomcat/localhost.2016-02-12.log (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 11 -> /opt/log/tomcat/manager.2014-12-09.log (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 12 -> /opt/log/tomcat/host-manager.2014-12-09.log (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 2 -> /opt/log/tomcat/catalina.out (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 65 -> /opt/log/archive/athena.log.20160105-09 (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 9 -> /opt/log/tomcat/catalina.2015-01-17.log (deleted)

Null the specific files (here, we target the catalina.out file):

[root@importantserver1 usr]# cat /dev/null > /proc/23938/fd/2
Alternative ending

Instead of deleting the contents to recover the space, you might be in the situation where you need to recover the contents of the deleted file. If the application still has the file descriptor open on it, you can then recover the entire file to another one (dd if=/proc/23938/fd/2 of=/tmp/my_new_file.log) – assuming you have the space to do it!


While it’s best not to get in the situation in the first place, you’ll sometimes find yourself cleaning up after someone else’s good intentions. Now, instead of trying to find a window of “least disruption” to the service, you can recover the situation nicely. Or, if the alternative solution is what you’re after, you’ve recovered a file that you thought was long since gone.

Categories: DBA Blogs

Deploy Docker containers using AWS Opsworks

Pythian Group - Wed, 2016-04-27 12:51

This post is about how to deploy Docker containers on AWS using Opsworks and Docker Composer.
For AWS and Docker, the introduction isn’t required. So, let’s quickly introduce Opsworks and Docker Composer.


Opsworks is a great tool provided by AWS, which runs Chef recipes on your Instances. If the instance is an AWS instance, you don’t pay anything for using Opsworks, but you can also manage instances outside of AWS with a flat cost just by installing the Agent and registering the instance on Opsworks.

Opsworks Instances type

We have three different types of instances on Opsworks:

1. 24x7x365
Run with no stop

2. Time based
Run in a predefined time. Such as work hours.

3. Load based
Scale up and down according to the metrics preconfigured.

You can find more details here.

Custom JSON

Opsworks provides Chef Databags (variables to be used in your recipes) via Custom JSON, and that’s the key to this solution. We will manage everything just changing a JSON file. This file can become a member of your development pipeline easily.

Life cycle

Opsworks has five life cycles:
1. Setup
2. Configure
3. Deploy
4. Undeploy
5. Shutdown
We will use setup, deploy, and shutdown. You can find more details about Opsworks life cycle here.

Docker Compose

Docker Compose was originally developed under the Fig project. Nowadays, the fig is deprecated, and docker-compose is a built-in component of Docker.
Using docker-compose, you can manage all containers and their attributes (links, share volumes, etc.) in a Docker host. Docker-compose can only manage containers on the local host where it is deployed. It cannot orchestrate Docker containers between hosts.
All configuration is specified inside of a YML file.

Chef recipes

Using Opsworks, you will manage all hosts using just one small Chef cookbook. All the magic is in translating Custom JSON file from Opsworks to YML file to be used by docker-compose.
The cookbook will install all components (Docker, pip, and docker-compose), translate Custom JSON to YML file and send commands to docker-compose.

Hands ON

Let’s stop talking and see things happen.

We can split it into five steps:

  1. Resources creation
    1. Opsworks Stack
        1. Log into your AWS account
        2. Go to Services -> Management Tools -> Opsworks
          Accessing Opsworks menu
        3. Click on Add stack (if you already have stacks on Opsworks) or Add your first stack (if it’s the first time you are creating stacks on opsworks)
        4. Select type Chef 12 stack
          Note: The Chef cookbook used in this example only supports Chef12
        5. Fill out stack information
          – You can use any name as stack name
          – Make sure VPC selected are properly configured
          – This solution supports Amazon Linux and Ubuntu
          – Repository URL
        6. Click on advanced if you want to change something. Changing “Use OpsWorks security groups” to No can be a good idea when you need to communicate with instances which are running outside of Opsworks
        7. Click on “Add stack”
    2. Opsworks layer
        1. Click on “Add a layer”
        2. Set Name, Short name and Security groups. I will use webserver

      Use a simple name because we will use this name in next steps
      The Name web is reserved for AWS internal use

        1. Click on “Add layer”


    3. Opsworks Instance
        1. Click on “Instances” on left painel
        2. Click on “Add an instance”
        3. Select the size (instance type)
        4. Select the subnet
        5. Click on “Add instance”


  2. Resources configuration
    1. Opsworks stack
        1. Click on “Stack” on left painel
        2. Click on “Stack Settings”
        3. Click on “Edit”
        4. Find Custom JSON field and paste the content of the file bellow


      1. Click on “Save”
    2. Opsworks layer
        1. Click on “Layers” on left painel
        2. Click on “Recipes”
        3. Hit docker-compose and press enter on Setup
        4. Hit docker-compose::deploy and press enter on Deploy
        5. Hit docker-compose::stop and press enter on Deploy
        6. Click on “Save”


  3. Start
    1. Start instance
        1. Click on start


  4. Tests
    Note: Wait until instance get online state

      1. Open your browser and you should be able to see It works!
      2. Checking running containers


  5. Management
      1. Change custom json to file bellow (See resources configuration=>Opsworks stack)


      1. Click on “Deployments” on left painel
      2. Click on “Run Command”
      3. Select “Execute Recipes” as “Command”
      4. Hit “docker-compose::deploy” as “Recipes to execute”
      5. Click on “Execute Recipes”

    Note: Wait until deployment finish

      1. Checking running containers


Categories: DBA Blogs

Stats History

Jonathan Lewis - Wed, 2016-04-27 06:09

From time to time we see a complaint on OTN about the stats history tables being the largest objects in the SYSAUX tablespace and growing very quickly, with requests about how to work around the (perceived) threat. The quick answer is – if you need to save space then stop holding on to the history for so long, and then clean up the mess left by the history that you have captured; on top of that you could stop gathering so many histograms because you probably don’t need them, they often introduce instability to your execution plans, and they are often the largest single component of the history (unless you are using incremental stats on partitioned objects***)

For many databases it’s the histogram history – using the default Oracle automatic stats collection job – that takes the most space, here’s a sample query that the sys user can run to get some idea of how significant this history can be:

SQL> select table_name , blocks from user_tables where table_name like 'WRI$_OPTSTAT%HISTORY' order by blocks;

TABLE_NAME                           BLOCKS
-------------------------------- ----------
WRI$_OPTSTAT_AUX_HISTORY                 80
WRI$_OPTSTAT_TAB_HISTORY                244
WRI$_OPTSTAT_IND_HISTORY                622

5 rows selected.

As you can see the “histhead” and “histgrm” tables (histogram header and histogram detail) are the largest stats history tables in this (admittedly very small) database.

Oracle gives us a couple of calls in the dbms_stats package to check and change the history setting, demonstrated as follows:

SQL> select dbms_stats.get_stats_history_retention from dual;


1 row selected.

SQL> execute dbms_stats.alter_stats_history_retention(7)

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_stats_history_retention from dual;


1 row selected.

Changing the retention period doesn’t reclaim any space, of course – it simply tells Oracle how much of the existing history to eliminate in the next “clean-up” cycle. This clean-up is controllled by a “savtime” column in each table:

SQL> select table_name from user_tab_columns where column_name = 'SAVTIME' and table_name like 'WRI$_OPTSTAT%HISTORY';


5 rows selected.

If all you wanted to do was stop the tables from growing further you’ve probably done all you need to do. From this point onwards the automatic Oracle job will start deleting the oldest saved stats and re-using space in the existing table. But you may want to be a little more aggressive about tidying things up, and Oracle gives you a procedure to do this – and it might be sensible to use this procedure anyway at a time of your own choosing:

SQL> execute dbms_stats.purge_stats(sysdate - 7);

Basically this issues a series of delete statements (including a delete on the “stats operation log (wri$_optstat_opr)” table that I haven’t previously mentioned) – here’s an extract from an 11g trace file of a call to this procedure (output from a simple grep command):

delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_tab_history          where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_ind_history h        where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_aux_history          where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_opr                  where start_time < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histhead_history     where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histgrm_history      where savtime < :1 and rownum <= NVL(:2, rownum)

Two points to consider here: although the appearance of the rownum clause suggests that there’s a damage limitation strategy built into the code I only saw one commit after the entire delete cycle, and I never saw a limiting bind value being supplied. If you’ve got a large database with very large history tables you might want to delete one day (or even just a few hours) at a time. The potential for a very long, slow, delete is also why you might want to do a manual purge at a time of your choosing rather than letting Oracle do the whole thing on auto-pilot during some overnight operation.

Secondly, even though you may have deleted a lot of data from these table you still haven’t reclaimed the space – so if you’re trying to find space in the sysaux tablespace you’re going to have to rebuild the tables and their indexes. Unfortunately a quick check of v$sysaux_occupants tells us that there is no official “move” producedure:

SQL> execute print_table('select occupant_desc, move_procedure, move_procedure_desc from v$sysaux_occupants where occupant_name = ''SM/OPTSTAT''')

OCCUPANT_DESC                 : Server Manageability - Optimizer Statistics History
MOVE_PROCEDURE                :

So we have to run a series of explicit calls to alter table move and alter index rebuild. (Preferably not when anyone is trying to gather stats on an object). Coding that up is left as an exercise to the reader, but it may be best to move the tables in the order of smallest table first, rebuilding indexes as you go.


*** Incremental stats on partitioned objects: I tend to assume that sites which use partitioning are creating very large databases and have probably paid a lot more attention to the details of how to use statistics effectively and successfully; that’s why this note is aimed at sites which don’t use partitioning and therefore think that the space taken up by the stats history significant.

Server Problems : Any ideas?

Tim Hall - Wed, 2016-04-27 02:54

I’m pretty sure last night’s problem was caused by a disk failure in the RAID array. The system is working now, but it might go down sometime today to get the disk replaced. Hopefully they won’t do what they did last time and wipe the bloody lot!

metered vs un-metered vs dedicated services

Pat Shuff - Wed, 2016-04-27 01:07
One of the newest concepts that has been introduced for cloud services is the concept of un-metered or dedicated services. Before we dive into this subject, let's review what a cloud service really is. When you boil it all down, you are basically leasing computer resources on a computer that you don't own. You are taking a slice of a compute engine, slice of a disk drive, part of a network connection. You are renting space. Think of it as living in an apartment. Yes, this is a silly analogy but if you think about it, it makes sense. You can rent an efficiency, one, two, or three bedroom apartment. You can get parking with or without a roof over your car. You can get a storage closet or a garage to store stuff that you are not using but want to keep around. There are benefits to apartments. You don't have to cut the grass. You typically have access to a pool but don't need to maintain it. If the toilet backs up or the gas stops working you call the super and they come fix it. You still have to replace your own light bulbs that burn out. You still need to clean your own bathroom and kitchen and take out your trash on a regular basis. On the grander scale, you don't need to drop 10% down and get a mortgage to live there. Monthly rents are typically cheaper than paying down a mortgage. Your taxes are bundled into your rent cost. You basically show up, use the apartment and go on with your life. On the flip side, there are drawbacks to apartments. If your upstairs neighbor likes to play heavy metal at 2am or throw wild parties on the weekends it does make it hard to sleep. Someone might park in your parking spot so you need to park farther away from your front door. You can't pull into your garage to unload your groceries and have to potentially carry them in the rain across the parking lot and up the stairs to your third floor apartment. The super might decide that Tuesday they are going to repaint all bathrooms another color and you need to be out of the way for a day and put up with the smell even though you planned a dinner party the next night. It is difficult to grill on your balcony and you can't really sit out without sharing the space with all of your neighbors. The true downfall is that twenty years from now, you will still be renting your apartment (and the rent probably went up every other year) while your college buddies are celebrating a mortgage burning party and the only thing that they owe on a monthly basis is the taxes that the government takes annually.

Yes, our analogy is silly. Yes, our analogy is relevant. It is easy to decide that you want another job in another city so you hire a mover, pack up all your stuff, and move to another apartment. This is where our analogy breaks down. Cloud vendors charge you for every piece of furniture that you take out of the building. They charge you to use the stairs or elevator. They charge you every time a moving van exits the building full of furniture and boxes of clothes. It is free to bring stuff in because it locks you into the apartment. Just don't try to take anything out. Remember that storage closet or garage that you got with your apartment, you can open the door and put stuff in for free but if you carry anything out (even if you just relocate it to your apartment) you get charged per item that you carry across the threshold.

If you look at storage from any cloud vendor they offer a metered storage service. The same is true for compute services. You can lease a virtual processor and memory and grind on data all that you want. The catch is when you want to transfer your files or report results of you analysis to your desktop computer, you get charged on a per gigabyte transferred across the internet. Cost calculators help you calculate these costs but they are a little hard to estimate and use to calculate outbound data charges. Amazon, for example, has a calculators that you can use. The AWS pricing calculator allows you to look at the cost of all cloud services.

Let's walk through the cost of Amazon Glacier. The price list says that you should pay $0.007/GB/month or $7/TB/month to keep things in cold storage. We will use 120 TB as our basis for analysis. We put this as the amount to store and see the cost of storing the data is $860/month.

If we plan on reading back 10% of this data during the month the price goes up to $2217. The bulk of these charges are the outbound charges. The cost goes to $921 if we read the data to an EC2 instance and not all the way back to our data center or desktop computer. To use our apartment analogy, you are paying $860 to get a storage garage. You pay $61 every time you take something out and move it to your apartment. You can put all you want into the storage area (as long as your don't exceed the space of the storage unit) but taking something out will cost you. If you put your recently retrieved item in your car or a truck and drive it out the gate you get a surcharge of $1300. It is important to remember that pulling more stuff out of your storage will cost you more. Putting this in terms of computer archive, you can store all of your emails, contracts, customer transactions, patient records in long term storage. If your on-site storage fails for some reason or if you get a legal request to review five years of data, you can pull the data back from cold storage. It will cost you to pull the data back but it is still cheaper than keeping the seven years of longer data on spinning disk in your data center (estimate $3K/TB plus 10% per year for spinning disk in your data center).

We can do the same calculation for cloud storage using S3. We can store 120 TB for roughly $3950/month. If we want to read back 10%, or 12 TB, of that data, it will cost us $5150 or $1200 additional.

We can reduce the cost by using lower speed storage in the cloud. We put the S3 data into the infrequent category to save money. This drops the cost to just over $3K which does save us about $2100/month. We agree to pay a lower cost to get higher latency and longer retrieval times. It is better than using tape in the cloud and we can save some money with this option.

We can opt for reduced redundancy storage (aka non-mirrored and non-replicated data) but we risk data loss since we will only have a single copy in the cloud. This drops the cost to $4300 with the data retrieval but we have to weigh the cost vs data loss risks.

Let's not pick on Amazon. How does this compare to Azure? Unfortunately, we can't start with Microsoft tape in the cloud, they don't offer the service. We must start with block storage in the Azure cloud. Microsoft has an Azure pricing calculator that you can use to perform the same calculations. The calculator and pricing is a little difficult to use when you first get into it. You basically need to put together the calculation a piece at a time. You need to factor in the cost of the storage and the cost of transferring the data from the cloud to your data center. This is done in two different pieces. An example of what we are looking for can be seen below.

We need to piece together the calculator. First we add the storage component then the bandwidth component. There is a transaction component but this amount is trivial and we are going to ignore it for simplicity.

If we look at the options for Azure storage, we can basically select blob storage in different zones. In the grand scheme of things, the cost is not siginificantly higher one way or another. The basic cost is about the same.

The third class of storage that are going to look at is the Oracle Storage Cloud Service. We can look at Oracle Storage Cloud Service as well as Oracle Archive Cloud Service. The Archive service compares directly to the Amazon Glacier service except that it is $1/TB/month and suffers the same transfer charges for outbound data. The Oracle Storage Cloud Service is similar to the Amazon S3 and Azure Blob Storage Service but it is offered either as a metered service (as is S3 and Blobs) and un-metered services. Unfortunately, Oracle does not provide a cost calculator for general use. The Value Added Distributors are given a copy of the calculator but it is not generally available. The key difference with the Oracle storage services is that there are two significant flavor differences; metered and non-metered. The metered services are charged just like the Microsoft and Amazon services. You pay for what you use on a per GB basis and pay for outbound data transfer. An example of the pricing calculator is shown below. Note that we do need to have a good guestimate on how much data we will transfer outbound across the internet. These charges are not consumed if you are reading the data to a compute engine in the cloud unlike S3 which still consumes cost just for reading the data off the disk.

The most significant differential in storage offerings is the non-metered storage. Oracle offers storage in blocks that you reserve and allocate for 12 months. This is different from the metered storage in that metered can start with 10 TB and grow to 120 TB over the year. With the non-metered storage, you start with 120 TB and end with 120 TB. You can extend your contract and grow storage but you basically extend a new contract for more storage. You can not shrink your storage and pay for less. The benefit of this is that you don't have to pay for outbound data transfer. You can read and write as much as you want and not get charged for transferring the data across the internet. A pricing calculator for this is simple. How much do you need and how long do you need it?

If we piece all of this together and look a price comparison between the three service providers, the answer of which is cheapest comes down to it depends. Oracle non-metered storage has a significant advantage if you are planning on reading back your data at high or unpredictable rates. Amazon S3 infrequent is the cheapest if you don't plan on reading back your data and want it as an insurance policy only. I honestly would go with Glacier or Oracle Archive if this is the case since it is an order of magnitude cheaper. The chart below compares 120 TB of storage and the variable charge for reading back this data on a monthly basis. If you have 120 TB of storage and plan on reading back 120 TB on a monthly basis, the Oracle non-metered storage is significantly cheaper. If you are only planning on reading back 12 to 24 TB per month the cost is about the same for all of the services.

In summary, one option is not clearly better than the other (except for high read rates) and this blog is intended to help you decide on what fits your needs best. Pricing calculators can help with the cost based on transfer rates. It is important to remember that storage transfer is a significant part of the calculation. It is also important to look at your usage model. We assumed that you started with 120 TB and ended with 120 TB for our analysis. If you start with 12 TB and grow to 120 TB, the pricing calculation will be a little different. Neither the Amazon nor Azure calculators will help you run this simulation and you will have to calculate everything on a month by month basis. It is also interesting to take 120 TB of on-premise storage and assume that each TB can be purchased at $3K/TB. If we assume 10% annual hardware maintenance and a three year amortization, the charge for on-premise storage is $1030/month which might be more or less than cloud based storage. Your results might vary.

Percona Live Data Performance Conference 2016 Retrospective

Pythian Group - Tue, 2016-04-26 07:39


Last week the annual Percona Live Data Performance Conference was held in Santa Clara, California. This conference is a great time to catch up with the industry, and be exposed to new tools and methods for managing MySQL and MongoDB.


The highlights from this year’s sessions and tutorials centered around a few technologies:

  • The typical sessions for Galera Cluster and Performance Schema are always getting better, along with visualization techniques.
  • Oracle MySQL’s new Document Store blurs the line between RDBMS and NoSQL.
  • Facebook’s RocksDB is getting smaller and faster.
  • ProxySQL, the new proxy kid on the block, promises to address MySQL scalability issues.
  • If security is a concern, which it should be, Hashicorp’s Vault project would be something to look into for managing MySQL secrets or encrypting data in transit.
  • MongoDB was a hot topic as well, with a number of sessions addressing management of environments and design patterns.

I expect to see an influx of articles regarding ProxySQL and MySQL’s Document Store in the next few months.


The evenings were also great events for networking and socializing, giving attendees the chance to rub shoulders with some of the most successful ‘WebScale’ companies to hear stories from the trenches. Events included the Monday Community Networking Reception and Wednesday’s Game Night.

Thank you to all those who attended the Annual Community Dinner at Pedro’s organized by Pythian on Tuesday night! We had a blast and we hope you did as well.

Community Dinner At Pedro's

Thank you!

Pythian sponsored and provided a great range of sessions this year, and we want to thank all those who stopped by our booth or attended our sessions.

I’d like to give a huge shout-out to Percona for continuing to organize a high-quality MySQL user conference focused on solving some of the toughest technical issues that can be thrown at us, and an equal shout-out to the other sponsors and speakers that play a huge part in making this conference happen.

I am looking forward to what PerconaLive Europe will bring this fall, not to mention what we can expect next year when Percona Live Santa Clara rolls around again.

Categories: DBA Blogs

Optimize ADF HTTP Response Size with ChangeEventPolicy

Andrejus Baranovski - Tue, 2016-04-26 02:18
You should read this post, if you are looking how to reduce ADF HTTP response size. This can be important for ADF application performance tuning, to improve PPR request response time. By default in ADF 12.2.1, iterator is assigned with ChangeEventPolicy = ppr. This works great for UI component bindings refresh, no need to set individual partial triggers. On other side, this generates extra content in ADF HTTP response and eventually increases response size. I would recommend to use ChangeEventPolicy = ppr, only when its really needed - dynamic forms, with unknown refresh dependencies. Otherwise set ChangeEventPolicy = none, to generate smaller response.

I will demonstrate below the difference for ADF HTTP response with ChangeEventPolicy=ppr/none. First let's take a look into page load response size:

Page contains list component and form. Both are based on two different iterators, set with ChangeEventPolicy = ppr. This generates AdfPage.PAGE.updateAutoPPRComponents calls for each UI item, referencing attributes from the iterator. In complex screens, this adds significant amount of extra text to the response, could increase size even by half:

Partial response also contains same calls added to the response. Select list item:

Each item from the form will be referenced by JavaScript call, to register it for refresh:

Let's change it to ChangeEventPolicy = none. Set it for both iterators:

We should set refresh dependencies manually. Form block must be set with PartialTrigger, referencing list component - to refresh, when list selection changes:

Next/Previous buttons dependency also must be set, to change form row:

With manual refresh dependency changes, there are no extra calls added to ADF HTTP response, reducing overall response size:

Same applies for PPR response:

Download sample application -

Please be patient!

Tim Hall - Tue, 2016-04-26 01:46

angry-1300616_640It’s extremely nice to have a big audience. It’s very flattering that people care enough about what I say to be bothered to read it. The problem with having a large audience is people can get a very demanding at times.

making backup better

Pat Shuff - Tue, 2016-04-26 01:07
Yesterday we looked at backing up our production databases to cloud storage. One of the main motivations behind doing this was cost. We were able to reduce the cost of storage from $3K/TB capex plus $300/TB/year opex to $400/TB/year opex. This is a great solution but some customers complain that it is not generic enough and latency to cloud storage is not that great. Today we are going to address both of these issues with the cloud storage appliance. First, let's address both of the typical customer complaints.

The database backup cloud service is just that. It backs up a database. It does it really well and it does it efficiently. You replace one of the backup library modules that translates writes of backup data to the cloud REST api rather than a tape driver. The software works well with commercial products like Symantec or Legato and integrates well into that solution. Unfortunately, the critics are right. The database backup cloud service does that and only that. It backs up Oracle databases. It does not backup MySQL, SQL Server, DB2, or other databases. It is a single use tool. A very useful single use tool but a single use tool. We need to figure out how to make it more generic and backup more than just databases. It would be nice if we could have it backup home directories, email servers, virtual machines, and other stuff that is used in the data center.

The second complaint is latency. If we are writing to an SSD or spinning disk attached to a server via high speed SCSI, iSCSI, or SAS, we should expect 10ms access time or less. If we are writing to a server half way across the country we might experience 80ms latency. This means that a typical read or write takes eight times longer when we read and write cloud storage. For some applications this is not an issue. For others this latency makes the system unusable. We need to figure out how to read adn write at 10ms latency but leverage the expandability of the cloud storage and lower cost.

Enter stage left the Oracle Cloud Storage Appliance. The appliance is a software component that listens on the data center internet using the NFS protocol and talks to the cloud services using the storage REST api. Local disks are used as a cache front end to store data that is written to and read from the network shares exposed by the appliance. These directories map directly to containers in the Oracle Storage Cloud Service and can be clear text or encrypted when stored. Data written from network servers is accepted and released quickly as it is written to local disk and slow tricked to the cloud storage. As the cache fills up, data is aged and migrated from the cache storage into cloud storage. The metadata representing the directory structure and storage location is updated to show that the data is no longer stored locally but stored in the cloud. If a read occurs from the file system, the meta data helps the appliance locate where the data is stored and it is presented to the network client from the cache or pulled from the cloud storage and temporarily stored in the local cache as long as there is space. A block diagram of this architecture is shown below

The concept of how to use this device is simple. We create a container in our cloud storage and we attach to it with the cloud storage appliance. This attachment is exposed via an nfs mount to clients on our corporate network and anyone on the client can read or write files in the cloud storage. Operations happen at local disk speed using the network security of the local network and group/owner rights in the directory structure. It looks, smells, and feels just like nfs storage that we would spend thousands of dollars per TB to own and operate.

For the rest of this blog we are going to go through the installation steps on how to configure the appliance. The minimum requirements for the appliance are

  • Linux 7 (3.10 kernel or later)
  • Docker 1.6.1 or later
  • two dual core x86 CPUs
  • 4 GB of RAM
We will be installing our configuration on a Windows desktop running VirtualBox. We will not go through the installation of Oracle Enterprise Linux 7 because we covered this a long time ago. We do need to configure the OS to have 4 GB of RAM and at least 2 virtual cores as shown in the images below.

We also need to configure a network. We configure two networks. One is for the local desktop console and the other is for the public internet. We could configure a third interface to represent our storage network but for simplicity we only configure two.

We can boot our Linux 7 system and will need to select the 3.10 kernel. By default it will want to boot to the 3.8 kernel which will cause problems in how the appliance works.

What we would like to do is remove the 3.8 kernel from our installation. This is done by removing the packages with the rpm -e command. We then update the grub.cfg file to list only the 3.10 kernels.

Once we have removed the kernels, we update the grub loader and enable additional options for the yum update.

The next step that we need to take is to install docker. This is done with the yum install command.

Once we have the docker package installed, we need to make sure that we have the nfs-client and nfs-server installed and started.

It is important to note that the tar bundle is not generally available. It does require product manager approval to get a copy of the software for installation. The file that I got was labeled oscsa-1.0.5.tar.gz. I had to unzup and untar this file after loading it on my Linux VirtualBox instance. I did not do a screen capture of the download but did go through the installation process.

We start the service with the oscsa command. When we start it it brings up a management web page so that we can make the connection to the cloud storage service. To see this page we need to start firefox and connect to the page.

One of the things that we need to know is the end point of our storage. We can find this by looking at the management console for our cloud services. If we click on the storage cloud service details link we can find it.

Once we have the end point we will need to enter this into the management console of the appliance as well as the cloud credentials.

We can add encryption and a container name for our network share and start reading and writing.

We can verify that everything is working from our desktop by mounting the nfs share or by using cloudberry to examine our cloud storage containers. In this example we use cloudberry just like we did when we looked at the generic Oracle Storage Cloud Services.

We can examine the properties of the container and network share from the management console. We can look at activity and resources available for the caching.

In summary, we looked at a solution to two problems offered by our database backup solution. The first was single purpose and the second was latency. By providing a network share to the data center we can not only backup or Oracle database but all of the databases by having the backup software write to the network share. We can backup other files like virtual machines, email boxes, and home directories. Disk latency operates at the speed of the local disk rather than the speed of the cloud storage. This software does not cost anything additional and can be installed on any virtual platform that supports Linux 7 with kernel 3.10 or greater. When we compare this to the Amazon Storage Gateway which requires 2x the processing power and $125/month to operate it looks significantly better. We did not compare it to the Azure solution because it is an iSCSI hardware solution and not easy to get a copy of for testing.

A Moment of Clarity on the Role of Technology in Teaching

Michael Feldstein - Mon, 2016-04-25 21:26

By Phil HillMore Posts (402)

This following excerpt is based on a post first published at The Chronicle of Higher Education.

With all of the discussion around the role of online education for traditional colleges and universities, over the past month we have seen reminders that key concerns are about people and pedagogy, not technology. And we can thank two elite universities that don’t have large online populations — MIT and George Washington University — for this clarity.

On April 1, the MIT Online Education Policy Initiative released its report,“Online Education: A Catalyst for Higher Education Reforms.” The Carnegie Corporation-funded group was created in mid-2014, immediately after an earlier initiative looked at the future of online education at MIT. The group’s charter emphasized a broader policy perspective, however, exploring “teaching pedagogy and efficacy, institutional business models, and global educational engagement strategies.”

While it would be easy to lament that this report comes from a university with few online students and yet dives into how online learning fits in higher education, it would be a mistake to dismiss the report itself. This lack of “in the trenches” experience with for-credit online education helps explain the report’s overemphasis on MOOCs and its underemphasis on access and nontraditional learner support. Still, the MIT group did an excellent job of getting to some critical questions that higher-education institutions need to address. Chief among them is the opportunity to use online tools and approaches to instrument and enable enhanced teaching approaches that aren’t usually possible in traditional classrooms.

The core of the report, in fact, is based on the premise that online education and online tools can enable advances in effective pedagogical approaches, including constructivism, active learning, flipped classrooms, problem-based learning, and student-centered education. It argues that the right way to use technology is to help professors teach more effectively:

“Technology can support teachers in the application of the relevant principles across a group of students with high variability. In fact, technology can help tailor lessons to the situation in extremely powerful ways.

The instrumentation of the online learning environment to sense the student experience and the ability to customize content on a student-by-student basis may be the key to enabling teachers to provide differentiated instruction, informed by a solid foundation in cognitive science. Modern online courses and delivery platforms already implement some of these concepts, and provide a framework for others.”

But there is value in seeing what happens when that advice is ignored. And that’s where an incident at George Washington University comes in. If technology is just thrown at the problem with no consideration of helping educators to adopt sound pedagogical design, then we can see disasters.

On April 7, four students who took an online program for a master’s degree in security and safety leadership from George Washington’s College of Professional Studies filed a class-action lawsuit against the university for negligence and misleading claims. As reported byThe GW Hatchet, a student newspaper:

For a non-paywall version of the full article, good through 4/26, follow this link.

Update: What interesting timing! See Michelle Pacansky-Brock’s post on very similar topic.

The nature of online classes varies dramatically, much like face-to-face classes. But, in both scenarios, the teacher matters and the teaching matters. When an online class is taught by an engaged and empathetic instructor who seeks to be aware of the needs of her students, the asynchronous nature of online learning may become a benefit to students, not a disadvantage. This is contingent upon the design of the course, which is where instructional designers or “learning engineers” can play an important role. Many instructors, however, play both roles — and those who do are often the professors who experience deep transformations in their face-to-face classes as a result of what they learned from teaching online.

The post A Moment of Clarity on the Role of Technology in Teaching appeared first on e-Literate.

Improve Parsing and Query Performance – Fix Oracle’s Fixed Object Statistics

Pythian Group - Mon, 2016-04-25 19:50

What do I mean by ‘fix’ the the fixed object statistics?  Simply gather statistics to help the optimizer.

What are ‘fixed objects’?  Fixed objects are the x$ tables and associated indexes that data dictionary views are based on.  In this case we are interested in the objects that make up the v$sqlstats and v$sql_bind_capture views.

If you’ve never before collected statistics on Oracle fixed object, you may be wondering why you should bother with it, as everything appears to be fine in your databases.

After seeing an example you may want to schedule a time to collect these statistics.

Searching for SQL

Quite recently I was looking for recently executed SQL, based on the most recently captured bind variables.

select  sql_id, sql_fulltext
from v$sqlstats
where sql_id in (
   select  distinct sql_id
   from (
      select sql_id, last_captured
      from (
         select sql_id, last_captured
         from V$SQL_BIND_CAPTURE
         order by last_captured desc nulls last
      where rownum <= 20

I ran the query and was distracted for a few moments.  When I next looked at the terminal session where this SQL was executing, no rows had yet been returned.

Thinking that maybe ‘SET PAUSE ON’ had been run, I pressed ENTER.  Nothing.

From another session I checked for waits in v$session_wait.  Nothing there either.  If the session is not returning rows, and not registering and event in v$session_wait, then it must be on CPU.

This didn’t seem an ideal situation, and so I stopped the query with CTRL-C.

The next step was to run the query on a smaller and not very busy database.  This time I saw that rows were being returned, but very slowly.

So now it was time to trace the execution and find out what was going on.

alter session set tracefile_identifier='JKSTILL';

set linesize 200 trimspool on

alter session set events '10046 trace name context forever, level 12';

select  sql_id, sql_fulltext
from v$sqlstats
where sql_id in (
   select  distinct sql_id
   from (
      select sql_id, last_captured
      from (
         select sql_id, last_captured
         from V$SQL_BIND_CAPTURE
         order by last_captured desc nulls last
      where rownum <= 20

alter session set events '10046 trace name context off';


Coming back to this several minutes later, the resulting trace file was processed with the Method R Profiler to find out just where the time was going.




The ‘SQL*Net message from client’ event can be ignored; most of that time was accumulated waiting for me to come back and exit sqlplus.  While the script example shows that the 10046 trace was turned off and the session exited, I had forgot to include those two line for this first run.

No matter, as the interesting bit is the next line, ‘CPU: FETCH dbcalls’.  More than 6 minutes was spent fetching a few rows, so clearly something was not quite right. The SQL plan in the profile showed what the problem was, as the execution plan was far less than optimal. The following is the execution plan from AWR data:


  1  select *
  2  from TABLE(
  3     dbms_xplan.display_awr(sql_id => :sqlidvar, plan_hash_value => 898242479, format => 'ALL ALLSTATS LAST')
  4*    )
sys@oravm1 SQL- /

SQL_ID 4h7qfxa9t1ukz
select  sql_id, sql_fulltext from v$sqlstats where sql_id in (  select
distinct sql_id         from (          select sql_id, last_captured            from (
   select sql_id, last_captured from V$SQL_BIND_CAPTURE order by
last_captured desc nulls last           )               where rownum <= 20      ) )

Plan hash value: 898242479

| Id  | Operation                 | Name         | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |
|   0 | SELECT STATEMENT          |              |        |       |     1 (100)|          |
|   1 |  FILTER                   |              |        |       |            |          |
|   2 |   FIXED TABLE FULL        | X$KKSSQLSTAT |      1 |  2023 |     0   (0)|          |
|   3 |   VIEW                    |              |      1 |     8 |     1 (100)| 00:00:01 |
|   4 |    COUNT STOPKEY          |              |        |       |            |          |
|   5 |     VIEW                  |              |      1 |     8 |     1 (100)| 00:00:01 |
|   6 |      SORT ORDER BY STOPKEY|              |      1 |    43 |     1 (100)| 00:00:01 |
|   7 |       FIXED TABLE FULL    | X$KQLFBC     |      1 |    43 |     0   (0)|          |


Query Block Name / Object Alias (identified by operation id):

   1 - SEL$88122447
   2 - SEL$88122447 / X$KKSSQLSTAT@SEL$4
   3 - SEL$6        / from$_subquery$_002@SEL$5
   4 - SEL$6
   5 - SEL$FEF91251 / from$_subquery$_003@SEL$6
   6 - SEL$FEF91251
   7 - SEL$FEF91251 / X$KQLFBC@SEL$10

   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level

39 rows selected.


While useful, this plan is not giving much information about why this took so long.  If pressed I would just whip up a Bash and Awk one-liner  to parse the trace files and find out where this time was going.  In this case though I could just consult the Method R profile again.




Yikes!  There were 106.3E6 rows returned from from X$KQLFBC.

Collecting the Fixed Object Statistics

Rather than spend time analyzing this further, it seemed that here was a clear case for collecting statistics on fixed objects in the database.  The following SQL was run:


exec dbms_stats.gather_fixed_objects_stats


The next step was to rerun the query.  This time it ran so quickly I wondered if it had even worked.  As before, tracing had been enabled, and a profile generated from the trace. There was now quite an improvement seen in the execution plan:




The 0.0047 seconds required to return 442 rows from X$KQLFBC was quite a reduction from the previously seen time of nearly 396 seconds.

Why This Is Important

This issue came to light due to a custom query I was running. The optimizer will probably never run that same query, but it was clear that the fixed object statistics needed to be updated.

Now imagine your customers using your application; they may be waiting on the database for what seems like an eternity after pressing ENTER on a web form.  And what are they waiting on?  They may be waiting on the optimizer to evaluate a SQL statement and determine the best plan to use.  The reason for the waiting in this case would simply be that the DBA has not taken steps to ensure the optimizer has the correct information to effectively query the database’s own metadata.   Until the optimizer has the correct statistics, performance of query optimization will be sub-optimal.  In a busy system this may result in mutex waits suddenly showing as a top event in AWR reports.  Troubleshooting these waits can be difficult as there are many possible causes of them.

Do your customers, your database and yourself a favor – include updates of fixed tables statistics in your regular database maintenance schedule, and avoid a possible source of performance problems.

Categories: DBA Blogs

Searching in PeopleSoft

PeopleSoft Technology Blog - Mon, 2016-04-25 17:50

Many customers have asked us about PeopleSoft’s search strategy.  They may have seen that Oracle’s Secure Enterprise Search (SES) isn’t on the price list any longer, so they wonder about PeolpleSoft’s continued use of that product. 

First and foremost, Search is an important part of PeopleSoft’s overall user experience, and we are continuing to invest in a consumer grade search experience for the enterprise.  When we re-architected our approach to Search several releases ago, we built a PeopleSoft search framework that provides a great Search UI that enables users not only to search enterprise information, but to refine their search results with facets and filters.  We also made it possible for users to act directly on search results through our Related Actions framework.  In many cases, users do not even have to navigate to a transaction page to complete a task.  This provides a rich Search UI that users have come to expect.  Further improvements using PeopleSoft’s Fluid UI include Pivot Grids in search pages that allow users better visualization of search results.  Oracle’s Secure Enterprise Search is the engine for that search in PeopleSoft. 

In an effort to provide choice, PeopleSoft is working on providing an alternative search engine that customers can use in their PeopleSoft ecosystem.  We are planning on offering Elastic Search as an option with PeopleTools 8.56, and our applications that are delivered on 8.56 are planned to contain Elastic indexes.  Note that Oracle will continue to support SES with PeopleSoft for some time yet, and customers can continue to use SES if they have deployed it.  We think that Elastic will be an attractive option for many customers, however.  It is important to note that whichever search engine you use, the PeopleSoft search framework will work with either search engine, and the search UI will be essentially the same regardless of your choice.  Here are a few key points about the Elastic Search option for PeopleSoft:

  • Planning on Elastic availability in 8.56, back porting to 8.55
  • Leverage PeopleTools Search Framework
    - SES or Elastic is a deployment Choice
    - Deploy Elastic on the separate search host instead of SES
  • Take PeopleSoft Images for application fixes and Elastic indexes
  • Will require a one-time full index build with Elastic using the new process
  • We plan to provide a migration guide to help deploy Custom indexes on Elastic
  • PeopleTools DPK for Elastic infrastructure
    - Supported on Linux and Windows
  • We plan to provide a deployment guide to help with performance tuning, load balance, and failover

Our initial testing indicates that Elastic will require fewer resources and will perform better—and will be easier to install than SES.

There is also some discussion of our plans for Search in this PeopleSoft talk video.  The Search discussion is at the 5:50 point.

Oracle OpenWorld 2016 - Call for Papers Deadline is May 9!

WebCenter Team - Mon, 2016-04-25 14:07
If you’re an Oracle technology expert, conference attendees want to hear it straight from you. So don’t wait—proposals must be submitted by May 9. You have two weeks left to submit!

Oracle OpenWorld 2016

Wanted: Outstanding Oracle Experts

The Oracle OpenWorld 2016 Call for Proposals is now open. Attendees at the conference are eager to hear from experts on Oracle business and technology. They’re looking for insights and improvements they can put to use in their own jobs: exciting innovations, strategies to modernize their business, different or easier ways to implement, unique use cases, lessons learned, the best of best practices.

If you’ve got something special to share with other Oracle users and technologists, they want to hear from you, and so do we. Submit your proposal now for this opportunity to present at Oracle OpenWorld, the most important Oracle technology and business conference of the year.

We recommend you take the time to review the General Information, Submission Information, Content Program Policies, and Tips and Guidelines pages before you begin. We look forward to your submissions.

Submit Your Proposal

By submitting a session for consideration, you authorize Oracle to promote, publish, display, and disseminate the content submitted to Oracle, including your name and likeness, for use associated with the Oracle OpenWorld and JavaOne San Francisco 2016 conferences. Press, analysts, bloggers and social media users may be in attendance at OpenWorld or JavaOne sessions.

Submit Now.

Driving Content Experience at Modern Marketing Experience Conference 2016

WebCenter Team - Mon, 2016-04-25 12:49

Heading to the Modern CX Conference this week? Looking to participate in the Modern Marketing Experience 2016 conference? Or just happen to be in Las Vegas this week? Then don't miss connecting with us!

Where the best of marketing minds and thought leaders collect, Modern Marketing Conference 2016 is expecting thousands of attendees this year as they immerse in demos, solution showcases, industry keynotes, product strategy sessions, customer presentations, and more. The conference is taking place at MGM Grand, Las Vegas from April 26-28, 2016.

You, as marketers know, the importance of content. And yet, the reality is that while about a quarter of our budget is spent building content, 70% of it goes unused. How can we maximize content usage, consumption, distribution and optimal presentation? How do we get the most mileage from our content?

Join our Oracle executive, Mariam Tariq and industry thought leader,Tina Miletich, who is the CEO and founder of HEEDGroup a creative strategy consulting practice that designs and implements successful customer engagement initiatives as they discuss the importance of Marketing Asset Management and the role technology plays in driving brand consistency and content optimization. While Mariam has extensive experience in product management for digital experience solutions and sites experience, working on Oracle WebCenter solutions, Tina has had an extensive digitally focused career and her teams have won many awards including an Effie, several IAC awards, a Silver IDSA and short-listed for a Cannes Cyber Lion. Tina also teaches Digital Strategy at NYU - SPS in the Masters of Marketing Program. So, don't miss their session:

Create, Collaborate, and Distribute Marketing Assets to Ensure Brand Consistency
Thursday, April 28, 2016
8:00 a.m. - 8:45 a.m.
MGM Grand Conference Center, Room 318

And while at the conference, do check our solutions live in action by visiting us in the Exhibit Hall at the Digital Experience kiosk. We are there through the exhibit hours and during dedicated exhibition schedule.

So, see you there?

When the default value is not the same as the default

Pythian Group - Mon, 2016-04-25 10:39

I was working on a minor problem recently where controlfile autobackups were written to the wrong location during rman backups. Taking controlfile autobackups is generally a good idea, even if you configure controlfile backups yourself. Autobackups also include an spfile backup, though not critical for restore, is still convenient to have. And autobackups are taken not only after backups, but more importantly every time you change the physical structure of your database, like adding or removing datafiles and tablespaces which would make a restore with an older controlfile a lot harder.

What happened in this case was that the CONTROLFILE AUTOBACKUP FORMAT parameter was changed from the default ‘%F’ to the value ‘%F’. Yes, the values are the same. But setting a value and not leaving it at the default changed the behaviour of those autobackups. Where by default ‘%F’ means writing to the flash recovery area, explicitly setting the format parameter to ‘%F’ will save the autobackup to the folder $ORACLE_HOME/dbs/.

See for yourself. This shows an autobackup while the parameter is set to the default and as expected, the autobackup is written to the flash recovery area. So that is the correct location but the filename is a bit off. It should be c-DBID-YYYYMMDD-SERIAL.


RMAN configuration parameters for database with db_unique_name CDB1 are:

RMAN> backup spfile;

Starting backup at 18-APR-16
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 18-APR-16
channel ORA_DISK_1: finished piece 1 at 18-APR-16
piece handle=/u01/app/oracle/fast_recovery_area/CDB1/backupset/2016_04_18/o1_mf_nnsnf_TAG20160418T172428_ckb62f38_.bkp tag=TAG20160418T172428 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 18-APR-16

Starting Control File and SPFILE Autobackup at 18-APR-16
piece handle=/u01/app/oracle/fast_recovery_area/CDB1/autobackup/2016_04_18/o1_mf_s_909509070_ckb62gko_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 18-APR-16

Now we are setting the to format string to ‘%F’ and observe the autobackup is not written to the FRA but $ORACLE_HOME/dbs. At least it has the filename we were expecting.


new RMAN configuration parameters:
new RMAN configuration parameters are successfully stored

RMAN> backup spfile;

Starting backup at 18-APR-16
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 18-APR-16
channel ORA_DISK_1: finished piece 1 at 18-APR-16
piece handle=/u01/app/oracle/fast_recovery_area/CDB1/backupset/2016_04_18/o1_mf_nnsnf_TAG20160418T172447_ckb62z7f_.bkpx tag=TAG20160418T172447 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 18-APR-16

Starting Control File and SPFILE Autobackup at 18-APR-16
piece handle=/u01/app/oracle/product/ comment=NONE
Finished Control File and SPFILE Autobackup at 18-APR-16


RMAN configuration parameters for database with db_unique_name CDB1 are:

This is like Schrödinger’s parameter, where you can either get the correct location or the correct name, but not both. To be fair, not assigning the right name to the autobackup in the FRA does not matter much because the files will be found during a restore anyway.

At this point it is good to remember how to use CLEAR to reset a parameter to it’s default instead of just setting the default value.


old RMAN configuration parameters:
RMAN configuration parameters are successfully reset to default value


RMAN configuration parameters for database with db_unique_name CDB1 are:

I have tested this in versions 10g, 11g and with the same result. The behaviour is also not unknown. In fact, bug 4248670 was logged against this in 2005 but has not been resolved so far. My Oracle Support does mention the above workaround of clearing the parameter in note 1305517.1 though.

Categories: DBA Blogs

MySQL Query Best Practices

Pythian Group - Mon, 2016-04-25 10:30

You can get many returns from a Google search for “MySQL Query Best Practices” or “MySQL Query Optimization.” The drawback is that too many rules can provide confusing or even conflicting advice. After doing some research and tests, I outlined the essential and important ones below:

1) Use proper data types

1.1) Use the smallest data types if possible

MySQL tries to load as much data as possible into memory (innodb-buffer-pool, key-buffer), so a small data type means more rows of data in memory, thus improving performance. Also, small data sizes reduces disk i/o.

1.2) Use Fixed-length Data Types if Possible

MySQL can calculate quickly the position of a fixed-length column in a specific row of a table.

With the flexible-length data type, the row size is not fixed, so every time it needs to do a seek, MySQL might consult the primary key index. However, the flexible-length data type can save data size, and the disk space required.

In practice, if the column data size varies a lot, then use a flexible-length data type (e.g., varchar); if the data length is short or length barely changes, then use a fixed data type.

1.3) Use not null unless there is reason not to

It is harder for MySQL to optimize queries that refer to nullable columns, because they make indexes, index statistics, and value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MySQL.

When a nullable column is indexed, it requires an extra byte per entry and can even cause a fixed-size index (e.g., an index on a single integer column) to be converted to a variable-sized one in MyISAM.

2)Use indexes smartly

2.1) Make primary keys short and on meaningful fields

A shorter primary key will benefit your queries, because the smaller your primary key, the smaller the index, and the less pages in the cache. In addition, a numeric type is prefered because numeric types are stored in a much more compact format than character formats and so it will make primary key shorter.

Another reason to make primary key shorter, is because we usually use primary key to join with the other tables.

It is a good idea to use a primary key on a meaningful field, because MySQL uses a cluster index on a primary key. We usually just need the info from primary key, and especially when joined with other tables, it will only search in the index without reading from the data file in disk, and benefit the performance. When you use a meaningful field as the primary key, make sure the uniqueness on the fields wouldn’t change, otherwise it might affect all the tables using this as foreign key when you have to change the primary key.

2.2) Index on the search fields only when needed

Usually we add indexes on the fields that frequently show up in a where clause — that is the purpose of indexing. But while an index will benefit reads, it can make writes slower (inserting/updating), so index only when you need it and index smartly.

2.3) Index and use the same data types for join fields

MySQL can do joins on different data types, but the performance is poor as it has to convert from one type to the other for each row. Use the same data type for join fields when possible.

2.4) Use a composite index if your query has has more than one field in the where clause

When the query needs to search on multiple columns of a table, it might be a good idea to create a compound index for those columns. This is because with composite index on multiple columns, the search will be able to narrow down the result set by the first column, then the second, and so on.

Please note that the order of the columns in the composite index affects the performance, so put the columns in the order of the efficiency of narrowing down the search.

2.5) Covering index for most commonly used fields in results

In some cases, we can put all the required fields into an index (i.e., a covering index) with only some of the fields in the index used for searching and the others for data only. This way, MySQL only need to access the index and there is no need to search in another table.

2.6) Partial index for long strings or TEXT, BLOB data types by index on prefix

There is a size limitation for indexes (by default, 1000 for MyISAM, 767 for InnoDB). If the prefix part of the string already covers most of the unique values, it is good to just index the prefix part.

2.7) Avoid over-indexing

Don’t index on the low cardinality values, MySQL will choose a full table scan instead of use index if it has to scan the index more than 30%.

If a field already exists in the first field of a composite index, you may not need an extra index on the single field. If it exists in a composite index but not in the leftmost field, you will usually need a separate index for that field only if required.

Bear in mind that indexing will benefit in reading data but there can be a cost for writing (inserting/updating), so index only when you need it and index smartly.

3) Others
3.1) Avoid SELECT *
There are many reasons to avoid select * from… queries. First, it can waste time to read all the fields if you don’t need all the columns. Second, even if you do need all columns, it is better to list the all the field names, to make the query more readable. Finally, if you alter the table by adding/removing some fields, and your application uses select * queries, you can get unexpected results.

3.2) Prepared Statements
Prepared Statements will filter the variables you bind to them by default, which is great for protecting your application against SQL injection attacks.

When the same query is being used multiple times in your application, you can assign different values to the same prepared statement, yet MySQL will only have to parse it once.

3.3) If you want to check the existence of data, use exists instead SELECT COUNT

To check if the data exists in a table, using select exists (select *…) from a table will perform better than select count from a table, since the first method will return a result once it gets one row of the required data, while the second one will have to count on the whole table/index.

3.4) Use select limit [number]

Select… limit [number] will return the only required lines of rows of data. Including the limit keyword in your SQL queries can have performance improvements.

3.5) Be careful with persistent connections

Persistent connections can reduce the overhead of re-creating connections to MySQL. When a persistent connection is created, it will stay open even after the script finishes running. The drawback is that it might run out of connections if there are too many connections remaining open but in sleep status.

3.6) Review your data and queries regularly

MySQL will choose the query plan based on the statistics of the data in the tables. When the data size changes, the query plan might change, and so it is important to check your queries regularly and to make optimizations accordingly. Check regularly by:

3.6.1) EXPLAIN your queries

3.6.2) Get suggestions with PROCEDURE ANALYSE()

3.6.3) Review slow queries

Categories: DBA Blogs

Partition Storage -- 5 : Partitioned Table versus Non-Partitioned Table ? (in 12.1)

Hemant K Chitale - Mon, 2016-04-25 09:13
Reviewing my second blog post in this series, I found it strange that Partition P_100 (populated by Serial Inserts of 1 row, 100,000 rows, 500,000 rows and 500,000 rows) had such a High Water Mark.

For 1.1million rows of an Average Row Length of 11, the High Water Mark was 3,022 blocks.

In the fourth blog post, a simple ALTER TABLE MOVE PARTITION had brought the High Water Mark to 2,482 blocks !

This needs further investigation.

Let's compare a single Partition of a Partitioned Table with a Non-Partitioned Table for exactly the same data and same pattern of INSERT statements.

Starting with a new Partitioned Table.

SQL> l
1 create table new_part_tbl (id_column number(6), data_column varchar2(100))
2 partition by range (id_column)
3 (partition p_100 values less than (101),
4 partition p_200 values less than (201),
5 partition p_300 values less than (301),
6 partition p_400 values less than (401),
7* partition p_max values less than (maxvalue))
SQL> /

Table created.

SQL> insert into new_part_tbl values (51,'Fifty One');

1 row created.

SQL> insert into new_part_tbl
2 select 25, 'New Row'
3 from dual
4 connect by level < 100001
5 /

100000 rows created.

SQL> insert into new_part_tbl
2 select 45, 'New Row'
3 from dual
4 connect by level < 500001
5 /

500000 rows created.

SQL> /

500000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','NEW_PART_TBL',granularity=>'ALL');

PL/SQL procedure successfully completed.

SQL> select avg_row_len, num_rows, blocks
2 from user_tab_partitions
3 where table_name = 'NEW_PART_TBL'
4 and partition_name = 'P_100'
5 /

----------- ---------- ----------
11 1100001 3022

SQL> REM Let's MOVE the Partition
SQL> alter table new_part_tbl move partition P_100;

Table altered.

SQL> exec dbms_stats.gather_table_stats('','NEW_PART_TBL',granularity=>'ALL');

PL/SQL procedure successfully completed.

SQL> select avg_row_len, num_rows, blocks
2 from user_tab_partitions
3 where table_name = 'NEW_PART_TBL'
4 and partition_name = 'P_100'
5 /

----------- ---------- ----------
11 1100001 2484

SQL> l
1 select extent_id, blocks
2 from dba_extents
3 where segment_name = 'NEW_PART_TBL'
4 and segment_type = 'TABLE PARTITION'
5 and partition_name = 'P_100'
6 and owner = 'HEMANT'
7* order by 1
SQL> /

---------- ----------
0 1024
1 1024
2 1024


As expected (see the first blog post), the Extents are still 8MB each.  But the High Water Mark has "magicallly" shrunk from 3,022 blocks to 2,484 blocks.

Let's create a Non-Partitioned Table with the same columns and rows.

SQL> create table non_part_tbl (id_column number(6), data_column varchar2(100));

Table created.

SQL> insert into non_part_tbl values (51,'Fifty One');

1 row created.

SQL> insert into non_part_tbl
2 select 25, 'New Row'
3 from dual
4 connect by level < 100001
5 /

100000 rows created.

SQL> insert into non_part_tbl
2 select 45, 'New Row'
3 from dual
4 connect by level < 500001
5 /

500000 rows created.

SQL> /

500000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','NON_PART_TBL');

PL/SQL procedure successfully completed.

SQL> select avg_row_len, num_rows, blocks
2 from user_tables
3 where table_name = 'NON_PART_TBL'
4 /

----------- ---------- ----------
11 1100001 2512

SQL> REM Let's MOVE the Table
SQL> alter table non_part_tbl move;

Table altered.

SQL> select avg_row_len, num_rows, blocks
2 from user_tables
3 where table_name = 'NON_PART_TBL'
4 /

----------- ---------- ----------
11 1100001 2512

SQL> l
1 select extent_id, blocks
2 from dba_extents
3 where segment_name = 'NON_PART_TBL'
4 and segment_type = 'TABLE'
5 and owner = 'HEMANT'
6* order by 1
SQL> /

---------- ----------
0 8
1 8
2 8
3 8
4 8
5 8
6 8
7 8
8 8
9 8
10 8
11 8
12 8
13 8
14 8
15 8
16 128
17 128
18 128
19 128
20 128
21 128
22 128
23 128
24 128
25 128
26 128
27 128
28 128
29 128
30 128
31 128
32 128
33 128
34 128

35 rows selected.


The Non-Partitioned Table had a High Water Mark of 2,512 blocks.  This did not change with a MOVE.  The allocation of Extents is also expected in AutoAllocate.

Why, then, does the Partition behave differently ?  It started with a High Water Mark of 3,022 blocks which shrunk to 2,484 blocks after a MOVE ?

Is the Average Row Length or the actual data a factor ?  (Note : I am *not* using Table Compression).

To be explored further with a larger row size ...........

Possibly, to be explored with a different pattern of INSERT statements  ......

Possibly to be compared in 11.2 as well. ......

Categories: DBA Blogs