Skip navigation.

Syed Jaffar

Syndicate content
Whatever topic has been discussed on this blog is my own finding and views, not necessary match with others. I strongly recommend you to do a test before you implement the piece of advice given at my blog.The Human Flyhttp://www.blogger.com/profile/03489518270084955004noreply@blogger.comBlogger212125
Updated: 2 days 1 hour ago

Oracle 11g upgrade and SYS.AUD$ obstacles

Sun, 2014-02-16 06:44
During one of the Oracle 10gR2 databases manual upgrade to Oracle 11gR2 couple of days ago, to our own surprise, the upgrade procedure took nearly 5.34 hrs to finish. Although we had done earlier 100's of database manual upgrades comfortably within 2 hrs time window for individual database, the delay indeed surprised us this time. Unfortunately, there is no log that could help us understanding what is really going on. We started investigation the delay after reaching the anticipated upgrade duration.  The only new addition in this upgrade was a standby database in place this time. But, this shouldn't be the genuine reason of the delay, right?
When we queried the v$session , we found the following statement was executing:

update SYS.AUD$ set dbid = 223249361 where dbid is null and rownum <= 10000
It was understood that during the course of Oracle Server component upgrade in 11gR2, the upgrade procedure performs the following actions on the SYS.AUD$ table:

  • Add 11 new columns
  • Updates ntimestamp column
  • Update DBID column with the appropriate value for every <10000 records in a loop
Fortunately the SYS.AUD$ table in the database has over 6.4 million records. And the update statement on the table was for every 10000 records at a time which took 4.30 hrs time to complete. Subsequently the remaining components upgrade procedure took 1 hr time to complete.
When we did our upgrades earlier, auditing option was not enabled on the databases, hence, we never encountered such delayed scenario before. In order to speed up the upgrade length, when SYS.AUD$ contains huge amount records, it is best recommend the following practice:


This would certainly help us when we plan Oracle 12c upgrade in the future.



Bug 18223021 : DISK FILE OPERATIONS I/O ON HP-UX.

Tue, 2014-02-11 01:45
We have been dealing with a strange ASM  behavior over the past few months across all ASM instances in our multiple Oracle 11gR2 (11.2.0.2) Cluster envs on HPUX 11.31 OS. Even a simple and typical ASM task like, adding new disk to ASM disk group, disk group mount/unmount and querying for CANDIDATE asm disks was taking min of 20 minutes, and sometimes infinity time. This behavior caused a lot of performance degradation across all database instance in the cluster, and most of the databases suffered with 'Disk file operations i/o' wait with other consequences.

I must say, Oracle support indeed made a few unsuccessful attempts by suggesting increasing the ASM instance SGA, enabling async i/o on OS, reducing the number of disks etc to address the issue. Finally, they have logged a BUG '18223021' for our issue and we are yet to receive the fix.

Below are a few consequences of this behavior we are confronting:

  • Oracle 10g database instances running on the node where a new ASM disk being added to the existing disk group had suffered with control file locking and database hung issues
  • We ensure the disk group to which a new disk is being added is not mounted on non relative ASM instances (in other nodes)
  • When adding disk takes infinite time on one of the ASM instances, the only way was to shutdown the instance with dismount the ASM disk group subsequently to speed-up and complete the procedure
  • Most databases are suffering with 'Disk File operations I/O'

If you have MOS access, you may refer the BUG for more details.
Stay tuned for the fix and solution...

CRS-1615:No I/O has completed after 50% of the maximum interval

Mon, 2014-01-27 03:01
A few days back on one of the production cluster nodes the clusterware became unhealthy and caused instances termination on the node. If it was a node eviction, everything would have come back automatically after the node reboot, however, in this scenario, just cluster components were in unhealthy state, hence, no instance started on the node.

Upon referring the ocssd.log file, the following error found:

cssd(21532)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file /dev/rdisk/oracle/ocr/ora_ocr_01 will be considered not functional in 13169 milliseconds

The node was unable to communicate with rest of the cluster nodes. If so, why didn't my node crashed/evicted? Isn't that question comes to your mind?.

Well, from the error it is clear that the node suffered I/O issues, and was unable to access the voting disk and started complaining (in the ocssd.log) that it can't see other nodes in the cluster. When contacted the storage and OS teams, the OS team was quick to identify the issue on PCI card. For some reasons, the I/O channel to the storage from the node was suspended for 10 seconds and re-established the connection back after 10 seconds. In the mean time, the cluster on the node become unhealthy and couldn't proceed further with any other action.

The only workaround we had to perform was restarting the CSS component using the following command:

crsctl start res ora.cssd -init

Once the CSS was started, everything back to normal state. Luckily, didn't have to do a lot of research why cluster become unhealthy or instances crashed.

AUTO Vs Manual PSU patch - when and why?

Sun, 2014-01-26 03:46
The purpose of this blog entry is to share my thoughts on AUTO Vs Manual PSU patch deployment with when and why scenario, also,  a success story of reducing (almost 50%) over all patching time we achieved in the recent times. Thought this would help you as well.

In my own perspective, I think we are one of the organizations in the Middle East who has a large and complex Oracle Cluster setups in-place. Having six (06) cluster environments, which includes production, non-production and DR, it is always certainly going to be an uphill task and challenging aspect maintaining them. One of the tasks that requires most attention and efforts more often is non-other than the PSU patch deployment for all the six environments at our premises.

We are currently in the process of applying 11.2.0.2.11 PSU patch in all our environments and the challenge in front of us is to bring down the patching time on each server. Our past patching experience says, an AUTO patching on each node needs minimum of 2 hours , and if you are going to patch a 10 node cluster, you need at least 22 hrs time frame to complete the deployment.

AUTO Patching
No doubt AUTO patching is the coolest enhancement from Oracle to automate the entire patching procedure smoothly and more importantly without much human intervene. The downside of the AUTO patch is the following, where AUTO patching for GI and RDBMS homes together fail or can't go with AUTO patch for GI and RDBMS home together:

  1. When you have multiple Oracle homes with different software owners, for example, we have an Oracle EBusiness Suite and typical RDBMS homes(10g and 11g) under different ownership.
  2. If you have multiple versions of Oracle databases running, for example, we have Oracle v10g and Oracle v11g databases.
  3. During the course of patching, if one of the files get failed to copy/rollback the backup copy due to file busy with any other existing OS process, the patch will be rolled back and will automatically restarts the cluster and the other services on the node subsequently.
Perhaps in the above circumstances, one may choose going with the AUTO patch separately for GI and then to the RDBMS homes. However, when you hit the 3 point above, the same thing gonna happen, which is time consuming, of course. While patching on particular node, the AUTO patching on GI home failed 3 times due to unsuccessfully cluster shutdown and we end-up rebooting the node 3 times.
Manual PatchingIn contrast, manual patching requires heavy human intervene during the course of patch deployment. A set of steps needs to be followed carefully.
Since we got a challenge to look at all the possibilities to reduce the overall patching time, we started off analyzing various options between AUTO and Manual patch deployment and where the time is being consumed/wasted. We figured out that, after each successful/unsuccessful AUTO patching attempt, the cluster and the services on the nodes will have to restart and this was the time consuming factor. In a complex/large cluster environment with many instances, asm disks, diskgroups, it is certainly going to take a good amount of time to start off everything. This caught our attention and thought of giving a manual patching try.
When we tried manual patching method, we managed to patch the GI and RDBMS homes in about 1 hour time, this was almost 50% less than with the AUTO patching time-frame. Imagine, we finish 6 nodes in about 7 hours time in contrast to 12-13 hours time frame.
In a nutshell, if you have a small cluster environment, say 2 nodes cluster, you may feel you are not gain much with respect to saving the time, however, if you are going to patch a large/complex cluster environment, think of manual method which could save pretty huge patching downtime. At the same time, keep in mind that this method requires DBA intervene more than the AUTO patching method.