Pythian Group
Testing the 19c Grid Infrastructure by Performing a Dry-Run Upgrade
Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. We originally published today’s post on August 14, 2019.
In reviewing the Grid Infrastructure Installation and Upgrade Guide for Linux, I discovered the dry-run upgrade mode to test upgrades. With dry-run being a new feature, I tested it first using GUI.
Below I will step you through this process:
[oracle@racnode-dc2-1 grid]$ /u01/app/19.3.0.0/grid/gridSetup.sh -dryRunForUpgrade Launching Oracle Grid Infrastructure Setup Wizard... The response file for this session can be found at: /u01/app/19.3.0.0/grid/install/response/grid_2019-08-06_00-20-31AM.rsp You can find the log of this install session at: /u01/app/oraInventory/logs/GridSetupActions2019-08-06_00-20-31AM/gridSetupActions2019-08-06_00-20-31AM.log [oracle@racnode-dc2-1 grid]$
From the response file above grid_2019-08-06_00-20-31AM.rsp, create dryRunForUpgradegrid.rsp:
[oracle@racnode-dc2-1 grid]$ grep -v "^#" /u01/app/19.3.0.0/grid/install/response/grid_2019-08-06_00-20-31AM.rsp | grep -v "=$" | awk 'NF' > /home/oracle/dryRunForUpgradegrid.rsp [oracle@racnode-dc2-1 ~]$ cat /home/oracle/dryRunForUpgradegrid.rsp oracle.install.responseFileVersion=/oracle/install/rspfmt_crsinstall_response_schema_v19.0.0 INVENTORY_LOCATION=/u01/app/oraInventory oracle.install.option=UPGRADE ORACLE_BASE=/u01/app/oracle oracle.install.crs.config.scanType=LOCAL_SCAN oracle.install.crs.config.ClusterConfiguration=STANDALONE oracle.install.crs.config.configureAsExtendedCluster=false oracle.install.crs.config.clusterName=vbox-rac-dc2 oracle.install.crs.config.gpnp.configureGNS=false oracle.install.crs.config.autoConfigureClusterNodeVIP=false oracle.install.crs.config.gpnp.gnsOption=CREATE_NEW_GNS oracle.install.crs.config.clusterNodes=racnode-dc2-1:,racnode-dc2-2: oracle.install.crs.configureGIMR=true oracle.install.asm.configureGIMRDataDG=false oracle.install.crs.config.storageOption=FLEX_ASM_STORAGE oracle.install.crs.config.sharedFileSystemStorage.ocrLocations= oracle.install.crs.config.useIPMI=false oracle.install.asm.diskGroup.name=CRS oracle.install.asm.diskGroup.AUSize=0 oracle.install.asm.gimrDG.AUSize=1 oracle.install.asm.configureAFD=false oracle.install.crs.configureRHPS=false oracle.install.crs.config.ignoreDownNodes=false oracle.install.config.managementOption=NONE oracle.install.config.omsPort=0 oracle.install.crs.rootconfig.executeRootScript=false [oracle@racnode-dc2-1 ~]$
Next, create directory /u01/app/19.3.0.0/grid for all nodes:
[root@racnode-dc2-1 ~]# id oracle uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),54318(asmdba),54322(dba),54323(backupdba),54324(oper),54325(dgdba),54326(kmdba) [root@racnode-dc2-1 ~]# mkdir -p /u01/app/19.3.0.0/grid [root@racnode-dc2-1 ~]# chown oracle:oinstall /u01/app/19.3.0.0/grid [root@racnode-dc2-1 ~]# chmod 775 /u01/app/19.3.0.0/grid [root@racnode-dc2-1 ~]# ll /u01/app/19.3.0.0/ total 4 drwxrwxr-x 2 oracle oinstall 4096 Aug 6 02:07 grid [root@racnode-dc2-1 ~]#
Extract grid software for node1 ONLY:
[oracle@racnode-dc2-1 ~]$ unzip -qo /media/swrepo/LINUX.X64_193000_grid_home.zip -d /u01/app/19.3.0.0/grid/ [oracle@racnode-dc2-1 ~]$ ls /u01/app/19.3.0.0/grid/ addnode clone dbjava diagnostics gpnp install jdbc lib OPatch ords perl qos rhp rootupgrade.sh sqlpatch tomcat welcome.html xdk assistants crs dbs dmu gridSetup.sh instantclient jdk md opmn oss plsql racg root.sh runcluvfy.sh sqlplus ucp wlm bin css deinstall env.ora has inventory jlib network oracore oui precomp rdbms root.sh.old sdk srvm usm wwg cha cv demo evm hs javavm ldap nls ord owm QOpatch relnotes root.sh.old.1 slax suptools utl xag [oracle@racnode-dc2-1 ~]$ du -sh /u01/app/19.3.0.0/grid/ 6.0G /u01/app/19.3.0.0/grid/ [oracle@racnode-dc2-1 ~]$
Run gridSetup.sh -silent -dryRunForUpgrade:
[oracle@racnode-dc2-1 ~]$ env|grep -i ora USER=oracle MAIL=/var/spool/mail/oracle PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/oracle/.local/bin:/home/oracle/bin PWD=/home/oracle HOME=/home/oracle LOGNAME=oracle [oracle@racnode-dc2-1 ~]$ date Tue Aug 6 02:35:47 CEST 2019 [oracle@racnode-dc2-1 ~]$ /u01/app/19.3.0.0/grid/gridSetup.sh -silent -dryRunForUpgrade -responseFile /home/oracle/dryRunForUpgradegrid.rsp Launching Oracle Grid Infrastructure Setup Wizard... [WARNING] [INS-13014] Target environment does not meet some optional requirements. CAUSE: Some of the optional prerequisites are not met. See logs for details. /u01/app/oraInventory/logs/GridSetupActions2019-08-06_02-35-52AM/gridSetupActions2019-08-06_02-35-52AM.log ACTION: Identify the list of failed prerequisite checks from the log: /u01/app/oraInventory/logs/GridSetupActions2019-08-06_02-35-52AM/gridSetupActions2019-08-06_02-35-52AM.log. Then either from the log file or from installation manual find the appropriate configuration to meet the prerequisites and fix it manually. The response file for this session can be found at: /u01/app/19.3.0.0/grid/install/response/grid_2019-08-06_02-35-52AM.rsp You can find the log of this install session at: /u01/app/oraInventory/logs/GridSetupActions2019-08-06_02-35-52AM/gridSetupActions2019-08-06_02-35-52AM.log As a root user, execute the following script(s): 1. /u01/app/19.3.0.0/grid/rootupgrade.sh Execute /u01/app/19.3.0.0/grid/rootupgrade.sh on the following nodes: [racnode-dc2-1] Run the script on the local node. Successfully Setup Software with warning(s). [oracle@racnode-dc2-1 ~]$
Run rootupgrade.sh for node1 ONLY and review log:
[root@racnode-dc2-1 ~]# /u01/app/19.3.0.0/grid/rootupgrade.sh Check /u01/app/19.3.0.0/grid/install/root_racnode-dc2-1_2019-08-06_02-44-59-241151038.log for the output of root script [root@racnode-dc2-1 ~]# cat /u01/app/19.3.0.0/grid/install/root_racnode-dc2-1_2019-08-06_02-44-59-241151038.log Performing root user operation. The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/app/19.3.0.0/grid Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Relinking oracle with rac_on option Performing Dry run of the Grid Infrastructure upgrade. Using configuration parameter file: /u01/app/19.3.0.0/grid/crs/install/crsconfig_params The log of current session can be found at: /u01/app/oracle/crsdata/racnode-dc2-1/crsconfig/rootcrs_racnode-dc2-1_2019-08-06_02-45-31AM.log 2019/08/06 02:45:44 CLSRSC-464: Starting retrieval of the cluster configuration data 2019/08/06 02:45:52 CLSRSC-729: Checking whether CRS entities are ready for upgrade, cluster upgrade will not be attempted now. This operation may take a few minutes. 2019/08/06 02:47:56 CLSRSC-693: CRS entities validation completed successfully. [root@racnode-dc2-1 ~]#
Next, check grid home for node2:
[oracle@racnode-dc2-2 ~]$ du -sh /u01/app/19.3.0.0/grid/ 6.6G /u01/app/19.3.0.0/grid/ [oracle@racnode-dc2-2 ~]$
Then, check oraInventory for ALL nodes:
[oracle@racnode-dc2-2 ~]$ cat /u01/app/oraInventory/ContentsXML/inventory.xml <?xml version="1.0" standalone="yes" ?> <!-- Copyright (c) 1999, 2019, Oracle and/or its affiliates. All rights reserved. --> <!-- Do not modify the contents of this file by hand. --> <INVENTORY> <VERSION_INFO> <SAVED_WITH>12.2.0.7.0</SAVED_WITH> <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER> </VERSION_INFO> <HOME_LIST> <HOME NAME="OraGI12Home1" LOC="/u01/app/12.2.0.1/grid" TYPE="O" IDX="1" CRS="true"/> <HOME NAME="OraDB12Home1" LOC="/u01/app/oracle/12.2.0.1/db1" TYPE="O" IDX="2"/> <HOME NAME="OraGI19Home1" LOC="/u01/app/19.3.0.0/grid" TYPE="O" IDX="3"/> </HOME_LIST> <COMPOSITEHOME_LIST> </COMPOSITEHOME_LIST> </INVENTORY> [oracle@racnode-dc2-2 ~]$
Check crs activeversion: 12.2.0.1.0
[oracle@racnode-dc2-1 ~]$ . /media/patch/gi.env The Oracle base has been set to /u01/app/oracle ORACLE_SID=+ASM1 ORACLE_BASE=/u01/app/oracle GRID_HOME=/u01/app/12.2.0.1/grid ORACLE_HOME=/u01/app/12.2.0.1/grid Oracle Instance alive for sid "+ASM1" [oracle@racnode-dc2-1 ~]$ crsctl query crs activeversion -f Oracle Clusterware active version on the cluster is [12.2.0.1.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [927320293]. [oracle@racnode-dc2-1 ~]$
Check log location:
[oracle@racnode-dc2-1 ~]$ cd /u01/app/oraInventory/logs/GridSetupActions2019-08-06_02-35-52AM/ [oracle@racnode-dc2-1 GridSetupActions2019-08-06_02-35-52AM]$ ls -alrt total 17420 -rw-r----- 1 oracle oinstall 129 Aug 6 02:35 installerPatchActions_2019-08-06_02-35-52AM.log -rw-r----- 1 oracle oinstall 0 Aug 6 02:35 gridSetupActions2019-08-06_02-35-52AM.err drwxrwx--- 3 oracle oinstall 4096 Aug 6 02:35 temp_ob -rw-r----- 1 oracle oinstall 0 Aug 6 02:39 oraInstall2019-08-06_02-35-52AM.err drwxrwx--- 17 oracle oinstall 4096 Aug 6 02:39 .. -rw-r----- 1 oracle oinstall 157 Aug 6 02:39 oraInstall2019-08-06_02-35-52AM.out -rw-r----- 1 oracle oinstall 0 Aug 6 02:43 oraInstall2019-08-06_02-35-52AM.err.racnode-dc2-2 -rw-r----- 1 oracle oinstall 142 Aug 6 02:43 oraInstall2019-08-06_02-35-52AM.out.racnode-dc2-2 -rw-r----- 1 oracle oinstall 9341920 Aug 6 02:43 gridSetupActions2019-08-06_02-35-52AM.out -rw-r----- 1 oracle oinstall 13419 Aug 6 02:43 time2019-08-06_02-35-52AM.log -rw-r----- 1 oracle oinstall 8443087 Aug 6 02:43 gridSetupActions2019-08-06_02-35-52AM.log drwxrwx--- 3 oracle oinstall 4096 Aug 6 02:56 . [oracle@racnode-dc2-1 GridSetupActions2019-08-06_02-35-52AM]$
Since I have not performed an actual upgrade, I don’t know if 19.3.0.0 grid home in oraInventory will be problematic. It was problematic when performing a test with silent mode after the initial test with GUI. To resolve the issue, you can detach 19.3.0.0 grid home.
export ORACLE_HOME=/u01/app/19.3.0.0/grid $ORACLE_HOME/oui/bin/runInstaller -detachHome -silent ORACLE_HOME=$ORACLE_HOME
In conclusion, testing a grid upgrade using dryRunForUpgrade functionality is a good method to identify any possible issues before performing an actual upgrade.
Save Time With This Quick OCI CLI Installation
Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. We originally published today’s post on April 15, 2019.
This is more of a personal reminder than anything else, but hopefully it helps save some time for others trying to install the OCI CLI on their local Linux machines.
The idea is to have a very basic set of instructions to install the oci CLI on a Linux box and test it. This includes installing a newer version of Python and setting the system to use it by default. This is easy enough, but introduces an issue that has to be worked around.
The ideaI would usually keep these quite simple things to myself in a simple text file for my own reference. However, lately I’ve discovered that what I consider to be very basic Linux knowledge is either not-so-basic or not as widely spread among Oracle DBAs as I thought. So, this is not only to benefit my own memory but to try and help others out there that may struggle a bit with Linux-related stuff.
InstallationBoth the automated and the manual installation procedures are well-documented but there is an interesting caveat I discovered while installing in my Oracle Linux 7.6 laboratory VM. I tried the automated procedure because I’m one for automating stuff and I didn’t want to deep dive into this particular procedure.
So, simply following the instructions, I proceeded to execute the curl command to download and install the CLI.
Please, note the highlighted line (34) in the code below:
[oracle@orapex ~]$ bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 6283 100 6283 0 0 14095 0 --:--:-- --:--:-- --:--:-- 14087 Downloading Oracle Cloud Infrastructure CLI install script from https://raw.githubusercontent.com/oracle/oci-cli/6dc61e3b5fd2781c5afff2decb532c24969fa6bf/scripts/install/install.py to /tmp/oci_cli_install_tmp_LA0m. ######################################################################## 100.0% Python3 not found on system PATH Running install script. python /tmp/oci_cli_install_tmp_LA0m < /dev/tty -- Verifying Python version. -- Python version 2.7.5 okay. -- Verifying native dependencies. -- Unable to verify native dependencies. dist=oracle linux server, version=7.5. Continuing... ===> In what directory would you like to place the install? (leave blank to use '/home/oracle/lib/oracle-cli'): -- Creating directory '/u01/userhome/oracle/lib/oracle-cli'. -- We will install at '/u01/userhome/oracle/lib/oracle-cli'. ===> In what directory would you like to place the 'oci' executable? (leave blank to use '/home/oracle/bin'): -- The executable will be in '/u01/userhome/oracle/bin'. ===> In what directory would you like to place the OCI scripts? (leave blank to use '/home/oracle/bin/oci-cli-scripts'): -- Creating directory '/u01/userhome/oracle/bin/oci-cli-scripts'. -- The scripts will be in '/u01/userhome/oracle/bin/oci-cli-scripts'. -- Downloading virtualenv package from https://github.com/pypa/virtualenv/archive/15.0.0.tar.gz. -- Downloaded virtualenv package to /tmp/tmpRjNdhT/15.0.0.tar.gz. -- Checksum of /tmp/tmpRjNdhT/15.0.0.tar.gz OK. -- Extracting '/tmp/tmpRjNdhT/15.0.0.tar.gz' to '/tmp/tmpRjNdhT'. -- Executing: ['/usr/bin/python', 'virtualenv.py', '--python', '/usr/bin/python', '/u01/userhome/oracle/lib/oracle-cli'] Already using interpreter /usr/bin/python New python executable in /u01/userhome/oracle/lib/oracle-cli/bin/python Installing setuptools, pip, wheel...done. -- Executing: ['/u01/userhome/oracle/lib/oracle-cli/bin/pip', 'install', '--cache-dir', '/tmp/tmpRjNdhT', 'oci_cli', '--upgrade'] DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. Collecting oci_cli Downloading https://files.pythonhosted.org/packages/83/aa/922fc63808ffc79d114474bef9af8b4a066e4653898b6df427cfcf98f166/oci_cli-2.5.5-py2.py3-none-any.whl (3.8MB) 100% |------------------------------| 3.8MB 6.4MB/s (:..) Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB) 100% |------------------------------| 163kB 25.2MB/s Building wheels for collected packages: arrow, configparser, terminaltables, retrying, PyYAML, pycparser Building wheel for arrow (setup.py) ... done Stored in directory: /tmp/tmpRjNdhT/wheels/ce/4f/95/64541c7466fd88ffe72fda5164f8323c91d695c9a77072c574 Building wheel for configparser (setup.py) ... done Stored in directory: /tmp/tmpRjNdhT/wheels/a3/61/79/424ef897a2f3b14684a7de5d89e8600b460b89663e6ce9d17c Building wheel for terminaltables (setup.py) ... done Stored in directory: /tmp/tmpRjNdhT/wheels/30/6b/50/6c75775b681fb36cdfac7f19799888ef9d8813aff9e379663e Building wheel for retrying (setup.py) ... done Stored in directory: /tmp/tmpRjNdhT/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c Building wheel for PyYAML (setup.py) ... done Stored in directory: /tmp/tmpRjNdhT/wheels/ad/da/0c/74eb680767247273e2cf2723482cb9c924fe70af57c334513f Building wheel for pycparser (setup.py) ... done Stored in directory: /tmp/tmpRjNdhT/wheels/f2/9a/90/de94f8556265ddc9d9c8b271b0f63e57b26fb1d67a45564511 Successfully built arrow configparser terminaltables retrying PyYAML pycparser Installing collected packages: idna, six, python-dateutil, arrow, jmespath, configparser, enum34, pycparser, cffi, asn1crypto, ipaddress, cryptography, pyOpenSSL, httpsig-cffi, terminaltables, click, cx-Oracle, retrying, pytz, PyYAML, certifi, oci, oci-cli Successfully installed PyYAML-3.13 arrow-0.10.0 asn1crypto-0.24.0 certifi-2019.3.9 cffi-1.12.2 click-6.7 configparser-3.5.0 cryptography-2.4.2 cx-Oracle-7.0.0 enum34-1.1.6 httpsig-cffi-15.0.0 idna-2.6 ipaddress-1.0.22 jmespath-0.9.3 oci-2.2.4 oci-cli-2.5.5 pyOpenSSL-18.0.0 pycparser-2.19 python-dateutil-2.7.3 pytz-2016.10 retrying-1.3.3 six-1.11.0 terminaltables-3.1.0 ===> Modify profile to update your $PATH and enable shell/tab completion now? (Y/n): Y ===> Enter a path to an rc file to update (leave blank to use '/home/oracle/.bashrc'): -- Backed up '/u01/userhome/oracle/.bashrc' to '/u01/userhome/oracle/.bashrc.backup' -- Tab completion set up complete. -- If tab completion is not activated, verify that '/u01/userhome/oracle/.bashrc' is sourced by your shell. -- -- ** WARNING: Other 'oci' or 'bmcs' executables are on your $PATH. ** -- Conflicting paths: /home/oracle/bin/oci, /home/oracle/bin/bmcs, /home/oracle/bin/oci, /home/oracle/bin/bmcs, /home/oracle/bin/oci, /home/oracle/bin/bmcs, /home/oracle/bin/oci, /home/oracle/bin/bmcs -- You can run this installation of the CLI with '/u01/userhome/oracle/bin/oci'. -- -- ** Run `exec -l $SHELL` to restart your shell. ** -- -- Installation successful. -- Run the CLI with /u01/userhome/oracle/bin/oci --help
The highlighted line offers a deprecation warning for the currently available Python version, so I decided to have Python 3 installed for the CLI to use.
Python installationThis is a simple call to yum to install the package python36.
[root@orapex ~]# yum install -y python36 Loaded plugins: langpacks, ulninfo Resolving Dependencies --> Running transaction check ---> Package python36.x86_64 0:3.6.6-2.el7 will be installed --> Processing Dependency: python36-libs(x86-64) = 3.6.6-2.el7 for package: python36-3.6.6-2.el7.x86_64 --> Processing Dependency: libpython3.6m.so.1.0()(64bit) for package: python36-3.6.6-2.el7.x86_64 --> Running transaction check ---> Package python36-libs.x86_64 0:3.6.6-2.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved =================================================================================================================================================================================================================== Package Arch Version Repository Size =================================================================================================================================================================================================================== Installing: python36 x86_64 3.6.6-2.el7 ol7_developer_EPEL 65 k Installing for dependencies: python36-libs x86_64 3.6.6-2.el7 ol7_developer_EPEL 8.5 M Transaction Summary =================================================================================================================================================================================================================== Install 1 Package (+1 Dependent package) Total download size: 8.6 M Installed size: 36 M Downloading packages: (1/2): python36-libs-3.6.6-2.el7.x86_64.rpm | 8.5 MB 00:00:01 (2/2): python36-3.6.6-2.el7.x86_64.rpm | 65 kB 00:00:03 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 2.6 MB/s | 8.6 MB 00:00:03 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : python36-libs-3.6.6-2.el7.x86_64 1/2 Installing : python36-3.6.6-2.el7.x86_64 2/2 Verifying : python36-3.6.6-2.el7.x86_64 1/2 Verifying : python36-libs-3.6.6-2.el7.x86_64 2/2 Installed: python36.x86_64 0:3.6.6-2.el7 Dependency Installed: python36-libs.x86_64 0:3.6.6-2.el7 Complete! [root@orapex ~]# python -V Python 2.7.5 [root@orapex ~]# python36 -V Python 3.6.6
As you may have noted, in order to use the new version of Python, we would use the python36 command, while python will call version 2.7. This is a little problem when the oci CLI will call python and we want it to be the latest version.
In order to solve this little problem, I decided to go for Linux alternatives. This is a command to help maintain symbolic links to determine default commands system-wide. I’ve highlighted “system-wide” for a good reason we’ll see below.
Using alternatives is pretty straightforward. It requires a simple call defining the new symbolic link and the command it points to, plus a priority value. In my case, I opted for creating the same link for Python 2 and Python 3, giving a higher priority to Python 3. This means the system will first try Python 3, then Python 2 if the first one is not available.
[root@orapex ~]# alternatives --install /usr/bin/python python /bin/python36 2 [root@orapex ~]# alternatives --install /usr/bin/python python /bin/python 1 [root@orapex ~]# alternatives --display python python - status is auto. link currently points to /bin/python36 /bin/python36 - priority 2 /bin/python - priority 1 Current 'best' version is /bin/python36. [oracle@orapex ~]$ python -V Python 3.6.6System-wide change means trouble
At this point we have the OCI CLI installed and using Python 3 for its calls, so we are fine here. Fortunately, I was following the steps found in this blog post to configure alternatives properly and it includes an appendix. In this appendix, the post mentions a problem that appears in yum once the system-wide switch to Python 3 has completed. Also, fortunately, the post provides the solution / workaround for it, which I immediately applied. The solution is pretty simple; just modify the yum Python scripts to use the older Python version. There are two files to modify:
[root@orapex ~]# yum search python File "/bin/yum", line 30 except KeyboardInterrupt, e: ^ SyntaxError: invalid syntax [root@orapex ~]# head -1 /usr/bin/yum #!/usr/bin/python [root@orapex ~]# vi /usr/bin/yum [root@orapex ~]# head -1 /usr/bin/yum #!/usr/bin/python2.7 [root@orapex ~]# yum search python |more Loaded plugins: langpacks, ulninfo ============================= N/S matched: python ============================== GitPython.noarch : Python Git Library (...) [root@orapex ~]# head -1 /usr/libexec/urlgrabber-ext-down #! /usr/bin/python [root@orapex ~]# vim /usr/libexec/urlgrabber-ext-down [root@orapex ~]# head -1 /usr/libexec/urlgrabber-ext-down #! /usr/bin/python2.7
With these simple steps, we have the OCI CLI installed and working with the latest Python version, until a Python 4 version comes in, but now we know how to use alternatives to fix it.
The next step is to use the CLI itself to create a configuration file to access our OCI tenancy resources and start working, but that will be in another blog post.
Thank you for reading.
Weighing the Pros and Cons of Oracle Autonomous Database
Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 20, 2019.
The next generation of managed database services is upon us, offering organizations of all sizes the ability to scale database workloads quickly without forking out huge investments in either hardware or specialized expertise. One of these services is Oracle Autonomous Database (ADB), marketed by the company as the “world’s first self-driving database” that uses machine learning to replace previously manual tasks.
Sounds pretty cool – but is ABD a good choice for you and your organization?
What exactly is the Oracle Autonomous Database?If you’re in any way used to working in an on-premises Oracle system, you’ll be at home in ADB, as it’s built on the same underlying (and well-known) technology; Oracle Exadata hardware combined with software like Pluggable Database (PDB), which isolates each client to their own data neighborhood by restricting their ability to see outside their own PDB. These all run on Real Application Clusters (RAC) in a container database in one of Oracle’s cloud data centers with Oracle managing the whole thing, either on shared or dedicated underlying hardware.
It’s also worth pointing out that ADB is a relational database. That’s a clear differentiator between it and, say, Google BigQuery, a fantastic big data solution with the ability to scale to petabytes. ADB only scales to 128TB, but comes with relational database management system (RDBMS) functionality such as referential integrity (primary key / foreign-key relationships), server-side code objects such as procedures and functions and triggers.
One important thing to keep in mind, however, is that ADB isn’t a standalone product or service. It’s actually an umbrella term for the combination of two underlying Oracle services: Autonomous Data Warehouse (largely tuned for data warehouse workloads with data stored in columnar format) and the Autonomous Transaction Processing database (tuned for Online Transaction Processing-type workloads using the more traditional row storage format).
Makes sense. But what else is so great about it?Think of it as essentially a fully-fledged Oracle database that’s far easier and less costly to procure and scale up and down than an on-premises counterpart. And instead of your IT team managing various services, ADB automatically deals with things such as:
- Monitoring
- Addressing underlying component errors
- Fine-tuning and query stabilization
- Backups and recoveries
It also provides many features users have come to expect from one of the most powerful RDBMS systems in the world. These include everything from simple tables to the more advanced features such as server-side code (procedures, functions, triggers, types) and complex table and index types. That’s along with the ability to monitor query performance using tools to which users have grown accustomed.
Sounds impressive. But what are the drawbacks?We all know the saying “you’ve got to give to get” and unfortunately, even though new features are constantly being added, it’s just as applicable to ADB as many other things in life. Any managed database is great for scaling up and improving efficiency, but in exchange ADB sacrifices certain benefits and features typically found in an Oracle on-premises system, for example:
- Database Vault, OLAP, Text, Multimedia, Workspace Manager and other features are not available.
- There’s no control over database initialization parameters or tablespaces.
- There’s no ability to control database instance memory sizes or configurations.
- You can’t monitor via Oracle Enterprise Manager.
PDB lockdown profiles also restrict certain commands such as ALTER TABLESPACE.
Interested in learning more about Oracle Autonomous Database? This post only scratches the surface of the deep dive that is our new white paper on ADB and whether it might be a good fit at your organization.
Download the Oracle Autonomous Database White Paper to continue your evaluation of one of the most powerful cloud databases in the world.
Issues With Database Service Names Starting With Pluggable Database (PDB) Name
In this post I’ll describe a problem which occurs when creating and using a database service name which starts with the pluggable database’s (PDB) name, and has a dot (“.”) after it: “<pdb_name>.<optional_suffix>.” I noticed this issue when I was trying to reset an expired user’s password by connecting to that service.
I’ll also describe other anomalies, for example, being able to stop the service, but not start it, as well as the service remaining active even if the PDB gets dropped. I also acknowledge there may be other side effects in addition to those I describe here. The workaround is relatively simple — don’t create services with the format “<pdb_name>.<optional_suffix>.” Or, said differently, don’t start the service name with “<pdb_name>.”
It’s worth noting that the problem reproduces on 18c and 19c, but not on 12.1 and 12.2.
As of now, the development team has created a predefect — “Bug 32009641: ORA-01017 WHEN CONNECTING TO A PDB USING A SERVICE NAME WHICH STARTS WITH THE PDB” — though at the time of writing this post, it’s not published yet, as the team has to confirm whether it is a bug, and whether it can be published.
While it’s unlikely that many users will run into the issue described in this post, it’s good to be aware of it, and to avoid creating service names starting with “<pdb_name>.” until this issue is fixed. Mine is only a simple use case, and there could be other side effects not discussed here.
Setting up the test environment to reproduce the problemI performed the steps below on a 19.8.0 DB; no ASM or GI managed resources are present. I start with a new PDB named PDB1, and open it:
SQL> create pluggable database pdb1 admin user pdb_adm identified by oracle create_file_dest='/u02/oradata/CDB1'; Pluggable database created.
The default CDB services are started, and PDB1’s default service “pdb1,” associated with CON_ID=3. This identifies the initial current container for the service:
CDB$ROOT:
SQL> select con_id, name from v$services order by con_id, name; CON_ID NAME ---------- --------------- 1 CDB2 1 CDB2XDB 1 SYS$BACKGROUND 1 SYS$USERS 3 pdb1 SQL> select con_id, name from cdb_services order by con_id, name; CON_ID NAME ---------- ----------------- 1 CDB2 1 CDB2.localdomain 1 CDB2XDB 1 SYS$BACKGROUND 1 SYS$USERS 3 PDB1 6 rows selected.
PDB1:
SQL> select con_id, name from v$services order by con_id, name; CON_ID NAME ---------- ----- 3 pdb1 SQL> select con_id, name from cdb_services order by con_id, name; CON_ID NAME ---------- ----- 3 PDB1
Next, I create and start services in PDB1. I chose service names to include various combinations of names using “pdb1” and “pdb1.”:
SQL> alter session set container=pdb1; exec dbms_service.create_service(service_name=>'pdb1.test1',network_name=>'pdb1.test1'); exec dbms_service.create_service(service_name=>'test2.pdb1',network_name=>'test2.pdb1'); exec dbms_service.create_service(service_name=>'test3.pdb1.app',network_name=>'test3.pdb1.app'); exec dbms_service.create_service(service_name=>'pdb1_A',network_name=>'pdb1_A'); exec dbms_service.create_service(service_name=>'pdb1_B.app',network_name=>'pdb1_B.app'); exec dbms_service.create_service(service_name=>'custom_service.app',network_name=>'custom_service.app'); exec dbms_service.start_service(service_name=>'pdb1.test1'); exec dbms_service.start_service(service_name=>'test2.pdb1'); exec dbms_service.start_service(service_name=>'test3.pdb1.app'); exec dbms_service.start_service(service_name=>'pdb1_A'); exec dbms_service.start_service(service_name=>'pdb1_B.app'); exec dbms_service.start_service(service_name=>'custom_service.app');
I’ll be using the below script to display the services’ status:
SQL> !cat s.sql set lin 500 pagesize 9999 col name for a50 show con_name prompt prompt ** Output from v$active_services: ** select con_id, name from v$active_services where lower(name) like '%pdb%' or lower(name) like '%app%' order by con_id, name; prompt ** Output from cdb_services: ** select con_id, name from cdb_services where lower(name) like '%pdb%' or lower(name) like '%app%' order by con_id, name;
Let’s check the status of the above-created services:
CDB$ROOT:
** Output from v$active_services: ** CON_ID NAME ---------- ------------------ 1 pdb1.test1 <<-- 3 custom_service.app 3 pdb1 3 pdb1_A 3 pdb1_B.app 3 test2.pdb1 3 test3.pdb1.app 7 rows selected. ** Output from cdb_services: ** CON_ID NAME ---------- ------------------ 3 PDB1 3 custom_service.app 3 pdb1.test1 <<-- 3 pdb1_A 3 pdb1_B.app 3 test2.pdb1 3 test3.pdb1.app
PDB1:
** Output from v$active_services: ** CON_ID NAME ---------- ------------------ 3 custom_service.app 3 pdb1 3 pdb1_A 3 pdb1_B.app 3 test2.pdb1 3 test3.pdb1.app 6 rows selected. ** Output from cdb_services: ** CON_ID NAME ---------- ------------------ 3 PDB1 3 custom_service.app 3 pdb1.test1 3 pdb1_A 3 pdb1_B.app 3 test2.pdb1 3 test3.pdb1.app 7 rows selected.
As you can see, the “pdb1.test1” service is apparently associated to the root container (v$active_services.con_id=1), and not with PDB1 as configured (cdb_services.con_id=3). All the other created services are being reported as running in PDB1 (v$active_services.con_id=3).
Why is this a problem?Let’s create two local users in PDB1 — u1 and u2 — and set u1’s password to expired:
SQL> grant create session to u1 identified by oracle; SQL> grant create session to u2 identified by oracle; SQL> alter user u1 password expire;
If I try to connect as u2, everything works without problems:
$ sqlplus u2/oracle@localhost/pdb1.test1 SQL> select sys_context('userenv','cdb_name') cdb_name, sys_context('userenv','con_name') con_name, sys_context('userenv','service_name') service_name from dual; CDB_NAME CON_NAME SERVICE_NAME -------------------- -------------------- -------------------- CDB2 PDB1 pdb1.test1
However, if I try to connect to u1 and change its password, this is what happens:
$ sqlplus u1/oracle@localhost/pdb1.test1 SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 11:57:56 2020 Version 19.8.0.0.0 Copyright (c) 1982, 2020, Oracle. All rights reserved. ERROR: ORA-28001: the password has expired Changing password for u1 New password: Retype new password: ERROR: ORA-01017: invalid username/password; logon denied Password unchanged Enter user-name:Analysis
Tracing the session with oradebug before entering the new password (with events 10046 and 10079), the session gets established via the pdb1.test1 service, and it’s traced under “CDB$ROOT(1)”:
*** 2020-10-26T12:01:31.706897+01:00 (CDB$ROOT(1)) <<-- traced at "CDB$ROOT(1)" level *** SESSION ID:(269.54910) 2020-10-26T12:01:31.706934+01:00 *** SERVICE NAME:(pdb1.test1) 2020-10-26T12:01:31.706937+01:00 <<-- service pdb1.test1 was used when connecting *** MODULE NAME:(sqlplus@hol.localdomain (TNS V1-V3)) 2020-10-26T12:01:31.706940+01:00 *** ACTION NAME:() 2020-10-26T12:01:31.706943+01:00 *** CONTAINER ID:(1) 2020-10-26T12:01:31.706945+01:00
Not going into the details of the trace, we might get a clue at what’s happening by checking the audit log.
From PDB1:
SQL> select * from ( select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text from sys.aud$ where userid = 'U1' and returncode <> 0 order by ntimestamp# desc ) where rownum = 1; NTIMESTAMP# USERID USERHOST TERMINAL ACTION# RETURNCODE CURRENT_USER COMMENT$TEXT ------------------------------ ---------- -------------------- --------------- --------- ---------- --------------- ---------------------------------------------------------------------- 26-OCT-20 10.57.56.687258 AM U1 hol.localdomain pts/6 100 28001 U1 Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=58922))
From CDB$ROOT:
SQL> select * from ( select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text from sys.aud$ where userid = 'U1' and returncode <> 0 order by ntimestamp# desc ) where rownum = 1; 2 3 4 5 6 7 NTIMESTAMP# USERID USERHOST TERMINAL ACTION# RETURNCODE CURRENT_USER COMMENT$TEXT ------------------------------ -------- ---------------- ---------- ------- ---------- --------------- ----------------------------------------------------------------- 26-OCT-20 11.01.32.725820 AM U1 hol.localdomain pts/6 100 1017 U1 Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=58922))
As you can see, when entering the password for the u1 user, we get “ORA-28001: the password has expired,” which is recorded in the PDB’s audit log. Then, after changing the password, an attempt is made to connect to the root container as u1. Since u1 is not a common user, the login fails, and we get the “ORA-01017: invalid username/password; logon denied” error.
What happens if I try to repeat the same steps using the “test2.pdb1” service name, which according to v$active_services maps to CON_ID=3, so PDB1?
This time the password change succeeds:
$ sqlplus u1/oracle@localhost/test2.pdb1 SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 12:38:04 2020 Version 19.8.0.0.0 Copyright (c) 1982, 2020, Oracle. All rights reserved. ERROR: ORA-28001: the password has expired Changing password for u1 New password: Retype new password: Password changed Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.8.0.0.0 SQL>
Enabling tracing on the session before the initial password is provided, this time the trace gets generated in the context of “PDB1(3).” Notice the difference with the previous trace header, where “CDB$ROOT(1)” was reported:
*** 2020-10-26T12:39:18.045405+01:00 (PDB1(3)) <<-- traced at "CDB$ROOT(1)" level *** SESSION ID:(269.12918) 2020-10-26T12:39:18.045547+01:00 *** SERVICE NAME:(test2.pdb1) 2020-10-26T12:39:18.045557+01:00 <<-- service test2.pdb1 was used when connecting *** MODULE NAME:(sqlplus@hol.localdomain (TNS V1-V3)) 2020-10-26T12:39:18.045564+01:00 *** ACTION NAME:() 2020-10-26T12:39:18.045570+01:00 *** CONTAINER ID:(3) 2020-10-26T12:39:18.045574+01:00
Checking PDB1’s audit log contents, there’s an initial ORA-28001, followed by RETURNCODE=0, thus a successful login:
SQL> select * from ( select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text from sys.aud$ where userid = 'U1' order by ntimestamp# desc ) where rownum <=2; 2 3 4 5 6 NTIMESTAMP# USERID USERHOST TERMINAL ACTION# RETURNCODE CURRENT_USER COMMENT$TEXT ------------------------------ -------- ---------------- ---------- ------- ---------- --------------- ----------------------------------------------------------------- 26-OCT-20 11.39.18.069634 AM U1 hol.localdomain pts/6 100 0 U1 Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=59244)) 26-OCT-20 11.38.04.992001 AM U1 hol.localdomain pts/6 100 28001 U1 Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=59244))
There are no new entries for the u1 user in CDB$ROOT’s audit log.
Other anomalies Start / stop serviceThere are also other anomalies, e.g. the ‘pdb1.test1’ service can be started and stopped from PDB1, but then not started back:
From PDB1:
SQL> exec dbms_service.stop_service(service_name=>'pdb1.test1'); PL/SQL procedure successfully completed.
Querying v$active_services from PDB1 or CDB$ROOT, the service is not displayed. However, if I try to start it:
SQL> exec dbms_service.start_service(service_name=>'pdb1.test1'); BEGIN dbms_service.start_service(service_name=>'pdb1.test1'); END; * ERROR at line 1: ORA-44773: Cannot perform requested service operation. ORA-06512: at "SYS.DBMS_SERVICE_ERR", line 63 ORA-06512: at "SYS.DBMS_SERVICE", line 486 ORA-06512: at line 1 SQL>Closing and / or dropping PDB1
Going to the initial setup, where all of the services I defined, except “pdb1.test1”, are mapped to PDB1:
SQL> @s CON_NAME ------------------------------ CDB$ROOT ** Output from v$active_services: ** CON_ID NAME ---------- ------------------- 1 pdb1.test1 3 custom_service.app 3 pdb1 3 pdb1_A 3 pdb1_B.app 3 test2.pdb1 3 test3.pdb1.app 7 rows selected. ** Output from cdb_services: ** CON_ID NAME ---------- ------------------- 3 PDB1 3 custom_service.app 3 pdb1.test1 3 pdb1_A 3 pdb1_B.app 3 test2.pdb1 3 test3.pdb1.app 7 rows selected.
Let’s close PDB1, and check the services status:
SQL> show con_name CON_NAME ------------------------------ CDB$ROOT SQL> alter pluggable database pdb1 close immediate; Pluggable database altered. SQL> @s CON_NAME --------- CDB$ROOT ** Output from v$active_services: ** CON_ID NAME ---------- ----------- 1 pdb1.test1 3 pdb1 ** Output from cdb_services: ** no rows selected SQL>
All of the user created services were closed, except for pdb1.test1. If I try to connect to PDB1 as user u2, I get “ORA-01109: database not open.”
Let’s drop the PDB:
SQL> drop pluggable database pdb1 including datafiles; Pluggable database dropped. SQL> @s CON_NAME --------- CDB$ROOT ** Output from v$active_services: ** CON_ID NAME ---------- ----------- 1 pdb1.test1 ** Output from cdb_services: ** no rows selected SQL>
Connecting to PDB1 as user u2 now returns “ORA-01017: invalid username/password; logon denied” which indicates that an authentication attempt is performed. Since only CDB$ROOT is open, let’s see if I can connect as system:
$ sqlplus system/oracle@localhost/pdb1.test1 SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 13:05:59 2020 Version 19.8.0.0.0 Copyright (c) 1982, 2020, Oracle. All rights reserved. Last Successful login time: Mon Oct 26 2020 11:38:01 +01:00 Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.8.0.0.0 SQL> show con_name CON_NAME ------------------------------ CDB$ROOT SQL>Conclusion
The above are relatively simple use cases, and until this issue is fixed, it’s advisable to avoid creating service names starting with “<pdb_name>.” as there could be other implications not discussed here. You may also want to keep an eye on the status of the predefect tracked as Bug 32009641.
Reducing Contention on Hot Cursor Objects (Cursor: Pin S)
First, let me offer a little explanation about the wait event “cursor: pin S.”
Oracle states: “A session waits on this event when it wants to update a shared mutex pin and another session is currently in the process of updating a shared mutex pin for the same cursor object.” In other words, two or more sessions are trying to concurrently run the same statement (the same cursor in library cache), which forces them to compete to update a shared mutex pin for the same cursor object.
This wait event provides very useful information to identify why sessions are competing to update a shared mutex pin:
Here’s how a mutex works:
If a session wants to use a cursor, it must not disappear from the library cache while in use. The session uses a mutex to ensure the cursor cannot be changed or deleted so, to this end, it logs that there is an interested session by incrementing the mutex usage by one. This is called taking a shared lock.
The process for taking a shared lock:- A session wants to run a cursor and so checks the owning cursor pin mutex to see if there is a session waiting to change the mutex (e.g. performing a hard-parse). It does this by checking the high-order bits to see if they are zero or have a session ID.
- If the high-order bits are zero, then it locks and increments by one (this is an atomic action). Waiting to lock and increment causes the “cursor: pin S” wait event. This increment is done on the low-order bits of the mutex.
- If the lock and increment fails, then some other session must be updating the mutex, so it’s necessary to sleep and try again, i.e. lock and increment. The “cursor: pin S” wait event will be longer. This can cause extra CPU load on the server as it spins attempting to update the mutex.
- If the high-order bits are not zero then there is a session waiting to change the mutex. The current interested session waits on the event “cursor: pin S wait on X.” If this is the case then it sleeps and tries again.
- Once the cursor is closed and finished, the shared lock on the mutex must be released by performing a lock and decrementing by one. Once again, if there is a failure to lock and decrement the next step is to sleep and try again.
If a session wants to perform a hard parse on a cursor already existing in the library cache it must acquire the mutex in exclusive mode.
The process for taking an exclusive lock:- A session wants to perform a hard parse on a statement so it checks the cursor pin mutex to see if it’s in use.
- It checks the high-order bits and, if zero, updates the high-order bits to the current session ID (this compare-and-swap routine is a CPU atomic action).
- If the high-order bits are already set, the process has to wait on the event “cursor: pin X.” The session then sleeps and tries again.
- Once the high-order bits are set to the current session ID, it checks the low-order bits to see if the cursor is currently in use.
- If the low-order bits are not zero, it must wait for the counter to decrement to zero (Note: the counter cannot be incremented once the high-order bits are set to the session ID).
- Once the low-order bits are set to zero then the hard parse can proceed.
- The session removes the exclusive mutex lock by resetting the high-order bits to zero.
Oracle states that the wait event should be very rare, since updating the shared mutex pin is extremely fast.
This is true in most cases, but we recently ran into a scenario where a vast number of sessions were waiting on “cursor: pin S” while running different statements (v$session showed different sql_id). As Oracle states, the process to take a shared lock was very fast, but it still generated a big impact on performance. As the load from the application grew (in this case load grew exponentially, as this was an online retailer with increased traffic due to COVID-19), this impact became more and more evident.
In this scenario, we found a number of statements that shared the same characteristic; they all invoked the same function which, among other tasks, ran a simple select statement. As they were all trying to run the select statement in the function, they were all competing for access to the shared mutex pin for that specific select statement.
The best approach to reduce the contention on the shared mutex for the statement is to mark it as hot. This allows Oracle to split the executions of this statement among several “SQL-copies.” Once the statement is marked as hot, Oracle will stop running the original statement and will, instead, split the executions among the different “SQL-copies” of the cursor object. Since each “SQL-copy” has its own shared mutex, load on the single mutex will be split among the copies generated by this mechanism.
Even though it doesn’t eliminate the waits on “cursor: pin S,” this technique proved to be very effective by dramatically reducing waits and improving performance on a real productive environment with around 9000 executions of the same SQL statement per second.
Below is a step-by-step guide to create a scenario very similar to the issue we encountered, along with the solution implemented to overcome its impact on performance. I also cover some important topics to keep in mind when you reboot the database, and when dealing with RAC environments.
Creating a test environment to reproduce waits on “cursor: pin S”In order to create the test environment, I created a dummy table, a function, two shell scripts and four SQL scripts.
1. Create a dummy table (code_table) and populate it with some simple values using a for loop.
SQL> create table code_table (code_name char(1), low_value number, high_value number); Table created. SQL> declare 2 letters char(26) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'; 3 v_num number := 1; 4 begin 5 for i in 1..26 LOOP 6 insert into code_table values (substr(letters,i,1), i*v_num, i*(v_num+1000)); 7 v_num := v_num + 1001; 8 end loop; 9 commit; 10 end; 11 / PL/SQL procedure successfully completed.
2. Create a function (fx_num) that contains the select statement that will force the sessions to wait on “cursor: pin S’.
SQL> create or replace function fx_num (v_name varchar) return number is 2 v_low number; 3 v_high number; 4 begin 5 select low_value, high_value into v_low, v_high from code_table where code_name=v_name; 6 return(DBMS_RANDOM.value(low => v_low, high => v_high)); 7 end; 8 / Function created.
3. Create a shell script (launch_test.sh) that will spam multiple concurrent sessions to mimic high concurrency on the object cursor.
:::::::::::::::: launch_test.sh :::::::::::::::: #!/bin/bash export ORACLE_SID=proddb2 export ORACLE_BASE=/u01/app/oracle export ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/db export LD_LIBRARY_PATH=$ORACLE_HOME/lib export PATH=$ORACLE_HOME/bin:$PATH for i in {1..50} do nohup sqlplus -s user/pass @launch_sql1.sql >> launch_sql1.log 2>&1 & nohup sqlplus -s user/pass @launch_sql2.sql >> launch_sql2.log 2>&1 & nohup sqlplus -s user/pass @launch_sql3.sql >> launch_sql3.log 2>&1 & nohup sqlplus -s user/pass @launch_sql4.sql >> launch_sql4.log 2>&1 & done exit 0
4. Create four SQL scripts (launch_sql[1-4].sql), containing a simple query that calls the function created previously. Each of these scripts will be executed 50 times; thus simulating 200 concurrent sessions trying to call the function (fx_num).
:::::::::::::: sts_tst1.sql :::::::::::::: select f_random(substr(to_char(sysdate,'MON'),1,1)) from dual; exit
:::::::::::::: sts_tst2.sql :::::::::::::: select object_name from user_objects where object_id < fx_num(substr(object_id,1,1)); exit
:::::::::::::: sts_tst3.sql :::::::::::::: select name, count(*) from user_source where line < fx_num(substr(name,1,1)) group by name; exit
:::::::::::::: sts_tst4.sql :::::::::::::: select trunc(last_analyzed), sample_size, count(*) from user_tab_cols where column_id < fx_num(substr(column_name,1,1)) group by trunc(last_analyzed), sample_size; exit
5. Finally, create a shell script (check_waits.sh) to list the sessions waiting on “cursor: pin S.” As stated before, the column P1 of gv$session view shows the hash value of the statement protected by the shared mutex. Showing this value will help identify the root cause of the waits on “cursor: pin S.”
::::::::::::::::: check_waits.sql ::::::::::::::::: set lines 200 pages 200 prompt - Waits on cursor pin S: select inst_id, substr(event,1,30) event, p1, sql_id, count(*) from gv$session where event = 'cursor: pin S' group by inst_id, substr(event,1,30), p1, sql_id; exitRunning the test scenario
First, let’s generate the waits on “cursor: pin S”:
1. In order to check for session waiting on the wait event “cursor: pin S,” I run the SQL script check_waits.sql on an endless while loop as follows:
[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @check_waits.sql; sleep 1; done - Waits on cursor pin S: no rows selected - Waits on cursor pin S: no rows selected
2. On a second terminal, I execute the shell script launch_test.sh to generate 200 sessions by executing the four SQL scripts 50 times each. This step forces the database to receive 200 executions of different select statements that invoke the function called fx_num.
[oracle@oradb02 ddml]$ ./launch_test.sh
3. On the first terminal, we can now see a lot of sessions waiting on “cursor: pin S.” As you can see in the following output, instance two registers up to 157 sessions waiting on this event at one point in time.
[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @check_waits; sleep 1; done - Waits on cursor pin S: no rows selected - Waits on cursor pin S: no rows selected - Waits on cursor pin S: INST_ID EVENT P1 SQL_ID COUNT(*) ------- ------------- --------- ------------- -------- 2 cursor: pin S 356306711 2ndpau148y2by 14 2 cursor: pin S 356306711 7tr4jwnamtmsr 55 2 cursor: pin S 356306711 23 2 cursor: pin S 356306711 a3xkbsayc47kq 13 - Waits on cursor pin S: INST_ID EVENT P1 SQL_ID COUNT(*) ------- ------------- --------- ------------- -------- 2 cursor: pin S 356306711 5n3qfbb42gfdr 1 2 cursor: pin S 356306711 2ndpau148y2by 11 2 cursor: pin S 356306711 7tr4jwnamtmsr 84 2 cursor: pin S 356306711 42 2 cursor: pin S 356306711 a3xkbsayc47kq 19 - Waits on cursor pin S: INST_ID EVENT P1 SQL_ID COUNT(*) ------- ------------- --------- ------------- -------- 2 cursor: pin S 356306711 2ndpau148y2by 1 2 cursor: pin S 356306711 7tr4jwnamtmsr 4 2 cursor: pin S 356306711 a3xkbsayc47kq 6
4. The view gv$sql shows the sessions waiting on “cursor: pin S” when executing the statements in the SQL scripts (launch_sqlX.sql). You can see both the actual problematic SQL statement (SQL_ID 7tr4jwnamtmsr) and its calling statements waiting for the event “cursor: pin S.”
SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql 2 where hash_value=356306711 3 or sql_id in ('2ndpau148y2by','7tr4jwnamtmsr','a3xkbsayc47kq','5n3qfbb42gfdr') 4 group by sql_id, hash_value, sql_text order by 1; SQL_ID HASH_VALUE EXECUTIONS SQL_TEXT ------------- ---------- ---------- ----------------------------------------------------------------------------------------------------- 2ndpau148y2by 1217333630 800 select job_name, count(*) from job_masters where job_id > fx_num(substr(owner,1,1)) group by job_name 5n3qfbb42gfdr 3358046647 800 select fx_num(substr(to_char(sysdate,'MON'),1,1)) from dual 7tr4jwnamtmsr 356306711 3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 a3xkbsayc47kq 3166838358 800 select trunc(timestamp), completion_status, count(*) from job_masters where job_id < fx_num(substr(job_name,1,1)) group by trunc(timestamp), completion_status
5. If you take a look into column P1, you’ll notice even though the sessions are running different statements (column SQL_ID) they are still trying to access the same shared mutex. All the sessions point to the same hash_value (356306711): the select statement in the function fx_num.
6. In addition, we can see the number of executions for the select statements in the function (sql_id 7tr4jwnamtmsr) is much higher than those from the select statements in the SQL scripts (launch_sqlX.sql). This makes total sense since the statements in the SQL scripts call the function fx_num for each row. This extremely high number of concurrent executions of the function is the root cause of the high waits on “cursor: pin S.” These sessions are struggling to access the shared mutex that protects the select statement in the function.
Implementing the recommended fixThe best approach to fix this performance issue is to mark the select statement in the function as hot. As stated earlier, this tells Oracle to split the executions on this statement among a number of “SQL-copies.” This will also split the load among different cursor objects protected by different shared mutexes.
1. The first step is to set the hidden parameter _kgl_hot_object_copies.
This parameter governs the number of “SQL-copies” that Oracle will create for each statement marked as hot. The recommendation is to set this parameter to half the amount of CPUs in the server. Keep in mind, you will need to reboot for this change to take effect, so plan accordingly.
SQL> alter system set '_kgl_hot_object_copies'=8 scope=spfile sid='*'; System altered SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SQL> startup ORACLE instance started. (...) Database mounted. Database opened. SQL> select ksppinm, ksppstvl from x$ksppi a, x$ksppsv b, v$instance 2 where a.indx=b.indx AND substr(ksppinm,1,1) = '_' AND ksppinm = '_kgl_hot_object_copies'; KSPPINM KSPPSTVL -------------------------- ---------- _kgl_hot_object_copies 8
2. Once you set the parameter _kgl_hot_object_copies and restart the database, you have to set the SQL suffering the waits on “cursor: pin S” to hot. Oracle provides the procedure dbms_shared_pool.markhot for this purpose.
3. You need to confirm that the cursor you’re planning to mark as hot is already in the library cache. You can use the following query to confirm this, and also to get the full_hash_value which you must provide to set the statement as hot.
SQL> select kglnahsh, kglnahsv from v$sql, x$kglob where kglhdadr=address and sql_id = '7tr4jwnamtmsr'; KGLNAHSH KGLNAHSV ---------- -------------------------------- 356306711 eb4cdceda1c495cd7cdc91e5153ccf17
4. The following statement marks the SQL as hot.
SQL> begin 2 dbms_shared_pool.markhot(hash => 'eb4cdceda1c495cd7cdc91e5153ccf17', namespace => 0, global => true); 3 end; 4 / PL/SQL procedure successfully completed.
5. You can query the view gv$db_object_cache to check on which instance the statement has been marked as hot by checking the PROPERTY column.
SQL> select inst_id, hash_value, namespace, child_latch, property, status, sum(executions) executions, name 2 from gv$db_object_cache where hash_value = 356306711 3 group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value; INST_ID HASH_VALUE NAMESPACE CHILD_LATCH PROPERTY STATUS EXECUTIONS NAME ------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------- 2 356306711 SQL AREA 0 HOT VALID 3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 53015 HOT VALID 3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
6. Keep in mind the procedure will only mark the statement as hot if it finds it in the library cache. Because of this, you can see that gv$db_object_cache shows the statement marked as hot only in instance two even though I executed the procedure dbms_shared_pool.markhot with the global argument set to true.
Running the test scenario one more time1. With the statement set to hot, let’s try the same test one more time. Again, I run the check_waits.sql script on a while loop.
[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @chk; sleep 1; done - Waits on cursor pin S: no rows selected - Waits on cursor pin S: no rows selected
2. Same as before, let’s launch the test on a separate terminal.
[oracle@oradb02 ddml]$ ./launch_test.sh
3. Instance two now shows just a few sessions waiting on “cursor: pin S.” The total number of concurrent waits dropped to just eight sessions.
[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @chk; sleep 1; done - Waits on cursor pin S: no rows selected - Waits on cursor pin S: INST_ID EVENT P1 SQL_ID COUNT(*) ------- ------------- ---------- ------------- -------- 2 cursor: pin S 1835995903 ddttvdtqqy4rz 1 2 cursor: pin S 939937426 2ndpau148y2by 2 2 cursor: pin S 4244290431 2ndpau148y2by 1 2 cursor: pin S 939937426 1 2 cursor: pin S 939937426 dg9n6z0w0cmnk 2 2 cursor: pin S 4244290431 61m38g7ygpfvz 1 6 rows selected. - Waits on cursor pin S: no rows selected - Waits on cursor pin S: no rows selected
4. While the view gv$sql shows an increase on the number of executions of the select statements in the SQL scripts (launch_sqlX.sql), the number of executions for the SQL in the function fx_num (sqlid 7tr4jwnamtmsr) remains unchanged (still 3854606).
SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql 2 where hash_value=356306711 3 or sql_id in ('2ndpau148y2by','7tr4jwnamtmsr','a3xkbsayc47kq','5n3qfbb42gfdr') 4 group by sql_id, hash_value, sql_text order by 1; SQL_ID HASH_VALUE EXECUTIONS SQL_TEXT ------------- ---------- ---------- ------------------------------------------------------------------------------------------------------ 2ndpau148y2by 1217333630 1600 select job_name, count(*) from job_masters where job_id > fx_num(substr(owner,1,1)) group by job_name 5n3qfbb42gfdr 3358046647 1600 select fx_num(substr(to_char(sysdate,'MON'),1,1)) from dual 7tr4jwnamtmsr 356306711 3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 a3xkbsayc47kq 3166838358 1600 select trunc(timestamp), completion_status, count(*) from job_masters where job_id < fx_num(substr(job_name, 1,1)) group by trunc(timestamp), completion_status
5. If we filter by the SQL signature from the statement in the function fx_num, you can see there are now a group of new statements with a high number of executions. It seems these statements were executed as part of the second test. As a side note, the SQL signature for all these sql_id match because they all share the same SQL text.
SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql 2 where force_matching_signature = (select force_matching_signature from gv$sql where hash_value=356306711) 3 group by sql_id, hash_value, sql_text order by 1; SQL_ID HASH_VALUE EXECUTIONS SQL_TEXT ------------- ---------- ---------- ------------------------------------------------------------------------------------------------------ 0u9hxt3azayv5 3589634917 125688 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1far8p5csfrmx 1502043773 142500 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 5bacfy8mwthrj 667730673 90651 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 61m38g7ygpfvz 4244290431 140775 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 7tr4jwnamtmsr 356306711 3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 9pyrc45h3tgg9 1614593513 148272 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 arg5wx08suqm0 294476384 117839 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 ddttvdtqqy4rz 1835995903 104122 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 dg9n6z0w0cmnk 939937426 102875 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 9 rows selected.
6. The view gv$db_object_cache shows these statements have been identified as HOTCOPY of the statement in the function fx_num. We have exactly eight hot copies, as defined by the hidden parameter _kgl_hot_object_copies.
SQL> select inst_id, hash_value, namespace, child_latch, property, status, sum(executions) executions, name 2 from gv$db_object_cache where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 ) 3 group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value; INST_ID HASH_VALUE NAMESPACE CHILD_LATCH PROPERTY STATUS EXECUTIONS NAME ------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 2 294476384 SQL AREA 0 HOTCOPY4 VALID 117082 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 294476384 SQL AREA 88672 HOTCOPY4 VALID 117159 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 0 HOT VALID 3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 53015 HOT VALID 3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 0 HOTCOPY2 VALID 90188 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 49905 HOTCOPY2 VALID 90218 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 0 HOTCOPY8 VALID 102067 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 20114 HOTCOPY8 VALID 101647 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 0 HOTCOPY5 VALID 141347 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 89725 HOTCOPY5 VALID 141440 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 0 HOTCOPY6 VALID 147205 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 48617 HOTCOPY6 VALID 147329 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 0 HOTCOPY3 VALID 103453 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 70399 HOTCOPY3 VALID 103484 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 0 HOTCOPY1 VALID 124746 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 97125 HOTCOPY1 VALID 125095 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 0 HOTCOPY7 VALID 139691 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 47999 HOTCOPY7 VALID 139803 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 18 rows selected.
7. If we keep running this same test over and over again, we will confirm that the number of executions of the original sql_id (marked as hot) remains unchanged. At the same time the executions on the “SQL-copies” continue to grow. In other words, the “SQL-copies” now perform all executions of this statement.
[oracle@oradb02 ddml]$ ./launch_test.sh SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name 2 from gv$db_object_cache 3 where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 ) 4 group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value; INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME ------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 2 294476384 SQL AREA 0 HOTCOPY4 VALID 240474 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 294476384 SQL AREA 88672 HOTCOPY4 VALID 240582 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 0 HOT 0 VALID 3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 53015 HOT 0 VALID 3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 0 HOTCOPY2 VALID 206588 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 49905 HOTCOPY2 VALID 206662 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 0 HOTCOPY8 VALID 228203 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 20114 HOTCOPY8 VALID 228118 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 0 HOTCOPY5 VALID 275321 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 89725 HOTCOPY5 VALID 275482 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 0 HOTCOPY6 VALID 280358 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 48617 HOTCOPY6 VALID 279606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 0 HOTCOPY3 VALID 201829 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 70399 HOTCOPY3 VALID 201773 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 0 HOTCOPY1 VALID 244295 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 97125 HOTCOPY1 VALID 244809 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 0 HOTCOPY7 VALID 253317 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 47999 HOTCOPY7 VALID 253521 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 18 rows selected.Keeping statements as hot on a RAC database
1. As mentioned earlier, statements cannot be marked as hot if they are not first loaded into the library cache. For this reason, the statement in our example was not set as hot in instance one.
2. So now let’s run the load test on instance one to see what happens.
[oracle@oradb01 ddml]$ ./launch_test.sh
3. Below is an example of what you’d see if the statement is used on both instances but only marked as hot on one of the instances (inst_id=2).
SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name 2 from gv$db_object_cache 3 where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 ) 4 group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value; INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME ------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 1 356306711 SQL AREA 0 VALID 899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 356306711 SQL AREA 53015 VALID 832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 294476384 SQL AREA 0 HOTCOPY4 VALID 240474 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 294476384 SQL AREA 88672 HOTCOPY4 VALID 240582 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 0 HOT VALID 3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 53015 HOT VALID 3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 0 HOTCOPY2 VALID 206588 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 49905 HOTCOPY2 VALID 206662 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 0 HOTCOPY8 VALID 228203 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 20114 HOTCOPY8 VALID 228118 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 0 HOTCOPY5 VALID 275321 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 89725 HOTCOPY5 VALID 275482 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 0 HOTCOPY6 VALID 280358 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 48617 HOTCOPY6 VALID 279606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 0 HOTCOPY3 VALID 201829 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 70399 HOTCOPY3 VALID 201773 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 0 HOTCOPY1 VALID 244295 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 97125 HOTCOPY1 VALID 244809 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 0 HOTCOPY7 VALID 253317 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 47999 HOTCOPY7 VALID 253521 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 20 rows selected.
4. After running the procedure dbms_shared_pool.markhot one more time, we can now see that the fix is correctly working in both instances of the RAC.
SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name 2 from gv$db_object_cache 3 where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 ) 4 group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value; INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME ------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 1 294476384 SQL AREA 0 HOTCOPY4 VALID 235617 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 294476384 SQL AREA 88672 HOTCOPY4 VALID 235190 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 356306711 SQL AREA 0 HOT VALID 899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 356306711 SQL AREA 53015 HOT VALID 832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 667730673 SQL AREA 0 HOTCOPY2 VALID 243111 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 667730673 SQL AREA 49905 HOTCOPY2 VALID 243114 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 939937426 SQL AREA 0 HOTCOPY8 VALID 272199 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 939937426 SQL AREA 20114 HOTCOPY8 VALID 272379 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1502043773 SQL AREA 0 HOTCOPY5 VALID 223078 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1502043773 SQL AREA 89725 HOTCOPY5 VALID 223230 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1614593513 SQL AREA 0 HOTCOPY6 VALID 254701 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1614593513 SQL AREA 48617 HOTCOPY6 VALID 254807 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1835995903 SQL AREA 0 HOTCOPY3 VALID 238938 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1835995903 SQL AREA 70399 HOTCOPY3 VALID 238361 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 3589634917 SQL AREA 0 HOTCOPY1 VALID 231836 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 3589634917 SQL AREA 97125 HOTCOPY1 VALID 230782 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 4244290431 SQL AREA 0 HOTCOPY7 VALID 225743 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 4244290431 SQL AREA 47999 HOTCOPY7 VALID 225810 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 294476384 SQL AREA 0 HOTCOPY4 VALID 129780 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 294476384 SQL AREA 88672 HOTCOPY4 VALID 129861 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 0 HOT VALID 4565713 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 53015 HOT VALID 4469646 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 0 HOTCOPY2 VALID 120900 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 49905 HOTCOPY2 VALID 121150 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 0 HOTCOPY8 VALID 120642 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 20114 HOTCOPY8 VALID 120730 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 0 HOTCOPY5 VALID 117835 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 89725 HOTCOPY5 VALID 117268 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 0 HOTCOPY6 VALID 140124 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 48617 HOTCOPY6 VALID 140374 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 0 HOTCOPY3 VALID 118871 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 70399 HOTCOPY3 VALID 118520 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 0 HOTCOPY1 VALID 98116 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 97125 HOTCOPY1 VALID 98091 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 0 HOTCOPY7 VALID 119193 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 47999 HOTCOPY7 VALID 119279 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 36 rows selected.Restarting the database
1. It is extremely important to note that the statement’s hot status will not survive a database / instance reboot. This means you need to mark it as hot again each time you restart it.
2. Here are the steps to follow to make sure this setup remains available after an instance / database reboot:
a. Make sure you still have the hidden parameter _kgl_hot_object_copies set to number of CPUs.
b. Place the cursor on the library cache in each instance. This means you need to run the statement with the exact same code on each instance, to ensure it uses the right full_hash_value.
c. Mark the instance as hot using the procedure dbms_shared_pool.markhot. As a side note, you can set the global argument to true if you want to mark the statement as hot on all instances but, as mentioned before, the statement will only be marked as hot on those instances were the statement is present on the library cache.
Unmarking hot statements1. Finally, you can unmark the hot statement using the procedure dbms_shared_pool.unmarkhot.
SQL> begin 2 dbms_shared_pool.unmarkhot(hash => 'eb4cdceda1c495cd7cdc91e5153ccf17', namespace => 0, global => true); 3 end; 4 / PL/SQL procedure successfully completed.
2. You can confirm that you’ve unmarked the statement by querying the view gv$db_object_cache.
SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name 2 from gv$db_object_cache where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 ) 3 group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value; INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME ------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 1 294476384 SQL AREA 0 OBSCOPY4 VALID 235617 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 294476384 SQL AREA 88672 OBSCOPY4 VALID 235190 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 356306711 SQL AREA 0 VALID 899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 356306711 SQL AREA 53015 VALID 832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 667730673 SQL AREA 0 OBSCOPY2 VALID 243111 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 667730673 SQL AREA 49905 OBSCOPY2 VALID 243114 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 939937426 SQL AREA 0 OBSCOPY8 VALID 272199 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 939937426 SQL AREA 20114 OBSCOPY8 VALID 272379 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1502043773 SQL AREA 0 OBSCOPY5 VALID 223078 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1502043773 SQL AREA 89725 OBSCOPY5 VALID 223230 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1614593513 SQL AREA 0 OBSCOPY6 VALID 254701 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1614593513 SQL AREA 48617 OBSCOPY6 VALID 254807 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1835995903 SQL AREA 0 OBSCOPY3 VALID 238938 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 1835995903 SQL AREA 70399 OBSCOPY3 VALID 238361 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 3589634917 SQL AREA 0 OBSCOPY1 VALID 231836 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 3589634917 SQL AREA 97125 OBSCOPY1 VALID 230782 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 4244290431 SQL AREA 0 OBSCOPY7 VALID 225743 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 1 4244290431 SQL AREA 47999 OBSCOPY7 VALID 225810 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 294476384 SQL AREA 0 OBSCOPY4 VALID 385506 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 294476384 SQL AREA 88672 OBSCOPY4 VALID 385414 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 0 VALID 4565713 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 356306711 SQL AREA 53015 VALID 4469646 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 0 OBSCOPY2 VALID 359549 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 667730673 SQL AREA 49905 OBSCOPY2 VALID 359951 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 0 OBSCOPY8 VALID 376019 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 939937426 SQL AREA 20114 OBSCOPY8 VALID 376026 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 0 OBSCOPY5 VALID 420406 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1502043773 SQL AREA 89725 OBSCOPY5 VALID 420031 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 0 OBSCOPY6 VALID 455769 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1614593513 SQL AREA 48617 OBSCOPY6 VALID 455326 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 0 OBSCOPY3 VALID 340070 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 1835995903 SQL AREA 70399 OBSCOPY3 VALID 339658 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 0 OBSCOPY1 VALID 364612 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 3589634917 SQL AREA 97125 OBSCOPY1 VALID 365127 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 0 OBSCOPY7 VALID 395629 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 2 4244290431 SQL AREA 47999 OBSCOPY7 VALID 395981 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1 36 rows selected.
3. You will notice that all hot copies will be marked as obsolete, meaning they are no longer valid since the original statement is no longer hot.
I hope this post gives you an idea of how to work around the bottlenecks that can be caused by “cursor: pin s.” Please let me know if you have any questions, or tips of your own.
OPatchAuto Bug on 19.3 GI/RDBMS When Patching to 19.6
The other day I was doing a fresh install of Oracle 19.3. I was using the binaries from edelivery, doing some testing in that version, then patching to 19.6. The installation and the tests went fine. Next, I proceeded to analyze patch 30501910 and saw that everything was OK.
## Note.- I trimmed the log for readability
[root@node2 ~]$ . oraenv
ORACLE_SID = [root] ? DB193H12
The Oracle base has been set to /u01/app/oracle
[root@node2 ~]$ export PATH=$ORACLE_HOME/OPatch:$PATH
[root@node2 ~]$ opatchauto apply /u01/patches/30501910 -analyze
OPatchauto session is initiated at Tue Oct 6 13:23:11 2020
...
Host:node2
CRS Home:/u01/app/19.3.0.0/grid
Version:19.0.0.0.0
==Following patches were SUCCESSFULLY analyzed to be applied:
Patch: /u01/patches/30501910/30489227
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log
Patch: /u01/patches/30501910/30489632
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log
Patch: /u01/patches/30501910/30655595
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log
Patch: /u01/patches/30501910/30557433
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log
OPatchauto session completed at Tue Oct 6 11:42:46 2020
Time taken to complete the session 2 minutes, 41 seconds
I proceeded with the patch with the command below:
opatchauto apply /u01/patches/30501910
Everything went well in node 1, with the patching completed successfully in GI and RDBMS, but the patch on node 2 failed on the RDBMS with the error below:
## Note.- I trimmed the log for readability
[root@node2 ~]$ opatchauto apply /u01/patches/30501910
...
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/oracle/product/19.3.0.0/dbhome_1, host: node2.
Command failed: /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/opatchauto apply /u01/patches/30501910 -oh /u01/app/oracle/product/19.3.0.0/dbhome_1 -target_type rac_database -binary -invPtrLoc /u01/app/oracle/product/19.3.0.0/dbhome_1/oraInst.loc -jre /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/jre -persistresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_node2_rac_2.ser -analyzedresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_analyze_node2_rac_2.ser
Command failure output:
==Following patches FAILED in apply:
Patch: /u01/patches/30501910/30489227
Log: /u01/app/oracle/product/19.3.0.0/dbhome_1/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-05_17-25-22PM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: ApplySession failed in system modification phase... 'ApplySession::apply failed: java.io.IOException: oracle.sysman.oui.patch.PatchException: java.io.FileNotFoundException: /u01/app/oraInventory/ContentsXML/oui-patch.xml (Permission denied)'
After fixing the cause of failure Run opatchauto resume
...
After looking for information, I found MOS (My Oracle Support) note 2582139.1, which directed me to copy the oui-patch.xml from node 1 to node 2.
[grid@node1 ~]$ cd /u01/app/oraInventory/ContentsXML/
[grid@node1 ContentsXML]$ scp oui-patch.xml node2:/u01/app/oraInventory/ContentsXML/
oui-patch.xml
[root@node2 ContentsXML]# ls -ltr
total 20
-rw-rw----. 1 grid oinstall 565 Oct 6 11:13 inventory.xml
-rw-r--r--. 1 oracle oinstall 174 Oct 6 12:43 oui-patch.xml.back
-rw-rw----. 1 grid oinstall 300 Oct 6 12:43 comps.xml
-rw-rw----. 1 grid oinstall 292 Oct 6 12:43 libs.xml
-rw-rw----. 1 grid oinstall 174 Oct 6 14:02 oui-patch.xml
After doing this, I reran the OPatchAuto. This time I faced the error “Unable to create patchObject.”
## Note.- I trimmed the log for readability
[root@node2 ~]$ opatchauto apply /u01/patches/30501910
...
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/oracle/product/19.3.0.0/dbhome_1, host: node2.
Command failed: /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/opatchauto apply /u01/patches/30501910 -oh /u01/app/oracle/product/19.3.0.0/dbhome_1 -target_type rac_database -binary -invPtrLoc /u01/app/oracle/product/19.3.0.0/dbhome_1/oraInst.loc -jre /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/jre -persistresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_node2_rac_2.ser -analyzedresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_analyze_node2_rac_2.ser
Command failure output:
==Following patches FAILED in apply:
Patch: /u01/patches/30501910/30489227
Log:
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Unable to create patchObject
Possible causes are:
ORACLE_HOME/inventory/oneoffs/30489227 is corrupted. PatchObject constructor: Input file "/u01/app/oracle/product/19.3.0.0/dbhome_1/inventory/oneoffs/30489227/etc/config/actions" or "/u01/app/oracle/product/19.3.0.0/dbhome_1/inventory/oneoffs/30489227/etc/config/inventory" does not exist.
After fixing the cause of failure Run opatchauto resume
...
To address this error, I found another note — 2286025.1 — which advises taking any of the following actions:
- OPatch lsinventory
- OPatch – analyze
- Restore database home backup for Hostname2.
After trying options 1 and 2, I kept getting the same error, so I proceeded to patch the GI on node 2.
## Note.- I trimmed the log for readability
[root@node2 ~]$ . oraenv
ORACLE_SID = [root] ? +ASM2
The Oracle base has been set to /u01/app/grid
[root@node2 ~]$ export PATH=$ORACLE_HOME/OPatch:$PATH
[root@node2 ~]$ opatchauto apply /u01/patches/30501910 -oh $ORACLE_HOME
OPatchauto session is initiated at Tue Oct 6 12:28:25 2020
...
--------------------------------Summary--------------------------------
Patching is completed successfully. Please find the summary as follows:
Host:node2
CRS Home:/u01/app/19.3.0.0/grid
Version:19.0.0.0.0
Summary:
==Following patches were SUCCESSFULLY applied:
Patch: /u01/patches/30501910/30489227
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log
Patch: /u01/patches/30501910/30489632
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log
Patch: /u01/patches/30501910/30557433
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log
Patch: /u01/patches/30501910/30655595
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log
OPatchauto session completed at Tue Oct 6 12:49:02 2020
Time taken to complete the session 20 minutes, 38 seconds
For the RDBMS I followed my own blog post (How to Clone an Oracle Home in 19c) and cloned the Oracle home from node 1, as this one was already patched correctly with 30501910.
[oracle@node1 ~]$ export ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_1
[oracle@node1 ~]$ $ORACLE_HOME/runInstaller -silent -createGoldImage -destinationLocation /u01/patches
Launching Oracle Database Setup Wizard...
Successfully Setup Software.
Gold Image location: /u01/patches/db_home_2020-10-06_02-11-59PM.zip
[oracle@node1 ~]$ unset ORACLE_HOME
[oracle@node1 ~]$ unset ORACLE_SID
[oracle@node1 ~]$ unset ORACLE_RSID
[oracle@node1 ~]$ unset ORACLE_UNQNAME
[oracle@node1 ~]$ unset ORACLE_BASE
[oracle@node1 ~]$ export ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_2
[oracle@node1 ~]$ export ORACLE_HOSTNAME=node1
[oracle@node1 ~]$ export ORA_INVENTORY=/u01/app/oraInventory
[oracle@node1 ~]$ export NODE1_HOSTNAME=node1
[oracle@node1 ~]$ export NODE2_HOSTNAME=node2
[oracle@node1 ~]$ export ORACLE_BASE=/u01/app/oracle
[oracle@node1 ~]$ unzip -q /u01/patches/db_home_2020-10-06_02-11-59PM.zip -d $ORACLE_HOME
[oracle@node1 ~]$ env | egrep "ORA|NODE"
ORACLE_BASE=/u01/app/oracle
NODE2_HOSTNAME=node2
ORACLE_HOSTNAME=node1
NODE1_HOSTNAME=node1
ORA_INVENTORY=/u01/app/oraInventory
ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_2
[oracle@node1 ~]$ cd $ORACLE_HOME
[oracle@node1 dbhome_2]$ ${ORACLE_HOME}/runInstaller -ignorePrereq -waitforcompletion -silent \
> -responseFile ${ORACLE_HOME}/install/response/db_install.rsp \
> oracle.install.option=INSTALL_DB_SWONLY \
> ORACLE_HOSTNAME=${ORACLE_HOSTNAME} \
> UNIX_GROUP_NAME=oinstall \
> INVENTORY_LOCATION=${ORA_INVENTORY} \
> SELECTED_LANGUAGES=en \
> ORACLE_HOME=${ORACLE_HOME} \
> ORACLE_BASE=${ORACLE_BASE} \
> oracle.install.db.InstallEdition=EE \
> oracle.install.db.OSDBA_GROUP=dba \
> oracle.install.db.OSOPER_GROUP=dba \
> oracle.install.db.OSBACKUPDBA_GROUP=dba \
> oracle.install.db.OSDGDBA_GROUP=dba \
> oracle.install.db.OSKMDBA_GROUP=dba \
> oracle.install.db.OSRACDBA_GROUP=dba \
> oracle.install.db.CLUSTER_NODES=${NODE1_HOSTNAME},${NODE2_HOSTNAME} \
> oracle.install.db.isRACOneInstall=false \
> oracle.install.db.rac.serverpoolCardinality=0 \
> oracle.install.db.config.starterdb.type=GENERAL_PURPOSE \
> oracle.install.db.ConfigureAsContainerDB=false \
> SECURITY_UPDATES_VIA_MYORACLESUPPORT=false \
> DECLINE_SECURITY_UPDATES=true
Launching Oracle Database Setup Wizard...
...
The response file for this session can be found at:
/u01/app/oracle/product/19.3.0.0/dbhome_2/install/response/db_2020-10-06_03-06-10PM.rsp
You can find the log of this install session at:
/u01/app/oraInventory/logs/InstallActions2020-10-06_03-06-10PM/installActions2020-10-06_03-06-10PM.log
As a root user, execute the following script(s):
1. /u01/app/oracle/product/19.3.0.0/dbhome_2/root.sh
Execute /u01/app/oracle/product/19.3.0.0/dbhome_2/root.sh on the following nodes:
[node1, node2]
After doing this, the only task remaining was to verify that the Oracle home was patched correctly, and switch the database to the new Oracle home.
[oracle@node1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
30557433;Database Release Update : 19.6.0.0.200114 (30557433)
30489227;OCW RELEASE UPDATE 19.6.0.0.0 (30489227)
OPatch succeeded.
[oracle@node2 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
30557433;Database Release Update : 19.6.0.0.200114 (30557433)
30489227;OCW RELEASE UPDATE 19.6.0.0.0 (30489227)
OPatch succeeded.
[oracle@node1 ~]$ srvctl modify database -db test -oraclehome /u01/app/oracle/product/19.3.0.0/dbhome_2
[oracle@node1 ~]$ srvctl config database -db test -a
Database unique name: test
Database name: test
Oracle home: /u01/app/oracle/product/19.3.0.0/dbhome_2
Oracle user: oracle
Spfile: +DATA/test/PARAMETERFILE/spfile.270.1053084685
Password file: +DATA/test/PASSWORD/pwdtest.258.1053083711
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: RECO,DATA
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: oper
Database instances: test1,test2
Configured nodes: node1,node2
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services:
Database is administrator managed
I plan to re-try this exercise by copying the oui-patch.xml before patching and see if that fixes this error. In the meantime, I hope this blog post helps you in case you face the above-mentioned bugs.
How to Enable RAM Cache in Oracle Exadata
In this post I will present a feature — to simplify let’s call it RAM cache — introduced in Oracle Exadata Storage Server image 18.1. You’ve probably heard a lot about the new Oracle Exadata X8M and its Intel Optane DC Persistent Memory (PMem). This new feature + architecture allows the database processes running on the database servers to remotely read and write through a protocol called Remote Direct Memory Access (RDMA) from / to the PMem cards in the storage servers. You can find the detailed Oracle Exadata X8M deployment process here.
It’s true that RDMA has existed in the Oracle Exadata architecture from the beginning, as Oracle points out in their blog post titled, Introducing Exadata X8M: In-Memory Performance with All the Benefits of Shared Storage for both OLTP and Analytics:
RDMA was introduced to Exadata with InfiniBand and is a foundational part of Exadata’s high-performance architecture.
What you may not know, is there’s a feature called in-memory OLTP acceleration (or simply RAM cache) which was introduced in the Oracle Exadata Storage Server image 18.1.0.0.0 when Oracle Exadata X7 was released. This feature allows read access to the storage server RAM on any Oracle Exadata system (X6 or higher) running that version or above. Although, this is not the same as PMem, since RAM is not persistent, it is still very cool since it allows you to take advantage of the RAM available in the storage servers.
Modern generations of Exadata storage servers come with a lot of RAM available. By comparison, X8 and X7 come with 192GB of RAM by default, as opposed to the 128GB of RAM that came with X6.
Unfortunately, the RAM cache feature is only available on storage servers X6 or higher and these are the requirements:
- Oracle Exadata System Software 18c (18.1.0).
- Oracle Exadata Storage Server X6, X7 or X8.
- Oracle Database version 12.2.0.1 April 2018 DBRU, or 18.1 or higher.
That large amount of RAM is rarely fully utilized by the Oracle Exadata storage servers. This RAM cache feature allows you to use all or part of the available RAM in the storage servers. Doing this extends your database buffer cache to the storage server’s RAM for read operations.
In the new Oracle Exadata X8M the I/O latency is under 19µs for read operations. This is due to the PMem cache combined with the RoCE (RDMA over converged ethernet) network. In the Oracle Exadata X7/X8 the I/O latency for reads with RAM cache using RDMA over InfiniBand is around 100µs. Without RAM cache the number goes up to 250µs reading directly from the flash cache. The following information is from the Oracle Exadata Database Machine X8-2 data sheet:
For OLTP workloads Exadata uniquely implements In-Memory OLTP Acceleration. This feature utilizes the memory installed in Exadata Storage Servers as an extension of the memory cache (buffer cache) on database servers. Specialized algorithms transfer data between the cache on database servers and in-memory cache on storage servers. This reduces the IO latency to 100 us for all IOs served from in-memory cache. Exadata’s (sic) uniquely keeps only one in-memory copy of data across database and storage servers, avoiding memory wastage from caching the same block multiple times. This greatly improves both efficiency and capacity and is only possible because of Exadata’s unique end-to-end integration.
How I set up RAM cache in the Exadata storage serversAs I mentioned previously, the recent generation of Oracle Exadata storage servers come with a lot of RAM. This RAM is normally not used at its fullest by the cellsrv services and features. Having said that, I normally take into consideration the amount of free memory (RAM) in the storage servers. First, I pick the storage server using the most RAM and do the math: freemem*0.7=RAM cache value. Next, I set the RAM cache to 70 percent of the free memory of the storage server using more RAM than the others. Note: I avoid using all the free memory for the RAM cache in case the storage server requires more memory for storage indexes or other needs in the future.
Let’s say my busiest storage server has 73GB of free memory. Applying the formula we get to: 73*0.7=51.1GB.
Oracle Exadata architecture was built to spread the workload evenly across the entire storage grid, so you’ll notice that the storage servers use pretty much the same amount of memory (RAM).
Here comes the action and fun. We must first check how much memory is available in our storage servers by running this from dcli (make sure your cell_group file is up-to-date):
[root@exadbadm01 ~]# dcli -l root -g cell_group free -g
In my case the cel01 is the storage server using more memory than others. Let’s check some details of this storage server:
[root@exaceladm01 ~]# cellcli CellCLI: Release 19.2.7.0.0 - Production on Thu Aug 06 07:44:59 CDT 2020 Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved. CellCLI> LIST CELL DETAIL name: exaceladm01 accessLevelPerm: remoteLoginEnabled bbuStatus: normal cellVersion: OSS_19.2.7.0.0_LINUX.X64_191012 cpuCount: 24/24 diagHistoryDays: 7 fanCount: 8/8 fanStatus: normal flashCacheMode: WriteBack flashCacheCompress: FALSE httpsAccess: ALL id: 1446NM508U interconnectCount: 2 interconnect1: bondib0 iormBoost: 0.0 ipaddress1: 192.168.10.13/22 kernelVersion: 4.1.12-124.30.1.el7uek.x86_64 locatorLEDStatus: off makeModel: Oracle Corporation SUN SERVER X7-2L High Capacity memoryGB: 94 metricHistoryDays: 7 notificationMethod: mail,snmp notificationPolicy: critical,warning,clear offloadGroupEvents: powerCount: 2/2 powerStatus: normal ramCacheMaxSize: 0 ramCacheMode: Auto ramCacheSize: 0 releaseImageStatus: success releaseVersion: 19.2.7.0.0.191012 rpmVersion: cell-19.2.7.0.0_LINUX.X64_191012-1.x86_64 releaseTrackingBug: 30393131 rollbackVersion: 19.2.2.0.0.190513.2 smtpFrom: "exadb Exadata" smtpFromAddr: exaadmin@loredata.com.br smtpPort: 25 smtpServer: mail.loredata.com.br smtpToAddr: support@loredata.com.br smtpUseSSL: FALSE snmpSubscriber: host=10.200.55.182,port=162,community=public,type=asr,asrmPort=16161 status: online temperatureReading: 23.0 temperatureStatus: normal upTime: 264 days, 8:48 usbStatus: normal cellsrvStatus: running msStatus: running rsStatus: running
From the output above we can see that the parameter ramCacheMode is set to auto while ramCacheMaxSize and ramCacheSize are 0. These are the default values and mean the RAM cache feature is not enabled.
This storage server has ~73GB of free / available memory (RAM):
[root@exaceladm01 ~]# free -m total used free shared buff/cache available Mem: 96177 15521 72027 4796 8628 75326 Swap: 2047 0 2047
Now we can enable the RAM cache feature by changing the parameter ramCacheMode to “On”:
CellCLI> ALTER CELL ramCacheMode=on Cell exaceladm01 successfully altered
Immediately after the change we check the free / available memory (RAM) in the storage server operation system:
[root@exaceladm01 ~]# free -m total used free shared buff/cache available Mem: 96177 15525 72059 4796 8592 75322 Swap: 2047 0 2047
Not much has changed, because the memory remains available for the storage server to use for RAM cache. However, when we enable the RAM cache feature, the storage server will not automatically allocate / use this memory.
We can see that only 10GB was defined in the ramCacheMaxSize and ramCacheSize parameters:
CellCLI> LIST CELL DETAIL name: exaceladm01 accessLevelPerm: remoteLoginEnabled bbuStatus: normal cellVersion: OSS_19.2.7.0.0_LINUX.X64_191012 cpuCount: 24/24 diagHistoryDays: 7 fanCount: 8/8 fanStatus: normal flashCacheMode: WriteBack flashCacheCompress: FALSE httpsAccess: ALL id: 1446NM508U interconnectCount: 2 interconnect1: bondib0 iormBoost: 0.0 ipaddress1: 192.168.10.13/22 kernelVersion: 4.1.12-124.30.1.el7uek.x86_64 locatorLEDStatus: off makeModel: Oracle Corporation SUN SERVER X7-2L High Capacity memoryGB: 94 metricHistoryDays: 7 notificationMethod: mail,snmp notificationPolicy: critical,warning,clear offloadGroupEvents: powerCount: 2/2 powerStatus: normal ramCacheMaxSize: 10.1015625G ramCacheMode: On ramCacheSize: 10.09375G releaseImageStatus: success releaseVersion: 19.2.7.0.0.191012 rpmVersion: cell-19.2.7.0.0_LINUX.X64_191012-1.x86_64 releaseTrackingBug: 30393131 rollbackVersion: 19.2.2.0.0.190513.2 smtpFrom: "exadb Exadata" smtpFromAddr: exaadmin@loredata.com.br smtpPort: 25 smtpServer: mail.loredata.com.br smtpToAddr: support@loredata.com.br smtpUseSSL: FALSE snmpSubscriber: host=10.200.55.182,port=162,community=public,type=asr,asrmPort=16161 status: online temperatureReading: 23.0 temperatureStatus: normal upTime: 264 days, 8:49 usbStatus: normal cellsrvStatus: running msStatus: running rsStatus: running
To confirm we can run the following query from cellcli:
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 10.1015625G On 10.09375G
To reduce the memory used by the RAM cache feature we can simply change the ramCacheMaxSize parameter:
CellCLI> ALTER CELL ramCacheMaxSize=5G; Cell exaceladm01 successfully altered
If we check the values of the RAM cache parameters we will see this:
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 5G On 0
As soon as the database blocks start being copied to the RAM cache we will see the ramCacheSize value increasing:
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 5G On 3.9250G
Increasing a bit more:
CellCLI> ALTER CELL ramCacheMaxSize=15G; Cell exaceladm01 successfully altered
When checking, you’ll notice it takes a while for the cellsrv to populate the RAM cache with blocks copied from the flash cache:
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 15G On 0 CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 15G On 11.8125G CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 15G On 15G
Re-setting to auto makes everything clear again:
CellCLI> ALTER CELL ramCacheMode=Auto Cell exaceladm01 successfully altered CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 0 Auto 0
Now we adjust to the value we got from our calculation of 70 percent of the free memory:
CellCLI> ALTER CELL ramCacheMode=On Cell exaceladm01 successfully altered CellCLI> ALTER CELL ramCacheMaxSize=51G Cell exaceladm01 successfully altered CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 51G On 32.8125G CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 51G On 35.2500G CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize 51G On 51G
With that configuration in place, if we want to be notified if the storage server is running out memory we can quickly create a threshold based on the cell memory utilization (CL_MEMUT) metric to notify us when the memory utilization goes beyond 95 percent:
CellCLI> CREATE THRESHOLD CL_MEMUT.interactive comparison=">", critical=95Conclusion
To sum up, RAM cache (aka, in-memory OLTP acceleration) is a feature available only on Oracle Exadata Database Machine X6 or higher with at least the 18.1 image. In addition, it’s available for the Oracle Database 12.2.0.1 with April 2018 DBRU or higher. This feature helps extend the database buffer cache to the free RAM in the storage servers, but only for read operations, since RAM is not persistent. For persistent memory, Oracle introduced the Persistent Memory Cache with Oracle Exadata Database Machine X8M.
It’s worth mentioning that a database will only leverage RAM cache when there is pressure on the database buffer cache. The data blocks present in the RAM cache are persistently stored in the storage server’s flash cache. When a server process on the database side requests a block that is no longer stored in the database buffer cache, but is in the RAM cache, the cellsrv will send this block from the RAM cache to the buffer cache for the server process to read it. It is faster to read from the RAM cache instead of reading it from the flash cache or disk.
While the in-memory OLTP acceleration feature is not a magic solution, it is a plus for our Exadata system. Since we almost always see free memory in the storage server, this is a way of optimizing the resources we’ve already paid for. This feature is already in the Exadata licenses, so there is no extra cost option, and it is not related to the database in-memory option. Having Exadata is all you need.
Happy caching! See you next time!
Franky
References:
- https://www.loredata.com.br/blog/exadata-how-to-enable-ram-cache-in-the-storage-servers
- https://docs.oracle.com/en/engineered-systems/exadata-database-machine/dbmmn/maintaining-exadata-storage-servers.html#GUID-344A8D7D-AFCD-4B44-ABDD-EAF65483163A
- https://docs.oracle.com/en/engineered-systems/exadata-database-machine/dbmso/new-features-exadata-system-software-release-18.html#GUID-2FA29E52-D72B-4235-8B1E-57B38966EB11
- https://blogs.oracle.com/exadata/exadata-x8m
- https://www.oracle.com/technetwork/database/exadata/exadata-x7-2-ds-3908482.pdf
- https://slideplayer.com/slide/14351522/
Resolving Error ORA-28353: Failed to Open Wallet
Full disclosure: this is a post I’ve had in draft mode for almost one and a half years.
I noticed the original error after applying the October 2018 bundle patch (BP) for 11.2.0.4. While I realize most clients are no longer in 11.2.0.4, this information remains valid for anyone upgrading from 11.2 to 12, 18 or 19c.
I had been doing several tests on my Spanish RAC (Real Application Cluster) Attack for 12.2. The goal was to patch my client to October 2018 PSU; obtaining enough security leverage to avoid patching their database and do their DB (database) upgrade to 18c. I created RAC VMs to enable testing. I also set up my environment to match the client’s, which had TDE with FIPS 140 enabled (I will provide more details on this later in the post).
While the patching was successful, the problem arose after applying the patch. I was unable to open the database despite having the correct password for the encryption key.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-28365: wallet is not open
SQL> alter system set encryption key identified by “xxx”;
alter system set encryption key identified by “xxxx”
*
ERROR at line 1:
ORA-28353: failed to open wallet
SQL> select * from v$encryption_wallet;
INST_ID WRL_TYPE
---------- --------------------
WRL_PARAMETER
------------------------
STATUS
------------------
1 file
/etc/oracle/wallets/$ORACLE_UNQNAME
CLOSED
SQL> ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY *****;
ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY *****
*
ERROR at line 1:
ORA-28353: failed to open wallet
When I tried to open the database, this is what appeared in the alert.log:
Tue Feb 16 11:21:56 2018
QMNC started with pid=81, OS id=86184
kcbztek_get_tbskey: decrypting encrypted key for tablespace 16 without opening the wallet
I did a rollback of the patch, and as soon as I rolled back the patch, the database opened:
SQL> select name,open_mode ,database_role from v$database;
NAME OPEN_MODE DATABASE_ROLE
--------- -------------------- ----------------
TEST READ WRITE PRIMARY
After many days of looking for information to address the error, I noticed that FIPS 140-2 was enabled. FIPS (Federal Information Processing Standard), 140-2, is a US government standard defining cryptographic module security requirements.
You can see it’s enabled for SSL in the following file:
[oracle@collabn1 /u01/app/oracle/product/11.2.0.4/dbhome_1/ldap/admin ]$ cat fips.ora
#
# file to be reside in $ORACLE_HOME/ldap/admin
#
# sec-220
#
sslfips_140=true
I was able to find a document called After Applying October 2018 CPU/PSU, Auto-Login Wallet Stops Working For TDE With FIPS Mode Enabled (Doc ID 2474806.1).
This helped me discover the solution is to patch the DB with October 2018 PSU and, after patching the binaries, recreate the auto login file cwallet.sso with a compatibility of version 12. After you have done this, you will be able to open your DB normally.
[oracle@collabn1 ~]$ cd /etc/oracle/wallets/test
[oracle@collabn1 /etc/oracle/wallets/test]$ mv cwallet.sso cwallet.sso.original
[oracle@collabn1 /etc/oracle/wallets/test]$ orapki wallet create -wallet /etc/oracle/wallets/test -auto_login -compat_v12
Oracle PKI Tool : Version 11.2.0.4.0 - Production
Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.
Enter wallet password: ******
[oracle@collabn1 ~]$ srvctl status database -d test
Instance test1 is running on node collabn1
Instance test2 is running on node collabn2
SQL> select name,open_mode ,database_role from v$database;
NAME OPEN_MODE DATABASE_ROLE
--------- -------------------- ----------------
TEST READ WRITE PRIMARY
SQL> select * from v$encryption_wallet;
INST_ID WRL_TYPE
---------- --------------------
WRL_PARAMETER
------------------------
STATUS
------------------
1 file
/etc/oracle/wallets/$ORACLE_UNQNAME
OPEN
IMPORTANT: DO NOT recreate the ewallet.p12 file! This will likely cause data loss, as you will lose the master key required to decrypt your encrypted data.
Below is an example of what you DO NOT WANT TO DO:
[oracle@collabn1 ~]$ cd /etc/oracle/wallets/test
[oracle@collabn1 /etc/oracle/wallets/test]$ mv ewallet.p12 ewallet.p12.old
[oracle@collabn1 /etc/oracle/wallets/test]$ sqlplus / as sysdba
SQL> ADMINISTER KEY MANAGEMENT SET KEY IDENTIFIED BY Welcome1 WITH BACKUP;
keystore altered
It’s important to note that the above also applies to Jan 2019 Database BP, or to any upgrade from 11.2.0.4 to 12, 18 or 19c. This means you will face this issue for anything after October 2018 if you are using TDE and SSL with FIPS.
Note: This was originally posted in rene-ace.com.
Overcoming ORA-01722 Error While Doing an Upgrade With Different DST Versions
This will be a short entry, but hopefully, it will help if you face the error “ORA-01722: invalid number” while doing an upgrade.
Today I was doing an Oracle RDBMS upgrade from 12.2 with DST 34, to 19.7 with DST 35. The upgrade failed with the error below:
Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
-------------------------------------------------
Errors in database [TEST]
Stage [DBUPGRADE]
Operation [STOPPED]
Status [ERROR]
Info [
Error: UPG-1400
UPGRADE FAILED [TEST]
Cause: This indicates that the database upgrade failed with errors.
For further details, see the log file located at /u01_diag/working/pythian/upgrade/upg_logs/TEST/102/autoupgrade_20200921_user.log]
-------------------------------------------------
Logs: [/u01_diag/working/pythian/upgrade/upg_logs/TEST/102/autoupgrade_20200921_user.log]
-------------------------------------------------Job 102 completed
------------------- Final Summary --------------------
Number of databases [ 1 ]
Jobs finished successfully [0]
Jobs failed [1]
Jobs pending [0]
-------------------- JOBS FAILED ---------------------
Job 102 for TEST
Exiting
-- From the Upgrade Log
17:12:53 SQL> Rem Check if time zone file version used by the database exists in new home
17:12:53 SQL> SELECT TO_NUMBER('MUST_PATCH_TIMEZONE_FILE_VERSION_ON_NEW_ORACLE_HOME')
17:12:53 2 FROM sys.props$
17:12:53 3 WHERE
17:12:53 4 (
17:12:53 5 (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > &C_LTZ_CONTENT_VER)
17:12:53 6 AND
17:12:53 7 (0 = (select count(*) from v$timezone_file))
17:12:53 8 );
old 5: (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > &C_LTZ_CONTENT_VER)
new 5: (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > 32)
SELECT TO_NUMBER('MUST_PATCH_TIMEZONE_FILE_VERSION_ON_NEW_ORACLE_HOME')
*
ERROR at line 1:
ORA-01722: invalid number
While searching for the error, the first thing that came into my mind is that the 19.7 was patched incorrectly, but that wasn’t the case. As you can see, the 19.7 was patched with a higher version of the DST patch than the 12.2 OH
([SID:TEST][oracle@hostname:/u01/app/oracle/product/12.2.0.1/dbhome_1/OPatch ] )
oracle $ ./opatch lspatches
29997937;RDBMS - DSTV34 UPDATE - TZDATA2019B
30133625;OJVM RELEASE UPDATE: 12.2.0.1.191015 (30133625)
30138470;Database Oct 2019 Release Update : 12.2.0.1.191015 (30138470)
30122814;OCW OCT 2019 RELEASE UPDATE 12.2.0.1.191015 (30122814)
OPatch succeeded.
([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/OPatch ] )
oracle $ ./opatch lspatches
30565805;RDBMS 19C REGRESSION ORA-01843 NOT A VALID MONTH ERROR WHEN EXECUTING TO_DATE(20191120,RRMMDD)
31335037;RDBMS - DSTV35 UPDATE - TZDATA2020A
30894985;OCW RELEASE UPDATE 19.7.0.0.0 (30894985)
30869156;Database Release Update : 19.7.0.0.200414 (30869156)
OPatch succeeded.
My first move was to search for the following files in the 19.7 OH:
- timezlrg_34.dat
- timezone_34.dat
They weren’t there, and because this is a cumulative patch, DST 34 files shouldn’t be missing from the 19.7 OH patched with DST 35. Accordingly, I would classify this as a bug in the DST 35 patch.
([SID:TEST][oracle@hostname:home/oracle/working/antunez ] )
oracle $ cd $ORACLE_HOME/oracore/zoneinfo
([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ ls -ltr | egrep "timezlrg_34.dat|timezone_34.dat" | wc -l
0
I fixed this by copying the files from the 12.2 $ORACLE_HOME/oracore/zoneinfo to the 19.7 $ORACLE_HOME/oracore/zoneinfo
([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ cp /u01/app/oracle/product/12.2.0.1/dbhome_1/oracore/zoneinfo/timezone_34.dat ./
([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ cp /u01/app/oracle/product/12.2.0.1/dbhome_1/oracore/zoneinfo/timezlrg_34.dat ./
([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ ls -ltr | egrep "timezlrg_34.dat|timezone_34.dat" | wc -l
2
Once I did this, I relaunched my upgrade. This time, it finished successfully
([SID:TEST][oracle@hostname:/home/oracle/working/antunez/TEST ] )
oracle $ $ORACLE_HOME19c/jdk/bin/java -jar $ORACLE_HOME19c/rdbms/admin/autoupgrade.jar -config TEST_config.cfg -mode DEPLOY -noconsole
AutoUpgrade tool launched with default options
Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
Job 102 completed
------------------- Final Summary --------------------
Number of databases [ 1 ]
Jobs finished successfully [1]
Jobs failed [0]
Jobs pending [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 102 for TEST
---- Drop GRP at your convenience once you consider it is no longer needed ----
Drop GRP from TEST: drop restore point AUTOUPGRADE_221145114461854_TEST
Hopefully, this small post will help you if you are faced with this error, as it seems the DST 35 patch is not including the necessary DST 34 files.
Note: This was originally posted on rene-ace.com.
Using SQLCODE Function
The idea of this post is to show how to handle errors using SQLCODE function. That function shows the code error for the latest exception when running a PL/SQL code.
Here is a small example of code that raises an exception when two or more consecutive ORA-02292 errors occur.
--Create a Table to capture how many times ORA-02292 errors occurred. CREATE TABLE tcount (x int); -- table to count the number of 2292 errors --Parent Table create table orders (order_id int, total_sales int); CREATE UNIQUE INDEX orders_pk on orders (order_id); ALTER TABLE orders ADD CONSTRAINT orders_pk PRIMARY KEY (order_id) ENABLE; --Child Table CREATE TABLE products ( product_id numeric(10) not null, supplier_id numeric(10) not null, order_id int, CONSTRAINT fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id) ); --Add data in Parent and Child Tables insert into orders values (1,1); insert into products values (1,1,1); commit; --Verify the data on those Tables select * from orders; select * from products; --Create the procedure to Delete data from Parent table and force the error ORA-02292 CREATE OR REPLACE PROCEDURE delete_order (order_id_in IN NUMBER) IS vtest2292 int := 0; BEGIN delete from orders where order_id = order_id_in; commit; -- if the delete instruction runs fine the tcount table is truncated EXECUTE IMMEDIATE ('truncate table tcount'); EXCEPTION WHEN OTHERS THEN DECLARE error_code NUMBER := SQLCODE; BEGIN IF error_code = -2292 THEN DBMS_OUTPUT.PUT_LINE('ERROR 2292!!!!!!!!!!!!!'); -- error found insert into tcount values (1); commit; select count(1) into vtest2292 from tcount; IF vtest2292 >= 2 then DBMS_OUTPUT.PUT_LINE('ERROR 2292 >x2!!!!!!!!!!!!!!'); -- two or more consecutive errors found raise_application_error (-20001,'Two or more ORA-2292 were occurred deleting an order.'); END IF; ELSE raise_application_error (-20002,'An ERROR has occurred deleting an order.'); END IF; END; END; / --Job to run the procedure delete_order and confirm that just after the second consecutive execution the error will be written in alert log BEGIN DBMS_SCHEDULER.create_job ( job_name => 'job_delete_order', job_type => 'PLSQL_BLOCK', job_action => 'begin delete_order(1); end;', start_date => SYSTIMESTAMP, repeat_interval => 'FREQ=HOURLY;BYMINUTE=0; interval=1;', enabled => TRUE); END; / --Running the job job_delete_order to force the error, we have to monitor the database alert log to confirm that just after two executions the error will be visible there set serveroutput on select count(1) from tcount; exec dbms_scheduler.run_job('job_delete_order',FALSE); select count(1) from tcount; tail -f alert_<database_name>.log
The text associated with the ORA-02292 error is “integrity constraint (%s.%s) violated – child record found.” This issue can be fixed by deleting the child rows first, and afterward deleting the parent row.
Another option is to modify the foreign key constraint to delete child rows when a parent row is deleted by setting up the “Foreign Key with Delete Cascade” option, but with SQLCODE function we can capture the error and handle it as needed.
How To Get The Most Out Of Your Next Virtual Learning Event
COVID-19 pandemic has disrupted every aspect of our lives this year. But in July, in spite of all the upheaval, the All India Oracle Users Group (AIOUG) still managed to hold its annual Oracle Groundbreakers Yatra conference. By moving the event online, the organizers were able to present more than 100 sessions in 14 days, offering over 125 hours of learning, plus networking with 100+ speakers from more than two dozen countries.
COVID-19 has made virtual learning the new normal. I know not everyone is happy about this but, to be honest, I love virtual webinars. They offer benefits you just can’t get from live events, including:
- The very best subject matter experts. Many people who deserve the title of “expert” can’t afford the time it takes to travel to a live event. But when organizers are running a virtual conference, they can invite experts from anywhere in the world, and make them available to the audience at almost any time.
- Savings in time and money. Online conferences give me all the learning of a live event without the cost of travel, food and accommodations. Plus, attending virtually means my time commitment is measured in hours instead of days.
- Learning without intimidation. Not everyone is comfortable asking questions in front of a large crowd. Online learning is a great equalizer, and anyone can raise a virtual hand to ask a question without feeling shy. These events have made it easy for me to connect with speakers and grow my network through LinkedIn and Twitter.
But we should also talk about the things that make virtual events less than ideal. During the OG Yatra conference, Oracle’s Jim Grisanzio listed some major drawbacks of the online experience:
- There’s no human contact. When your speaker is present only virtually, it’s easy to get distracted and end up learning nothing at all.
- It’s boring. Virtual presentations lack eye contact, real-time exchanges, and audience reactions that help to keep us engaged with the material.
- It’s stressful. As countless teachers have discovered during the pandemic, the attention span for online learning is much shorter than it is for the live classroom experience.
There’s no doubt that virtual learning presents real challenges. But as someone who has grown to actually prefer online events, I’m happy to offer some practical tips to help you make the most of virtual webinars and learning:
- Don’t be greedy. Don’t sign up for every session. You’ll end up exhausted and unable to learn much of anything. Instead, be selective. Choose only those sessions that really interest you and that are worthy of your complete focus.
- Do your homework. Before you join a webinar, do a bit of research on the topic. The more basic knowledge you have, the more value you’ll get from the session.
- Learn about the speakers. Review their LinkedIn profiles so that you know something about their background and career path. This way, you’ll be better able to form realistic expectations for each session.
- Focus completely on learning. Don’t try to do your job at the same time you’re attending a webinar. You’ll learn nothing, and your work will suffer. If you’re holding a pager or you have an on-call shift, flip the pager and swap shifts with a colleague.
- Take notes. Before you log on, make sure you have a notebook and pen at the ready. Research shows that old-school note-taking leads to better retention of learning. That’s because writing notes by hand forces our brains to process what we’re putting down on paper. Plus, writing notes by hand does a better job of triggering that brain activity than typing notes on a computer. People who take screenshots of slides typically never look at them again. When it comes to learning, screenshots are pretty much useless.
I hope this post will help you make the most of virtual webinars and learning. Sure, online events have their drawbacks, but they are undeniably the way of the future — especially in light of COVID-19. Today, virtual learning offers us a great way to acquire knowledge safely and affordably from just about anywhere. In the future, I believe technology will create solutions to some of the difficulties of the online experience. But for now, organizations like the All India Oracle Users Group deserve our thanks for creating our industry’s newest and best opportunities to learn.
CreateGoldImage Followup – Is CreateGoldImage Required for Cloning Oracle Home?
On October 31, 2019, I blogged about How To Use CreateGoldImage For Cloning In 19c.
CreateGoldImage seems like a nice new feature; however, it’s also very buggy.
Here are a few bugs:
2641469.1 27432355.8 27432338.8
I received a suggestion to use tar to clone Grid Home; however, I was uncertain if it would work.
Why would Oracle create a new process for CreateGoldImage when traditional tar works?
Out of curiosity, I decided to test this by creating tarball for GRID_HOME from 19.8 RAC.
Use the tarball to upgrade Oracle Restart 12.2 to 19.8
==============================
Source: 19c Oracle RAC is located at /u01/app/19.0.0.0/grid
==============================
### Create tarball for 19.8 GRID_HOME from RAC instance:
[root@ol7-19-lax1 grid]# pwd /u01/app/19.0.0/grid [root@ol7-19-lax1 grid]# time tar -cvzpf /tmp/grid19.8_clone.tar.gz . > /tmp/clone_grid19.8.log ; echo $?
### Transfer tarball to target.
==============================
TARGET: 19c Oracle Restart is located at /u01/app/oracle/19.8.0.0/grid
==============================
I know what you are thinking. Why is GRID_HOME at two different locations?
The two vagrant boxes I have used to create test cases were created by two different individuals.
### Review current patch for 12.2 Oracle Restart:
[grid@db-asm-1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches 27144050;Tomcat Release Update 12.2.0.1.0(ID:171023.0830) (27144050) 26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277) 28163235;ACFS JUL 2018 RELEASE UPDATE 12.2.0.1.180717 (28163235) 28163190;OCW JUL 2018 RELEASE UPDATE 12.2.0.1.180717 (28163190) 28163133;Database Jul 2018 Release Update : 12.2.0.1.180717 (28163133) OPatch succeeded.
### Copy tarball to /u01/app/oracle/19.8.0.0.
[grid@db-asm-1 19.8.0.0]$ time cp -fv /media/swrepo/grid19.8_clone.tar.gz . ; echo $? ‘/media/swrepo/grid19.8_clone.tar.gz’ -> ‘./grid19.8_clone.tar.gz’ [grid@db-asm-1 19.8.0.0]$ ls -l total 7015760 -rwxr-xr-x 1 grid oinstall 7184131622 Aug 15 22:30 grid19.8_clone.tar.gz
### Unzip tarball to /u01/app/oracle/19.8.0.0
[grid@db-asm-1 19.8.0.0]$ time gunzip grid19.8_clone.tar.gz [grid@db-asm-1 19.8.0.0]$ ls -l total 13542216 -rwxr-xr-x 1 grid oinstall 13867223040 Aug 15 22:30 grid19.8_clone.tar
### Create directory for GRID_HOME – /u01/app/oracle/19.8.0.0/grid
[grid@db-asm-1 19.8.0.0]$ mkdir grid
### Untar to GRID_HOME – /u01/app/oracle/19.8.0.0/grid
[grid@db-asm-1 19.8.0.0]$ cd grid [grid@db-asm-1 grid]$ time tar xf ../grid19.8_clone.tar . ; echo $?
### Upgrade 12.2 Oracle HAS to 19.8 using 19.8 tarball (GUI Method):
[grid@db-asm-1 grid]$ ./gridSetup.sh
### Verify Oracle Restart is 19.8:
[grid@db-asm-1 grid]$ $ORACLE_HOME/OPatch/opatch lspatches 31335188;TOMCAT RELEASE UPDATE 19.0.0.0.0 (31335188) 31305087;OCW RELEASE UPDATE 19.8.0.0.0 (31305087) 31304218;ACFS RELEASE UPDATE 19.8.0.0.0 (31304218) 31281355;Database Release Update : 19.8.0.0.200714 (31281355) OPatch succeeded. [grid@db-asm-1 grid]$
In conclusion, there is an option to use tarball if CreateGoldImage does not work; thereby avoiding creating a ticket with Oracle support.
Please note: I am unaware whether using tarball is supported — it would be best to check with Oracle support.
Mining Statspack and AWR Repositories for Historical Trends
There are various ways that we can approach the analysis of a past database performance problem. The initial steps usually differ based on their scope. Is it limited to a certain SQL/process/task, or is it perceived as a database-wide slowdown? Additionally, is it occurring at the moment, or is it an event that occurred in the past?
In case the starting scope of analysis is database-wide, mining the Automatic Workload Repository (AWR) is one of the methods we can use to obtain historical performance trends. However, not all customers have access to it, either because it requires the Diagnostics Pack license on Enterprise Edition, or because they are running Oracle Standard Edition, where it’s not present.
In such cases, we can still use Statspack as a free alternative to AWR, even though it’s not as sophisticated. One of Statspack’s shortcomings is it doesn’t store Active Session History data, which we can use to drill-down into the activity of particular sessions over time. With Statspack we’re missing this session-level granularity.
In this post, I’m going to present a script I use to get an overview of the workload dynamics of a database querying the Statspack repository. There’s also an AWR counterpart, as I’ll mention later in the post.
statspack_load_trends.sql 1. Script’s PropertiesLet’s summarize the script’s main properties:
- It queries the Statspack repository directly and doesn’t create any (temporary) objects, nor relies on any pre-created Statspack reports.
- We can use it to analyze a Statspack repository imported from another DB, containing data with a different DBID as the DB hosting the repository.
- We can analyze a period spanning instance restart(s). The script considers only adjacent snapshot pairs having the same startup_time value. In this case, “adjacent” denotes two snapshots which belong to the same [DBID, instance number] pair, and which SNAP_IDs are closest to each other when ordered. That’s just to emphasize that the difference between two consecutive SNAP_IDs is not always one (think RAC with cached sequence values, an instance restart, or pressure on the shared pool).
The script provides a quick high-level overview of the DB workload dynamics. It reports a combination of historical OS utilization statistics (stats$osstat), system time model statistics (stats$sys_time_model), and aggregated instance-level statistics (stats$sysstat) for a given period. Currently, it doesn’t query stats$system_event for wait event details. Several time-based metrics are presented in a form of Average Active Sessions (AAS), calculated by simply dividing the observed time-based metric by the elapsed time in the observed period.
You can download the script here, and its AWR counterpart is available here. Before running the AWR version, make sure the database has the Diagnostics Pack license. The following explanation applies to both scripts.
Note: I got the idea for this script from various AWR mining scripts in Chapter 5 (Sizing Exadata) from the “Oracle Exadata Recipes: A Problem-Solution Approach” book. Additionally, the idea to calculate and display CPU core-normalized values for some of the CPU usage statistics originates from John Beresniewicz’s AWR1page project.
3. Calculating CPU CapacityThe script considers the number of CPU cores, and not threads (in case hyperthreading is enabled) to calculate the number of CPU seconds between two Statspack snapshots. Various publications explain the reasons for this approach, but to summarize: Depending on the workload type, hyperthreading can provide up to approx. 30% higher CPU throughput compared to non-hyperthread mode.
When the number of processes running on CPU approach the number of CPU cores, the system might become CPU saturated/over-subscribed. At that point, its response time becomes unpredictable, and additional workload decreases the amount of useful work performed by the system.
4. UsageThe script produces a very wide output, so I recommend spooling it out to a file for easier viewing. Because Statspack tables have public synonyms, any user that has permission to select from the repository tables can run it.
Note: I’ve seen the script fail with errors like “ORA-00936: missing expression,” or “ORA-01722: invalid number” when used on databases running with cursor_sharing=FORCE. To avoid the error, I included the /*+ cursor_sharing_exact*/ hint in the script’s SELECT statement. Setting cursor_sharing=EXACT at the session-level is also a valid alternative.
SQL> spool load_trends.txt SQL> @statspack_load_trends.sql
First, we provide the DBID and instance number we want to analyze. If we don’t provide an instance number, all of the instances for the provided DBID are considered:
Instances in this Statspack schema ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DB Id |Inst Num|DB Name |Instance |Host -----------|--------|------------|------------|------------- 1558102526| 1|ORCL |orcl1 |ol7-122-rac1 1558102526| 2|ORCL |orcl2 |ol7-122-rac2 Enter DBID to analyze - default "1558102526" : /* enter DBID to analyze */ Enter instance number or "all" to analyze all instancs for DBID = 1558102526 - default "all" : /* report data for a specific RAC instance or all of them */
Next, we provide a time range to analyze.
5. Sample OutputLet’s check what the output looks like. Due to its width, and to fit the blog format, it’s segmented in several sections. Also, due to some (temporary) blog formatting limitations, I recommend viewing wide code sections by clicking “Open code in new window.”
- “CPU #Cores;#Threads”: The number of CPU cores and threads (in case of enabled hyperthreading) reported by the OS.
- “Tot CPU Time Avail [Cores] (s)”: The number of CPU seconds available between the two analyzed snapshots based on the number of CPU cores.
Instance|Snap ID |Begin Snap |End Snap |Elapsed| CPU| Tot CPU Time| Number|start-end |Time |Time | Mins|#Cores;#Threads|Avail [Cores] (s)| --------|---------------|---------------|---------------|-------|---------------|-----------------| 1|19195-19196 |16-APR-20 17:00|16-APR-20 18:00| 59.98| 24; 48| 86,376.00| |19196-19197 |16-APR-20 18:00|16-APR-20 19:00| 60.00| 24; 48| 86,400.00| |19197-19198 |16-APR-20 19:00|16-APR-20 20:00| 60.00| 24; 48| 86,400.00| |19198-19199 |16-APR-20 20:00|16-APR-20 21:00| 60.02| 24; 48| 86,424.00| |19199-19200 |16-APR-20 21:00|16-APR-20 22:00| 59.98| 24; 48| 86,376.00|
Note: One hour between snapshots is probably excessive, but that’s beyond the scope of this post.
Time Model Statistics: stats$sys_time_modelThe next section reports the time spent in the database in the form of Average Active Sessions (AAS). For completeness, and to better understand what the figures represent, I’m including how various statistics from stats$sys_time_model are related:
DB time = CPU + Wait time spent by foreground sessions background elapsed time = CPU + Wait time spent by background sessions DB CPU = CPU time spent by foreground sessions background cpu time = CPU time spent by background sessions
Considering the above, we can calculate the AAS figures as follows:
Total AAS = ("DB time" + "background elapsed time")/elapsed_time Total AAS on CPU = ("DB CPU" + "background cpu time")/elapsed_time FG AAS = "DB time" / elapsed_time BG AAS = "background elapsed time" / elapsed_time FG AAS on CPU = "DB CPU" / elapsed_time BG AAS on CPU = "background cpu time" / elapsed_time Total AAS in wait = ("Total AAS" - "Total AAS on CPU") / elapsed_time FG AAS in wait = ("DB time" - "DB CPU") / elapsed_time BG AAS in wait = ("background elapsed time" - "background cpu time") / elapsed_time
Columns reporting CPU-related figures display two values: The “usual” AAS value, and the “core-normalized Average Active Sessions” value, using the acronym “NPC”. If the core-normalized value approaches (or even crosses) the value of “1,” the system could potentially be CPU saturated:
- “AAS [FG+BG]”: The number of AAS considering foreground and background sessions.
- “AAS [FG]”: The number of AAS considering foreground sessions only.
- “AAS [BG]”: The number of AAS considering background sessions only.
Continuing with the CPU component of the above three mentioned metrics:
- “AAS on CPU [FG+BG]”: The number of AAS on CPU considering foreground and background sessions, followed by core-normalized AAS on CPU (NPC).
- “AAS on CPU [FG]”: Same as above, though only considering foreground sessions.
- “AAS on CPU [BG]”: Same as above, though only considering background sessions.
The “AAS wait” figures follow the same principle as the above-mentioned columns, and “AAS RMAN CPU” represents AAS performing “RMAN CPU time (backup/restore)” activities.
Returning to our sample output:
AAS| AAS| AAS|AAS on CPU| AAS on CPU|AAS on |AAS on CPU|AAS on |AAS on CPU| AAS wait| AAS wait| AAS wait| | [FG+BG]| [FG]| [BG]|[FG+BG] |[FG+BG] NPC|CPU [FG]| [FG] NPC|CPU [BG]|[BG] NPC | [FG+BG]| [FG]| [BG]|AAS RMAN CPU | ---------|---------|---------|----------|-----------|--------|----------|--------|----------|---------|---------|---------|-------------| 96.4| 94.3| 2.0| 8.5| 0.4| 8.3| 0.3| 0.2| 0.0| 87.9| 86.0| 1.9| 0.0| 32.9| 31.6| 1.3| 10.3| 0.4| 10.1| 0.4| 0.2| 0.0| 22.5| 21.4| 1.1| 0.0| 59.4| 58.9| 0.6| 23.3| 1.0| 23.2| 1.0| 0.1| 0.0| 36.2| 35.7| 0.5| 0.0| 13.3| 12.9| 0.5| 5.8| 0.2| 5.7| 0.2| 0.1| 0.0| 7.5| 7.1| 0.4| 0.0| 23.0| 22.2| 0.8| 6.0| 0.3| 5.9| 0.2| 0.1| 0.0| 17.0| 16.3| 0.7| 0.0|
The first line reports 94.3 foreground AAS, out of which only 8.3 were on CPU, and 86 were in various waits. Looking at the third line, the situation changes, as out of the 58.9 AAS, 23.3 were on CPU, and 35.7 in waits. Checking the per-CPU-core-normalized value, we see it reports 1, which means the machine might be approaching or has already crossed CPU saturation. We also see that there was no RMAN activity occurring during that time. Background processes also spent most of their time in waits, rather than on CPU.
For convenience, we have displayed the number of seconds consumed by foreground sessions, breaking them further down into CPU and wait components, and reporting the relative percentages. This is basically the same information we saw in the previous section, just expressed as time instead of AAS:
DB Time (s) DB CPU (s) | [FG CPU+WAIT] = [FG CPU] + [FG WAIT] | ---------------------------------------------------------| 339,555.62 = 29,924.51 9% + 309,631.11 91% | 113,683.70 = 36,469.52 32% + 77,214.18 68% | 211,880.46 = 83,404.47 39% + 128,475.99 61% | 46,325.13 = 20,692.78 45% + 25,632.35 55% | 79,966.07 = 21,274.38 27% + 58,691.69 73% |OS Statistics – stats$osstat
Next, OS statistics from stats$osstat are displayed. “Tot OS Load@end_snap” is the recorded OS load at the time of the end snapshot creation. The other four columns represent Average Active Processes (AAP), which is simply the measured time of each named statistic divided by elapsed time in the observed period. Similarly, as above, the normalized value per core is also reported here for the BUSY statistic (sum of USER+SYS). The meaning is the same; If the value approaches 1, the system might be CPU saturated.
In our sample report, the third line reports 23.9 processes on CPU, or 1 per CPU core (that’s considering all the OS processes, not only Oracle’s). That also correlates with the “AAS on CPU [FG+BG]” figure in the third line we saw in the above snippet. Because in this particular case the machine is dedicated to one Oracle instance, it used all of the available CPU:
Tot OS| |AAP OS | AAP OS| AAP OS| AAP OS| Load@end_snap|AAP OS BUSY|BUSY NPC| USER| SYS| IOWAIT| -------------|-----------|--------|---------|---------|---------| 37.0| 9.2| 0.4| 8.2| 0.9| 12.1| 76.8| 10.8| 0.4| 9.8| 0.9| 5.3| 9.4| 23.9| 1.0| 22.9| 0.9| 2.3| 4.3| 6.2| 0.3| 5.7| 0.5| 1.4| 4.8| 6.4| 0.3| 5.6| 0.7| 4.7|System Statistics: stats$sysstat
Finally, stats$sysstat reports various system statistics. I won’t describe their meaning because that’s beyond the scope of this post. It’s worth noting that apart from “Logons Current,” almost all other statistics are expressed in units of work per second. The only exceptions are statistics related to parallel operations. Because their usage usually pertains to “heavy-duty” DDL/DML tasks, we don’t expect to see many such operations per second. Thus, the whole snapshot interval seems a more appropriate time-frame to report the number of occurrences of such events.
Logons| | User| |SQL*Net roundtrips|SQL*Net roundtrips| Bytes received via| Bytes sent via| Bytes received via| Current|Logons/s| calls/s| Executes/s| to/from client/s| to/from dblink/s|SQL*Net from client/s|SQL*Net to client/s|SQL*Net from dblink/s| ----------|--------|----------|------------|------------------|------------------|---------------------|-------------------|---------------------| 556.0| 9.5| 872.9| 692.2| 723.0| 4,575.6| 1,846,238.6| 6,305,967.2| 1,177,004.3| 527.0| 16.2| 1,008.0| 639.4| 828.5| 5,773.2| 2,462,067.1| 7,760,807.5| 1,453,024.0| 607.0| 18.5| 738.8| 588.3| 556.1| 5,618.1| 1,986,647.1| 3,644,026.9| 1,448,627.4| 427.0| 9.2| 873.3| 910.0| 716.4| 5,972.3| 2,691,244.6| 4,067,039.1| 1,532,389.7| 418.0| 7.4| 719.9| 627.8| 588.5| 7,471.6| 2,564,916.7| 3,773,344.1| 1,852,806.9| Bytes sent via|Cluster wait|Session logical| DB block| Consistent| Consistent reads | Physical| Physical|Physical read|Physical write| SQL*Net to dblink/s| time/s| reads/s| changes/s| changes/sec|undo rec applied/s| reads/s| writes/s|IO requests/s| IO requests/s| -------------------|------------|---------------|--------------|--------------|------------------|----------|----------|-------------|--------------| 576,510.8| 0.0| 339,009.8| 31,353.8| 3,062.0| 4,002.4| 47,349.4| 1,879.1| 2,621.5| 448.5| 726,935.7| 0.0| 487,469.9| 48,874.4| 487.3| 563.9| 31,277.7| 2,127.6| 5,021.6| 526.8| 707,648.6| 0.0| 343,665.8| 38,862.0| 379.4| 362.9| 37,057.7| 777.7| 1,949.2| 265.2| 751,698.7| 0.0| 288,724.2| 26,163.7| 618.0| 435.8| 14,001.6| 823.3| 828.5| 274.1| 940,096.3| 0.0| 335,631.4| 24,500.0| 198.5| 211.9| 53,625.8| 638.2| 2,451.6| 227.5| Parses| Hard| Parse| Parse| User| User| Redo size| Redo| Rollback changes| total/s| parses/s|describe/s|failures/s| commits/s|rollbacks/s| bytes/s| writes/s|undo records applied/s| ----------|----------|----------|----------|----------|-----------|--------------|--------------|----------------------| 142.0| 103.9| 0.0| 0.0| 158.2| 0.8| 46,951,095.1| 137.8| 0.1| 143.4| 100.1| 0.0| 0.3| 155.0| 0.8| 49,017,168.1| 170.3| 1.2| 135.9| 89.2| 0.0| 0.1| 143.3| 0.8| 11,513,858.2| 149.7| 0.1| 141.1| 109.8| 0.0| 0.0| 284.4| 0.8| 9,513,089.1| 226.2| 0.1| 123.0| 93.4| 0.0| 0.0| 175.5| 0.9| 7,462,206.6| 169.3| 0.3| Queries| DML statements| PX oper not|PX oper downgraded|PX oper downgraded|PX oper downgraded|PX oper downgraded|PX oper downgraded parallelized/Total|parallelized/Total|downgraded/Total| to serial/Total|75 to 99 pct/Total|50 to 75 pct/Total|25 to 50 pct/Total| 1 to 25 pct/Total ------------------|------------------|----------------|------------------|------------------|------------------|------------------|------------------ 1,912.0| 3.0| 1,989.0| 38.0| 0.0| 1.0| 4.0| 1.0 2,450.0| 6.0| 2,551.0| 10.0| 0.0| 0.0| 0.0| 0.0 2,477.0| 13.0| 2,584.0| 9.0| 0.0| 0.0| 0.0| 0.0 1,553.0| 3.0| 1,646.0| 9.0| 0.0| 0.0| 0.0| 0.0 1,390.0| 2.0| 1,487.0| 8.0| 0.0| 0.0| 0.0| 0.06. Visualizing Results
When comparing a period where there was an issue to one where the database was running fine, or also when just checking for trends, it’s more convenient to plot the results. That’s an easy way to get an overview of how certain metrics changed over time, or how do they compare across nodes on a RAC database.
To ease that task, the two scripts contain two ways of formatting columns: One for plotting/charting purposes, and one displaying column headings on two lines for a more user-friendly format (the format used in the above descriptions). Based on the needs, the appropriate block of column formatting commands has to be uncommented in the script. You can plot results using a third-party utility, such as Tableau, which is used to produce graphs in the following sections.
CPU Usage Distribution and Limits Across NodesThe graph below represents the average number of OS processes on CPU, normalized to the CPU core count for each of the three nodes on a RAC system. As noted above, when the normalized value per core crosses the value of “1,” the host might be oversubscribed on CPU. Nodes 2 and 3 are usually below the value of 1. However, spikes in usage on node 1 might require further investigation. Also, there seems to be an imbalance in CPU usage across nodes:
Breakdown of Foreground and Background Sessions on CPU and Wait Components, Expressed as AASWhen investigating a problematic time period, we can quickly get a high-level overview of the relation between CPU and waits experienced by foreground and background sessions:
Observing Waits After Applying a Fix to Reduce ThemAfter applying “a fix” on April 16th, the time spent waiting by foreground sessions decreased substantially. CPU demand also decreased.
Comparing the Number of AAS on CPU Normalized to CPU Core Count at the OS and DB Level on a Three-Node RAC DB
The observed DB isn’t using all/most of the CPU available on the nodes, and there’s “something else” using it at the OS level. That’s visible on the graph for node 1, between May 6th at 7-9 PM, where CPU usage at the OS level increased, but that was not the case for DB sessions. Additionally, because we have the normalized per CPU core value displayed, we can see that node 1 crosses the value of 1 quite often.
Logons per Second and Number of Logged-in Sessions per NodeRAC nodes 1 and 2 already have a relatively high number of Logons per second at ~5, whereas node 3 has this even higher at ~23. Additionally, there’s a large imbalance in the number of established sessions on node 3 compared to nodes 1 and 2. Because each node has 8 physical cores (not visible from the below graphs), the ~2500 established sessions represent a potential risk should too many of them try to become active at the same time. Overall it seems a connection pooling review is in place for this system.
A final example from a different server, but still related to connection management. Detecting a large drop in the number of established sessions and correlating with the number of logins per second:
7. Other Mining ToolsSome time ago, Maris Elsins published a post describing a set of handy scripts to mine the AWR repository. Make sure to check it out!
To conclude, here’s a list of some free mining utilities. These are mostly created for AWR, but some are for Statspack. Some of them parse a pre-created set of AWR/Statspack reports. Others connect directly to the database and extract/analyze data from there. The script presented in this post might not offer the same functionality as those utilities. However, for some of my use-cases, it complemented them by providing a customized set of pre-calculated workload related figures.
Query Statspack’s “SQL ordered by” sections over a time period
In my previous blog post <link>, I presented the statspack_load_trends.sql script, which provides a high-level overview of workload dynamics on DB instance(s) over time. In this post, I’ll present the statspack_top_sqls.sql script, which returns SQL performance statistics by mining the Statspack repository. You can download it <here>.
The script provides the same SQLs with the same performance statistics as in Statspack’s various “SQL ordered by” sections. However, it does so by reporting the figures of all categories in a single line, and, more importantly, does it over a time range, considering each available consecutive pair of snapshots. Thus, it provides a way to quickly identify SQLs contributing the most to a particular “SQL ordered by” category over a time period. Alternatively, we can also check for specific SQL(s) behavior over time.
Its core logic is based on Oracle’s $ORACLE_HOME/rdbms/admin/sprepins.sql (StatsPack Report Instance) script, but it doesn’t rely on its existence to run. Similarly as for statspack_load_trends.sql, it queries the Statspack repository directly. It doesn’t create or use any (temporary) objects, not even the global temporary table that sprepins.sql uses when producing the report. We can also use it to analyze a repository imported from another DB and handles periods spanning instance restart(s).
Important note in case you want to compare the results with Statspack reportsTesting the script by comparing its output to regular Statspack reports (created by running $ORACLE_HOME/rdbms/admin/spreport.sql or sprepins.sql), I noticed that sometimes the numbers in the “SQL ordered by” sections didn’t match between the two. Examples include SQLs reported by my script, but not by Statspack reports. Or even Statspack reports reporting the same SQL (same hash value) multiple times in the same “SQL ordered by” section.
The root cause of those anomalies is described in the MOS note “Statspack Reports Show Different Results In “SQL ordered by …” When Taken At Different Times with Same Snapid (Doc ID 2258762.1)”: “When more than one statspack reports are taken repeatedly in the same session by non-perfstat user, some data might get mixed up in the temporary table used by the reporting script, and the result may get corrupted.”.
The problem was not connecting as the owner of the Statspack repository (usually PERFSTAT) when generating multiple consecutive snapshots looping over a snapshot range. The same was true also when creating a single Statpack report.
The takeaway is to always connect as the Statspack repository owner when running spreport.sql, especially if you use any helper scripts which generate Statspack reports for a series of snapshots.
Let’s see the script in action analyzing a sample Swingbench run on a 2 node RAC database. The output is wide, so I suggest to spool it to a file for easier viewing/plotting:
SQL> spool top_sqls.txt SQL> @statspack_top_sqls.sql List SQL by [elapsed_time | cpu_time | buffer_gets | disk_reads | executions | parse_calls | max_sharable_mem | max_version_count | cluster_wait_time]: Enter a value - default "elapsed_time" :
First, we specify by which category we want the SQLs to be ordered by. We can choose one of the above-listed possibilities, which are the same categories the “SQL ordered by” Statspack report’s sections displays. The script reports the same SQLs in the same order as they appear in the selected Statspack report category.
Suppose we want to order SQLs by “cpu_time”, and that the corresponding Statspack report lists 10 SQLs in the “SQL ordered by CPU” section. The script lists the same ones. However, the added benefit of the script is that it reports values, which the Statspack report doesn’t display. For example, the “SQL ordered by CPU” Statspack report section doesn’t display the “Physical Reads” statistic. Instead, the “SQL ordered by Elapsed time” section lists it. If a SQL isn’t qualified to display in the “SQL ordered by Elapsed time” section, we won’t get those values from the Statspack report.
Next, we provide the DBID and instance number we want to be analyzed. If we don’t provide an instance number, the script considers all which are present in the repository:
Instances in this Statspack schema ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DB Id Inst Num DB Name Instance Host ----------- -------- ------------ ------------ -------------------------------- 1558102526 1 ORCL orcl1 ol7-122-rac1.localdomain 1558102526 2 ORCL orcl2 ol7-122-rac2.localdomain Enter DBID to analyze - default "1558102526" : Enter instance number or "all" to analyze all instancs for DBID = 1558102526 - default "all" :
Finally, we specify the time range we’d like to analyze:
Enter begin time for report [DD-MON-YYYY HH24:MI] - default "30-APR-2020 10:54" : 15-FEB-2020 12:30 Enter end time for report [DD-MON-YYYY HH24:MI] - default "30-APR-2020 22:54" : 15-FEB-2020 13:00Script output
Let’s check what the script output looks like. Because of the output width, I have broken it up in several sections. The whole file containing the spool, and the respective Statspack reports for comparison, can be obtained here .
Since the original spool file contains 280 lines, I’m showing an abbreviated version of the first 8 columns. The full output lists SQLs between each snapshot. Since we didn’t specify which instance number we’re interested in, both instances are considered. Additionally, we see it’s normal and expected, that consecutive snapshots don’t have consecutive numbers. Snapshots 4 and 21 are two consecutive snapshots on instance 1, as we can confirm by checking their snap time. The other columns are self-explanatory:
INSTANCE_NUMBER B_SNAP_ID E_SNAP_ID B_SNAP_TIME E_SNAP_TIME INTERVAL_MIN DBTIMEMIN AAS --------------- ---------- ---------- ------------------ ------------------ ------------ ---------- ---------- 1 1 2 15-FEB-20 12:30:00 15-FEB-20 12:40:00 10 2.30 0.23 <removed 16 lines listing SQLs> 2 3 15-FEB-20 12:40:00 15-FEB-20 12:50:00 10 1.60 0.16 <removed 25 lines listing SQLs> 3 4 15-FEB-20 12:50:00 15-FEB-20 13:00:00 10 1.55 0.15 <etc> 4 21 15-FEB-20 13:00:00 15-FEB-20 13:10:00 10 1.66 0.17 21 22 15-FEB-20 13:10:00 15-FEB-20 13:20:00 10 1.30 0.13 22 23 15-FEB-20 13:20:00 15-FEB-20 13:30:00 10 1.18 0.12 2 11 12 15-FEB-20 12:30:00 15-FEB-20 12:40:00 10 3.81 0.38 12 13 15-FEB-20 12:40:00 15-FEB-20 12:50:00 10 2.70 0.27 13 14 15-FEB-20 12:50:00 15-FEB-20 13:00:00 10 2.50 0.25 14 15 15-FEB-20 13:00:00 15-FEB-20 13:10:00 10 2.94 0.29 15 16 15-FEB-20 13:10:00 15-FEB-20 13:20:00 10 2.18 0.22 16 17 15-FEB-20 13:20:00 15-FEB-20 13:30:00 10 1.98 0.20
Let’s check an excerpt of the output for snapshots 1-2 and 2-3. Apart from the “HV” column (SQL old hash value), the other columns are self-explanatory. For blog post brevity, I’m showing only the first 10 SQLs per snapshot pair.
B_SNAP_ID E_SNAP_ID HV ELAPSED_TIME_SEC EXECUTIONS ELAPSED_PER_EXEC_SEC PERCENT_OF_DBTIME_USED CPU_TIME_SEC CPU_TIME_MS_PER_EXEC --------- ---------- ---------- ---------------- ---------- -------------------- ---------------------- ------------ -------------------- 1 2 3565022785 80,55 483 ,17 58,47 14,63 30,3 2319948924 55,8 5931 ,01 40,51 7,69 1,3 1852190137 14,22 1024 ,01 10,32 7,75 7,57 1113394757 8,17 12332 0 5,93 2,97 ,24 4194254847 6,4 483 ,01 4,64 ,84 1,73 1283549268 4,55 169 ,03 3,3 ,89 5,28 2588369535 4,21 24 ,18 3,06 1,12 46,55 4212635381 4,18 24 ,17 3,04 1,09 45,39 4219272024 3,97 1396 0 2,88 ,86 ,62 2835506982 3,74 173 ,02 2,71 ,57 3,32 (..) 2 3 3565022785 46,93 956 ,05 48,79 18,87 19,73 2319948924 22,85 11550 0 23,75 7,52 ,65 1852190137 15,35 2158 ,01 15,95 11,98 5,55 1283549268 6,36 380 ,02 6,61 1,65 4,33 2835506982 6 377 ,02 6,24 1,03 2,72 1822227481 5,32 7742 0 5,53 1,26 ,16 4194254847 4,69 957 0 4,87 1,22 1,28 3463613875 4,61 380 ,01 4,79 ,62 1,62 1113394757 4,07 25794 0 4,23 3,28 ,13 4219272024 3,89 2945 0 4,04 1,43 ,49 (..)
B_SNAP_ID E_SNAP_ID HV PHYSICAL_READS PHYSICAL_READS_PER_EXECUTION BUFFER_GETS GETS_PER_EXECUTION ROWS_PROCESSED ROWS_PROCESSED_PER_EXECUTION PARSE_CALLS --------- ---------- ---------- -------------- ---------------------------- ----------- ------------------ -------------- ---------------------------- ----------- 1 2 3565022785 5860 12,13 261329 541,05 483 1 483 2319948924 4614 ,78 205925 34,72 26467 4,46 1 1852190137 394 ,38 131100 128,03 1024 1 1025 1113394757 336 ,03 124291 10,08 172648 14 1 4194254847 396 ,82 7760 16,07 483 1 1 1283549268 262 1,55 6188 36,62 169 1 169 2588369535 76 3,17 13104 546 24 1 24 4212635381 76 3,17 13104 546 737 30,71 1 4219272024 167 ,12 19979 14,31 1396 1 1 2835506982 255 1,47 3579 20,69 173 1 173 2 3 3565022785 1138 1,19 511742 535,29 956 1 957 2319948924 487 ,04 402425 34,84 51879 4,49 0 1852190137 164 ,08 274493 127,2 2158 1 2157 1283549268 303 ,8 13726 36,12 380 1 380 2835506982 448 1,19 7218 19,15 377 1 377 1822227481 259 ,03 23226 3 7742 1 0 4194254847 150 ,16 14371 15,02 957 1 0 3463613875 298 ,78 5844 15,38 380 1 0 1113394757 0 0 260084 10,08 361116 14 0 4219272024 76 ,03 42277 14,36 2945 1 0
B_SNAP_ID E_SNAP_ID HV MAX_SHARABLE_MEM_KB LAST_SHARABLE_MEM_KB MAX_VERSION_COUNT LAST_VERSION_COUNT DELTA_VERSION_COUNT CLUSTER_WAIT_TIME_SEC --------- ---------- ---------- ------------------- -------------------- ----------------- ------------------ ------------------- --------------------- 1 2 3565022785 55,39 55,39 1 1 1 13,67 2319948924 47,44 47,44 1 1 1 9,38 1852190137 55,38 55,38 1 1 1 2,65 1113394757 43,41 43,41 1 1 1 2,17 4194254847 47,64 47,64 1 1 1 1,26 1283549268 59,46 59,46 1 1 1 ,83 2588369535 55,38 55,38 1 1 1 2,21 4212635381 35,51 35,51 1 1 1 2,21 4219272024 27,42 27,42 1 1 1 1,15 2835506982 55,38 55,38 1 1 1 ,39 2 3 3565022785 113,27 113,27 2 2 1 15,91 2319948924 47,44 47,44 1 1 0 9 1852190137 113,25 113,25 2 2 1 1,32 1283549268 121,42 121,42 2 2 1 1,42 2835506982 113,27 113,27 2 2 1 ,72 1822227481 31,5 31,5 1 1 0 1,5 4194254847 47,64 47,64 1 1 0 1,76 3463613875 59,63 59,63 1 1 0 ,81 1113394757 43,41 43,41 1 1 0 ,41 4219272024 27,42 27,42 1 1 0 1,77
B_SNAP_ID E_SNAP_ID HV CWT_PERCENT_OF_ELAPSED_TIME AVG_HARD_PARSE_TIME_MS MODULE SQL_TEXT --------- ---------- ---------- --------------------------- ---------------------- --------------------------- ---------------------------------------------------------------- 1 2 3565022785 16,97 JDBC Thin Client BEGIN :1 := orderentry.neworder(:2 ,:3 ,:4 ); END; 2319948924 16,8 New Order SELECT PRODUCTS.PRODUCT_ID, PRODUCT_NAME, PRODUCT_DESCRIPTION, C 1852190137 18,61 BEGIN :1 := orderentry.browseproducts(:2 ,:3 ,:4 ); END; 1113394757 26,55 Browse Products SELECT PRODUCTS.PRODUCT_ID, PRODUCT_NAME, PRODUCT_DESCRIPTION, C 4194254847 19,72 New Order INSERT INTO ORDERS ( ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ 1283549268 18,16 BEGIN :1 := orderentry.newcustomer(:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,: 2588369535 52,52 BEGIN :1 := orderentry.SalesRepsQuery(:2 ,:3 ,:4 ); END; 4212635381 52,9 Sales Rep Query SELECT TT.ORDER_TOTAL, TT.SALES_REP_ID, TT.ORDER_DATE, CUSTOMERS 4219272024 28,85 New Order INSERT INTO ORDER_ITEMS ( ORDER_ID, LINE_ITEM_ID, PRODUCT_ID, UN 2835506982 10,53 BEGIN :1 := orderentry.browseandupdateorders(:2 ,:3 ,:4 ); END; 2 3 3565022785 33,89 JDBC Thin Client BEGIN :1 := orderentry.neworder(:2 ,:3 ,:4 ); END; 2319948924 39,39 New Order SELECT PRODUCTS.PRODUCT_ID, PRODUCT_NAME, PRODUCT_DESCRIPTION, C 1852190137 8,63 BEGIN :1 := orderentry.browseproducts(:2 ,:3 ,:4 ); END; 1283549268 22,26 BEGIN :1 := orderentry.newcustomer(:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,: 2835506982 12,06 BEGIN :1 := orderentry.browseandupdateorders(:2 ,:3 ,:4 ); END; 1822227481 28,14 New Order SELECT CUSTOMER_ID, CUST_FIRST_NAME, CUST_LAST_NAME, NLS_LANGUAG 4194254847 37,54 New Order INSERT INTO ORDERS ( ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ 3463613875 17,49 Update Customer Details INSERT INTO CUSTOMERS ( CUSTOMER_ID , CUST_FIRST_NAME , CUST_LAS 1113394757 9,98 Browse Products SELECT PRODUCTS.PRODUCT_ID, PRODUCT_NAME, PRODUCT_DESCRIPTION, C 4219272024 45,53 New Order INSERT INTO ORDER_ITEMS ( ORDER_ID, LINE_ITEM_ID, PRODUCT_ID, UN
Final note
Oracle’s sprepins.sql script has a /*+ first_rows */ hint in the inline view containing analytical functions used to compute the current and previous row values from the stats$sql_summary table. The hint is present, but not enabled in statspack_top_sqls.sql. If due to some reason you need to re-enable it, just search for and enable it in the script (or use it’s alternative first_rows(0) ).
How to Manage the ORA-02292 Error Through the SQLCODE Function
ORA-02292 is a known issue caused by existing foreign key constraints. This is caused when a record is being deleted from the parent table and its child table has records linking to the parent table.
02292, 00000,"integrity constraint (%s.%s) violated - child record found"
// *Cause: attempted to delete a parent key value that had a foreign
// key dependency.
// *Action: delete dependencies first then parent or disable constraint.
When you remove the child row(s), the issue is fixed and you can successfully remove the parent row.
One of our clients is experiencing this error when trying to run a procedure through a scheduled job. Intermittently, the first run of the scheduled job fails with ORA-02292: “Integrity constraint (%s.%s) violated – child record found.” However, subsequent job execution succeeds.
There are two ways to fix this issue. The first one is to modify the procedure code to remove child row(s) and then remove the parent row. The second is to disable alerts written to the alert log when the job fails the first time. You can do this with the SQLCODE function.
The SQLCODE function returns the code of the most recent exception, and based on that code, the procedure can handle the error and avoid unnecessary alerts. If the same job fails more than twice, the error is recorded in the alert log and an alert is generated (through the monitor tool).
The code below shows you how to handle the ORA-02292 error.
- Create parent and child tables with data:
--Parent table create table orders (order_id int, total_sales int); CREATE UNIQUE INDEX orders_pk on orders (order_id); ALTER TABLE orders ADD CONSTRAINT orders_pk PRIMARY KEY (order_id) ENABLE; --Child table CREATE TABLE products ( product_id numeric(10) not null, supplier_id numeric(10) not null, order_id int, CONSTRAINT fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id) ); insert into orders values (1,1); insert into products values (1,1,1); commit; select * from orders; select * from products;
2. Create a table to capture how many times ORA-02292 errors occurred:
CREATE TABLE tcount (x int);
3. Add the code in the procedure which deletes rows from the parent table, and also ignores ORA-02292 for the first time:
CREATE OR REPLACE PROCEDURE delete_order (order_id_in IN NUMBER) IS vtest2292 int := 0; BEGIN delete from orders where order_id = 1; commit; -- if the delete instruction runs fine the tcount table is truncated EXECUTE IMMEDIATE ('truncate table tcount'); EXCEPTION WHEN OTHERS THEN DECLARE error_code NUMBER := SQLCODE; BEGIN IF error_code = -2292 THEN null; DBMS_OUTPUT.PUT_LINE('ERROR 2292!!!!!!!!!!!!!'); -- error found insert into tcount values (1); commit; select count(1) into vtest2292 from tcount; -- two or more consecutive errors found IF vtest2292 >= 2 then DBMS_OUTPUT.PUT_LINE('ERROR 2292 >= X2!!!!!!!!!!!!!!'); raise_application_error (-20001,'Attempted to delete a parent key value that had a foreign key dependency'); END IF; ELSE raise_application_error (-20002,'An ERROR has occurred deleting an order.'); END IF; END; END; /
4. If the error occurs more than twice, consecutively, an alert appears in the alert log:
BEGIN DBMS_SCHEDULER.create_job ( job_name => 'job_delete_order', job_type => 'PLSQL_BLOCK', job_action => 'begin delete_order(1); end;', start_date => SYSTIMESTAMP, repeat_interval => 'FREQ=HOURLY;BYMINUTE=0; interval=1;', enabled => TRUE); END; /
5. By running the job job_delete_order, you can confirm that no errors are written in the Oracle Alert log after the execution of the first job.
set serveroutput on exec delete_order(1); select count(1) from tcount; exec dbms_scheduler.run_job('job_delete_order',FALSE); select count(1) from tcount;
The Brand New Exadata X8M Deployment Process Revealed
Here we will see how the deployment process of the new Exadata X8M works.
RoCE issues from the factory
Exadata X8M servers are coming from the factory with the RoCE private network disabled. In case the Field Engineer assigned to work on the physical setup of the Exadata did not enable the RoCE network it is your job to do so.
RoCE network must be enabled on all Compute Nodes and also on all Storage Servers.
In Exadata X8M the private network is not on InfiniBand switches anymore, but on RoCE (RDMA over Converged Ethernet) Fabric switches. The interface cards we see in the operating system are re0 and re1.
When checking the active interface cards we cannot see re0 and re1:
[root@ex03db01 ~]# ifconfig bondeth0: flags=5187<up,broadcast,running,master,multicast> mtu 1500 inet 10.201.80.54 netmask 255.255.254.0 broadcast 10.201.81.255 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 54309 bytes 3744342 (3.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14088 bytes 1318384 (1.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0: flags=4163<up,broadcast,running,multicast> mtu 1500 inet 10.201.84.190 netmask 255.255.254.0 broadcast 10.201.85.255 ether 00:10:e0:ee:c5:6c txqueuelen 1000 (Ethernet) RX packets 279171 bytes 18019054 (17.1 MiB) RX errors 0 dropped 1 overruns 0 frame 0 TX packets 9553 bytes 1693920 (1.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0x9ca00000-9cafffff eth3: flags=6211<up,broadcast,running,slave,multicast> mtu 1500 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 31847 bytes 2396622 (2.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14088 bytes 1318384 (1.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth4: flags=6211<up,broadcast,running,slave,multicast> mtu 1500 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 22492 bytes 1349520 (1.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2 bytes 104 (104.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<up,loopback,running> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 136405 bytes 6139347 (5.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 136405 bytes 6139347 (5.8 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Most of the InfiniBand related commands/tools do not work anymore, but ibstat still does, so we can use that tool to check the state of the private network state:
[root@ex03db01 ~]# ibstat | grep -i 'state\|rate' State: Down Physical state: Disabled Rate: 100 State: Down Physical state: Disabled Rate: 100
Checking the config of RoCE interface cards:
[root@ex03db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-re0 #### DO NOT REMOVE THESE LINES #### #### %GENERATED BY CELL% #### DEVICE=re0 BOOTPROTO=none ONBOOT=no HOTPLUG=no IPV6INIT=no [root@ex03db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-re1 #### DO NOT REMOVE THESE LINES #### #### %GENERATED BY CELL% #### DEVICE=re1 BOOTPROTO=none ONBOOT=no HOTPLUG=no IPV6INIT=no
Bringing RoCE interface cards up:
[root@ex03db01 ~]# ifup re0 /sbin/ifup-local: /sbin/ifup-local re0: /sbin/ifup-local: + RoCE configuration... /sbin/ifup-local: + Matched (wildcard) interface re0. /sbin/ifup-local: + RoCE Configuration: /bin/roce_config -i re0... NETDEV=re0; IBDEV=mlx5_0; PORT=1 + RoCE v2 is set as default rdma_cm preference + Tos mapping is set + Default roce tos is set to 32 + Trust mode is set to dscp + PFC is configured as 0,1,1,1,1,1,0,0 + Congestion control algo/mask are set as expected + Buffers are configured as 32768,229120,0,0,0,0,0,0 Finished configuring "re0" ã½(â¢â¿â¢)ã /sbin/ifup-local: + Non-RoCE Configuration... /sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re0. [root@ex03db01 ~]# ifup re1 /sbin/ifup-local: /sbin/ifup-local re1: /sbin/ifup-local: + RoCE configuration... /sbin/ifup-local: + Matched (wildcard) interface re1. /sbin/ifup-local: + RoCE Configuration: /bin/roce_config -i re1... NETDEV=re1; IBDEV=mlx5_0; PORT=2 + RoCE v2 is set as default rdma_cm preference + Tos mapping is set + Default roce tos is set to 32 + Trust mode is set to dscp + PFC is configured as 0,1,1,1,1,1,0,0 + Congestion control algo/mask are set as expected + Buffers are configured as 32768,229120,0,0,0,0,0,0 Finished configuring "re1" ã½(â¢â¿â¢)ã /sbin/ifup-local: + Non-RoCE Configuration... /sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re1.
Now we can see that the interfaces re0 and re1 are up, but with no IPs assigned:
[root@ex03db01 ~]# ifconfig bondeth0: flags=5187<up,broadcast,running,master,multicast> mtu 1500 inet 10.201.80.54 netmask 255.255.254.0 broadcast 10.201.81.255 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 54533 bytes 3767354 (3.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14414 bytes 1349944 (1.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0: flags=4163<up,broadcast,running,multicast> mtu 1500 inet 10.201.84.190 netmask 255.255.254.0 broadcast 10.201.85.255 ether 00:10:e0:ee:c5:6c txqueuelen 1000 (Ethernet) RX packets 279584 bytes 18051211 (17.2 MiB) RX errors 0 dropped 1 overruns 0 frame 0 TX packets 9727 bytes 1720009 (1.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0x9ca00000-9cafffff eth3: flags=6211<up,broadcast,running,slave,multicast> mtu 1500 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 32071 bytes 2419634 (2.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14414 bytes 1349944 (1.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth4: flags=6211<up,broadcast,running,slave,multicast> mtu 1500 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 22492 bytes 1349520 (1.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2 bytes 104 (104.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<up,loopback,running> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 136804 bytes 6157123 (5.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 136804 bytes 6157123 (5.8 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 re0: flags=4163<up,broadcast,running,multicast> mtu 1500 ether 0c:42:a1:3b:45:12 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 re1: flags=4163<up,broadcast,running,multicast> mtu 1500 ether 0c:42:a1:3b:45:13 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
We can use ibstat again to confirm the interfaces are enabled:
[root@ex03db01 ~]# ibstat | grep -i 'state\|rate' State: Active Physical state: LinkUp Rate: 100 State: Active Physical state: LinkUp Rate: 100
To start any Exadata deployment you need the OEDA configuration files. They are a set of files generated by the OEDA (Oracle Exadata Deployment Assistant) tool. OEDA tool is currently a web-based tool that will allow the client to fill up all the IP addresses and hostnames that the new Exadata will be assigned. Normally this step is taken by the client with the support of their network team.
Configuration files needed:
- Clientname-clustername.xml
- Clientname-clustername-InstallationTemplate.html
- Clientname-clustername-preconf.csv
The OEDA tool for Linux is also needed and can be downloaded from the Patch ID 30640393. It is recommended to go with the latest version available, but if the configuration files were generated with a different/older version go with that version to avoid warnings during the execution of the onecommand.
Stage the OEDA for Linux in /u01/onecommand/ and unzip it:
[root@ex03db01 ~]# mkdir -p /u01/onecommand/ [root@ex03db01 ~]# unzip -q p30640393_193800_Linux-x86-64.zip -d /u01/onecommand/ [root@ex03db01 ~]# cd /u01/onecommand/linux-x64
Once in the correct directory run onecommand to list the steps just to make sure it is working:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -l Initializing 1. Validate Configuration File 2. Setup Required Files 3. Create Users 4. Setup Cell Connectivity 5. Verify Infiniband 6. Calibrate Cells 7. Create Cell Disks 8. Create Grid Disks 9. Install Cluster Software 10. Initialize Cluster Software 11. Install Database Software 12. Relink Database with RDS 13. Create ASM Diskgroups 14. Create Databases 15. Apply Security Fixes 16. Install Autonomous Health Framework 17. Create Installation Summary 18. Resecure Machine
applyElasticConfig.sh is a script, provided by Oracle within the OEDA, which performs the initial setup of the compute nodes and storage servers. That script works with the factory IP range and hostnames by default, but we found a way to trick it and make it work even when the client had already changed the IP addresses and hostnames. The initial setup is basically defining the network configuration, IP addresses, hostnames, DNS and NTP configuration and the script will look for nodes in the IP range of the 172.x.x.x network, so if the client had already changed the IPs and hostnames the script will not find anything. It is worth to mention that there is no documentation about this anywhere in the docs.oracle.com. You can find something here:
- Configuring Oracle Exadata Database Machine
- ApplyElasticConfig failed during the execution of elasticConfig.sh (Doc ID 2175587.1)
- Bug 23064772 OEDA: applyelasticconfig.sh fails with error unable to locate rack item with ulocation
Even though these documents briefly mention the applyElasticConfig.sh script they do not mention how to overcome the issue when the IPs and hostnames were already changed.
PreparationIn order to make the script look for the servers when their hostnames and IPs were changed, you have to edit the es.properties file which is located under /u01/onecommand/linux-x64/properties. Consider changing only the parameters related to the IPs, Subnets, and Hostnames. The variables we care about are: ROCEELASTICNODEIPRANGE, ROCEELASTICILOMIPRANGE, ELASTICSUBNETS and SKIPHOSTNAMECHECK. Change those to the range of IPs found in the Clientname-clustername-InstallationTemplate.html for each network:
- ROCEELASTICNODEIPRANGE expects the range of IPs in the management network.
- ROCEELASTICILOMIPRANGE expects the range of IPs of the ILOM of the servers.
- ELASTICSUBNETS expects the subnet of the management network.
- SKIPHOSTNAMECHECK defaults to false, so if the hostnames were also changed you want to set this to true.
Find some examples below:
[root@ex03db01 linux-x64]# cat properties/es.properties|grep ELASTIC #ROCEELASTICNODEIPRANGE=192.168.1.1:192.168.1.99 ROCEELASTICNODEIPRANGE=10.201.84.190:10.201.84.206 ROCEELASTICILOMIPRANGE=10.201.84.196:10.201.84.201 ELASTICCONFIGMARKERFILE=/.elasticConfig ELASTICRACKNAMES=x5,x6,sl6,x7,x8 QINQELASTICCONFIGMINVERION=20.1.0.0.0.200323 #ELASTICSUBNETS=172.16.2:172.16.3:172.16.4:172.16.5:172.16.6:172.16.7 ELASTICSUBNETS=10.201.84 [root@ex03db01 linux-x64]# grep SKIPHOST properties/es.properties #SKIPHOSTNAMECHECK=false SKIPHOSTNAMECHECK=trueExecution
Now that you have the es.properties ELASTIC* parameters matching your infrastructure configuration you are ready to execute the applyElasticConfig.sh script. To execute it you just need to call the script passing the Clientname-clustername.xml configuration file to it:
[root@ex03db01 linux-x64]# ./applyElasticConfig.sh -cf /root/config/Client-ex03.xml Applying Elastic Config... Discovering pingable nodes in IP Range of 10.201.84.190 - 10.201.84.206..... Found 6 pingable hosts..[10.201.84.193, 10.201.84.194, 10.201.84.195, 10.201.84.190, 10.201.84.191, 10.201.84.192] Validating Hostnames.. Discovering ILOM IP Addresses.. Getting uLocations... Getting Mac Addressess.. Getting uLocations... Mapping Machines with local hostnames.. Mapping Machines with uLocations.. Checking if Marker file exists.. Updating machines with Mac Address for 6 valid machines. Creating preconf.. Writing host-specific preconf files.. Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel02_preconf.csv for ex03cel02 .... Preconf file copied to ex03cel02 as /var/log/exadatatmp/firstconf/ex03cel02_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db01_preconf.csv for ex03db01 .... Preconf file copied to ex03db01 as /var/log/exadatatmp/firstconf/ex03db01_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db03_preconf.csv for ex03db03 .... Preconf file copied to ex03db03 as /var/log/exadatatmp/firstconf/ex03db03_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel03_preconf.csv for ex03cel03 .... Preconf file copied to ex03cel03 as /var/log/exadatatmp/firstconf/ex03cel03_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel01_preconf.csv for ex03cel01 .... Preconf file copied to ex03cel01 as /var/log/exadatatmp/firstconf/ex03cel01_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db02_preconf.csv for ex03db02 .... Preconf file copied to ex03db02 as /var/log/exadatatmp/firstconf/ex03db02_preconf.csv Running Elastic Configuration on ex03cel02.client.com Running Elastic Configuration on ex03db01.client.com Running Elastic Configuration on ex03db03.client.com Running Elastic Configuration on ex03cel03.client.com Running Elastic Configuration on ex03cel01.client.com Running Elastic Configuration on ex03db02.client.com /////OEDA onecommand preparation and execution Technical background
OEDA is a set of scripts, files, and a form we use to plan and deploy an Exadata. Sometimes we refer to it as the onecommand utility. It is called onecommand because with just one command we can deploy everything. This onecommand is the install.sh script.
PreparationTo be able to run the install.sh script we have to prepare some things first in the environment. Some prerequisites:
- The switches must have been already set up by the Field Engineer responsible for the physical installation of the hardware.
- The applyElasticConfig.sh script must have been run and completed successfully.
- The files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html must be staged to /u01/onecommand/linux-x64/WorkDir.
Stage the files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html to /u01/onecommand/linux-x64/WorkDir:
[root@ex03db01 ~]# ls -lh /u01/onecommand/linux-x64/WorkDir total X.9G -rwxr-xr-x 1 root root 355M Jun 9 12:34 ahf_setup -rw-r--r-- 1 root root 2.9G Jun 9 12:54 V982063-01.zip -rw-r--r-- 1 root root 2.7G Jun 9 12:57 V982068-01.zip -rw-r--r-- 1 root root 2.4G Jun 9 12:57 p30805684_190000_Linux-x86-64.zip -rw-r--r-- 1 root root 600M Jun 9 12:57 p6880880_180000_Linux-x86-64.zip -rw-r--r-- 1 root root 1.3G Jun 9 12:57 p30899722_190000_Linux-x86-64.zip
After all of this is done you can run the step 1 to validate the configuration files with the environment:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 1 Initializing Executing Validate Configuration File Validating cluster: ex03-clu1 Locating machines... Validating platinum... Checking Disk Tests Status.... Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version. Validating nodes for database readiness... Completed validation... SUCCESS: Ip address: 10.201.84.190 is configured correctly SUCCESS: Ip address: 10.201.80.54 is configured correctly SUCCESS: Ip address: 10.201.84.191 is configured correctly SUCCESS: Ip address: 10.201.80.55 is configured correctly SUCCESS: Ip address: 10.201.84.192 is configured correctly SUCCESS: Ip address: 10.201.80.56 is configured correctly SUCCESS: Ip address: 10.201.80.60 is configured correctly SUCCESS: Ip address: 10.201.80.62 is configured correctly SUCCESS: Ip address: 10.201.80.61 is configured correctly SUCCESS: Ip address: 10.201.80.58 is configured correctly SUCCESS: Ip address: 10.201.80.59 is configured correctly SUCCESS: Ip address: 10.201.80.57 is configured correctly SUCCESS: Validated NTP server 10.248.1.1 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/ahf_setup exists... SUCCESS: Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version. SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db01 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db02 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db03 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on ex03db01 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on ex03db02 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on ex03db03 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1 SUCCESS: DatabaseHome Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1 SUCCESS: Disk size 14000GB on cell ex03cel01.client.com matches the value specified in the OEDA configuration file SUCCESS: Disk size 14000GB on cell ex03cel02.client.com matches the value specified in the OEDA configuration file SUCCESS: Disk size 14000GB on cell ex03cel03.client.com matches the value specified in the OEDA configuration file SUCCESS: Number of physical disks on ex03cel01.client.com matches the value specified in OEDA configuration file SUCCESS: Number of physical disks on ex03cel02.client.com matches the value specified in OEDA configuration file SUCCESS: Number of physical disks on ex03cel03.client.com matches the value specified in OEDA configuration file Successfully completed execution of step Validate Configuration File [elapsed Time [Elapsed = 85395 mS [1.0 minutes] Tue Jun 09 22:51:44 PDT 2020]]
If it finishes successfully you are good to move forward.
ExecutionNow we just need to execute the remaining steps. You can execute one-by-one or all in a row. I normally do the step 1 and step 2 separate from the others just because they tend to fail easier than others. Running all of them in a row would not cause any harm since once any step fails the execution will immediately stop. So it is up to you how you would like to execute it.
In case you need to undo any of the steps you can use the -u and the step you would like to undo. You can use the install.sh -h to help you on that:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -h Warning: Invalid input(s) for {-h=null} ********************************** install.sh -cf <config.xml> -l [options] install.sh -cf <config.xml> -s <step #=''> | -r <num-num> install.sh ARGUMENTS: -l List all the steps that exist -cf Use to specify the full path for the config file -s <step #=''> Run only the specified step -r <num-num> Run the steps one after the other as long as no errors are encountered -u <num-num> | <step#> Undo a range of steps or a particular step For a range of steps, specify the steps in reverse order -h Print usage information -override Force to run undo steps related to celldisk and grid disk -force Delete binaries under grid home and database home when uninstalling clusterware and database software -delete Delete staging area/directories -nocalibratecell Create the installation summary file without running the calibrate cell command -noinfinicheck Create the installation summary file without running InfiniBand verification -p Prompts for root password for each or all the nodes. This option allows deployments in Exadata environments with non-default and/or different root passwords on each of the nodes in the rack -usesu Use SU with root account to run commands for grid/oracle users -sshkeys Run deployment with root SSH Keys that are setup by setuprootssh.sh or oedacli. Must be used with "-usesu" -customstep Run custom actions. Actions can be: updatecellroute: generate cellroute.ora in domUs -clustername Specify the cluster name, or All. Only used with -customstep to specify the cluster on which to run the custom action -upgradeNetworkFirmware X7 Broadcom network card Firmware upgrade Version : 200519
To undo a step simply execute this one to undo step 2:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2
Or to undo from step 2 to step 4:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2-4
Here is the execution of step 2:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 2 Initializing Executing Setup Required Files Copying and extracting required files... Required files are: /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip /u01/onecommand/linux-x64/WorkDir/V982068-01.zip /u01/onecommand/linux-x64/WorkDir/V982063-01.zip Copying required files... Checking status of remote files... Checking status of existing files on remote nodes... Getting status of local files... Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip at /u01/app/oracle/Oeda/Software/V982063-01.zip Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip at /u01/app/oracle/Oeda/Software/V982068-01.zip Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30805684_190000_Linux-x86-64.zip Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30899722_190000_Linux-x86-64.zip Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/Patches/p6880880_180000_Linux-x86-64.zip Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db02.client.com Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db02.client.com Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db02.client.com Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db03.client.com Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db03.client.com Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db03.client.com Completed copying files... Extracting required files... Copying resourcecontrol and other required files No config Keys in the configuration file.. Creating databasemachine.xml for EM discovery Done Creating databasemachine.xml for EM discovery Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 325110 mS [5.0 minutes] Wed Jun 10 12:16:46 CDT 2020]]
Here is the execution of steps from 3 to 8:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 3-8 Initializing Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Create Users Creating users... Creating users in cluster ex03-clu1 Validating existing users and groups... Creating required directories on nodes in cluster ex03-clu1 Updating /etc/hosts on nodes in cluster ex03-clu1 Setting up ssh for users in cluster ex03-clu1 Creating cell diag collection user CELLDIAG on cell servers.. Completed creating all users... Successfully completed execution of step Create Users [elapsed Time [Elapsed = 77818 mS [1.0 minutes] Wed Jun 10 12:20:31 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Setup Cell Connectivity Creating cellip.ora and cellinit.ora ... Creating cellip.ora for cluster ex03-clu1 Creating cellinit.ora for cluster ex03-clu1 Done creating cellip.ora and cellinit.ora... Successfully completed execution of step Setup Cell Connectivity [elapsed Time [Elapsed = 14675 mS [0.0 minutes] Wed Jun 10 12:20:52 CDT 2020]] Executing Verify Infiniband Validating infiniband network with rds-ping... Check Admin network connectivity... Running infinicheck to verify infiniband fabric for cluster ex03-clu1... Running verify topology to verify infiniband network... No Infiniband link errors found... SUCCESS: Verify topology does not report any errors on node ex03db01.client.com... ****************ex03db01***************** Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology Verify topology is not supported on RoCE ******************************************** SUCCESS: Verify topology does not report any errors on node ex03db02.client.com... ****************ex03db02***************** Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology Verify topology is not supported on RoCE ******************************************** SUCCESS: Verify topology does not report any errors on node ex03db03.client.com... ****************ex03db03***************** Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology Verify topology is not supported on RoCE ******************************************** Successfully completed execution of step Verify Infiniband [elapsed Time [Elapsed = 280227 mS [4.0 minutes] Wed Jun 10 12:25:37 CDT 2020]] Executing Calibrate Cells Calibrating cells... Successfully completed execution of step Calibrate Cells [elapsed Time [Elapsed = 461064 mS [7.0 minutes] Wed Jun 10 12:33:18 CDT 2020]] Executing Create Cell Disks Validating Self-Signed Certificates on cell servers... Fixing Cell Certificates on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com] Reconfiguring WLS... Cell name attribute does not match hostnames Cell ex03cel03 has cell name ru06, cell name attribute will be reset to ex03cel03 Cell ex03cel01 has cell name ru02, cell name attribute will be reset to ex03cel01 Cell ex03cel02 has cell name ru04, cell name attribute will be reset to ex03cel02 Checking physical disks for errors before creating celldisks Creating cell disks... Dropping Flash Cache before enabling WriteBack on cells [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com] Enable FlashCache mode to WriteBack in [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com] Creating flashcache on cells... Successfully completed execution of step Create Cell Disks [elapsed Time [Elapsed = 218067 mS [3.0 minutes] Wed Jun 10 12:36:56 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Create Grid Disks Creating grid disks for cluster ex03-clu1 Checking Cell Disk status... Successfully completed execution of step Create Grid Disks [elapsed Time [Elapsed = 123858 mS [2.0 minutes] Wed Jun 10 12:39:04 CDT 2020]] [root@ex03db01 linux-x64]#
Here is the execution of steps from 9 to 16:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 9-16 Initializing Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Install Cluster Software Installing cluster ex03-clu1 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Writing grid response file for cluster ex03-clu1 Running clusterware installer... Setting up Opatch for cluster ex03-clu1 Patching cluster ex03-clu1... Successfully completed execution of step Install Cluster Software [elapsed Time [Elapsed = 667497 mS [11.0 minutes] Wed Jun 10 12:51:15 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Initialize Cluster Software Initializing cluster ex03-clu1 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Writing grid response file for cluster ex03-clu1 Running root.sh on node ex03db01.client.com Checking file root_ex03db01.client.com_2020-06-10_12-54-03-631071286.log on node ex03db01.client.com Running root.sh on node ex03db02.client.com Checking file root_ex03db02.client.com_2020-06-10_13-02-42-916817198.log on node ex03db02.client.com Running root.sh on node ex03db03.client.com Checking file root_ex03db03.client.com_2020-06-10_13-05-42-659221162.log on node ex03db03.client.com Generating response file for Configuration Tools... Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Writing grid response file for cluster ex03-clu1 Running Configuration Assistants on ex03db01.client.com Checking status of cluster... Cluster Verification completed successfully Successfully completed execution of step Initialize Cluster Software [elapsed Time [Elapsed = 1184567 mS [19.0 minutes] Wed Jun 10 13:11:06 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Install Database Software Installing database software ... Validating nodes for database readiness... Installing database software with database home name DbHome1 Installing database software ... Extracting Database Software file /u01/app/oracle/Oeda/Software/V982063-01.zip into /u01/app/oracle/product/19.0.0.0/dbhome_1 Running database installer on node ex03db01.client.com ... Please wait... After running database installer... Patching Database Home /u01/app/oracle/product/19.0.0.0/dbhome_1 Successfully completed execution of step Install Database Software [elapsed Time [Elapsed = 717961 mS [11.0 minutes] Wed Jun 10 13:23:11 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Relink Database with RDS Successfully completed execution of step Relink Database with RDS [elapsed Time [Elapsed = 36009 mS [0.0 minutes] Wed Jun 10 13:23:54 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Create ASM Diskgroups Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Validating ASM Diskgroups.. Successfully completed execution of step Create ASM Diskgroups [elapsed Time [Elapsed = 138147 mS [2.0 minutes] Wed Jun 10 13:26:20 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Create Databases Setting up Huge Pages for Database..[test] Creating database [test]... Patch 30805684 requires specific post-installation steps. Databases will be restarted ... Running datapatch on database [test] Recompiling Invalid Objects (if any) on database [test] Successfully completed execution of step Create Databases [elapsed Time [Elapsed = 1252604 mS [20.0 minutes] Wed Jun 10 13:47:19 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Apply Security Fixes Setting up Huge Pages for ASM Instance.. Bouncing clusterware to set required parameters... Checking and enabling turbo mode if required... ex03db03.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db03.client.com ex03db02.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db02.client.com ex03db01.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db01.client.com Copying over /root/config/client-ex03.xml to all nodes under /etc/exadata/config Successfully completed execution of step Apply Security Fixes [elapsed Time [Elapsed = 436720 mS [7.0 minutes] Wed Jun 10 13:54:43 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Install Autonomous Health Framework Copying over AHF to all nodes in the Cluster..[ex03db01, ex03db02, ex03db03] Configuring Autonomous Health Framework(AHF) on all computes nodes.. AHF has been installed on all compute nodes at: /opt/oracle.ahf . EXAchk can be run by invoking ./exachk Generating an EXAchk report... EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_13567.zip Generating the EXAchk Infrastructure Report... EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_141143_infrastructure.zip Successfully completed execution of step Install Autonomous Health Framework [elapsed Time [Elapsed = 2234216 mS [37.0 minutes] Wed Jun 10 14:32:04 CDT 2020]] [root@ex03db01 linux-x64]#
Here is the execution of step 17:
[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 17 Initializing Executing Create Installation Summary Getting system details... Generating Installation Summary report: /u01/onecommand2/linux-x64/ExadataConfigurations/client-Development-InstallationReport.xml... Creating Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html... Created Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html All deployment reports are stored in /u01/onecommand2/linux-x64/ExadataConfigurations/client-AK00625423-deploymentfiles.zip Generating Platinum CSV file and copying it over to /opt/oracle.SupportTools on all compute nodes Writing platinum file : /u01/onecommand2/linux-x64/WorkDir/client_null-platinum.csv Successfully completed execution of step Create Installation Summary [elapsed Time [Elapsed = 53311 mS [0.0 minutes] Wed Jun 10 14:36:07 CDT 2020]]
Just a rac-status.sh run to check how the cluster was setup (learn more about rac-status.sh here):
[root@ex03db01 ~]# ./pythian/rac-status.sh -a Cluster ex03-clu1 is a X8M-2 Elastic Rack HC 14TB Type | Name | db01 | db02 | db03 | ------------------------------------------------------------------ asm | asm | Online | Online | Online | asmnetwork | asmnet1 | Online | Online | Online | chad | chad | Online | Online | Online | cvu | cvu | Online | - | - | dg | DATA | Online | Online | Online | dg | RECO | Online | Online | Online | dg | SPARSE | Online | Online | Online | network | net1 | Online | Online | Online | ons | ons | Online | Online | Online | proxy_advm | proxy_advm | Offline x| Offline x| Offline x| qosmserver | qosmserver | Online | - | - | vip | db01 | Online | - | - | vip | db02 | - | Online | - | vip | db03 | - | - | Online | vip | scan1 | - | Online | - | vip | scan2 | - | - | Online | vip | scan3 | Online | - | - | ------------------------------------------------------------------ x : Resource is disabled : Has been restarted less than 24 hours ago Listener | Port | db01 | db02 | db03 | Type | --------------------------------------------------------------------------------- ASMNET1LSNR_ASM| TCP:1525 | Online | Online | Online | Listener | LISTENER | TCP:1521 | Online | Online | Online | Listener | LISTENER_SCAN1 | TCP:1864 | - | Online | - | SCAN | LISTENER_SCAN2 | TCP:1864 | - | - | Online | SCAN | LISTENER_SCAN3 | TCP:1864 | Online | - | - | SCAN | --------------------------------------------------------------------------------- : Has been restarted less than 24 hours ago DB | Version | db01 | db02 | db03 | DB Type | --------------------------------------------------------------------------------- test | 19.0.0.0 (1) | Open | Open | Open | RAC (P) | --------------------------------------------------------------------------------- ORACLE_HOME references listed in the Version column 1 : /u01/app/oracle/product/19.0.0.0/dbhome_1 oracle oinstall : Has been restarted less than 24 hours ago [root@ex03db01 ~]# ps -ef|grep pmon root 362094 50259 0 14:40 pts/1 00:00:00 grep --color=auto pmon oracle 364290 1 0 13:52 ? 00:00:00 asm_pmon_+ASM1 oracle 367756 1 0 13:53 ? 00:00:00 ora_pmon_test1 [root@ex03db01 ~]#
That’s it. The deployment is finished. Now you just need to patch the compute nodes, storage servers, RoCE switches, GI, and DBs to whatever version you would like to go up to.
You might be thinking “what about step 18”. Well, step 18 “Resecure the machine” means you will harden the servers by dropping SSH keys, enhancing password complexity, expire current passwords, and implement password expiration time, etc. Sometimes those changes make the administration a bit harder and also you might want to implement your own security policies. So we normally skip this step, but again, it is up to you.
See you next time, sincerely,
Franky Faust
Oracle Database and Two-Factor Authentication (2FA)
A common question from Oracle customers is whether they can configure the Oracle Database to use:
- Two-Factor Authentication (2FA)
- Multi-Factor Authentication (MFA)
- A “Time-based One-time Password” (TOTP) code, which is usually a six-digit code generated from a hardware or software application.
The short answer is, “yes!” (though, not natively). You can implement it through other directory service technologies, namely RADIUS.
Previous blog posts discussed the setup and testing of the new Oracle 18c+ feature of Oracle “Centrally Managed Users” (CMU). It allows you to manage Oracle database users through Microsoft Active Directory (AD), and effectively offloads user management to AD. Oracle RADIUS authentication isn’t part of CMU. It’s a similar but slightly different implementation, and each has its own benefits and limitations.
However, by leveraging Oracle’s compatibility with the RADIUS protocol and an external directory service, you can achieve true 2FA capabilities for the Oracle database (and on a per-user basis, meaning that application and/or service accounts remain unaffected).
This post describes how to implement Oracle database 2FA using FreeRADIUS. The subsequent post extends the setup to use the commercially available Cisco Duo platform instead.
RADIUS Quick SummaryThe RADIUS (Remote Authentication Dial-in Service) protocol is based on “AAA:” Authentication, Authorization, and Accounting. This post is based mainly on the Authentication part. Furthermore, RADIUS can operate in two modes: synchronous mode and challenge-response (asynchronous) mode. Oracle Database is actually capable of both (and has been since at least Oracle 8i). However, the most practical and applicable form is “synchronous,” and will therefore be the focus of this post.
RADIUS can authenticate against numerous sources including its own “users” flat file, LDAP directory services, Microsoft Active Directory, and others. RADIUS refers to “clients” and “NASs” (Network Access Servers, which broker the credentials). When authenticating against a RADIUS server, the Oracle Database acts as the “RADIUS client” and “NAS.”
From the Oracle Database “Database Licensing Information User Manual” (part number E94254-18 dated April 2020):
Network encryption (native network encryption, network data integrity, and SSL/TLS) and strong authentication services (Kerberos, PKI, and RADIUS) are no longer part of Oracle Advanced Security and are available in all licensed editions of all supported releases of Oracle Database.
However, to use RADIUS authentication, the Oracle client software must include the Advanced Security option. Therefore, a full client installation is required and the InstantClient is not sufficient. Verify Oracle client software compatibility using the adapters command.
Authentication ProcessThe process flow is relatively simple, yet important to understand:
- The Oracle client attempts to connect to the database and provides credentials (username and password) along with a 2FA TOTP code.
- Oracle Database receives this information, finds the user in the local catalog, and determines that the authentication type is EXTERNAL.
- Oracle Database then scans the SQLNET.ORA file to determine where to authenticate EXTERNAL users.
- Using the RADIUS connectivity information from the SQLNET.ORA file, the Oracle Database passes the credential details onto the RADIUS server.
- The RADIUS server first authenticates the username/password with a directory service which could be a local file, Active Directory, an LDAP service, etc. This is “Primary Authentication.”
- If validated, the RADIUS server then authenticates the TOTP with the multi-factor authentication service (for example, the Google Authenticator PAM module, Cisco Duo, etc). This is “Secondary Authentication.”
- If also validated, the RADIUS server passes back the “Access-Accept” response to the Oracle Database which then accepts and completes the connection.
Here’s an illustration of the process:
Prerequisite SetupTo use 2FA with a TOTP, a “RADIUS server” is required. Some common options include:
- FreeRADIUS: A RADIUS Software suite developed under the GNU General Public License: https://freeradius.org/
- Cisco Duo Security: Commercial access security software: https://duo.com/
- Otka: Cloud-based identity management: https://www.okta.com/
This post focuses on setting up and configuring Oracle Database 2FA through FreeRADIUS. FreeRADIUS is a pretty simple software suite for DBAs to set up for testing and experimentation purposes without worrying about licensing costs or third-party vendors.
The first requirement is to find a location to run FreeRADIUS. This can be a local server or a Virtual Machine (for example, created within VirtualBox or a cloud VM). For examples on how to create a cloud VM in OCI for DBA testing and experimentation, go to https://blog.pythian.com/part-1-creating-an-oracle-18c-centrally-managed-users-testbed-using-oracle-cloud-infrastructure/.
VM requirements are minimal for testing/experimentation purposes. One or two vCPUs with 2 GB of memory should be sufficient.
For a base OS, many versions of Linux are supported including CentOS 7 and Oracle Linux 7.
Oracle Database and RADIUSFor an explanation about how Oracle Database interacts with RADIUS (and specifically FreeRADIUS), see Jared Still’s blog post: https://blog.pythian.com/using-freeradius-to-authorize-oracle-connections/
A key takeaway from that post: FreeRADIUS users need to be in UPPERCASE for Oracle Database compatibility.
Installing FreeRADIUSThis section summarizes the installation steps for FreeRADIUS on CentOS 7 or Oracle Linux 7. Perform all steps as “root” user, unless otherwise indicated:
If you are installing on Oracle Linux 7, add the necessary YUM repo (not required for CentOS 7):
cat << EOF > /etc/yum.repos.d/networkradius.repo [networkradius] name=NetworkRADIUS-7 baseurl=http://packages.networkradius.com/centos/7/repo/ gpgcheck=0 EOF
Perform other Linux prerequisites:
sed -i 's/=enforcing/=permissive/g' /etc/selinux/config setenforce Permissive yum -y update
Install the core FreeRADIUS package and the utilities package. The later adds testing tools:
yum -y install freeradius freeradius-utils
Adjust the Linux firewall (if required):
firewall-cmd --permanent --zone=public --add-port=1812/udp firewall-cmd --permanent --zone=public --add-port=1813/udp firewall-cmd --reload
Create a service for automatic-start (but don’t start it yet):
systemctl enable radiusd.serviceInitial RADIUS Testing
To perform a basic test of the RADIUS functionality, a test user is required. FreeRADIUS is configured based on several files, including an “authorize” file. This used to be a “users” file. “users” is now just a symbolic link to “authorize“.
IMPORTANT: Directives in the “authorize” file are processed sequentially. Therefore, when testing, it’s recommended you put the entries near or at the top to ensure they aren’t skipped due to some preceding and/or superseding directive.
Add a test user called “testuser” with a password of “Passw0rd” to the file:
sed -i '1itestuser Cleartext-Password := \"Passw0rd\"' /etc/raddb/mods-config/files/authorize head -4 /etc/raddb/mods-config/files/authorize
To test, it’s recommended to manually start the FreeRADIUS software (instead of through the Linux service) in debug mode:
radiusd -X
From a different window/session, test the connection using the command:
radtest testuser Passw0rd localhost 0 testing123
IMPORTANT: The “testing123” argument is a “RADIUS secret” configured in the default “RADIUS client.” This secret is used for secure communication between RADIUS clients and the server. See Jared Still’s blog post for more information.
The result should be “Received Access-Accept …”
FreeRADIUS Linux-PAM SetupThe next step is to set up FreeRADIUS to use Linux-PAM (Pluggable Authentication Modules).
IMPORTANT: For initial proof of concept experimentation and testing, local Linux users are authenticated using RADIUS and 2FA. Therefore, in this test scenario, the RADIUS server runs with root permissions so it can read all 2FA configuration files. In a production or hardened environment (including those using external directory services) you should configure FreeRADIUS to run in a “lower-permission” environment.
Change both user and group to root (in the test environment):
grep 'user\|group' /etc/raddb/radiusd.conf | grep = | grep -v ^[[:space:]]\# sed -i '/user =/ s/\(\#\)\{0,1\}[[:space:]]user = radius.*/\t user = root/' /etc/raddb/radiusd.conf sed -i '/group =/ s/\(\#\)\{0,1\}[[:space:]]group = radius.*/\t group = root/' /etc/raddb/radiusd.conf grep 'user\|group' /etc/raddb/radiusd.conf | grep = | grep -v ^[[:space:]]\#
Then, enable Pluggable Authentication Modules (PAM) including adding the required symbolic links:
grep pam /etc/raddb/sites-enabled/default sed -i '/^\#[[:space:]]pam/ s/^\#//' /etc/raddb/sites-enabled/default grep pam /etc/raddb/sites-enabled/default ln -s /etc/raddb/mods-available/pam /etc/raddb/mods-enabled/pam ls -l /etc/raddb/mods-enabled/pam ls -l /etc/raddb/mods-available/pam
Change auth-type to PAM in the authorize file. Remember that file directives are processed sequentially, so place it near the top:
sed -i '2iDEFAULT Auth-Type := PAM' /etc/raddb/mods-config/files/authorize head -5 /etc/raddb/mods-config/files/authorize
Finally, add a new Linux user on the same system for testing (remember that this user must be in uppercase for Oracle Database usage):
useradd RADUSER passwd RADUSER # Assume the set password is Passw0rd
Test FreeRADIUS and Linux-PAM authorization by running the FreeRADIUS server in debugging mode again:
radiusd -X
From another session/window, test a connection:
radtest RADUSER Passw0rd localhost 0 testing123
The result should again be “Received Access-Accept …”
Google Authenticator PAM Module SetupGoogle provides an example PAM module for 2FA through https://github.com/google/google-authenticator-libpam (Cisco Duo or Otka provide alternative options).
Install the Google Authenticator PAM module:
yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum -y install google-authenticator
Perform a basic configuration for the RADUSER local Linux user:
sudo -u RADUSER google-authenticator \ --time-based \ --allow-reuse \ --no-rate-limit \ --window-size=3 \ --force
IMPORTANT: The command above uses configuration options optimized for testing. In an actual production environment, I recommend reviewing all options and deploying in a more more secure manner. For example, using “–disallow-reuse” and similar.
Sample output:
You can generate the TOTP codes using a mobile phone application such as “Google Authenticator” or “Authy.” Scan the generated QR code with the phone app or manually enter the “secret key” into the app.
However, when testing, constantly generating codes on a phone can become cumbersome. Generating the codes using a Linux command might be more efficient.
To generate codes from the Linux bash shell, install the oathtool package:
yum -y install oathtool
Then, you can generate codes using the “secret key” from the user’s ~/.google-authenticator file. For example:
oathtool --base32 --totp "`head -1 /home/RADUSER/.google_authenticator`"
Finally, you must adjust the Linux-PAM configuration to reference the new Google Authenticator PAM Module:
cat << EOF > /etc/pam.d/radiusd #%PAM-1.0 auth requisite pam_google_authenticator.so forward_pass auth required pam_sepermit.so auth substack password-auth auth include postlogin account required pam_nologin.so account include password-auth session include password-auth EOFTesting Using the Local User and One-time Password
With the FreeRADIUS and Google Authenticator PAM Module implementation of 2FA, the password must be a single string that is comprised of the actual user password and the passcode. So, the format is: <password><TOTP>.
Other implementations (such as Cisco Duo) might allow for a comma between the two, or push notification to a mobile device. However, for FreeRADIUS and the Google Authenticator PAM Module, the format is one continuous string with no additional characters.
So, to test:
MY_PASSWD=Passw0rd MY_OTP=$(oathtool --base32 --totp "`head -1 /home/RADUSER/.google_authenticator`") radtest RADUSER ${MY_PASSWD}${MY_OTP} localhost 0 testing123
Example successful output:
Testing from the Oracle Database
The previously referenced blog post by Jared Still covers Oracle setup for FreeRADIUS. Review this post for additional information about the Oracle setup steps in this section. Repeating the steps in a simplified format:
As “root” on the FreeRADIUS server, add a FreeRADIUS client that the Oracle Databases will use to connect:
cat << EOF >> /etc/raddb/clients.conf client oracle-db { ipaddr = 192.168.1.0/24 secret = secretoracle shortname = oracle nastype = other } EOF
Restart the FreeRADIUS service to ensure all changes take effect:
systemctl restart radiusd.service systemctl status radiusd.service
As the “oracle” user (or Oracle software owner) on the Oracle Database server, save the RADIUS secret to a file:
mkdir -p ${ORACLE_HOME}/network/security echo "secretoracle" > ${ORACLE_HOME}/network/security/radius.key chmod 600 ${ORACLE_HOME}/network/security/radius.key
Add the required entries to the SQLNET.ORA file and verify the file (add the appropriate IP address for the FreeRADIUS server):
cat <<EOF >> ${ORACLE_HOME}/network/admin/sqlnet.ora SQLNET.AUTHENTICATION_SERVICES=(BEQ,TCPS,RADIUS) SQLNET.RADIUS_AUTHENTICATION=192.168.1.212 SQLNET.RADIUS_AUTHENTICATION_PORT=1812 SQLNET.RADIUS_SECRET=${ORACLE_HOME}/network/security/radius.key EOF
If you want, you can include other parameters such as the RADIUS timeout and retries. Alternatively, for higher availability and redundancy, use the SQLNET.RADIUS_ALTERNATE parameter to specify a backup RADIUS server to use if the primary becomes unavailable.
IMPORTANT: The SQLNET.ORA configuration above is for the database server only (RDBMS home). For Oracle clients, all that’s required is that the SQLNET.AUTHENTICATION_SERVICES parameter includes RADIUS. The other SQLNET.ORA parameters are not required/used in Oracle client homes.
Make sure that the database initialization parameters are set properly:
sqlplus -s / as sysdba << EOF alter system set os_authent_prefix='' scope=spfile sid='*'; alter system set remote_os_authent=false scope = spfile sid='*'; shutdown immediate startup alter pluggable database all open; EOF
Add the test user to the database:
sqlplus -s / as sysdba << EOF alter session set container=PDB1; create user raduser identified externally; grant create session to raduser; grant select on v_\$database to raduser; EOF
Testing is as simple as providing the TOTP in the password field. Because the user was added to the database using “… IDENTIFIED EXTERNALLY,” this is the Linux user’s password.
Using a connection string such as <username>/<password><passcode>, here’s an example of a full connection using SQLPlus:
Success! An Oracle Database connection to a database user using an external credential and 2FA!
If the password or the TOTP code is incorrect, then the standard “ORA-01017: invalid username/password; logon denied” error occurs.
Next Steps: Microsoft Active DirectoryFor many implementations, the next logical step is implemented using Microsoft Active Directory (AD) users instead of local Linux users. Switching this test setup to use AD instead of local users for authentication is relatively easy and involves the following:
- Connecting the FreeRADIUS server to the Microsoft Active Directory domain. For information about how to do this, see: https://blog.pythian.com/part-2-configuring-windows-and-linux-servers-for-cmu-testing/
- Adjusting the Linux-PAM module configuration to:
cat << EOF > /etc/pam.d/radiusd #%PAM-1.0 auth requisite pam_google_authenticator.so forward_pass auth required pam_sss.so use_first_pass account required pam_nologin.so account include password-auth session include password-auth EOF
- Logging to the FreeRADIUS server using the AD domain user and generating the ~/.google-authenticator configuration file using the previously described process.
- Creating the domain user in the database and testing.
Configuring the Oracle Database to use 2FA is something that many inquire about but few actually do. However, doing so isn’t overly complex and allows you to leverage open source software such as FreeRADIUS. All that’s required is a small VM to act as a RADIUS server.
In a real production environment, this server could become a single point of failure and therefore might require you to set it up with reliable infrastructure and/or in a redundant configuration (by including the SQLNET.RADIUS_ALTERNATE parameter in the SQLNET.ORA file).
This environment is simple and flexible for DBA testing. However, most enterprise users will instead want to leverage a more robust and flexible solution such as the commercial offerings from Cisco Duo or Otka. Configuring Oracle Database authentication using Cisco Duo is covered in the next post in this series.
AWS RDS: 5 Must-Know Actions for Oracle DBAs
Managing Oracle on AWS has some twists. Here are five daily DBA activities that have changed on AWS:
Kill Sessions:
begin rdsadmin.rdsadmin_util.kill( sid => &sid, serial => &serial, method => 'IMMEDIATE'); end; /
Flush shared_pool or buffer_cache:
exec rdsadmin.rdsadmin_util.flush_shared_pool; exec rdsadmin.rdsadmin_util.flush_buffer_cache;
Perform RMAN Operations:
BEGIN rdsadmin.rdsadmin_rman_util.validate_database( p_validation_type => 'PHYSICAL+LOGICAL', p_parallel => 4, p_section_size_mb => 10, p_rman_to_dbms_output => FALSE); END; /
Grant Privileges to SYS Objects
# Grant
begin rdsadmin.rdsadmin_util.grant_sys_object( p_obj_name => 'V_$SESSION', p_grantee => 'PYTHIAN', p_privilege => 'SELECT'); end; /
# Grant with Grant Option
begin rdsadmin.rdsadmin_util.grant_sys_object( p_obj_name => 'V_$SESSION', p_grantee => 'PYTHIAN', p_privilege => 'SELECT', p_grant_option => true); end; /
# Revoke
begin rdsadmin.rdsadmin_util.revoke_sys_object( p_obj_name => 'V_$SESSION', p_revokee => 'PYTHIAN', p_privilege => 'SELECT'); end; /
Create Custom Functions to Verify Passwords:
begin rdsadmin.rdsadmin_password_verify.create_verify_function( p_verify_function_name => 'CUSTOM_PASSWORD_FUNCTION', p_min_length => 12, p_min_uppercase => 2, p_min_digits => 1, p_min_special => 1, p_disallow_at_sign => true); end; /
If you want to double-check the generated code, here’s simple trick: Check on DBA_SOURCE:
col text format a150 select TEXT from DBA_SOURCE where OWNER = 'SYS' and NAME = 'CUSTOM_PASSWORD_FUNCTION' order by LINE;
I hope this helps!
OATUG Forum Online (or, the 2020 Version of Collaborate Las Vegas)
Hi everyone!
It’s been a very busy year for me, so I haven’t had much time to blog. But, here we go!
This isn’t a technical post, but more of an opinion piece. I’ve split this post into two parts. Part one was written before I spoke at the OATUG (Oracle Applications and Technology Users Group) Forum online. Part two was written after I spoke.
Part 1: Before…The global pandemic has disrupted virtually everything in our world. The same goes for the world of IT, including one of the best things about our industry: Conferences and live events.
I was supposed to attend two major conferences in the first quarter of 2020. One was cancelled and the other became an online event. Aborting major events like these is no trivial undertaking. Not only for the organizers themselves but also for all the sponsors and attendees (customers and vendors included).
Huge events like Oracle Open World, Google Next, or the one that will be the focus of most of this post, Collaborate, have a long-lasting impact that extends throughout the year. At these events, a casual meeting or a chat during a coffee break can kickstart a relationship. It might develop into a large services agreement, a partnership, or a whole new successful startup.
You have to agree that these events are important, if not crucial, for our industry.
CollaborateAs for Collaborate, well, I was very excited to not only attend but also speak at the event. After attending DOAG, BIWA, and UKOUG the previous years, Collaborate is the next big Oracle-related event in which I’d be participating. Then, of course, came COVID-19.
However, Collaborate isn’t cancelled. Instead, it’s now a fully virtual event. Here I’d like to give a massive KUDOS to the event organizers for their commitment to the event and the attendees. It certainly couldn’t have been easy to make the initial decision to proceed with Collaborate. Nor could it have been easy to completely undo and transform months and months of planning and logistics into a fully virtual conference in a matter of weeks.
I was very pleased to still be selected to speak at the (now) online conference. And, a big thanks to Pythian for their speaker program. It enables me to attend these gatherings, virtual or otherwise.
So, here I am, a few hours prior to my first big online presentation to an unknown number of people, and crossing my fingers that everything works as expected. This is basically how I feel before any “typical” presentation. The main difference? Well, the presentation won’t be quite the same. Interaction with the listening public is close to zero, and there is little practical opportunity for people to reach out to you with questions afterwards.
My Thoughts…I know, I know… there’s always social media, and so on. However, it’s not the same, and you know it. There’s no immediate feedback from the attendees, and you can’t really tell if they find the content interesting or boring. Nor is it easy to tell whether you should slow down, or speed up. It definitely feels more impersonal.
This is my opinion from a “presenter” standpoint. As an “attendee,” there are two major issues for me: One is that it’s no longer a live in-person event, it’s awkward to ask your manager for time to attend it. The second big issue for me is the time zone. I’m in the Eastern time zone, and the event is based in Las Vegas, so sessions can go all the way from 4:30 PM to 10:30 PM EDT. This, on top of my regular eight-hour workday, is simply too much.
So, I might be able to attend a session or two and watch saved sessions at a later time. This takes me back to my previous point about lost face-to-face interactions with people.
Anyway, I’m really interested to see how this plays out. I really hope the event is a big success because, unfortunately, our current “situation” might last a long time. We need to prepare for that possibility.
Part 2: After…And… the event is over. Unfortunately, it wasn’t very satisfactory, which makes me a little sad. You see, Collaborate is one of the biggest Oracle events that I know of with a great many attendees, but only four showed up to my live presentation. Yes, the feedback was very good, and surely those people who attended were very interested in the topic I was speaking about. I thank each and every one of them.
A few days later, I checked my presentation statistics and there were a few views and downloads. But overall, the number is far below the 20 to 50 people that attended this same presentation at other on-site conferences.
Of course, I’m not a renowned speaker and the subject matter might not be of broader interest. However, the online experience is just “not the same,” and that takes a toll on the number of attendees.
Regardless, it was a good overall experience. I give my sincere congratulations to the organizers for the huge effort they clearly put into the entire event.
Hopefully, we can all meet in person soon and enjoy a truly magnificent Collaborate.
Datascape Podcast Episode 41 – Choosing a Public Cloud for Oracle Database
You can run Oracle anywhere. It’s very flexible, which means anywhere you can get a VM, you can run it, including a virtual box on your laptop. There’s even have a free edition (XE) if you don’t have large data needs. In this episode, we’ll discuss why you would choose to host your Oracle Database anywhere other than its native cloud. We look at the feasibility of doing so, along with some of the benefits and drawbacks that you’ll likely encounter moving to a public cloud.
Simon Pane, a friend of the show and Principal Consultant at Pythian, is here to walk us through this topic. As usual, he provides excellent insights. We start by discussing Oracle’s cloud (OCI) and the reasons to stay with it, which include its flexibility and very good support. However, the platform is in its relative infancy which means that there will likely some problems.
We then move on to the big three: Azure, AWS, and Google Cloud. Moving to a public cloud provider might be part of an organizational “bigger picture” and other non-functional reasons. With Azure and AWS, clients are likely hoping to take advantage of the other features, such as Azure’s Office 365 and AWS’s RDS. Despite being licensed for Oracle, Azure is not optimized for it because there are no special shapes or options. AWS does have managed services, but still has drawbacks, which Simon explores in-depth.
We then move on to Google Cloud, the newest kid on the block. There are certainly great features (like BigQuery) that come along with it, but there is a major discrepancy: Licensing. The Oracle licensing policy does not include GC, so Simon offers some potential workarounds for this problem.
Ultimately, there is no slam-dunk solution for where to host Oracle databases. If you put your blinders on and focus on one area, it might seem that one cloud resolves all of your needs. However, it really comes down to making trade-offs and deciding which need is most important. Be sure to tune in today!
Key Points from this Episode- Discover reasons someone might move an Oracle on-premises database to a public cloud.
- How Oracle-supported versions work and why this might be a reason for using public clouds.
- Insights into Oracle cloud (OCI), its bad name, and the two reasons people don’t use it.
- The motivators for utilizing OCI: It’s best at home, supportability, and unique needs.
- Learn why you would use Azure as a home for your Oracle Database.
- The pain points you might encounter hosting an Oracle Database on Azure.
- Developments between Microsoft and Oracle. Two former enemies now building bridges.
- Reasons to go with AWS: Brand recognition, first to market, and managed service.
- Oracle licenses are incredibly expensive and drive the cost up in cloud and on-premises.
- Discover some of the shortcomings of hosting an Oracle Database on AWS.
- An overview of the special Oracle licensing considerations to remember.
- The benefits and drawbacks of using Google Cloud for Oracle Databases.
- Cloud SQL: Another example of Oracle’s terrible product names!
- A universal cloud challenge: None of them are great with upgrades.
- Why having cloud as backup or DR can be a good way to navigate the cloud decision matrix.
- What public clouds have in store for Oracle going into the future.