Bobby Durrett's DBA Blog
I’m still working on resolving the issues caused by bug 13914613.
Oracle support recommended that we apply a parameter change to resolve the issue but that change requires us to bounce the database and I was looking for a resolution that does not need a bounce. The bug caused very bad shared pool latch waits when the automatic memory management feature of our 188.8.131.52 database expanded the shared pool. Oracle support recommending setting _enable_shared_pool_durations=false and I verified that changing this parameter requires a bounce. It is a big hassle to bounce this database because of the application so I thought that I might try flushing the shared pool on a regular basis so the automatic memory management would not need to keep increasing the size of the shared pool. The shared pool was growing in size because we have a lot of SQL statements without bind variables. So, I did a test and in my test flushing the shared pool did not slow the growth of the shared pool.
Here is a zip of the scripts I used for this test and their outputs: zip
I set the shared pool to a small value so it was more likely to grow and I created a script to run many different sql statements that don’t use bind variables:
spool runselects.sql select 'select * from dual where dummy=''s' ||to_char(sysdate,'HHMISS')||rownum||''';' from dba_objects; spool off @runselects
So, the queries looked like this:
select * from dual where dummy='s0818111'; select * from dual where dummy='s0818112'; select * from dual where dummy='s0818113'; select * from dual where dummy='s0818114'; select * from dual where dummy='s0818115'; select * from dual where dummy='s0818116'; select * from dual where dummy='s0818117';
I ran these for an hour and tested three different configurations. The first two did not use the _enable_shared_pool_durations=false setting and the last did. The first test was a baseline that showed the growth of the shared pool without flushing the shared pool. The second test including a flush of the shared pool every minute. The last run included the parameter change and no flush of the shared pool. I queried V$SGA_RESIZE_OPS after each test to see how many times the shared pool grew. Here is the query:
SELECT OPER_TYPE,FINAL_SIZE Final, to_char(start_time,'dd-mon hh24:mi:ss') Started, to_char(end_time,'dd-mon hh24:mi:ss') Ended FROM V$SGA_RESIZE_OPS where component='shared pool' order by start_time,end_time;
Here are the results.
Baseline – no flush, no parameter change:
OPER_TYPE FINAL STARTED ENDED --------- ----------- --------------- --------------- GROW 150,994,944 18-jun 05:03:54 18-jun 05:03:54 GROW 134,217,728 18-jun 05:03:54 18-jun 05:03:54 STATIC 117,440,512 18-jun 05:03:54 18-jun 05:03:54 GROW 167,772,160 18-jun 05:04:36 18-jun 05:04:36 GROW 184,549,376 18-jun 05:47:38 18-jun 05:47:38
Flush every minute, no parameter change:
OPER_TYPE FINAL STARTED ENDED --------- ----------- --------------- --------------- GROW 134,217,728 18-jun 06:09:15 18-jun 06:09:15 GROW 150,994,944 18-jun 06:09:15 18-jun 06:09:15 STATIC 117,440,512 18-jun 06:09:15 18-jun 06:09:15 GROW 167,772,160 18-jun 06:09:59 18-jun 06:09:59 GROW 184,549,376 18-jun 06:22:26 18-jun 06:22:26 GROW 201,326,592 18-jun 06:42:29 18-jun 06:42:29 GROW 218,103,808 18-jun 06:47:29 18-jun 06:47:29
Parameter change, no flush:
OPER_TYPE FINAL STARTED ENDED --------- ------------ --------------- --------------- STATIC 117,440,512 18-jun 07:16:09 18-jun 07:16:09 GROW 134,217,728 18-jun 07:16:18 18-jun 07:16:18
So, at least in this test – which I have run only twice – flushing the shared pool if anything makes the growth of the shared pool worse. But, changing the parameter seems to lock it in.
I hacked together a query today that shows the overall I/O performance that a database is experiencing.
The output looks like this:
End snapshot time number of IOs ave IO time (ms) ave IO size (bytes) ------------------- ------------- ---------------- ------------------- 2015-06-15 15:00:59 359254 20 711636 2015-06-15 16:00:59 805884 16 793033 2015-06-15 17:00:13 516576 13 472478 2015-06-15 18:00:27 471098 6 123565 2015-06-15 19:00:41 201820 9 294858 2015-06-15 20:00:55 117887 5 158778 2015-06-15 21:00:09 85629 1 79129 2015-06-15 22:00:23 226617 2 10744 2015-06-15 23:00:40 399745 10 185236 2015-06-16 00:00:54 1522650 0 43099 2015-06-16 01:00:08 2142484 0 19729 2015-06-16 02:00:21 931349 0 9270
I’ve combined reads and writes and focused on three metrics – number of IOs, average IO time in milliseconds, and average IO size in bytes. I think it is a helpful way to compare the way two systems perform. Here is another, better, system’s output:
End snapshot time number of IOs ave IO time (ms) ave IO size (bytes) ------------------- ------------- ---------------- ------------------- 2015-06-15 15:00:25 331931 1 223025 2015-06-15 16:00:40 657571 2 36152 2015-06-15 17:00:56 1066818 1 24599 2015-06-15 18:00:11 107364 1 125390 2015-06-15 19:00:26 38565 1 11023 2015-06-15 20:00:41 42204 2 100026 2015-06-15 21:00:56 42084 1 64439 2015-06-15 22:00:15 3247633 3 334956 2015-06-15 23:00:32 3267219 0 49896 2015-06-16 00:00:50 4723396 0 32004 2015-06-16 01:00:06 2367526 1 18472 2015-06-16 02:00:21 1988211 0 8818
Here is the query:
select to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS') "End snapshot time", sum(after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS) "number of IOs", trunc(10*sum(after.READTIM+after.WRITETIM-before.WRITETIM-before.READTIM)/ sum(1+after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS)) "ave IO time (ms)", trunc((select value from v$parameter where name='db_block_size')* sum(after.PHYBLKRD+after.PHYBLKWRT-before.PHYBLKRD-before.PHYBLKWRT)/ sum(1+after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS)) "ave IO size (bytes)" from DBA_HIST_FILESTATXS before, DBA_HIST_FILESTATXS after,DBA_HIST_SNAPSHOT sn where after.file#=before.file# and after.snap_id=before.snap_id+1 and before.instance_number=after.instance_number and after.snap_id=sn.snap_id and after.instance_number=sn.instance_number group by to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS') order by to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS');
I hope this is helpful.
Oracle support says we have hit bug 13914613. Here is what our wait events looked like in an AWR report:
Top 5 Timed Foreground EventsEvent Waits Time(s) Avg wait (ms) % DB time Wait Class latch: shared pool 3,497 17,482 4999 38.83 Concurrency latch: row cache objects 885 12,834 14502 28.51 Concurrency db file sequential read 1,517,968 8,206 5 18.23 User I/O DB CPU 4,443 9.87 library cache: mutex X 7,124 2,639 370 5.86 Concurrency
What really struck me about these latch waits were that the average wait time was several thousand milliseconds which means several seconds. That’s a long time to wait for a latch.
Oracle pointed to the Latch Miss Sources section of the AWR. This is all gibberish to me. I guess it is the name of internal kernel latch names.Latch Miss Sources Latch Name Where NoWait Misses Sleeps Waiter Sleeps shared pool kghfrunp: clatch: wait 0 1,987 1,956 shared pool kghfrunp: alloc: session dur 0 1,704 1,364
Bug description says “Excessive time holding shared pool latch in kghfrunp with auto memory management” so I guess the “kghfrunp” latch miss sources told Oracle support that this was my issue.
I did this query to look for resize operations:
SELECT COMPONENT ,OPER_TYPE,FINAL_SIZE Final, 2 to_char(start_time,'dd-mon hh24:mi:ss') Started, 3 to_char(end_time,'dd-mon hh24:mi:ss') Ended 4 FROM V$SGA_RESIZE_OPS; COMPONENT OPER_TYPE FINAL STARTED ENDED ------------------------- ------------- --------------- ------------------------- ------------------------- DEFAULT 2K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 streams pool STATIC 134,217,728 12-may 04:33:01 12-may 04:33:01 ASM Buffer Cache STATIC 0 12-may 04:33:01 12-may 04:33:01 DEFAULT buffer cache INITIALIZING 10,401,873,920 12-may 04:33:01 12-may 04:33:08 DEFAULT 32K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 KEEP buffer cache STATIC 2,147,483,648 12-may 04:33:01 12-may 04:33:01 shared pool STATIC 13,958,643,712 12-may 04:33:01 12-may 04:33:01 large pool STATIC 2,147,483,648 12-may 04:33:01 12-may 04:33:01 java pool STATIC 1,073,741,824 12-may 04:33:01 12-may 04:33:01 DEFAULT buffer cache STATIC 10,401,873,920 12-may 04:33:01 12-may 04:33:01 DEFAULT 16K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 DEFAULT 8K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 DEFAULT 4K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 RECYCLE buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 KEEP buffer cache INITIALIZING 2,147,483,648 12-may 04:33:02 12-may 04:33:04 DEFAULT buffer cache SHRINK 10,334,765,056 20-may 21:00:12 20-may 21:00:12 shared pool GROW 14,025,752,576 20-may 21:00:12 20-may 21:00:12 shared pool GROW 14,092,861,440 27-may 18:06:12 27-may 18:06:12 DEFAULT buffer cache SHRINK 10,267,656,192 27-may 18:06:12 27-may 18:06:12 shared pool GROW 14,159,970,304 01-jun 09:07:35 01-jun 09:07:36 DEFAULT buffer cache SHRINK 10,200,547,328 01-jun 09:07:35 01-jun 09:07:36 DEFAULT buffer cache SHRINK 10,133,438,464 05-jun 03:00:33 05-jun 03:00:33 shared pool GROW 14,227,079,168 05-jun 03:00:33 05-jun 03:00:33 DEFAULT buffer cache SHRINK 10,066,329,600 08-jun 11:06:06 08-jun 11:06:07 shared pool GROW 14,294,188,032 08-jun 11:06:06 08-jun 11:06:07
The interesting thing is that our problem ended right about the time the last shared pool expansion supposedly started. The latch waits hosed up our database for several minutes and it ended right about 11:06. I suspect that the system was hung up with the bug and then once the bug finished then the normal expansion work started. Or, at least, the time didn’t get recorded until after the bug finished slowing us down.
So, I guess it’s just a bug. This is on 184.108.40.206 on HP-UX Itanium. I believe there is a patch set with the fix for this bug.
Maybe it will be helpful for someone to see an example.
Recently I used DBMS_RESOURCE_MANAGER.calibrate_io to measure disk I/O performance using a call like this:
DECLARE l_latency PLS_INTEGER; l_iops PLS_INTEGER; l_mbps PLS_INTEGER; BEGIN DBMS_RESOURCE_MANAGER.calibrate_io (num_physical_disks => 10, max_latency => 20, max_iops => l_iops, max_mbps => l_mbps, actual_latency => l_latency); DBMS_OUTPUT.put_line('Max IOPS = ' || l_iops); DBMS_OUTPUT.put_line('Max MBPS = ' || l_mbps); DBMS_OUTPUT.put_line('Latency = ' || l_latency); END; /
Using this call I have a surprising result. This test returns very different results when run against two different databases on the same Linux virtual machine and the same filesystem.
Max IOPS = 7459 Max MBPS = 863 Latency = 18
Max IOPS = 39921 Max MBPS = 1105 Latency = 0
Both databases use direct and asynchronous I/O. The databases differ in size and configuration. It seems that something about the databases themselves affects the results since they share the same filesystem on the same machine.
I did not get useful numbers from calibrate_io and have wasted a lot of time trying to interpret its results. You may want to focus on other tools for measuring disk I/O performance.
This is very cool:
There are a bunch of nice computer science classes online from MIT: Free online MIT computer science classes
Here is an introductory computer science class: Intro to computer science
Here is a graded version of the same class on edX: Graded version of MIT intro CS class starting June 10th.
edX does not have as many computer science classes but edX may motivate students because edX classes include grades and certificates for those who pass.
I use computer science every day in my database work but I have not taken a formal class since 1989.
I have been on a computer science kick ever since watching The Imitation Game. I downloaded Turing’s 1936 paper after watching the movie. I got about halfway through it before giving up. It was dense! Maybe will take another stab at it some day. But, the MIT classes are neat because they are the way computer science is now taught, and hopefully they are easier to understand than Turing’s paper.
I dug up a simple C program that I wrote years ago to test disk performance. I hesitated to publish it because it is rough and limited in scope and other more capable tools exist. But, I have made good use of it so why not share it with others? It takes a file name and the size of the file in megabytes. It sequentially writes the file in 64 kilobyte chunks. It opens the file in synchronous mode so it must write the data to disk before returning to the program. It outputs the rate in bytes/second that the program wrote to disk.
Here is a zip of the code: zip
There is no error checking so if you put in an invalid file name you get no message.
Here is how I ran it in my HP-UX and Linux performance comparison tests:
HP-UX: $ time ./createfile /var/opt/oracle/db01/bobby/test 1024 Bytes per second written = 107374182 real 0m10.36s user 0m0.01s sys 0m1.79s Linux: $ time ./createfile /oracle/db01/bobby/test 1024 Bytes per second written = 23860929 real 0m45.166s user 0m0.011s sys 0m2.472s
It makes me think that my Linux system’s write I/O is slower. I found a set of arguments to the utility dd that seems to do the same thing on Linux:
$ dd if=/dev/zero bs=65536 count=16384 of=test oflag=dsync 16384+0 records in 16384+0 records out 1073741824 bytes (1.1 GB) copied, 38.423 s, 27.9 MB/s
But I couldn’t find an option like dsync on the HP-UX version of dd. In any case, it was nice to have the C code so I could experiment with various options to open(). I used tusc on hp-ux and strace on Linux and found the open options to some activity in the system tablespace. By grepping for open I found the options Oracle uses:
hp trace open("/var/opt/oracle/db01/HPDB/dbf/system01.dbf", O_RDWR|0x800|O_DSYNC, 030) = 8 linux trace open("/oracle/db01/LINUXDB/dbf/system01.dbf", O_RDWR|O_DSYNC) = 8
So, I modified my program to use the O_DSYNC flag and it was the same as using O_SYNC. But, the point is that having a simple C program lets you change these options to open() directly.
I hope this program will be useful to others as it has to me.
p.s. Similar program for sequentially reading through file, but with 256 K buffers: zip
In my previous post I described how I could not explain why I got better db file parallel read wait times in a test on Linux than I got running the same test on HP-UX. I have discovered that the Linux wait times were better because Linux cached the data in the filesystem cache and HP-UX did not.
Neither system used direct I/O for the tests so both could cache data in the filesystem cache. Evidently Linux does this faster than HP-UX. I figured this out by repeatedly running the query flushing the buffer cache before each run. Flushing the buffer cache prevented the table and index from being cached within the database. On Linux the query ran for the same amount of time for all 5 executions. On HP-UX it ran much faster after running it for the first time. Apparently Linux cached the table and index before the first run and HP-UX cached them after the first run.
Here is how I ran the query:
alter system flush buffer_cache; select /*+ index(test testi) */ sum(blocks) from test; alter system flush buffer_cache; select /*+ index(test testi) */ sum(blocks) from test; alter system flush buffer_cache; select /*+ index(test testi) */ sum(blocks) from test; alter system flush buffer_cache; select /*+ index(test testi) */ sum(blocks) from test; alter system flush buffer_cache; select /*+ index(test testi) */ sum(blocks) from test;
Here are the elapsed times for the query on Linux:
Elapsed: 00:00:09.16 Elapsed: 00:00:09.17 Elapsed: 00:00:09.28 Elapsed: 00:00:09.18 Elapsed: 00:00:09.20
Here is the same thing on HP-UX:
Elapsed: 00:01:03.27 Elapsed: 00:00:19.23 Elapsed: 00:00:19.28 Elapsed: 00:00:19.35 Elapsed: 00:00:19.43
It’s not surprising that the HP-UX times with the data cached are twice that of Linux. An earlier post found the processor that I am evaluating on Linux was about twice as fast as the one I’m using on HP-UX.
Just to double-check that the caching was really at the filesystem level I turned direct I/O on for the Linux system using this parameter:
alter system set filesystemio_options=DIRECTIO scope=spfile;
I ran the test again after bouncing the database to make the parameter take effect and the run times were comparable to the slow first run on HP-UX:
Elapsed: 00:01:12.03 Elapsed: 00:01:06.69 Elapsed: 00:01:12.98 Elapsed: 00:01:10.14 Elapsed: 00:01:07.21
So, it seems that without the filesystem cache this query takes about 1 minute to run on either system. With caching the query runs under 20 seconds on both systems.
In some ways I think that these results are not important. Who cares if Linux caches things on the first attempt and HP-UX on the second?
The lesson I get from this test is that HP-UX and Linux are different in subtle ways and that when we migrate a database from HP-UX to Linux we may see performance differences that we do not expect.
Here is a zip of my script and its logs: zip
I am still working on comparing performance between an HP-UX blade and a Linux virtual machine and I have a strange result. I tried to come up with a simple example that would do a lot of single block I/O. The test runs faster on my Linux system than my HP-UX system and I’m not sure why. All of the parameters are the same, except the ones that contain the system name and filesystem names. Both systems are 220.127.116.11. The dramatic difference in run times corresponds to an equally dramatic difference in db file parallel read wait times.
I created a table called TEST and populated it with data and added an index called TESTI. I ran this query to generate a lot of single block I/O:
select /*+ index(test testi) */ sum(blocks) from test;
Here is the result on HP:
SUM(BLOCKS) ----------- 1485406208 Elapsed: 00:01:28.38 Statistics ---------------------------------------------------------- 9 recursive calls 0 db block gets 3289143 consistent gets 125896 physical reads 86864 redo size 216 bytes sent via SQL*Net to client 248 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed select EVENT,TOTAL_WAITS,TIME_WAITED,AVERAGE_WAIT 2 FROM V$SESSION_EVENT a 3 WHERE a.SID= :monitored_sid 4 order by time_waited desc; EVENT TOTAL_WAITS TIME_WAITED AVERAGE_WAIT ------------------------------ ----------- ----------- ------------ db file parallel read 4096 6760 1.65 db file sequential read 14526 236 .02 events in waitclass Other 1 28 28.49 SQL*Net message from client 19 5 .28 db file scattered read 5 3 .65 SQL*Net message to client 20 0 0 Disk file operations I/O 1 0 .01
Here is the same thing on Linux:
SUM(BLOCKS) ----------- 958103552 Elapsed: 00:00:09.01 Statistics ---------------------------------------------------------- 9 recursive calls 0 db block gets 3289130 consistent gets 125872 physical reads 77244 redo size 353 bytes sent via SQL*Net to client 360 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed select EVENT,TOTAL_WAITS,TIME_WAITED,AVERAGE_WAIT 2 FROM V$SESSION_EVENT a 3 WHERE a.SID= :monitored_sid 4 order by time_waited desc; EVENT TOTAL_WAITS TIME_WAITED AVERAGE_WAIT ------------------------------ ----------- ----------- ------------ db file parallel read 4096 55 .01 events in waitclass Other 1 17 16.72 db file sequential read 14498 11 0 SQL*Net message from client 19 6 .31 db file scattered read 15 0 0 SQL*Net message to client 20 0 0 Disk file operations I/O 1 0 0
Something doesn’t seem right. Surely there is some caching somewhere. Is it really possible that the Linux version runs in 9 seconds while the HP one runs in a minute and a half? Is it really true that db file parallel read is 1 hundredth of a second on HP and .01 hundredths of a second on Linux?
I’m still working on this but thought I would share the result since it is so strange.
Here is a zip of my scripts and their logs if you want to check them out: zip
p.s. Here are some possibly relevant parameters, same on both system:
compatible 18.104.22.168.0 cpu_count 4 db_block_size 8192 db_cache_size 512M db_writer_processes 2 disk_asynch_io FALSE dispatchers (PROTOCOL=TCP)(DISPATCHERS=32) filesystemio_options ASYNCH large_pool_size 32M log_buffer 2097152 max_shared_servers 12 pga_aggregate_target 5871947670 sga_max_size 3G sga_target 3G shared_pool_size 1G shared_servers 12 star_transformation_enabled FALSE
I’m trying to compare two types of database servers and it looks like one has a faster CPU than the other. But, the benchmark I have used runs a complicated variety of SQL so it is hard to really pin down the CPU performance. So, I made up a simple query that eats up a lot of CPU and does not need to read from disk.
First I created a small table with five rows:
create table test (a number); insert into test values (1); insert into test values (1); insert into test values (1); insert into test values (1); insert into test values (1); commit;
Then I ran a query Cartesian joining that table to itself multiple times:
select sum(t1.a)+ sum(t2.a)+ sum(t3.a)+ sum(t4.a)+ sum(t5.a)+ sum(t6.a)+ sum(t7.a)+ sum(t8.a)+ sum(t9.a)+ sum(t10.a) from test t1, test t2, test t3, test t4, test t5, test t6, test t7, test t8, test t9, test t10;
Then I used one of my profile scripts to extract the CPU. Here is a typical output:
SUBSTR(TIMESOURCE,1,30) SECONDS PERCENTAGE ------------------------------ ---------- ---------- TOTAL_TIME 32 100 CPU 32 100
I edited the output to make it fit. The profile shows the time that the query spent on the CPU in seconds.
I tried multiple runs of the same query and kept adding tables to the join to make the query longer.
This zip includes the sql scripts that I ran and my spreadsheet with the results: zip
I was comparing an Itanium and a Xeon processor and the test query ran in about half the time on the Xeon. I realize that this is not a complete benchmark, but it is some information. My other testing is not targeted specifically to CPU but I also saw a significant CPU speed-up there as well. So, this simple query adds to the evidence that the Xeon processor that I am evaluating is faster than the Itanium one.
I need to change a view and an index on an active production system. I’m concerned that the change will fail with a “ORA-00054: resource busy” error because I’m changing things that are in use. I engaged in a twitter conversation with @FranckPachot and @DBoriented and they gave me the idea of using DDL_LOCK_TIMEOUT with a short timeout to sneak in my changes on our production system. Really, I’m more worried about backing out the changes since I plan to make the change at night when things are quiet. If the changes cause a problem it will be during the middle of the next day. Then I’ll need to sneak in and make the index invisible or drop it and put the original view text back.
I tested setting DDL_LOCK_TIMEOUT to one second at the session level. This is the most conservative setting:
alter session set DDL_LOCK_TIMEOUT=1;
I created a test table with a bunch of rows in it and ran a long updating transaction against it like this:
update /*+ index(test testi) */ test set blocks=blocks+1;
Then I tried to alter the index invisible with the lock timeout:
alter index testi invisible * ERROR at line 1: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
Same error as before. The update of the entire table took a lot longer than 1 second.
Next I tried the same thing with a shorter running update:
update /*+ index(test testi) */ test set blocks=blocks+1 where owner='SYS' and table_name='DUAL'; commit; update /*+ index(test testi) */ test set blocks=blocks+1 where owner='SYS' and table_name='DUAL'; commit; ... lots more of these so script will run for a while...
With the default setting of DDL_LOCK_TIMEOUT=0 my alter index invisible statement usually exited with an ORA-00054 error. But, eventually, I could get it to work. But, with DDL_LOCK_TIMEOUT=1 in my testing my alter almost always worked. I guess in some cases my transaction exceeded the 1 second but usually it did not.
Here is the alter with the timeout:
alter session set DDL_LOCK_TIMEOUT=1; alter index testi invisible;
Once I made the index invisible the update started taking 4 seconds to run. So, to make the index visible again I had to bump the timeout up to 5 seconds:
alter session set DDL_LOCK_TIMEOUT=5; alter index testi visible;
So, if I have to back out these changes at a peak time setting DDL_LOCK_TIMEOUT to a small value should enable me to make the needed changes.
Here is a zip of my scripts if you want to recreate these tests: zip
You need Oracle 11g or later to use DDL_LOCK_TIMEOUT.
These tests were all run on Oracle 22.214.171.124.
Also, I verified that I studied DDL_LOCK_TIMEOUT for my 11g OCP test. I knew it sounded familiar but I have not been using this feature. Either I just forgot or I did not realize how helpful it could be for production changes.
I’m trying to compare how a query runs on two different 126.96.36.199 systems. One runs on HP-UX Itanium and one runs on 64 bit x86 Linux. Same query, same plan, different hash value.
SQL_ID 0kkhhb2w93cx0 -------------------- update seg$ set type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,exts ize=:9,extpct=:10,user#=:11,iniexts=:12,lists=decode(:13, 65535, NULL, :13),groups=decode(:14, 65535, NULL, :14), cachehint=:15, hwmincr=:16, spare1=DECODE(:17,0,NULL,:17),scanhint=:18, bitmapranges=:19 where ts#=:1 and file#=:2 and block#=:3 Plan hash value: 1283625304 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | | | 2 (100)| | | 1 | UPDATE | SEG$ | | | | | | 2 | TABLE ACCESS CLUSTER| SEG$ | 1 | 65 | 2 (0)| 00:00:01 | | 3 | INDEX UNIQUE SCAN | I_FILE#_BLOCK# | 1 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------
SQL_ID 0kkhhb2w93cx0 -------------------- update seg$ set type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,exts ize=:9,extpct=:10,user#=:11,iniexts=:12,lists=decode(:13, 65535, NULL, :13),groups=decode(:14, 65535, NULL, :14), cachehint=:15, hwmincr=:16, spare1=DECODE(:17,0,NULL,:17),scanhint=:18, bitmapranges=:19 where ts#=:1 and file#=:2 and block#=:3 Plan hash value: 2170058777 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | | | 2 (100)| | | 1 | UPDATE | SEG$ | | | | | | 2 | TABLE ACCESS CLUSTER| SEG$ | 1 | 64 | 2 (0)| 00:00:01 | | 3 | INDEX UNIQUE SCAN | I_FILE#_BLOCK# | 1 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------
I wonder if the endianness plays into the plan hash value calculation? Or is it just a port specific calculation?
I started using sar -d to look at disk performance on a Linux system this week and had to look up what some of the returned numbers meant. I’ve used sar -d on HP Unix but the format is different.
Here is an edited output from a Linux VM that we are copying files to:
$ sar -d 30 1 Linux 2.6.32-504.3.3.el6.x86_64 (myhostname) 04/01/2015 _x86_64_ (4 CPU) 05:26:55 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 05:27:25 PM dev253-9 7669.55 2.44 61353.93 8.00 35.39 4.61 0.03 19.80
I edited out the real host name and I removed all the lines with devices except the one busy device, dev253-9.
Earlier today I got confused and thought that rd_sec/s meant read I/O requests per second but it is not. Here is how the Linux man page describes rd_sec/s:
Number of sectors read from the device. The size of a sector is 512
In the example above all the activity is writing so if you look at wr_sec/s it is the same kind of measure of activity:
Number of sectors written to the device. The size of a sector is 512
So in the example you have 61353.93 512 byte sectors written per second. Divide by 2 to get kilobytes = 30676 KB/sec. Divide by 1024 and round-up to get 30 megabytes per second.
But, how many write I/O operations per second does this translate to? It looks like you can’t tell in this listing. You can get overall I/O operations per second including both reads and writes from the tps value which the man page defines as:
Total number of transfers per second that were issued to physical
devices. A transfer is an I/O request to a physical device. Multiple
logical requests can be combined into a single I/O request to the
device. A transfer is of indeterminate size.
Of course there aren’t many read requests so we can assume all the transfers are writes so that makes 7669.55 write IOPS. Also, you can find the average I/O size by dividing rd_sec/s + wr_sec/s by tps. This comes out to just about 8 which is the same as avgrq-sz which the man page defines as
The average size (in sectors) of the requests that were issued to the
So, avgrq-sz is kind of superfluous since I can calculate it from the other values but it means that our average I/O is 8 * 512 bytes = 4 kilobytes. This seems like a small I/O size considering that we are copying large data files over NFS. Hmmm.
Also, the disk device is queuing the I/O requests but the device is only in use 19% of the time. Maybe there are bursts of 4K writes which queue up and then gaps in activity? Here are the definitions for the remaining items.
The average queue length of the requests that were issued to the
The average time (in milliseconds) for I/O requests issued to the
device to be served. This includes the time spent by the requests in
queue and the time spent servicing them.
The average service time (in milliseconds) for I/O requests that were
issued to the device.
Percentage of CPU time during which I/O requests were issued to the
device (bandwidth utilization for the device). Device saturation
occurs when this value is close to 100%.
The service time is good – only .03 milliseconds – so I assume that the I/Os are writing to a memory cache. But the total time is higher – 4.61 – which is mostly time spent waiting in the queue. The average queue length of 35.39 makes sense given that I/Os spend so much time waiting in the queue. But it’s weird that utilization isn’t close to 100%. That’s what makes me wonder if we are having bursts of activity.
Anyway, I have more to learn but I thought I would pass along my thoughts on Linux’s version of sar -d.
P.S. Here is the output on HP-UX that I am used to:
HP-UX myhostname B.11.31 U ia64 04/02/15 11:27:14 device %busy avque r+w/s blks/s avwait avserv 11:27:44 disk1 1.60 0.50 3 95 0.00 10.27 disk6 0.03 0.50 1 6 0.00 0.64 disk15 0.00 0.50 0 0 0.00 3.52 disk16 100.00 0.50 337 5398 0.00 5.52
r+w/s on HP-UX sar -d seems to be the equivalent of tps on Linux. blks/s on HP-UX appears to be the same as rd_sec/s + wr_sec/s on Linux. The other weird difference is that in HP-UX avwait is just the time spent in the queue which I believe is equal to await – svctm on Linux. I am more accustomed to the HP-UX tool so I needed to get up to speed on the differences.
I was preparing for my weekend patch of our Exadata system and I needed to back up all of our Oracle homes and inventories on our production system. On our 2 node dev and qa clusters I just ran the backups by hand like this:
login as root cd /u01/app/oracle/product/188.8.131.52 tar -cvf - dbhome_1 | gzip > dbhome_1-20150211.tgz cd /u01/app cp -r oraInventory oraInventory.20150211
But the production cluster has 12 nodes so I had to figure out how to use DCLI to run the equivalent on all 12 nodes instead of doing them one at a time. To run a DCLI command you need go to the directory that has the list of database server host names. So, first you do this:
login as root cd /opt/oracle.SupportTools/onecommand
The file dbs_group contains a list of the database server host names.
Next, I wanted to check how much space was free on the filesystem and how much space the Oracle home occupied so I ran these commands:
dcli -g dbs_group -l root "df|grep u01" dcli -g dbs_group -l root "cd /u01/app/oracle/product/184.108.40.206;du -ks ."
The first command gave me how much space was free on the /u01 filesystem on all database nodes. The second command gave me how much space the 220.127.116.11 home consumed. I should have done “du -ks dbhome_1″ since I’m backing up dbhome_1 instead of everything under 18.104.22.168, but there wasn’t much else under 22.214.171.124 so it worked out.
Now that I knew that there was enough space I ran the backup commands using DCLI.
dcli -g dbs_group -l root "cd /u01/app/oracle/product/126.96.36.199;tar -cvf - dbhome_1 | gzip > dbhome_1-20150316.tgz" dcli -g dbs_group -l root "cd /u01/app;cp -r oraInventory oraInventory.20150316"
I keep forgetting how to do this so I thought I would post it. I can refer back to this later and perhaps it will be helpful to others.
I saw a load of 44 on a node of our production Exadata and it worried me. The AWR report looks like this:
Host CPU Load Average CPUs Begin End %User %System %WIO %Idle ----- --------- --------- --------- --------- --------- --------- 16 10.66 44.73 68.3 4.3 0.0 26.8
So, why is the load average 44 and yet the CPU is 26% idle?
I started looking at ASH data and found samples with 128 processes active on the CPU:
select 2 sample_time,count(*) 3 from DBA_HIST_ACTIVE_SESS_HISTORY a 4 where 5 session_state='ON CPU' and 6 instance_number=3 and 7 sample_time 8 between 9 to_date('05-MAR-2015 01:00:00','DD-MON-YYYY HH24:MI:SS') 10 and 11 to_date('05-MAR-2015 02:00:00','DD-MON-YYYY HH24:MI:SS') 12 group by sample_time 13 order by sample_time; SAMPLE_TIME COUNT(*) ---------------------------- ---------- 05-MAR-15 01.35.31.451 AM 128 ... lines removed for brevity
Then I dumped out the ASH data for one sample and found all the sessions on the CPU were running the same parallel query:
select /*+ parallel(t,128) parallel_index(t,128) dbms_stats ...
So, for some reason we are gathering stats on a table with a degree of 128 and that spikes the load. But, why does the CPU idle percentage sit at 26.8% when the load starts at 10.66 and ends at 44.73? Best I can tell load in DBA_HIST_OSSTAT is a point measurement of load. It isn’t an average over a long period. The 11.2 manual describes load in v$osstat in this way:
Current number of processes that are either running or in the ready state, waiting to be selected by the operating-system scheduler to run. On many platforms, this statistic reflects the average load over the past minute.
So, load could spike at the end of an hour-long AWR report interval and still CPU could average 26% idle for the entire hour? So it seems.
Sign up for the Arizona Oracle User Group (AZORA) meeting next week: signup url
The email that I received from the meeting organizer described the topic of the meeting in this way:
“…the AZORA meetup on March 18, 2015 is going to talk about how a local business decided to upgrade their Oracle Application from 11i to R12 and give you a first hand account of what went well and what didn’t go so well. ”
Description of the speakers from the email:
Becky is the Director of Project Management at Blood Systems located in Scottsdale, AZ. Prior to coming to Blood Systems, Becky was an independent consultant for Tipton Consulting for four years.
Mike is the Vice President of Application Solutions at 3RP, a Phoenix consulting company. Mike has over 10 years of experience implementing Oracle E-Business Suite and managing large-scale projects.
I plan to attend. I hope to see you there too.
My Delphix user group presentation went well today. 65 people attended. It was great to have so much participation.
Here are links to my PowerPoint slides and a recording of the WebEx:
Also, I want to thank two Delphix employees, Ann Togasaki and Matthew Yeh. Ann did a great job of converting my text bullet points into a visually appealing PowerPoint. She also translated my hand drawn images into useful drawings. Matthew did an amazing job of taking my bullet points and my notes and adding meaningful graphics to my text only slides
I could not have put the PowerPoint together in time without Ann and Matthew’s help and they did a great job.
Also, for the first time I wrote out my script word for word and added it to the notes on the slides. So, you can see what I intended to say with each slide.
Thank you to Adam Leventhal of Delphix for inviting me to do this first Delphix user group WebEx presentation. It was a great experience for me and I hope that it was useful to the user community as well.
My first blog post was March 5, 2012, three years ago today.
I have enjoyed blogging. Even though I am talking about topics related to my work blogging does not feel like work. The great thing about blogging is that it’s completely in my control. I control the content and the time-table. I pay a small amount each year for hosting and for the domain name, but the entertainment value alone is worth the price of the site. But, it also has career value because this blog has given me greater credibility both with my employer and outside the company. Plus, I think it makes me better at my job because blogging forces me to put into words the technical issues that I am working on.
It’s been three good years of blogging. Looking forward to more in the future.
I joined twitter. I don’t really know how to use it. I’m setup as Bobby Durrett, @bobbydurrettdba if that means anything to you.
I mentioned this same effect in an earlier post about SQL profiles: link
I get a different plan_hash_value values for a query each time I run an explain plan or run the query. I see this in queries whose plan includes a system generated temporary segment like this:
| 1 | TEMP TABLE TRANSFORMATION | | ... | 72 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9D668C_764DD84C |
For some reason the system generated temporary table name gets included in the plan_hash_value calculation. This makes plan_hash_value a less than perfect way to compare two plans to see if they are the same.
Last week I was using my testselect package to test the effect of applying a patch to fix bug 20061582. I used testselect to grab 1160 select statements from production and got their plans with and without the patch applied on a development database. I didn’t expect many if any plans to change based on what the patch does. Surprisingly, 115 out of the 1160 select statements had a changed plan, but all the ones I looked at had the system generated temporary table names in their plan.
Now, I am going to take the queries that have different plans with and without the patch and execute them both ways. I have a feeling that the plan differences are mainly due to system generated temp table names and their execution times will be the same with and without the patch.
I’ve run across other limitations of plan hash value as I mentioned in an earlier post: link
I’m still using plan_hash_value to compare plans but I have a list of things in my head that reminds me of cases where plan_hash_value fails to accurately compare two plans.
P.S. After posting this I realized that I didn’t know how many of the 115 select statements with plans that differed with and without the patch had system generated temp tables. Now I know. 114 of the 115 have the string “TEMP TABLE TRANSFORMATION” in their plans. So, really, there is only one select statement for which the patch may have actually changed its plan.
P.P.S. I reapplied the patch and verified that the one sql_id didn’t really change plans with the patch. So, that means all the plan changes were due to the system generated name. Also, all the executions times were the same except for one query that took 50 seconds to parse without the patch and 0 with the patch. So, one of the queries with the system generated temp table name happened to benefit from the patch. Very cool!
P.P.P.S This was all done on an 188.8.131.52 Exadata system.