Bobby Durrett's DBA Blog
I worked with our support team to improve the performance of a PeopleSoft Financials update statement yesterday. The update statement had an index hint already in it but the index was not the best one of the available indexes.
Here is the original update statement:
UPDATE /*+ INDEX(B PSCPYMNT_VCHR_XREF) */ PS_PYMNT_VCHR_XREF B SET BANK_SETID = :1, BANK_CD = :2, BANK_ACCT_KEY = :3, PYMNT_METHOD = :4, BANK_ACCT_SEQ_NBR = :5, EFT_LAYOUT_CD = :6, STL_THROUGH = :7 WHERE B.REMIT_SETID = 'XYZ' AND B.REMIT_VENDOR = :8 AND B.PYMNT_SELCT_STATUS = 'N' AND B.PYMNT_ID = ' ' AND B.BANK_ACCT_KEY NOT LIKE ('EXP%')
I listed out the columns for the indexes on the table using the “querytuning” part of my standard sql tuning scripts.
Here are the columns for the hinted index:
REMIT_SETID REMIT_VENDOR VNDR_LOC
The where clause includes only the first two columns.
But another similar index, PSEPYMNT_VCHR_XREF, exists with these columns:
REMIT_SETID REMIT_VENDOR PYMNT_SELCT_STATUS
The where clause has all three of these columns. So, why was the original query hinted this way? Does the E index not work better than the C index? I ran this query to see how selective the condition PYMNT_SELCT_STATUS = ‘N’ is.
>select PYMNT_SELCT_STATUS,count(*) 2 from PS_PYMNT_VCHR_XREF B 3 WHERE B.REMIT_SETID = 'XYZ' 4 AND B.REMIT_VENDOR = '12345678' 5 group by PYMNT_SELCT_STATUS; P COUNT(*) - ---------- C 5 N 979 P 177343 X 5485
I included the conditions on the first two columns that both indexes share, but removed the other conditions from the original update. A count on the number of rows that meet the conditions of only these two columns shows how many rows the original index will have to use to check the remaining where clause conditions.
I grouped by PYMNT_SELCT_STATUS to see how many rows met the condition PYMNT_SELCT_STATUS = ‘N’ and how many did not. Grouping on PYMNT_SELCT_STATUS shows how many rows the new index will use to check the remaining conditions in the where clause. I ran this query to see if the second index would use fewer rows than the first.
This query showed that only 979 of the over 180,000 rows met the condition. This made me think that the E index which includes PYMNT_SELCT_STATUS has a good chance of speeding up the original update. I ran a count with a hint forcing the C index and then again forcing the E index:
> >set timing on > >select /*+ INDEX(B PSCPYMNT_VCHR_XREF) */ count(*) 2 from PS_PYMNT_VCHR_XREF B 3 WHERE B.REMIT_SETID = 'XYZ' 4 AND B.REMIT_VENDOR = '12345678' 5 AND B.PYMNT_SELCT_STATUS = 'N' 6 AND B.PYMNT_ID = ' ' 7 AND B.BANK_ACCT_KEY NOT LIKE ('EXP%'); COUNT(*) ---------- 982 Elapsed: 00:13:52.53 > >select /*+ INDEX(B PSEPYMNT_VCHR_XREF) */ count(*) 2 from PS_PYMNT_VCHR_XREF B 3 WHERE B.REMIT_SETID = 'XYZ' 4 AND B.REMIT_VENDOR = '12345678' 5 AND B.PYMNT_SELCT_STATUS = 'N' 6 AND B.PYMNT_ID = ' ' 7 AND B.BANK_ACCT_KEY NOT LIKE ('EXP%'); COUNT(*) ---------- 982 Elapsed: 00:00:01.28
The original hint caused the select count(*) query to run in 13 minutes while the new hint caused it to run in 1 second. Clearly the new E index causes the query to run faster!
The developer that I was working with found the problem update statement in some PeopleCode and was able to edit the hint forcing it to use the better index. We migrated the modified code to production and the user was able to run the update statement without the web site timing out. Prior to the change the user was not able to complete the update because the SQL statement took so long it exceeded our application server timeout.
I finished testing 1000 select statements on our new Exadata X5 to see if they would run faster or slower than on our older Exadata V2. Our current production V2 has 12 nodes and the new X5 has only 2. The memory and parallel server parameters on the X5 are 6 times are large as on the old one, since we have one sixth as many hosts and more than 6 times the memory and CPU per host. I think that memory parameters can sometimes change execution plans, and of course with the newer Exadata software who knows what other differences we might see. I wanted to see if any plan changes or other issues caused some queries to run much slower on our newer Exadata system than the old one. I picked 1000 select statements at random from our current production and tested them comparing plans and execution time. In the end I did not find any bad plan changes and on average the tested select statements ran about 4 times faster on the X5 than on the older V2.
I used my testselect package that I have mentioned in several other posts. Here are some other examples of using this package for performance tuning:
In the other posts I was using the package to test the effect of some change on query plans and performance. So, I was comparing two different situations on the same host. But, in this case I was comparing two different hosts with essentially the same data and settings. But they had different versions of Exadata hardware and larger parameters and fewer nodes on the newer host. Here are the results of my first run with all 1000 statements. I got the execution plan for all 1000 select statements but only executed the ones with different plans. Here were the results:
>execute TEST_SELECT.display_results('X5','V2'); Select statements that ran 3 times faster with X5 than with V2. T1=X5 T2=V2 SQLNUMBER T1_EXECUTE_PLAN_HASH T2_EXECUTE_PLAN_HASH T1_ELAPSED_IN_SECONDS T2_ELAPSED_IN_SECONDS --------- -------------------- -------------------- --------------------- --------------------- 3 287237826 287237826 3 34 4 1245040971 1245040971 1 11 9 36705296 2770058206 4 22 ... edited out most of the lines for brevity ... 997 2423577330 2423577330 0 9 998 2217180459 3921538090 1 13 1000 3842377551 1690646521 2 12 Number of selects=329 Select statements that ran 3 times faster with V2 than with X5. T1=V2 T2=X5 SQLNUMBER T1_EXECUTE_PLAN_HASH T2_EXECUTE_PLAN_HASH T1_ELAPSED_IN_SECONDS T2_ELAPSED_IN_SECONDS --------- -------------------- -------------------- --------------------- --------------------- 95 3919277442 3919277442 0 2 210 3508255766 3508255766 0 2 282 3946849555 3085057493 0 6 347 3278587008 789099618 19 170 375 581067860 460184496 0 3 429 534521834 534521834 1 6 569 3953904703 3484839332 0 2 681 946688683 3451337204 1 6 697 908111030 2971368043 0 1 699 3756954097 1915145267 0 1 706 1121196591 1121196591 0 2 708 581067860 460184496 0 4 797 908111030 2841065272 0 5 950 786005624 2571241212 45 460 966 3151548044 3151548044 1 5 Number of selects=15 Summary of test results TEST_NAME TOTAL_ELAPSED_IN_SECONDS SELECTS_EXECUTED AVERAGE_ELAPSED_IN_SECONDS -------------------- ------------------------ ---------------- -------------------------- X5 5545.9999999999999999999 486 11 V2 21138 486 43
Of the tested statements 329 ran 3 or more times faster on the X5. But 15 selects ran 3 or more times faster on the old V2. So, I needed to test the 15 selects again on both servers.
I’m not sure if it was smart or not, but I decided to run all the selects 5 times in a row to maximize caching. The X5 is new and not in use so there wouldn’t be any activity to stimulate caching. My test script for the X5 looked like this:
truncate table test_results; execute TEST_SELECT.execute_all('X5'); execute TEST_SELECT.execute_all('X5'); execute TEST_SELECT.execute_all('X5'); execute TEST_SELECT.execute_all('X5'); execute TEST_SELECT.execute_all('X5'); execute TEST_SELECT.reexecute_errored('X5'); execute TEST_SELECT.reexecute_errored('X5'); execute TEST_SELECT.reexecute_errored('X5'); execute TEST_SELECT.reexecute_errored('X5'); execute TEST_SELECT.reexecute_errored('X5');
After we made sure that the system had cached everything, all 15 selects ran, on average, 4 times faster on the X5 than the V2:
TE SQLNUMBER SQL_ID EXPLAIN_PLAN_HASH EXECUTE_PLAN_HASH ROWS_FETCHED ELAPSED_IN_SECONDS CPU_USED_BY_THIS_SESSION CONSISTENT_GETS DB_BLOCK_GETS PARSE_TIME_ELAPSED PHYSICAL_READS ERROR_MESSAGE -- ---------- ------------- ----------------- ----------------- ------------ ------------------ ------------------------ --------------- ------------- ------------------ -------------- ---------------------------------------------------------------- X5 95 54a8k0yhbgyfq 3919277442 1 0 12 2583 14 0 1 V2 95 54a8k0yhbgyfq 3919277442 1 1 15 2583 14 1 1 V2 210 b132ygmp743h4 3508255766 0 2 19 1592 14 0 1 X5 210 b132ygmp743h4 3508255766 0 2 8 1430 14 0 1 V2 282 aw5f12xsa8c2h 3946849555 0 0 14 3468 14 0 2 X5 282 aw5f12xsa8c2h 3946849555 0 0 8 3322 14 2 2 V2 347 8ncbyjttnq0sk 3278587008 1 3 462 1203794 14 0 61838 X5 347 8ncbyjttnq0sk 3278587008 1 2 206 1126539 14 4 51849 X5 375 4yq5jkmz2khv5 581067860 0 0 9 14530 14 0 2 V2 375 4yq5jkmz2khv5 581067860 0 0 19 14686 14 1 2 V2 429 49pyzgr4swm4p 534521834 0 2 11 1814 14 0 0 X5 429 49pyzgr4swm4p 534521834 0 0 5 1638 14 1 0 X5 569 3afmdkmzx6fw8 630418386 694 0 74 70173 14 3 0 V2 569 3afmdkmzx6fw8 3527323087 694 1 73 68349 14 0 3588 V2 681 dyufm9tukaqbz 668513927 0 0 10 6298 14 0 2 X5 681 dyufm9tukaqbz 3317934314 0 0 8 6096 14 0 2 V2 697 1fqc3xkzw8bhk 908111030 0 0 3 1406 14 0 1 X5 697 1fqc3xkzw8bhk 908111030 0 0 2 1406 14 0 1 V2 699 03qk2cjgr4q2k 1915145267 31 0 476 95922 14 1 0 X5 699 03qk2cjgr4q2k 1915145267 31 0 272 96299 14 0 0 V2 706 28fnjtdhjqwrg 1121196591 0 0 21 1355 14 0 4 X5 706 28fnjtdhjqwrg 1121196591 0 0 13 1355 14 0 4 V2 708 2yrkwqs46nju0 581067860 0 0 14 14684 14 0 0 X5 708 2yrkwqs46nju0 581067860 0 0 9 14528 14 0 0 V2 797 dc5481yn8pm85 908111030 0 0 3 1407 14 0 2 X5 797 dc5481yn8pm85 908111030 0 0 2 1407 14 0 2 V2 950 by6n1m74j82rt 786005624 6 7 2087 249736 14 1 245443 X5 950 by6n1m74j82rt 2571241212 6 0 186 90897 14 0 3 X5 966 5c2n74gfrxwxx 3151548044 12 0 24 116360 14 9 84949 V2 966 5c2n74gfrxwxx 3151548044 12 0 52 119701 14 1 88002
The summary of the results:
Select statements that ran 3 times faster with X5 than with V2. T1=X5 T2=V2 SQLNUMBER T1_EXECUTE_PLAN_HASH T2_EXECUTE_PLAN_HASH T1_ELAPSED_IN_SECONDS T2_ELAPSED_IN_SECONDS --------- -------------------- -------------------- --------------------- --------------------- 95 3919277442 3919277442 0 1 429 534521834 534521834 0 2 569 630418386 3527323087 0 1 950 2571241212 786005624 0 7 Number of selects=4 Select statements that ran 3 times faster with V2 than with X5. T1=V2 T2=X5 SQLNUMBER T1_EXECUTE_PLAN_HASH T2_EXECUTE_PLAN_HASH T1_ELAPSED_IN_SECONDS T2_ELAPSED_IN_SECONDS --------- -------------------- -------------------- --------------------- --------------------- Number of selects=0 Summary of test results TEST_NAME TOTAL_ELAPSED_IN_SECONDS SELECTS_EXECUTED AVERAGE_ELAPSED_IN_SECONDS -------------------- ------------------------ ---------------- -------------------------- X5 4 15 0 V2 16 15 1
I guess it is no surprise that the X5 is faster than the five-year older V2. But, I thought it was a good example of how to use my testselect package to do see how a set of queries will run in two different situations.
You can find the full length video of my Delphix talk that I did at OakTable World on Tuesday here: url
Also, the OakTable folks have updated the OakTable World agenda page with video of all the talks. This has lots of good material and for free. Scroll down to the bottom of the page to find the links to the videos.
This morning was my last day of Oracle OpenWorld sessions and this afternoon and evening finished off my day with Delphix Sync.
The first talk was my only NoSQL talk. It was interesting because the claim was that NoSQL was good for large numbers of simple transactions. This seems to be a theme across a couple of sessions. The funny thing is that the NoSQL code reminded me of my pre-SQL mainframe Datacom DB database programming days. You specified the table and the index and fetched rows etc. You are the optimizer! Of course, you can do the same with simple one table queries in SQL. But, Oracle’s NoSQL may have some concurrency modes that Oracle’s main RDBMS doesn’t have for what that’s worth. The fun thing was that they had examples using Python and I’ve taken Python on Edx so I could read the code. Also, they talked about the REST API and I had done a few REST commands with JSON working through a demo of the Oracle database cloud a few weeks back. So, there were synergies with things I already know.
Next I went to this packed session by someone from Tumbler describing their approach to sharding and scaling. The two packed sessions I went to this week were both MySQL sessions and both by internet companies – Tumbler and Ticketmaster. They were in kind of small rooms and it was a little warm and stuffy. But, I found both very interesting. Supporting large web apps is a pretty cool proposition. Something that in another life would be fun to work on.
Next I went to a PeopleSoft session. I’ve done PeopleSoft for 20 years and I’m bored with it but I figure I should keep up with the latest. It was actually more of a functional presentation on modules that I have never used so most of the information was of no use to me. But, the new Fluid User Interface that I had never seen before interested me so I stayed long enough to get a feel for it. It seems that Oracle built it for tablets and maybe smart phones.
Next it was off to the hip (or should I say hipster :)) Hotel Zetta for Delphix Sync. It was a very cool event with a fun venue and lots of good snacks. No dinner for me tonight. I got a chance to do a ten minutes lightning talk that I built from three slides from Tuesday’s presentation. I got positive feedback but I felt kind of intimidated by all the Delphix techies. There were a lot of Delphix leaders and developers present as well as a number of people from larger customers. I heard a great talk on Delphix performance and other customers and Delphix employees spoke as well. I learned a lot and it makes me think I need to delve back into our Delphix environment and give it a thorough check out.
So, my OpenWorld/Delphix Sync week is over and I am beat. Like always these conferences leave me with information overload. I’m back to the prioritization thing that always dogs my step. There is just too much to learn and do. Where do I put my time? We shall see.
Well, it was a long day but it ended in a fun way.
Today I was back to the normal OpenWorld sessions starting with the general session. It was eye-opening because the speakers described a new CPU chip that they were using in their latest servers. It had some custom elements to support database processing. It was strange because I have recently been studying the latest Intel x86 documentation and it was interesting to compare Intel’s chips with the latest Sun/Oracle chip. I had read about the specialized SIMD instructions in the x86 family that Intel uses to speed up graphics. So, I was not surprised that Oracle is including more complex additions to their new chip with specialized instructions. Still, I question whether people are really going to buy anything but Intel x86 at this point due to the price/performance.
Next I went to a session describing the way a company used a tool called Chef to manage their Weblogic deployments. The session topic interested me because we also use Weblogic at US Foods. But it was a little hard to follow. Maybe it would have helped if I had been exposed to Chef before hearing the talk. Still, it was good to know that there are tools out there to help automate deployment of new systems.
Next I caught a PeopleSoft in the Cloud talk. It seems that you will install the latest version of PeopleTools in a very different way than you did in the past. I got the feeling that this was just a part way step toward fully setting up PeopleSoft to run in the cloud.
Then I went to a really cool talk about how Ticketmaster sells 20,000 tickets in one minute. It was about their MySQL architecture. They have a large farm of MySQL servers supporting the queries behind their ticketing web site. But, they use Oracle for the updates related to ticket purchases.
Then I went to a talk on Oracle ZFS. I get the feeling that I need to learn more about ZFS. It seems that ZFS is an ancestor of Delphix and I know that there is a free OpenZFS that I might play with. I think that Tim Gorman, who works for Delphix, mentioned something about OpenZFS at his Ted talk at Oak Table World Tuesday so there may be some relationship.
Lastly I went to a talk about how you can use Oracle’s Enterprise Manager to support both on site and cloud databases. It sounds good but I think it still need to mature over time to support the cloud systems more fully.
Then at 5:30 pm I went to a fun bloggers party sponsored by Pythian and the Oracle Technology Network (OTN). I’m not a big party person but I had a good time. It was easy to strike up a conversation with people since we had a lot in common.
Anyway, enough for today. One more day Thursday and then my brain will overflow.
Instead of going to the normal OpenWorld events today I went to OakTable World. Now my brain is fried from information overload.
It started at 8 am with a nice talk about hash joins and Bloom filters. Toon Koppelaars had some nice moving graphics showing how bloom filters work. I’ve studied Bloom filters before but I’m not sure I understood it with the clarity that I had after this talk.
Then I did my talk at 9 am. The best part for me was that we had a number of questions. I ended up skipping several slides because of time but I felt like we helped people get what they wanted out of it by having the questions and discussion. In retrospect my talk could have used more of an introduction to Delphix itself for this audience but I think we covered the essentials in the end.
Next Kellyn Pot’Vin-Gorman did more of a community service type of talk which was a change of pace. She had a Raspberry Pi project which was a stuffed bear that would take your picture and post it on Twitter. It was an example of the type of project that kids could do to get them interested in computer technology.
My brain began to turn to mush with Marco Gralike’s talk on XML and 12c In-Memory Column store. I’m sure I’m absorbed something but I’m not that familiar with Oracle’s XML features. Still, at least I know that there are in memory features for XML which I can file away for the future.
Several amusing 10 minute Ted talks followed. Most notable to me was Jonathan Lewis’ talk about how virtual columns and constraints on virtual columns could improve cardinality estimates and thus query performance.
Cary Millsap talked about a variety of things including things like what he covered in his book. He shared how he and Jeff Holt were hacking into what I assume is the C standard library to diagnose database performance issues, which was pretty techy.
Gwen Shapira’s talk on Kafka was a departure from the Oracle database topics but it was interesting to hear about this sort of queuing or logging service. Reminds me in some ways of GGS and Tibco that we use at work but I’m sure it has different features.
Alex Gorbachev gave a high level overview of Internet of Things architectures. This boiled down to how to connect many possibly low power devices to something that can gather the information and use it in many ways.
Lastly, we went back to the Oracle database and my brain slowly ground to a halt listening to Chris Antognini’s talk on Adaptive Dynamic Sampling. I had studied this for my OCP but it has slowly leaked out of my brain and by 4 pm I wasn’t 100% efficient. But, I got a few ideas about things that I can adjust when tuning this feature.
Anyway, brief overview. I’m back to normal OpenWorld tomorrow but it was all OakTable today. It was a good experience and I appreciated the chance to speak as well as to listen.
It was a good first full day at Oracle OpenWorld. It started with the keynote led by Oracle’s CEO. Of course he was very upbeat about Oracle’s products. But, I found his comments about the economy and the way it affects IT spending more interesting than Oracle’s cloud products. My translation is that companies have a lot of older systems that they can’t easily retire or upgrade but they want to quickly add all this new functionality. I see that in my company so it rings true. I don’t really believe that the cloud is the long-term answer but it makes me wonder what the real answer is. I always come back to prioritization. I think prioritizing spending is more important than moving things to the cloud. You can’t afford to do everything so you have to make tough choices about what to spend your IT dollars on. That’s my opinion at least.
Next I went to a session on Coherence. I kind of went out of a sense of obligation since our company owns the product. But, it was a surprisingly good session. It had a lot in it about Java 8 and the new features in it for parallel processing. It made me want to dust off my Java skills and generally think about parallel processing in the Oracle database and how it relates to that in Hadoop, Scala, etc.
I went to two sessions on analytics, again out of a sense that I needed to learn about analytics and not due to any enthusiasm about it. The first session was really interesting, but the 3:30 session almost literally put me to sleep. The first session reminded me of some of the things in Craig Shallahamer’s forecasting book such as making a model of a system and doing validation of the model. Analytics seems to follow a similar process. But, by the late afternoon a non-technical session on analytics in banking nearly knocked me out.
Wedged between my two analytics sessions I went to a very cool In-Memory Option boot camp. I have not had the time or taken the time to look at the In-Memory Option and I got a nice fast hour-long exposure to it. I don’t know if the other people in the class were lost because there were a lot of explain plans flying by but it is the type of stuff I’m interested in so it was nice that it was so technical. The In-Memory Option reminded me a lot of Exadata smart scans and hybrid columnar compression.
Strangely enough multiple speakers pronounced columnar differently than I have done so I guess I will have to change. They emphasize the second syllable but I usually emphasize the first.
I also snuck in to the OakTable World presentation by Tanel Poder. It had to do with querying Hadoop clusters from Oracle databases using odbc/jdbc. Pretty cool. I also got to scope out the venue for my talk tomorrow in the process.
That’s enough for today. I got a lot of good information. I’ve slotted tomorrow for OakTable world so it will be interesting to see what all I can learn there.
I just finished reviewing my Delphix blog posts in preparation for the talks that I will give during OpenWorld. I find myself referring back to my blog to help me remember what I have done in the past. I was thinking that I needed to jog my memory so that I could answer questions when I give my talks. Some of the Delphix topics that I am speaking about occurred a year or two ago so my memories are fuzzy. But, reading my own posts brought a lot of the details back.
I thought I would list the Delphix posts, even though people can find these just by searching my web site. If you are coming to my talk and want more details or if you just want all the Delphix related information on my site I have collected it here.
I have two talks scheduled during OpenWorld next week.
The first talk is at OakTable World at 9 am on Tuesday.
The second talk is at Delphix Sync between 3 and 4 pm Thursday. The second talk is a ten minute “lightning talk” version of the first.
I hope to see you at one of the talks and if not I hope this post is valuable to you.
Here is a YouTube video of a short Toastmasters talk of mine about a DBA topic. It was probably too technical for the audience but may interest the readers of this blog.
Here was a post about a similar topic:
I will describe our journey as a new Delphix customer with its ups and downs. I tried to have the spirit of a user group talk where you get a real person’s experience that you might not get from a more sales oriented vendor presentation.
Kyle Hailey, a OakTable member and Delphix employee, will host my talk. I have been very impressed by Kyle’s technical knowledge and he will be with me to answer questions about Delphix that I could not answer. I think it will be a good combination of my real world user experience and his depth of technical background in Delphix and Oracle performance tuning.
If you are going to OpenWorld and if you want to know more about Delphix come check it out. Also, feel free to email me or post comments here if you have any questions about what the talk will cover.
The Arizona Oracle User Group has moved tomorrow’s meeting to Oracle’s offices on Camelback road:
Tuesday, September 15, we will be hosting the Arizona Oracle User Group on site at my employer US Foods at our office in Tempe. The topic will be Oracle Enterprise Manager 12c. Here is the meeting invite url: invite.
The invitation describes the content like this:
Join us for AZORA’s September meetup at US Foods. First, we’ll have Raj Chotalla from Intel sharing his team’s Oracle EM12c implementation strategy, architecture, and best-known methods. Next up will be Oracle’s Keith Anthony Marshall and Eric Dennis answering questions and providing live demos showing the power of Oracle Enterprise Manager for both Fusion Middleware and Oracle Database.
I hope to have some coworkers attend that can share US Foods experience with EM 12c as well maybe during the question and answer time.
I’m looking forward to it and I hope to see you there.
This morning a colleague of mine noticed an alert from our plan change monitor and prevented any impact on the users by forcing a SQL’s plan to its normal fast plan before the users could be affected. I want to share the monitor script and describe how we use its output.
I know this is long but I want to paste the SQL text of the monitor script here. Review my comments on each step. I can not remember the details of the script myself even though I wrote it but the comments hopefully will help:
set linesize 160 set pagesize 1000 set echo on set termout on set trimspool on set define off -- status active -- this plan isn't the most commonly -- run plan. -- average this plan in v$sql > 10 x average of -- most common plan -- status active sessions -- sql_id and plan_hash_value -- elapsed and executions -- max elapsed for the sql -- eliminate pl/sql which has -- plan hash value of 0 drop table active_sql; create table active_sql as select vs.sql_id, sq.plan_hash_value, max(sq.ELAPSED_TIME) elapsed, max(sq.executions) executions from v$session vs, v$sql sq where vs.sql_id=sq.sql_id and vs.SQL_CHILD_NUMBER=sq.child_number and vs.status='ACTIVE' and sq.plan_hash_value <> 0 group by vs.sql_id, sq.plan_hash_value; -- to get the most frequently -- used plan first get the number -- of exections by plan -- for each of the active sqls drop table plan_executions; create table plan_executions as select ss.sql_id, ss.plan_hash_value, sum(ss.executions_delta) total_executions from dba_hist_sqlstat ss, active_sql a where a.sql_id=ss.sql_id group by ss.sql_id, ss.plan_hash_value; -- use the previous table to get -- the plans that are most frequently -- used. note that two plans could -- have the same number of executions -- but this is unlikely. drop table most_frequent_executions; create table most_frequent_executions as select pe1.sql_id, pe1.plan_hash_value from plan_executions pe1 where pe1.total_executions = (select max(pe2.total_executions) from plan_executions pe2 where pe1.sql_id=pe2.sql_id); -- handle special case of two plans with -- same number of executions. -- pick one with highest plan value -- just to eliminate dups. drop table most_frequent_nodups; create table most_frequent_nodups as select mfe1.sql_id, mfe1.plan_hash_value from most_frequent_executions mfe1 where mfe1.plan_hash_value = (select max(mfe2.plan_hash_value) from most_frequent_executions mfe2 where mfe1.sql_id=mfe2.sql_id); -- get list of active sql that -- are not running the most -- frequently executed plan drop table not_most_freq; create table not_most_freq as select * from active_sql where (sql_id,plan_hash_value) not in (select sql_id,plan_hash_value from most_frequent_nodups); -- working on this part of the logic: -- average this plan in v$sql > 10 x average of -- most common plan -- get average elapsed of most -- frequently executed plans -- add 1 to handle case of 0 executions drop table avg_elapsed_most_f; create table avg_elapsed_most_f as select ss.sql_id, ss.plan_hash_value, sum(ss.ELAPSED_TIME_DELTA)/(sum(ss.executions_delta)+1) avg_elapsed from DBA_HIST_SQLSTAT ss, most_frequent_nodups nd where ss.sql_id = nd.sql_id and ss.plan_hash_value = nd.plan_hash_value group by ss.sql_id, ss.plan_hash_value; -- get list of the sqls that are running -- the plan that isn't most frequently -- executed and has an average elapsed -- more than 10 times the average of -- the most frequently executed plan -- add 1 to executions to prevent -- divide by zero drop table more_than_10x; create table more_than_10x as select n.sql_id, n.plan_hash_value from not_most_freq n, avg_elapsed_most_f m where (n.elapsed/(n.executions+1)) > 10 * m.avg_elapsed and n.sql_id=m.sql_id; spool planchangemonitor.log select name db_name from v$database; -- The listed sql_id and plan_hash_value items correspond to -- sql statements that have plans that may be -- inefficient and need to be investigated. -- The session id and username are included if a -- session is currently running the sql with the plan. select 'CHANGED '||'PLAN' flag, m.sql_id, m.plan_hash_value, s.sid, s.username from more_than_10x m, v$session s, v$sql q where m.sql_id=s.sql_id(+) and m.plan_hash_value=q.plan_hash_value(+) and s.sql_id=q.sql_id and s.SQL_CHILD_NUMBER=q.child_number order by m.sql_id, m.plan_hash_value, s.sid, s.username; spool off
If I remember correctly I think the script looks for sessions running a plan whose current run time is 10 times that of the most frequently executed plan. This script is not perfect. The join to v$sql is not perfect and in some cases you can get duplicates. People could quibble about the approach. Why 10 times the previous run time? I thought about more complex approaches but I just needed to get something in place. But, on one database with a lot of small transactions we have made good use of this script, despite its flaws.
This morning my colleague noticed emails from the script that runs this SQL listing a query whose plan had changed. The output looked like this:
FLAG SQL_ID PLAN_HASH_VALUE SID USERNAME ------------ ------------- --------------- ---------- --------- CHANGED PLAN 75ufmwrcmsuwz 2484041482 35 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 394 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 395 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 446 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 463 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 464 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 544 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 613 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 631 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 665 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 678 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 738 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 746 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 750 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 752 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 1333 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 1416 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 1573 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 1943 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 1957 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 3038 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 3445 YOURUSER CHANGED PLAN 75ufmwrcmsuwz 2484041482 3816 YOURUSER
I changed the real user to YOURUSER. This output indicates that 23 sessions were all running the same SQL – sql_id=75ufmwrcmsuwz – and that this SQL was running on a new plan that was causing the SQL to run at least 10 times the normal run time. In fact it was about 30 times as long.
To resolve the issue my colleague used our script to find the history of plans for 75ufmwrcmsuwz.
select ss.sql_id, 2 ss.plan_hash_value, 3 sn.END_INTERVAL_TIME, 4 ss.executions_delta, 5 ELAPSED_TIME_DELTA/(executions_delta*1000) "Elapsed Average ms", 6 CPU_TIME_DELTA/(executions_delta*1000) "CPU Average ms", 7 IOWAIT_DELTA/(executions_delta*1000) "IO Average ms", 8 CLWAIT_DELTA/(executions_delta*1000) "Cluster Average ms", 9 APWAIT_DELTA/(executions_delta*1000) "Application Average ms", 10 CCWAIT_DELTA/(executions_delta*1000) "Concurrency Average ms", 11 BUFFER_GETS_DELTA/executions_delta "Average buffer gets", 12 DISK_READS_DELTA/executions_delta "Average disk reads", 13 ROWS_PROCESSED_DELTA/executions_delta "Average rows processed" 14 from DBA_HIST_SQLSTAT ss,DBA_HIST_SNAPSHOT sn 15 where ss.sql_id = '75ufmwrcmsuwz' 16 and ss.snap_id=sn.snap_id 17 and executions_delta > 0 18 and ss.INSTANCE_NUMBER=sn.INSTANCE_NUMBER 19 order by ss.snap_id,ss.sql_id; SQL_ID PLAN_HASH_VALUE END_INTERVAL_TIME EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms Cluster Average ms Application Average ms Concurrency Average ms Average buffer gets Average disk reads Average rows processed ------------- --------------- ------------------------- ---------------- ------------------ -------------- ------------- ------------------ ---------------------- ---------------------- ------------------- ------------------ ---------------------- 75ufmwrcmsuwz 2297146707 30-AUG-15 09.00.54.472 AM 830 587.207673 49.9638554 541.617188 0 0 0 5234.01928 125.083133 332.66747 75ufmwrcmsuwz 2297146707 30-AUG-15 10.00.46.872 AM 1178 578.205867 49.3972835 532.377174 0 0 0 4870.22326 126.048387 324.50764 75ufmwrcmsuwz 2297146707 30-AUG-15 11.00.39.206 AM 1433 631.484713 49.1486392 585.826676 0 0 0 4624.11305 125.446615 299.57083 75ufmwrcmsuwz 2297146707 30-AUG-15 12.00.32.247 PM 1620 592.593823 49.5987654 546.29731 0 0 0 4744.17284 121.961728 312.735185 75ufmwrcmsuwz 2297146707 30-AUG-15 01.00.20.783 PM 1774 534.412339 51.059752 485.46836 0 0 0 4983.44983 119.564825 326.067644 75ufmwrcmsuwz 2297146707 30-AUG-15 02.00.08.843 PM 1757 447.385105 44.9345475 404.415659 0 0 0 4525.13147 107.277746 293.739328 75ufmwrcmsuwz 2297146707 30-AUG-15 03.00.57.308 PM 1626 431.718507 45.904059 388.200416 0 0 0 4462.93296 118.027675 300.724477 75ufmwrcmsuwz 2297146707 30-AUG-15 04.00.44.209 PM 1080 375.905966 44.212963 334.434835 0 0 0 4766.81574 109.157407 310.712037 75ufmwrcmsuwz 2297146707 30-AUG-15 05.00.36.753 PM 707 368.289475 44.3140028 327.166223 0 0 0 4894.20509 108.050919 315.565771 75ufmwrcmsuwz 2297146707 30-AUG-15 06.00.26.449 PM 529 341.483588 39.6408318 305.47356 0 0 0 4381.19849 96.2646503 288.030246 75ufmwrcmsuwz 2297146707 30-AUG-15 07.00.17.636 PM 356 380.733635 41.5168539 342.034876 0 0 0 4553.4691 105.272472 292.283708 75ufmwrcmsuwz 2297146707 30-AUG-15 08.00.11.170 PM 313 435.449406 37.1565495 402.636489 0 0 0 4144.30351 92.8690096 264.923323 75ufmwrcmsuwz 2297146707 30-AUG-15 09.00.07.894 PM 214 516.455509 44.5794393 477.020692 0 0 0 4567.67757 114.415888 289.607477 75ufmwrcmsuwz 2297146707 30-AUG-15 10.00.59.991 PM 182 720.749681 44.3956044 684.439467 0 0 0 3811.83516 95.2362637 239.027473 75ufmwrcmsuwz 2297146707 30-AUG-15 11.00.47.388 PM 83 1043.1503 43.7349398 1008.41358 0 0 0 3575.96386 114.289157 250.120482 75ufmwrcmsuwz 2484041482 30-AUG-15 11.00.47.388 PM 6 25314.6558 4311.66667 22971.4913 0 0 0 78533.8333 69813.3333 157.833333 75ufmwrcmsuwz 2484041482 31-AUG-15 12.00.36.033 AM 96 25173.7346 5105.20833 21475.9516 0 0 0 135242.802 62433.3125 118.395833 75ufmwrcmsuwz 2484041482 31-AUG-15 01.00.29.070 AM 39 26877.0626 5540.51282 22977.6229 0 0 0 139959.308 68478.1795 93.7179487 75ufmwrcmsuwz 2484041482 31-AUG-15 02.00.18.755 AM 38 26993.419 5998.15789 22768.4285 0 0 0 153843.342 74492 149.342105 75ufmwrcmsuwz 2484041482 31-AUG-15 03.00.09.615 AM 29 25432.5074 4952.06897 22288.7966 0 0 0 112813.552 69803.0345 187.689655 75ufmwrcmsuwz 2484041482 31-AUG-15 04.00.01.749 AM 34 27281.7339 4541.47059 24543.1609 0 0 0 95144.5 69187.3824 135.676471 75ufmwrcmsuwz 2484041482 31-AUG-15 05.00.55.876 AM 146 30512.9976 5421.43836 26984.2559 0 0 0 115531.801 71886.6644 136.321918 75ufmwrcmsuwz 2484041482 31-AUG-15 06.00.50.769 AM 405 24339.6641 4853.40741 20794.0472 0 0 0 115490.01 62004.4642 229.106173 75ufmwrcmsuwz 2484041482 31-AUG-15 07.00.43.047 AM 838 27552.3731 4903.06683 23661.2101 0 0 0 111654.558 58324.9511 176.058473 75ufmwrcmsuwz 2484041482 31-AUG-15 08.00.50.864 AM 1653 30522.8358 4550.56261 26526.2183 0 0 0 93818.418 49865.4701 137.212341
Based on this output my colleague chose 2297146707 as the good plan. She ran coe_xfr_sql_profile.sql which is found in SQLT’s utl directory like this:
coe_xfr_sql_profile.sql 75ufmwrcmsuwz 2297146707
Then she ran the generated script:
This forced the plan back to its earlier efficient plan and my coworker did all of this early this morning before we reached our peak time of usage and before the bad plan could affect our users.
P.S. This example is from an 126.96.36.199 database running on HP-UX Itanium.
The class is completely free and taught at a very high level of quality.
It teaches computer science concepts that apply in any programming language but also teaches Python programming.
It is valuable information in the increasingly computer oriented world and economy and the class is free which is remarkable given its quality.
Here is the class name:
MITx: 6.00.1x Introduction to Computer Science and Programming Using Python
Development asked me to look at ways to cut the run time of a series of PeopleSoft payroll batch jobs so I took an AWR report of the entire 4 hour period. Based on the waits, the percentage of the elapsed time spent using I/O and the buffer pool advisory I chose to double the size of the buffer cache. But, this added memory did not improve the run time of the batch jobs. Maybe the affected blocks are only read into memory once so they would not get cached no matter how big the buffer pool was.
Here is the original run on June 22 with the original memory settings:
Cache SizesBegin End Buffer Cache: 3,328M 3,424M Std Block Size: 8K Shared Pool Size: 1,600M 1,520M Log Buffer: 7,208K
Top 5 Timed Foreground EventsEvent Waits Time(s) Avg wait (ms) % DB time Wait Class db file sequential read 1,910,393 10,251 5 72.03 User I/O DB CPU 2,812 19.76 log file sync 35,308 398 11 2.80 Commit resmgr:cpu quantum 31,551 62 2 0.43 Scheduler db file scattered read 7,499 60 8 0.42 User I/O Buffer Pool Advisory
- Only rows with estimated physical reads >0 are displayed
- ordered by Block Size, Buffers For Estimate
In the SQL ordered by Elapsed Time report the top batch job SQL was 99.14% I/O
Based on this report it seems that the number of physical reads could be reduced to about 20% what they were on June 22 by doubling the size of the buffer cache. But, adding the memory did not cut the number of physical reads in any major way.
Here is yesterday’s run:
Cache SizesBegin End Buffer Cache: 6,848M 6,816M Std Block Size: 8K Shared Pool Size: 3,136M 3,136M Log Buffer: 16,572K
Top 5 Timed Foreground EventsEvent Waits Time(s) Avg wait (ms) % DB time Wait Class db file sequential read 1,789,852 10,173 6 72.15 User I/O DB CPU 2,970 21.06 log file sync 37,562 200 5 1.42 Commit resmgr:cpu quantum 24,996 59 2 0.42 Scheduler db file scattered read 5,409 54 10 0.38 User I/O Buffer Pool Advisory
- Only rows with estimated physical reads >0 are displayed
- ordered by Block Size, Buffers For Estimate
After the memory add the same top batch job SQL was 98.80% I/O. Some improvement but not nearly as much as I expected based on the buffer pool advisory.
I guess the moral of the story is that the buffer pool advisory does not apply to specific workloads and is only a general guideline. Maybe this is the same kind of fallacy that you have with buffer cache hit ratios where certain workloads make the ratio irrelevant. Here were the hit ratios: Before 98.59% After 98.82%. Basically these are the same.
I just thought I would share this to document a real case of using the buffer pool advisory and having it not produce the expected results.
We keep 6 weeks of history in the AWR on our databases, but I want to capture some information for long-term trending. What I really want to do is capture some metrics and put them in some database tables to use to generate reports, but I have not had time to build the scripts to do that. So, instead I built a simple set of scripts to capture an AWR for the previous month. Since we have 6 weeks of history if I run my report in the first week of a month all the days of the previous month should still be in the AWR. I have just finished building this script so I can not promise that there is value in keeping monthly AWR reports but I thought it was worth sharing it. Maybe something in the code will be useful to someone. Here is the script:
-- Has to be run in the first week of the month so the entire -- previous month is available. We keep 6 weeks of awr history. -- setup columns for snapshots column bsnap1 new_value bsnap1s noprint; column esnap1 new_value esnap1s noprint; column filenm new_value filenms noprint; -- get snap id for first day of previous month select min(snap_id) bsnap1 from dba_hist_snapshot where extract(month from END_INTERVAL_TIME)= extract(month from (sysdate-to_number(to_char(sysdate,'DD')))) and STARTUP_TIME= (select max(STARTUP_TIME) from dba_hist_snapshot where extract(month from END_INTERVAL_TIME)= extract(month from (sysdate-to_number(to_char(sysdate,'DD'))))); -- get snap id for last day of previous month select max(snap_id) esnap1 from dba_hist_snapshot where extract(month from END_INTERVAL_TIME)= extract(month from (sysdate-to_number(to_char(sysdate,'DD')))); -- get html file name select name|| '_'|| to_char(extract(month from (sysdate-to_number(to_char(sysdate,'DD')))))|| '_'|| to_char(extract(year from (sysdate-to_number(to_char(sysdate,'DD')))))|| '.html' filenm from v$database; -- get awr report define report_type='html'; define begin_snap = &bsnap1s; define end_snap = &esnap1s; define report_name = '&filenms'; define num_days = 0; @@$ORACLE_HOME/rdbms/admin/awrrpt.sql undefine report_type undefine report_name undefine begin_snap undefine end_snap undefine num_days
If the database bounced during the previous month we get the last set of snapshots after the last bounce.
I am not sure whether this approach will give us any benefits but I think it may help to show how to use queries to pick begin and end snapshots and then run an AWR report.
The tricky part of the code is this:
extract(month from (sysdate-to_number(to_char(sysdate,'DD'))))
It just returns the previous month as a number. It is August now so here is what it returns today:
SQL> select 2 extract(month from (sysdate-to_number(to_char(sysdate,'DD')))) 3 prev_month 4 from dual; PREV_MONTH ---------- 7
sysdate-to_number(to_char(sysdate,’DD’)) is the last day of the previous month:
SQL> select sysdate-to_number(to_char(sysdate,'DD')) last_day 2 from dual; LAST_DAY --------- 31-JUL-15
I applied the current July patch sets to a 11.2 and a 12.1 test database. Now I have a 188.8.131.52.7 and a 184.108.40.206.4 test database. It is helpful to have test databases that are on the most current patch sets and releases. If I see unexpected behavior on some other database I can try the same thing on the patched test databases to see if some patch changed the behavior to what I expect. Also, our production databases are all on 220.127.116.11 or earlier releases so I can check whether the new fully patched 12.1 release has different behavior than our older systems.
Here are the patch numbers:
6880880 – current version of opatch
20760982 – 18.104.22.168.7
20831110 – 22.214.171.124.4
My test environments are on x86-64 Linux.
I registered myself for Oracle OpenWorld and I have my hotel reserved and my flights ticketed.
I think it has been over 12 years – probably more like 15 years – since I went to OpenWorld. I went at least once between December 1994 and November 2003 when I still lived in Florida and was working on Oracle databases. But since I moved from Florida I do not believe that I have been to the conference. I have presented at Collaborate and ECOUG conferences since then. I’m thinking that maybe next year I will try to present at the RMOUG conference. I live in Arizona so RMOUG is close. ECOUG was a nice distance when I still lived near the East Coast. I like the smaller conferences and I have a better shot at getting a presentation accepted there.
But, this year it is OpenWorld and I am looking forward to it. I may get a chance to interact with some Delphix employees and customers. Also, I’m hoping to check out some technical presentations by the Oak Table members. And it does not hurt to hear from Oracle itself on its technology. No doubt there will be many of Oracle’s top technical leaders presenting. And, any interaction I get with fellow DBA’s will be great. It is always good to hear from people about their own experiences which may differ from mine.
Anyway, I’m all booked for OpenWorld. Hope to see you there.
I just finished the last program for a computer science class on edX and I urge you to try it.
I took this class:
MITx: 6.00.1x Introduction to Computer Science and Programming Using Python
I was more interested in how MIT taught the class than in the material itself because I already know the subjects covered.
The class taught the basics of programming – expressions, variables, loops, if statements, and functions.
It also had a large focus on bisection or binary search and the performance benefits of this type of search over sequentially reading through a list.
It also covered lists, hash tables, trees, stacks, and queues.
It discussed object-oriented programming.
The class concluded with the professor stating that the programming and computer science skills taught in this class are key to advancing your career, even if you do not work in a computer related job.
I interacted with a number of students in the class and found some that were in other fields and were having success taking the class. Others were in business computing or IT and yet did not have a computer science background so they were good programmers but learning new concepts. Many struggled with the class but, it is free, and is given often. The class starts up again August 26th. Nothing stops you from taking it multiple times.
I tried to think about whether I should recommend this class to the people I work with as a method of helping develop my coworkers that do not have experience in these areas. At first I thought that the subject is too academic and has no connection to their jobs. But, after thinking about it for a while, I now believe that just the opposite is true.
Searching for practical applications of the class, I first remembered the programs that we wrote that compared searching sequentially through a list to using binary search. In one test case the sequential method took 15 seconds but the binary search took less than one second. This reminded me so much of tuning Oracle SQL queries. The sequential scan of the list was like a full table scan in Oracle. The binary search was like looking up a single row using an index scan. As I tune Oracle queries my computer science knowledge of binary search and binary trees makes it easy to understand index and full table scans.
In another example, we recently had slowness on a Weblogic portal server. CPU was getting maxed out and the CPU spent most of its time in a Java ConcurrentHashMap object. I don’t know the internals of Weblogic and I have never used a ConcurrentHashMap but I know how hashing works. I know that hashing is very fast until your hash table fills up or if the hash function distributes the items in an unequal way. My knowledge of hashing helped me grasp why our portal server was using a lot of CPU despite my lack of access to its internals.
So, contrary to my original fear that the edX class was too academic and not practical I believe that the concepts covered are very practical. If you do not know how binary search works or what a binary tree is you will benefit from 6.00.1x on edX. If you can not explain how a hash table works and what causes hashing to slow down you can learn from 6.00.1x. And, if you have never written a computer program, although you may find the class difficult and have to take it more than once, you will benefit from 6.00.1x on edX.