Today was the first day of the United Kingdom Oracle User Group (UKOUG) - usually called the Super Sunday! I already participated in today’s first session three weeks ago in Geneva, but my colleague Nicolas Jardot was talking about Solving Critical Issues with the 12c Optimizer, covering histogram collection, temporary table statistics and adaptive query optimization.
There were around thirty people at his session, and a lot of UKOUG speakers were very interested and asked a lot of questions with a typical British accent that did not destabilize Nicolas at all. All my congratulations to Nicolas!
The next session, preseented by Tony Hasler, was about the Cost Based Optimizer: Why does the CBO goes wrong?
Mr. Hasler explained that the expectations around the CBO are unreasonably high.
He explained us a lot of issues around the following points:
- Cardinality errors (correlated columns, functions and expressions, stale statistics or daft data types)
- Partition level statistics (which might mask the need for extended statistics)
- Badly written code
- Missing information
- Transitive closure (you write the statement in another way and the plan is different)
Other issues were discussed as possible reasons of changing execution plans:
- Caching effects
- Bad physical design
His recommendations were the following:
- Proactively tune the critical statements
- Use optimizer hints cautiously
- Carefully consider the statistics gathering strategy (i. e.: gather statistics less frequently)
The last session of the afternoon was a live demo: Andy Colvin applied the latest Exadata patches to one of his company Exadata racks.
We assisted the live demo of the Exadata patching process with instructions and best practices on how to apply patches on an Exadata environment.
He used a utility named dbnodeupdate.sh which helps in the process of updating the nodes in an Exadata environment. Oracle last year introduced yum updates in Exadata compute nodes with Oracle 22.214.171.124.0.
Before that time, each Exadata server patch had a pack including a shell script that forced new RPMs on the new compute node. But if the users began to install their own packages, the installation had a lot of chances to fail.
The dbnodeupdate can run the bootstrap phase to directly update from 126.96.36.199.2 version to 188.8.131.52.1 in two reboots and running three commands. It analyses the free space on the compute node volume groups to back up the root volume, providing a way to rollback. It also has the possibility to disable the CRS on reboot and relink the Oracle Homes and enable CRS on startup.
It really was a "Super Sunday" afternoon, I will have to impatiently wait until tomorrow to discover some features like "Exadata Management deep dive with OEM 12c" or "10 optimizer hints you can't miss"!
What is advertised by the siteAnd here is the disclaimer from Face Book:
What Face Book is sharing
Who should I trust - I do not care about my public profile, all the information I put on face book is public and may not be accurate, so it's OK to have this shared with other for their use.
My list of friends - no way / I do not want them to get spam. What is of interest to me may not be of interest to them.
@Facebook, I do not mind sharing my face book email, but that's about it.
Long-time readers know that I have had a close affiliation with the Sakai Foundation at teams and served on the Board of Directors relatively recently. This year, Sakai merged with the Jasig Foundation to form the Apereo Foundation. The purpose of the new organization is to become a sort of Apache Foundation of higher education in the sense that it is an umbrella community where members of different open source projects can share best practices and find fellow travelers for higher ed-relevant open source software projects. In this merger, the whole is greater than the sum of its parts. For example, last week the foundation just announced that the board of the Opencast project, which supports the Matterhorn lecture capture and video management project, voted to effectively merge with the Apereo Foundation. (The decision is subject to discussion and feedback by the Opencast community.) On the one hand, Opencast probably wouldn’t have joined the Sakai Foundation because they want to interoperate with all LMSs and wouldn’t want to be perceived as favoring one over another. On the other hand, had they joined Jasig prior to the merger, they would not have had access to the rich community of education-focused technologists that Sakai has to offer. (Jasig has historically focused mostly on tech-oriented solutions like portals and identity management solutions). The fact that the Opencast community is interested in joining Apereo is a strong indicator that the new foundation is achieving its goal of establishing an ecumenical brand.
And in that context, I am pleased to tell you that I will be involved with the foundation in a more ecumenical role. Specifically, I will be facilitating an advisory council.About the Council
The goal of the council is to provide the Apereo Foundation Board and community with perspective. When you are running an open source software project, it’s easy to get a little near-sighted as you focus on the hard work of shipping code, gathering requirements, coordinating volunteer developers, finding additional volunteers, and so on. You can get lost in the details that are essential to short-term viability for the project but that can distract you from issues of long-term sustainability, such as thinking about schools that have not adopted your project that might, or about important changes on campuses that are relevant tot he ways in which your software will be perceived and used. The Advisory Council is meant to offer some of that perspective. We have invited participants who are one or several steps removed from the projects. They might work at a school that is heavily involved but not be personally heavily involved. They might be at a school that has adopted a community project but is not deeply involved with the community at the moment. They might have been active in the community in their previous job but further removed from it at present. Or they might come from a school that has not adopted any of the projects but could be receptive to adopting the right project some time in the future.
The group will convene four times a year to provide feedback on presentations from the Board and from various project teams on their vision, goals and plans. That’s it, really. Provide perspective. It’s a simple role, but an important one. We hope that if all goes well our council members will choose to act as informal ambassadors between the Apereo community and the broader community of higher education, but that would really be a byproduct of success rather than something that we’re asking our councilors to do.The Members
I am absolutely delighted with the group that we have assembled:
- Kimberly Arnold, Evaluation Consultant, University of Wisconsin
- Lois Brooks, Vice Provost of Information Services, Oregon State University
- Laura Cierniewicz, Director of the Centre for Educational Technology, University of Cape Town
- Tedd Dodds, CIO, Cornell University
- Kent Eaton, Provost, McPherson College
- Stuart Lee, Deputy CIO, IT Services, Oxford University
- Patrick Masson, General Manager, Open Source Initiative
- Lisa Ruud, Associate Provost, Empire Education Corporation
I feel privileged to be able to work with this group.Diversity
If your goal is to provide perspective, then it is particularly important to get a diverse group together. Overall, I’m pleased to say that we have achieved diversity across a number of dimensions:
- Roles: I wanted to get a good balance of academic and IT stakeholders, as well as at least one ed tech researcher. We’ve achieved that.
- Institutions: We definitely achieved some diversity of institutions, particularly when you throw the Empire Education Corporation and the Open Source Initiative into the mix. In the future, though, I would like to get more representation from smaller schools that don’t typically get involved in open source projects.
- Geography: Apereo is a global community and ultimately needs global input. But we have to balance that against the need to have a workable spread of time zones among council members who will be meeting with each other mostly virtually. The compromise we struck this time around was to have one representative from Europe and one from Africa as a down payment toward that goal. Ultimately, we will probably need to have several regional advisory councils.
- Gender: In keeping with the Apereo Foundation Board’s stated goal of cultivating women leaders in the community, I am delighted that we have achieved gender balance in our council membership. By the way, this was not hard. Asking colleagues to recommend women who would be good isn’t any different from asking them to recommend academic deans or leaders from small schools.
- Race: I don’t know for certain how we did on this metric because I haven’t met some of the council members in person yet, but my sense is that it is either a near or a total failure. Going forward, I would like to shoot for more racial diversity.
* * * * *
So that’s the deal. I am looking forward to convening the first meeting of this group (probably in January) and getting to know them better. More generally, I am really happy with the direction that the new foundation is taking and am privileged to be able to play a small part in it.
There has been a significant amount of progress and interest in competency-based education, as Wisconsin has launched its Flexible Option program, federal legislators are pushing the concept, ‘patient zero’ has graduated from College for America, and resistance is emerging to the concept. I thought it might be useful to update and re-share the primer that was originally posted in August, 2012.
I’m not an expert in the academic theory and background of outcomes and competencies, so in this post I’ll summarize the key points from various articles as they relate to the current market for online programs. Links are included for those wanting to read in more depth.
What is Competency-Based Education?
SPT Malan wrote in a article from 2000 about the generally-accepted origins:
Competency-based education was introduced in America towards the end of the 1960s in reaction to concerns that students are not taught the skills they require in life after school.
Competency-based education (CBE) is based on the broader concept of Outcomes-based education (OBE), one that is familiar to many postsecondary institutions and one that forms the basis of many current instructional design methods. OBE works backwards within a course, starting with the desired outcomes (often defined through a learning objectives taxonomy) and relevant assessments, and then moving to the learning experiences that should lead students to the outcomes. Typically there is a desire to include flexible pathways for the student to achieve the outcomes.
OBE can be implemented in various modalities, including face-to-face, online and hybrid models.
Competency-based education (CBE) is a narrower concept, a subset or instance of OBE, where the outcomes are more closely tied to job skills or employment needs, and the methods are typically self-paced. Again based on the Malan article, the six critical components of CBE are as follows:
- Explicit learning outcomes with respect to the required skills and concomitant proficiency (standards for assessment)
- A flexible time frame to master these skills
- A variety of instructional activities to facilitate learning
- Criterion-referenced testing of the required outcomes
- Certification based on demonstrated learning outcomes
- Adaptable programs to ensure optimum learner guidance
The Council for Adult and Experiential Learning put out a paper last summer that examines the current state of CBE. In this paper the author, Rebecca Klein-Collins, shows that there is a spectrum of implementation of competency models.
One subset of institutions uses competency frameworks in the context of a course-based system. By course-based system, we mean that students take the same kinds of courses that have always been offered by colleges and universities: instructor-led and credit-hour based.
These may be offered on campus or off, in the classroom or online, accelerated or normally paced. These institutions define competencies that are expected of graduates, and students demonstrate these competencies by successfully completing courses that relate to the required competencies. In some cases, institutions embed competency assessments into each course. In most of the examples presented in this paper, the institution also offers the option of awarding credit for prior learning, and usually [prior learning assessments] is course-based as well.
There seems to be a fairly big jump, however, once the program moves into a self-paced model. For these self-paced CBE initiatives, which are the subject of recent growth in adoption, the current implementations of CBE tend to be:
- Flexible to allow for retaking of assessments until competency demonstrated; and
- Targeted at college completion for working adults.
What is driving the current growth and interest in competency-based models?
In a nutshell, the current emphasis and growth in CBE is driven by the desire to provide lower-cost education options through flexible programs targeted at working adults.
Playing a significant role is government at both the federal and state level. In March of 2013 the Department of Education offered guidance to encourage and support the new competency programs, and in President Obama’s higher ed plan unveiled in August, there was direct reference to competency programs:
To promote innovation and competition in the higher education marketplace, the President’s plan will publish better information on how colleges are performing, help demonstrate that new approaches can improve learning and reduce costs, and offer colleges regulatory flexibility to innovate. And the President is challenging colleges and other higher education leaders to adopt one or more of these promising practices that we know offer breakthroughs on cost, quality, or both – or create something better themselves:
- Award Credits Based on Learning, not Seat Time. Western Governors University is a competency-based online university serving more than 40,000 students with relatively low costs— about $6,000 per year for most degrees with an average time to a bachelor’s degree of only 30 months. A number of other institutions have also established competency-based programs, including Southern New Hampshire University and the University of Wisconsin system. [snip]
- Reduce Regulatory Barriers: The Department will use its authority to issue regulatory waivers for “experimental sites” that promote high-quality, low-cost innovations in higher education, such as making it possible for students to get financial aid based on how much they learn, rather than the amount of time they spend in class. Pilot opportunities could include enabling colleges to offer Pell grants to high school students taking college courses, allowing federal financial aid to be used to pay test fees when students seek academic credit for prior learning, and combining traditional and competency-based courses into a single program of study. The Department will also support efforts to remove state regulatory barriers to distance education.
Why has it taken so long for the model to expand beyond WGU?
Despite the history of CBE since the 1960′s, it has only been since the early 2000′s that CBE has started to take hold in US postsecondary eduction, with a rapid growth occurring in the past year. From my earlier post:
Consider that just [three] years ago Western Governors University stood almost alone as the competency-based model for higher education, but today we can add Southern New Hampshire University, the University of Wisconsin System, Northern Arizona University, StraighterLine and Excelsior College.
Why has it taken so long? Although there is a newfound enthusiasm for CBE from the Obama administration, there have been three primary barriers to adoption of competency-based programs: conflicting policy, emerging faculty resistance, and implementation complexity.
1) Conflicting Policy
In Paul Fain’s article at Inside Higher Ed, he described some of the policy-related challenges pertaining to CBE.
Competency-based higher education’s time may have arrived, but no college has gone all-in with a degree program that qualifies for federal aid and is based on competency rather than time in class.
Colleges blame regulatory barriers for the hold-up. The U.S. Education Department and accreditors point fingers at each other for allegedly stymieing progress. But they also say the door is open for colleges to walk through, and note that traditional academics are often skeptical about competency-based degrees.
In 2005 Congress passed a law intended to help Western Governors University (WGU) and other CBE models, defining programs that qualify for federal financial aid to include:
an instructional program that, in lieu of credit hours or clock hours as the measure of student learning, utilizes direct assessment of student learning, or recognizes the direct assessment of student learning by others, if such assessment is consistent with the accreditation of the institution or program utilizing the results of the assessment.
Despite this law, WGU did not even use the law due to policy complexity, opting instead to map its competencies to seat time equivalents.
In fact, the first program to use this “direct assessment” clause to fully distinguish itself from the credit hour standard was Southern New Hampshire University’s College for America program in March of 2013. As described in the Chronicle:
Last month the U.S. Education Department sent a message to colleges: Financial aid may be awarded based on students’ mastery of “competencies” rather than their accumulation of credits. That has major ramifications for institutions hoping to create new education models that don’t revolve around the amount of time that students spend in class.
Now one of those models has cleared a major hurdle. The Education Department has approved the eligibility of Southern New Hampshire University to receive federal financial aid for students enrolled in a new, self-paced online program called College for America, the private, nonprofit university has announced.
Southern New Hampshire bills its College for America program as “the first degree program to completely decouple from the credit hour.”
2) Emerging Faculty and Institutional Resistance
Not everyone is enamored of the potential of competency-based education, and there is an emerging resistance from faculty and traditional institutional groups. The American Association of Colleges and Universities published an article titled “Experience Matters: Why Competency-Based Education Will Not Replace Seat Time” which argued against applying competency models to liberal arts:
Perhaps such an approach makes sense for those vocational fields in which knowing the material is the only important outcome, where the skills are easily identified, and where the primary goal is certification. But in other fields—the liberal arts and sciences, but also many of the professions—this approach simply does not work. Instead, for most students, the experience of being in a physical classroom on a campus with other students and faculty remains vital to what it means to get a college education.
In addition, I am hearing more and more discomfort from faculty members, especially as there are new calls to broadly expand competency-based programs beyond the working adult population. I have yet to see much formal resistance from faculty groups, however, and the bigger challenge is cultural barriers within the institutions or systems where CBE programs have been announced.
3) Implementation Complexity
I would add that the integration of self-paced programs not tied to credit hours into existing higher education models presents an enormous challenge. Colleges and universities have built up large bureaucracies – expensive administrative systems, complex business processes, large departments – to address financial aid and accreditation compliance, all based on fixed academic terms and credit hours. Registration systems, and even state funding models, are tied to the fixed semester, quarter or academic year – largely defined by numbers of credit hours.
It is not an easy task to allow transfer credits coming from a self-paced program, especially if a student is taking both CBE courses and credit-hour courses at the same time. The systems and processes often cannot handle this dichotomy.
I suspect this is one of the primary reasons the CBE programs that have gained traction to date tend to be separated in time from the standard credit-hour program. CBE students either take their courses, reach a certain point, and transfer into a standard program; or they enter a CBE program after they have completed a previous credit-hour based program. In other words, the transfer between the competency world and credit-hour world happen along academic milestones. Some of the new initiatives, however, such as the University of Wisconsin initiative are aiming at more of a mix-and-match flexible degree program.
The result is that the implementation of a competency-based initiative can be like a chess match. Groups need to be aware of multiple threats coming from different angles, while thinking 2 or 3 moves at a time.
It will be interesting to watch the new initiatives develop. However difficult their paths are, I think this is an educational delivery model that will continue to grow.
Full disclosure: Western Governors University has been a client of MindWires Consulting.
Update 12/1: I have bumped this link from the comments via Nick DiNardo. It is a ~20 minute interview with ‘student zero’ or ‘patient zero’ of the College for America program, giving direct feedback from a CBE student. Thanks Nick.
The post Competency-Based Education: An (Updated) Primer for Today’s Online Market appeared first on e-Literate.
Along with the output would be helpful to me to note please what release and what "kind" of system OLTP / ERP / DW etc. If done on a 12c system using CDB/PDB please run from CDB connection.
Output back here or emailed to firstname.lastname@example.org would be much appreciated.
select round(bytes/(1024*1024),0) count, 'Shared Pool Size' info from v$sgainfo where name = 'Shared Pool Size'
select count(*), 'First copy of sql' from v$sql_shared_cursor where child_number = 0
select count(*), 'Not first copy' from v$sql_shared_cursor where child_number >= 1
select count(*), 'Total Execution plans' from v$sql_shared_cursor
select count(*), 'Count from sqlarea' from v$sqlarea;
I have, in the past, used the dbms_rowid package to create rowids from block addresses (typically faking the first and last rowids that could appear in an extent); but I’ve just been sent a piece of information by Valentin Nikotin that’s going to make me go back and check whether what I’ve done with the package will always give me the correct results. Here’s a little demonstration code that highlights the issue:
create table t1 as select rownum id, dbms_rowid.rowid_create( rowid_type => 1, object_number => 250000, relative_fno => 5, block_number => rownum + 2330000, row_number => 1 ) basic_rowid, cast( dbms_rowid.rowid_create( rowid_type => 1, object_number => 250000, relative_fno => 5, block_number => rownum + 2330000, row_number => 1 ) as rowid ) cast_rowid from dual connect by level <= 50 ; spool rowid_trap select count(*) from t1 where basic_rowid = cast_rowid; select count(*) from t1 where basic_rowid != cast_rowid; select * from t1 order by basic_rowid; select * from t1 order by cast_rowid; spool off
As you can see I create a table with 50 rows, generating the nominal rowids that might appear as the first rowid for each of several consecutive blocks in a file. (The choice of rownum + 2330000 is based on the example that Valentin sent me using a real physical object and references to dba_extents).
As you might expect, when I compare the rowid value returned from dbms_rowid.rowid_create() with the equivalent value cast to rowid I get a perfect match across the table.
If you look at the output from sorting the rows, though, the rows appear in a different order. Check the table definition and you’ll see that dbms_rowid.create_rowid() seems to have returned a varchar2(), holding the “readable” representation of a rowed (and if you check stdbody.sql from $ORACLE_HOME/rdbms/admin, you’ll find that the pl/sql definition for rowed is as a subtype of varchar2()). Unfortunately the varchar2() representation of a rowid doesn’t follow the same sorting rules as the internal representation of a rowid.
The upshot of this is that if you’ve been using dbms_rowid.create_rowid() to generate a list of rowid ranges that you want to use to break a big object into little pieces for “manual parallelism” then you may have managed to generate a list of ranges that didn’t cover the whole object. Consider, for example, the following 20 rows I get from sorting by the character version of the rowid:
46 AAA9CQAAFAAI42+AAB AAA9CQAAFAAI42+AAB 47 AAA9CQAAFAAI42/AAB AAA9CQAAFAAI42/AAB 36 AAA9CQAAFAAI420AAB AAA9CQAAFAAI420AAB 37 AAA9CQAAFAAI421AAB AAA9CQAAFAAI421AAB 38 AAA9CQAAFAAI422AAB AAA9CQAAFAAI422AAB 39 AAA9CQAAFAAI423AAB AAA9CQAAFAAI423AAB 40 AAA9CQAAFAAI424AAB AAA9CQAAFAAI424AAB 41 AAA9CQAAFAAI425AAB AAA9CQAAFAAI425AAB 42 AAA9CQAAFAAI426AAB AAA9CQAAFAAI426AAB 43 AAA9CQAAFAAI427AAB AAA9CQAAFAAI427AAB 44 AAA9CQAAFAAI428AAB AAA9CQAAFAAI428AAB 45 AAA9CQAAFAAI429AAB AAA9CQAAFAAI429AAB 1 AAA9CQAAFAAI42RAAB AAA9CQAAFAAI42RAAB 2 AAA9CQAAFAAI42SAAB AAA9CQAAFAAI42SAAB 3 AAA9CQAAFAAI42TAAB AAA9CQAAFAAI42TAAB 4 AAA9CQAAFAAI42UAAB AAA9CQAAFAAI42UAAB 5 AAA9CQAAFAAI42VAAB AAA9CQAAFAAI42VAAB 6 AAA9CQAAFAAI42WAAB AAA9CQAAFAAI42WAAB 7 AAA9CQAAFAAI42XAAB AAA9CQAAFAAI42XAAB 8 AAA9CQAAFAAI42YAAB AAA9CQAAFAAI42YAAB
If I had supplied the 1st and 20th rowids as a range to be used by some “chunking” code in a “between” clause those rows would have slipped through a crack in the selection code (unless they were accidentally picked up in another badly defined chunk – which might then result in lock waits and deadlocks). So if you’ve used dbms_rowid in production code it might be a good idea to go and check your code.
I had an early idea to put in more slides than any presentation ever done by Connor McDonald ( ha ha joke just kidding not trying to start another Michigan debacle ) but do not have the guts to add in much more. Very useful in doing this presentation of course is information from Tanel Poder and Jonathan Lewis. Jonathan's chapter 7 in Oracle Core was especially good.
So here it is Three approaches to Shared Pool Monitoring comments/suggestions/feedback appreciated ( please stick in an email to email@example.com ).
A wise man once said that those who fail to learn the lessons of history are doomed to repeat them.
Time for a quick history lesson…
England’s recent supremacy in Ashes contests have a commmon theme running through them. In none of the last three series has any Australian fast bowler sport any facial hair.
The bristling tache has been a feature of potent Australian attacks since time immemorial.
From Frederick “The Demon” Spofforth sporting the face furniture that inspired an England batting collapse which gave birth to the Ashes legend, through Dennis Lillee and (occasionally) Jeff Thompson, the tache reigned supreme.
Having reached an exuberant zenith whilst working with Merv Hughes, the tache took a lower profile for a time as the top-half of Jason Gillespie’s beard.
There was a time during the 80′s when England seemed to have cottoned on to the power of the tache and managed to fight fire with fire with notable contributions evident on the upper lips of Botham, Gooch and Lamb.
More recently however, the moustache’s mystic ability to make an English batting line-up do a passable impression of a deck-chair seemed to have been lost in the mists of time.
Then along came Mitchell Johnson’s Movember Mush Mantle.
Johnson himself has quite a respectable test record, but it seems that he may have re-discovered the secret power of the mighty moustache.
As we’ve got a couple of weeks before the second test begins, the moustache has a bit of time on it’s handlebars so has kindly agreed to illustrate the fun and games to be had with assigning default values to PL/SQL parameters.
Let’s begin with the following function…
CREATE OR REPLACE FUNCTION bowler_effectiveness_fn ( i_has_tache VARCHAR2 ) RETURN VARCHAR2 AS BEGIN IF i_has_tache = 'Y' THEN RETURN 'On for a five-for'; ELSE RETURN 'None for plenty'; END IF; END; /
If we now invoke this function without specifying a parameter value…
SELECT bowler_effectiveness_fn FROM dual;
ERROR at line 1: ORA-06553: PLS-306: wrong number or types of arguments in call to 'BOWLER_EFFECTIVENESS_FN'
So, it looks like the i_has_tache parameter is mandatory.
However, we can make it optional by specifying a default value to assign it if none is specified in the call…
CREATE OR REPLACE FUNCTION bowler_effectiveness_fn ( i_has_tache VARCHAR2 DEFAULT 'N' ) RETURN VARCHAR2 AS BEGIN IF i_has_tache = 'Y' THEN RETURN 'On for a five-for'; ELSE RETURN 'None for plenty'; END IF; END; /
If we run the same query as before…
SQL> SELECT bowler_effectiveness_fn FROM dual; BOWLER_EFFECTIVENESS_FN -------------------------------------------------------------------------------- None for plenty SQL>
Of course, if we do specify a value in the call then this will be used in place of the default…
SQL> SELECT bowler_effectiveness_fn('Y') from dual; BOWLER_EFFECTIVENESS_FN('Y') -------------------------------------------------------------------------------- On for a five-for SQL>
So, the default value acts a bit like an NVL right ? I mean…
SQL> SELECT bowler_effectiveness_fn(i_has_tache => NULL) FROM dual; BOWLER_EFFECTIVENESS_FN(I_HAS_TACHE=>NULL) -------------------------------------------------------------------------------- None for plenty SQL>
OK, let’s make a change to the default value from N to Y :
CREATE OR REPLACE FUNCTION bowler_effectiveness_fn ( i_has_tache VARCHAR2 DEFAULT 'Y' ) RETURN VARCHAR2 AS BEGIN IF i_has_tache = 'Y' THEN RETURN 'On for a five-for'; ELSE RETURN 'None for plenty'; END IF; END; /
Passing in NULL should now result in a confirmation of our macho bowling prowess…
SQL> SELECT bowler_effectiveness_fn(i_has_tache => NULL) FROM dual; BOWLER_EFFECTIVENESS_FN(I_HAS_TACHE=>NULL) -------------------------------------------------------------------------------- None for plenty SQL>
Hmmm…maybe Mitchell has reached for the razor ? Let’s take a closer look…
CREATE OR REPLACE FUNCTION bowler_effectiveness_fn ( i_has_tache VARCHAR2 DEFAULT 'Y' ) RETURN VARCHAR2 AS BEGIN IF i_has_tache IS NULL THEN RETURN 'Face fuzz not specified.'; ELSIF i_has_tache = 'Y' THEN RETURN 'On for a five-for'; ELSE RETURN 'None for plenty'; END IF; END; /
Run the query again and we get…
SQL> SELECT bowler_effectiveness_fn(i_has_tache => NULL) FROM dual; BOWLER_EFFECTIVENESS_FN(I_HAS_TACHE=>NULL) -------------------------------------------------------------------------------- Face fuzz not specified. SQL>
So, a stored program unit parameter can be specified as NULL, even if it has a default value defined.
If this is not what you want to happen, you need to check the parameter values in your code…
CREATE OR REPLACE FUNCTION bowler_effectiveness_fn ( i_has_tache VARCHAR2 DEFAULT 'Y' ) RETURN VARCHAR2 AS BEGIN -- -- Make sure that the mandatory i_has_tache parameter -- is set. -- IF i_has_tache IS NULL THEN RAISE_APPLICATION_ERROR( -20000, 'Face fuzz not specified.'); END IF; -- end of parameter validation IF i_has_tache = 'Y' THEN RETURN 'On for a five-for'; ELSE RETURN 'None for plenty'; END IF; END; /
When we run our query now, we should be told explicitly that a parameter value is required :
SQL> SELECT bowler_effectiveness_fn(i_has_tache => NULL) FROM dual; SELECT bowler_effectiveness_fn(i_has_tache => NULL) FROM dual * ERROR at line 1: ORA-20000: Face fuzz not specified. ORA-06512: at "MIKE.BOWLER_EFFECTIVENESS_FN", line 13 ORA-06512: at line 1 SQL>
The other thing to bear in mind is that, unless you define your parameters with an anchored declaration, the user can pass in any value up to the maximum length of the datatype of the parameter. So, we may well want to make sure that the function is being called only with those values that we expect to handle :
CREATE OR REPLACE FUNCTION bowler_effectiveness_fn ( i_has_tache VARCHAR2 DEFAULT 'Y' ) RETURN VARCHAR2 AS BEGIN -- -- Make sure that the mandatory i_has_tache parameter -- is set to a valid value -- IF NVL(i_has_tache, 'X') NOT IN ('Y', 'N') THEN RAISE_APPLICATION_ERROR(-20000, 'Parameter i_has_tache must be set to Y or N'); END IF; -- end of parameter validation IF i_has_tache = 'Y' THEN RETURN 'On for a five-for'; ELSE RETURN 'None for plenty'; END IF; END; /
As we reach the end of November, the straw at which England Cricket Fans are clutching is that maybe Mitchell is not, after all, a keen student of the game’s history and that maybe he will indeed emerge clean-shaven at Adelaide.
If not, England may well be in for a series of torrid tache tyrrany.
Filed under: Oracle, PL/SQL Tagged: default values, parameter checking, pl/sql parameters
Oracle VirtualBox 4.3.4 has been released. The downloads and changelog are in the usual places. I thought 4.3.4 included a fix for the Mac OS X upgrade issue I posted about, but it didn’t. The install of 4.3.4 was seamless and didn’t require any fiddling about.
Tim…VirtualBox 4.3.4 was first posted on November 30, 2013 at 11:19 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.
The holiday season is in full swing now, and I’ll bet a lot of people out there will be getting some form of wearable device as a gift.
Personally, I’d like to see more fashion innovation, e.g. intelligent clothing. I’m not a fan of encumbrances, but since clothes aren’t optional, they might as well be smart.
As wearables gain momentum, I find myself increasingly frustrated because there are so many options, so many form factors and price points, and no clear leader where I feel comfortable investing time, money and development effort.
And that’s without even looking at the technical aspects of each device, sensors, SDKs and/or APIs, openness of data, associated ecosystems.
What a hot mess, but hey, it’s exciting too.
Bonus, Misha’s post includes an interview with our very own Anthony (@anthonyslai) about his adventures as a Google Glass Explorer. Noel (@noelportugal) is now in that club too, so expect more Glass content soon, e.g. I heard those two mad scientists got the Glass working to control the robotic arm.
Stay tuned, and as always, find the comments.Possibly Related Posts:
- PBS Off Book on the Future of Wearable Computing
- Google Glass Details Emerge
- Our Glass Overlords Have Arrived
- Monday News and Nuggets
- First 3 days as a Glass Explorer (Day 2)
SQL Server 2014 CTP 2 was already announced, which is available now for download as a community technology preview.
You can now download SQL Server 2014 CTP2 from http://technet.microsoft.com/en-US/evalcenter/dn205290.aspx. This post covers a step-by-step installation guide of SQL Server 2014 CTP 2 with screen shots. Future release or final release may not be exactly same but should be very similar to this. We will perform this installation on Virtual test machine with Windows 2008 R2 Enterprise Edition X64.
Upgrading from Microsoft SQL Server 2014 CTP 1 to Microsoft SQL Server 2014 CTP 2 is NOT supported. So make sure that you remove Microsoft SQL Server 2014 CTP 1 before proceeding for installation.
Step 1: Select the installation type. Here we will perform standalone installation thus selecting the same.Step 2: We will keep the edition details default and Click Next to continue.
Step 3: Read and accept term and conditions. Click next to continue.
Step 4: Setup will check Global Rules on this page. Product update page was skipped.Step 5: Setup should pass Install rules. You may ignore warnings and go ahead based on case to case basis. Click next to continue. Step 6: We will proceed with Feature install selection once the check phase is completed. I have selected default. Click next to continue. Step 7: Select the Features that you would like to install. Step 8: Based on the selected features the SQL Server 2014 CTP 2 setup will perform additional Feature Rules check. Click next to continue. Step 9: Provide the instance name if you want to install a named instance here. I have selected default instance here. Click next to continue. Step 10: Specify account name and startup type on this page. Move to adjacent tab.
Step 11: There is default checking of Collation settings for Database Engine & Analysis Services. You can customize it. Click next to continue.
Step 12: On the Server Configuration tab, specify the Authentication type and add the users who will have Administrative Privileges on Database Engine. Here I have added my account as Administrator. Switch to adjacent tab.Step 13: Select the locations in Data Dictionaries tab. I have kept them as default. Switch to last tab. Step 14: On File stream tab you might want to enable the filestream feature. For now, I have kept it as disabled. Click next to continue. Step 15: On the Server Configuration tab, add the users who will have Administrative Privileges on Analysis Services. Here I have added my account as Administrator. Switch to adjacent tab. Step 16: Select the locations in Data Dictionaries tab. I have kept them as the default. Click next to continue. Step 17: On Reporting Services Configuration page, select the option for Reporting services install. I have selected ‘Install Only’ as I will configure it later .Click next to continue. Step 18: Some more checks on Feature Configuration Rules page. Click next to continue. Step 19: We are ready to install. You might like to verify the selected configurations before proceeding. Click on Install button. Step 20: Installation is in progress. Step 21: Finally the setup is successful with the pop up window suggesting to reboot the system after the successful installation of SQL Server 2014 CTP 2.
Now we have SQL Server 2014 CTP 2 installed. Let’s connect to SSMS and verify the build.
SQL Server 2014 CTP 2 wizard is almost similar to SQL Server 2012. Soon we will explore SQL Server 2014 features in upcoming blogs.
When I first saw the suffix BATCHED in an Oracle execution plan from 12c, I was curious to see what is hidden behind and how it works. I had some spare time for testing and wanted to share my findings with you here.
Here is what I’m talking about:
------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1000 | 990K| 148 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1000 | 990K| 148 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_Y_INDX | 1000 | | 5 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("Y"=TO_NUMBER(:1))
Line 1 of this very simple execution plan shows how Oracle 12c added a suffix BATCHED to the table access by a B*Tree index rowsource. I was thinking about the reasons behind this change and how it could be implemented before starting my tests.
Why Oracle would want to “batch” table access
Usually large index range/full/skip scans with subsequent table access running serially cause lots of single block reads of a table. Depending on the clustering of the data, the number of table block reads could be as high as the number of ROWIDs fetched from index leaf blocks up to the next rowsource. In case of a serial execution plan it means that query performance depends on how fast single random table read is. Say you need to read 1000 random table blocks located far away from each other and average read of 1 block takes 5ms, then you need about 5 seconds to execute such query. But if the storage subsystem can handle concurrent IO requests well enough, and you were able to ask it for 1000 blocks someway concurrently or in parallel, transparent for the end user session, then it could take less wall clock time for a user while putting more pressure on the OS, storage and connectivity to the storage.
How Oracle optimizes IO already
As far as I know, Oracle can and does a few cunning things with IO even in pre-12.1 releases. Here is a(n incomplete, most likely) list with example optimizations you may see:
- Within NESTED LOOP joins there are couple of strategies Oracle uses: NL-join batching and moving TABLE ACCESS out of a join (I’ve no idea how it is called exactly).
- “Prefetching” with ‘db file parallel read’ – as described by Tanel Poder here (it gives you a very nice idea of what ‘db file parallel read’s are)
- In case of a “cold” buffer cache Oracle may choose to read ahead, and instead of reading just a single block when you think it is enough, Oracle may opt to reading multiple physically adjacent on disk blocks to the cache (aka ‘db file scattered read’). Sometimes it could hurt (a lot) the application performance, sometimes it doesn’t matter, but the thing is: it’s a “normal” thing to experience multi-block buffered reads on what should probably be single block reads.
Based on my understanding of what Oracle can possibly do I have created a test scenario which could be used to find out more things behind BATCHED table access. Here is the setup:
drop table t1 cascade constraints purge; create table t1 ( id integer, x integer, y integer, pad varchar2(4000) ); insert /*+ append */ into t1 select rownum, mod(rownum, 1000), floor((rownum-1)/1000), lpad('x', 1000, 'x') from all_source a1, all_source a2 where rownum <= 1e6; create index t1_x_indx on t1(x); create index t1_y_indx on t1(y); exec dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size 1', cascade=>true, no_invalidate=>false);
Very easy. I have created a sufficiently wide table holding 1 million rows with two integer columns following a very bad (T1.X) and very good (T1.Y) clustering of data. Usually it is also important where are you creating this table. Initially I created it in a standard USERS tablespace (i.e., ASSM, non-uniform extent size), but then switched to a MSSM tablespace with uniform extents of 1MB. Looking ahead, it does not make a difference to the test results (at least I could not identify it.)
The test itself:
set linesize 180 pagesize 100 define ^ arraysize 100 col plan_table_output format a180 explain plan for select /*+ index(t1(x)) */ * from t1 where x = :1; select * from table(dbms_xplan.display); explain plan for select /*+ index(t1(y)) */ * from t1 where y = :1; select * from table(dbms_xplan.display); col spid new_value spid col curr_date new_value curr_date select p.spid,to_char(sysdate, 'YYYYMMDDHH24MI') curr_date from v$session s, v$process p where s.paddr = p.addr and s.sid = userenv('sid'); col tracefile new_value tracefile select value tracefile from v$diag_info where name='Default Trace File'; alter system flush buffer_cache; !sleep 1 alter system flush buffer_cache; select object_id, data_object_id, object_name from dba_objects where owner = user and object_name like 'T1%'; set termout off exec dbms_session.session_trace_enable(waits=>true, binds=>false) !strace -tt -p ^spid -o trc_^spid..txt & spool batched_^curr_date..txt select /*+ index(t1(x)) */ * from t1 where x = 1; select /*+ index(t1(y)) */ * from t1 where y = 2; spool off set termout on !orasrp -t --sort=fchela --sys=no ^tracefile orasrp_^spid..txt !cat orasrp_^spid..txt | grep -A 165 fvkg1sp2b73x prompt trace: orasrp_^spid..txt prompt strace: trc_^spid..txt prompt tracefile: ^tracefile exit
So the test is also really easy, except for some diagnostic & preparation steps. Basically I’m tracing two statements, which are accessing T1 by two indexes respectively, both at OS and Oracle levels, and then parse Oracle trace file with OraSRP. You may want to use tkprof. I also used it initially but OraSRP has one feature which helps to see the waits with breakdown by object, like this:
--------- Time Per Call -------- Object/Event % Time Seconds Calls Avg Min Max -------------------------------------------- -------- ------------ --------- ---------- ---------- ---------- TABLE T1  db file parallel read 68.9% 4.8404s 26 0.1862s 0.0883s 0.2614s db file sequential read 29.9% 2.1023s 147 0.0143s 0.0014s 0.0636s INDEX T1_X_INDX  db file sequential read 1.1% 0.0746s 5 0.0149s 0.0025s 0.0278s Disk file operations I/O 0.0% 0.0034s 1 0.0034s 0.0034s 0.0034sTesting
I was using VirtualBox with 64-bit OEL 6.4 and Oracle 184.108.40.206 & 220.127.116.11. I also did (partial) tests on 18.104.22.168 running in OVM on a faster storage, and the results were similar to what I’ve observed with 22.214.171.124.
Both instances were running with a pfile, with following parameters specified:
-- 126.96.36.199 *._db_cache_pre_warm=FALSE *.compatible='188.8.131.52.0' *.control_files='/u01/app/oracle/oradata/ora11204/control01.ctl','/u01/app/oracle/fast_recovery_area/ora11204/control02.ctl' *.db_block_size=8192 *.db_cache_size=300000000 *.db_domain='' *.db_name='ora11204' *.diagnostic_dest='/u01/app/oracle' *.dispatchers='(PROTOCOL=TCP) (SERVICE=ora11204XDB)' *.filesystemio_options=setall *.pga_aggregate_target=100000000 *.open_cursors=300 *.processes=100 *.remote_login_passwordfile='EXCLUSIVE' *.shared_pool_size=420430400 *.undo_tablespace='UNDOTBS1' -- 184.108.40.206 *._db_cache_pre_warm=FALSE *.compatible='220.127.116.11.0' *.control_files='/u01/app/oracle/oradata/ora121/control01.ctl','/u01/app/oracle/oradata/ora121/control02.ctl' *.db_block_size=8192 *.db_cache_size=300000000 *.db_domain='' *.db_name='ora121' *.diagnostic_dest='/u01/app/oracle' *.dispatchers='(PROTOCOL=TCP) (SERVICE=ora121XDB)' *.enable_pluggable_database=true *.filesystemio_options='SETALL' *.pga_aggregate_target=100000000 *.open_cursors=300 *.processes=100 *.remote_login_passwordfile='EXCLUSIVE' *.shared_pool_size=420430400 *.undo_tablespace='UNDOTBS1'Test Process & Observations
Initially I started testing with a default database config, and filesystemio_options set to DIRECTIO. After some random tests, I realized that this cache warm up thing is not what I’m interested in right now and turned it off with a hidden parameter. Overall I think that the test results could be explained in the following:
- Test results are inconsistent. This is the most irritating thing. However, after I ran the test multiple times in a row, I get a pretty stable outcome. So I consider the results after multiple consecutive runs of the same test. Usually it is just 2 runs, but sometimes more, especially after an instance restart. I’ve no understanding why it happens and what’s behind the scene of the decisions. Maybe it has something to do with CKPT as Tanel mentions in his post on oracle-l, but I did not check (and honestly don’t want to :))
- Both 11g and 12c show that for a table access of scattered data (by T1_X_INDX index) Oracle may batch table access IO using db file parallel reads; on the OS level it is using io_submit/io_getevents calls to run IO with async API if it’s turned on of course; in case of just DIRECTIO in place it uses a bunch of single block reads using pread
- Both 11g and 12c can use multi-block access of clustered data (by T1_Y_INDX index) for index range scans (and most likely, full/skip scans too). This is one of the most amusing things: even though I turned off cache warm up, Oracle still can identify that the data I am accessing is well placed altogether, and it decides to read multiple adjacent table blocks at once. 12c, However, behaves differently and by default does not use buffered multi-block table reads
- The size of multi-block IO (db file parallel read) is different between 11g and 12c: in 11g it is usually 39, sometimes 19. With 12c, by default the number of requests depends on the client’s fetch size: it is equal to the minimum of fetch size and 127
- Looks like the parameter _db_file_noncontig_mblock_read_count does not control the actual number of blocks read with db file parallel read; any value greater than 1 turns this feature on and the size of read requests stays the same (I have tested only setting it to 1, 2, 3, 5)
- The word BATCHED appeared in execution plans of 12c is controlled with a new hidden parameter _optimizer_batch_table_access_by_rowid. By default the parameter is set to TRUE, so plans tend to include BATCHED in table access rowsource. In the run-time this setting acts very much similar to 11g behavior, so it reads scattered table data with db file parallel reads, except for the number of IO requests which is min(fetch_size, 127). If _optimizer_batch_table_access_by_rowid is set to FALSE on a session level, for example, then the plans generated by Oracle do not include BATCHED suffix in table access rowsource, but in run-time Oracle still uses multi-block IO in the same way as 11g does, i.e. 39 or 19 IO requests per one call and scattered reads of clustered table data are there as well!
In 12c Oracle changed some internal code path which deals with the batched table access. But important thing is that the batched table access is not new, so even if you disable it either explicitly with _optimizer_batch_table_access_by_rowid or implicitly with optimizer_features_enable, Oracle will still be able to utilize a similar approach as it was in 11g.
One important thing, of course, is that by default the size of vector IO now depends on the client fetch size. And I can imagine a situation in which this change could make an impact on the application performance after an upgrade.
I have uploaded test script & trace files from 18.104.22.168 and 22.214.171.124 here so if you would like to repeat my tests and compare results – feel free to do that.
Recently I was trying to install SQL Server 2014 CTP 1 on my Windows Server 2008 R2 Virtual Machine to explore SQL Server 2014.
What I encountered was a failure of a rule “Previous SQL Product installation”
Rule Check Result
Rule “Previous SQL product installation” failed.
A SQL product other than SQL Server 2014 CTP1 is detected. You cannot install this release until the existing instances of SQL products are uninstalled.
Microsoft SQL Server 2014 CTP1 is does not support upgrade from or side-by-side installations with any previous version of SQL Server, including SQL Server 2012.
I found that SQL Express was installed on the machine by a Third Party Software. So if you want to install SQL Server 2014 CTP 1, you need to either uninstall previous SQL versions, or alternatively, you can install it on fresh machine.
I will come up with the step-by step-installation guide soon.
When we relate file share and databases, many people will turn up their noses – but today this is a reality that we can use even in production. Microsoft spent the last years working on improvements to the Server Message Block (SMB) which is much more reliable today, than ever before.
In the past, we were allowed to store data and log files into the network by activating a trace flag 1807. However, this wasn’t supported because of the risks of network errors compromising database integrity, and possible performance implications that may result from the use of network file shares to store databases. But the reality has changed – from SQL Server 2012, it is possible to store user and system databases on the network thanks to the SMB improvements. Even better, this applies to both SQL Server stand-alone and SQL Server WFCI! Yes, today you can create an entire cluster based on SMB File shares.
Microsoft has introduced SMB 3.0 with Windows Server 2012, which brought several significant changes to add functionality and improve SMB performance. Notice that each Windows version brings a reviewed SMB and that this is transparent for SQL Server.
Here are some of the most important changes on SMB since Windows Server 2008, which proves the preoccupation with improvements to the SMB:
- Windows Server 2008 (SMB 2.0)
Durability, which helps recover from temporary network glitches.
- Windows Server 2008 R2 (SMB 2.1)
Significant performance improvements, specifically for SQL OLTP style workloads.
- Windows Server 2012 (SMB 3.0)
Support for transparent failover of file shares providing zero downtime.
- Windows Server 2012 R2 (SMB 3.02)
MTU is turned on by default, which significantly enhances performance in large sequential transfers like SQL Server data warehouse and database backup or restore.
How it works?
To start using file shares to store your databases, some rules should be followed. The SQL Server engine and Agent service accounts must have FULL CONTROL share permissions and NTFS permissions on the SMB share folders.
Another important information is the supported Universal Naming Convention (UNC):
Otherwise, we cannot use loopback paths which means that the use of file shares from the same server, or nodes of a cluster (where SQL Server is installed) are not allowed. The other disallowed path is:
- Administrative shares as \\\\servername\\x$
- Other UNC path formats like \\\\?\x:\\
- Mapped network drives.
The process to install a database using SMB file shares is as simple as set UNC instead of letter mapped disks:
Another option is execute the “CREATE DATABASE” command pointing the data and log files to a UNC. Following is an example:
CREATE DATABASE [SomeDatabase] ON PRIMARY ( name = N’networked’, filename =
N’\\W2012SRV01\DB\SomeDatabase.mdf’, size = 3072kb, maxsize = unlimited,
filegrowth = 1024kb ) log ON ( name = N’networked_log’, filename =
N’\\W2012SRV01\DB\SomeDatabase_log.ldf’, size = 1024kb, maxsize = 2048gb,
filegrowth = 10%)
Notice that for cluster installations, the file share won’t be a part of the SQL Server role, or controlled by the Cluster. So if you have any failures in the file share server, the behavior will be slightly different.
When we start using SMB file shares as storage option, the network performance starts to be more than critical. In a typical situation, we already have good traffic on our network. Not only client connections, but backups, our RDP connections, file transfers, etc. Based on this, consider using a dedicated and isolated network to access the share. This way we can avoid any network interference in both ways.
Another word of caution is the monitoring. As we are using file shares to store our files, a different approach needs to be taken.
We can use an analogy to prove it:
Imagine that you have a SQL Server instance, based on a Windows virtual machine, and you receive a call from an angry client saying that the SQL Server is performing very slow. But you, as a good DBA, took all the necessary cares and the server is very well configured. You also checked the server and, in general, there is nothing wrong at first glance. But wait, this is a VM, so I’m possibly sharing the disk, CPU and memory with several other machines, with different behaviors and being used for different purposes. So, how can I know if the problem is at the VM Host level or at my single machine level? I need to have data about the VM Host performance, this way I can do a screening and understand from where the problem is coming.
For exactly the same reason we need to monitor the File Share server, which can have several other file shares, used for an infinite number of purposes, and with different load characteristics.
So, what is important to monitor in our case? Checking physical disk counters, Memory and the CPU load should be enough to solve problems.
To finalize, what can this be used for?
Before all, in a clustered environment use a SAN is the best option, as well as in a standalone server the use of a local disk is preferable.
However, the use of a SMB file share to store databases could be very useful in the following circumstances:
- Build low-cost clustered instances:
Yes, this is an option. Today we can configure a Windows Failover Cluster based on file shares only. Attention: A failover cluster using SAN has many other advantages, so you need to think twice before deciding to use a low-cost failover cluster on production. Again, think twice!
- In non-productive or Disaster Recovery servers:
For a QA that requires a cluster or a disaster recovery solution, this approach can be a very good option.
- Lightweight databases:
A lightweight database is not only a small database, it can be a huge database with not many connections/operations. There are many cases, and a careful analysis should be done before take a decision for productive databases.
- Historical Data:
A file share can be a cheap solution to store data files containing historical data. This kind of data have the characteristic to be read-only and not too much accessed, and a file share can cover all the need for this scenario.
- Emergency/Temporary storage:
Imagine a situation that the storage is simply over, and you have a productive database stopped. Create a file (data or log) in a file share and your problem is half solved! This way the database can work while you wait for the definitive solution.
- Database migrations:
This is a very particular case. If you already have a database stored in a fileshare, and you need to migrate this database to another instance, the process will be as simple as detach the database from the old instance and attach to the new one. Quicker migration ever. (of course you will need to migrate jobs and logins first :).
Today we have another option to store our databases, for both clustered and non-clustered instances. However, we need to plan our strategy well before using this, as we have better, but more expensive options. It all boils down to budget, to the database/instance purposes and its expected characteristics.
I hope this has been useful for you!
Although the growing maturity of the cloud has made it easier for businesses to leverage new digital solutions, some decision-makers are hesitant to make the transition because of the loss of direct control that comes with outsourcing data. Losing complete control over information can be difficult for IT managers who are used to maintaining in-house systems. According to Cloud Tweaks, however, there are ways for enterprises to better prepare for a transition to make the absence of direct contact less stressful.
- Know who the provider is beforehand. Rather than selecting a cloud solution at random, the source noted the importance of researching providers thoroughly. Although most cloud services function the same way, every option has its own core set of features, and IT managers making the judgment call should fully understand what will happen to their information once it has been outsourced.
- Retain a copy on-premises. Decision-makers are skeptical of anything that might influence their data, and rightfully so. With news aggregators consistently reporting on security breaches and information loss, IT managers should necessarily be on guard with their own company's digital assets. For this reason, the source recommended backing up enterprise information that is stored in-house until the cloud provider has proved its trustworthiness.
- Transition slowly. Launching a cloud-based infrastructure all at once can be daunting, especially if decision-makers are skeptical of the cloud's security. Rather than deploying the cloud all at once, however, the source encouraged companies to consider a slow transition. This way, businesses will be able to determine if this option is a good fit early on.
- Flexibility for increased security. BizTech Magazine reported that cloud-optimized digital architectures are capable of hosting high-quality applications that add layers of defense against cybercrimes. Database administration, for instance, syncs an organization's data to remote DBA experts who will help IT managers monitor and maintain their information. These strategies help mitigate the threat of data loss and provide corporations with the option to reduce IT personnel, which can diminish drains on the company's budget.
As more businesses make the transition to the cloud, the features that make these services worthwhile are improving. Skeptical business owners should consider how their organizations can support a cloud-based infrastructure.
The ability to customize digital infrastructures to fit the unique needs of corporations is becoming easier as cloud services mature. In addition to its scalable storage, for instance, cloud providers are constantly adding new features to their core offerings. Competition in the cloud market has led to new and innovative tools for IT managers to use in the construction of their digital architectures.
Increased savings and efficiency with cloud services
In addition to the cost-effectiveness of scalable storage, which enables organizations to pay-as-they-go for memory, the cloud has also made it easier for businesses to rapidly adjust their computing strategies. Rather than having to rely on the limitations of on-premises legacy options, for instance, the cloud enables IT managers to flexibly alter their infrastructures quickly and without lengthy down times during upgrades.
According to ZDNet, speed is another facet of cloud computing that has made these services ideal for big data corporations. Not only has the rate of growth in this market made it easier for firms to customize their digital architectures, the source noted that velocity is just as important as cost-savings. The ability to quickly and efficiently load data should improve a company's productivity, and as competition increases across industries, this will provide companies with an edge over corporations that are still utilizing legacy strategies.
Unique applications on the cloud
Alongside improved speed, the cloud is providing organizations with new features, such as flexible cloud-based applications. With database administration, for instance, remote DBA experts make information maintenance easier and more efficient by supporting the corporation with streamlined categorization and increased security.
Wired reported that advances in the cloud are likely to continue because of its highly flexible features. Developers can easily manipulate and transform the cloud into new services for corporations to deploy, and in doing so, IT managers will always have fresh options.
As businesses launch new cloud-based infrastructures, it's becoming easier for IT managers to customize these options to integrate the organizations needs. Unlike legacy strategies, the cloud allows decision-makers to launch specific features, rather than entire suites of solutions that might be more costly.
RDX offers a full suite of cloud migration and administrative services that can be tailored to meet any customer's needs. To learn more about our full suite of cloud migration and support services, please visit our Cloud DBA Service page or contact us.
The Situation: The Way It Is...By now, most Oracle DBAs evaluate Oracle performance based on time, that is a time based analysis or simply TBA. For sure there are other methods. And many methods are in part, time based. But I digress.
The beauty of a time based analysis is it helps us understand what a user is experiencing. Because users experience time, when our analysis is time based we can quantify their experience. Completely? No, but for sure in part. And in part is better than no part at all and much better than many alternatives.
Oracle processes yearn for CPU! It is paramount to understand that an Oracle server or background process either consumes CPU or does not. Currently, there is no third alternative. The “does not consume CPU” is called wait time and the wait time can be classified as either “idle” or “non-idle”. I like to say that Oracle processes want to burn CPU, they want to consume CPU, they yearn for CPU. If they can’t, there must be a very good reason why they most stop and wait before consuming CPU. And it’s the wait events that provide clues as to why CPU can not be consumed.
What Is In The Time?An Oracle time based analysis includes both Oracle CPU consumption and Oracle wait time. And usually it excludes the time between the Oracle client process and the Oracle server process. This is shock to many, but capturing the time between the Oracle client and server process is not that simple. And it requires more than simply looking at Oracle trace files or v$ views.
The "big bar" figure below displays the "total time." That is, all the Oracle process CPU and wait time over a specific snapshot interval. This is one way to communicate the total CPU and total wait time.
There are a number of ways to collect the basic timing information. Most techniques use Oracle trace files or the Oracle v$ views.
Oracle's wait interface is amazing! Multiple database performance tool vendors will tell you Oracle’s wait interface is far superior than DB2 and SQL Server. Below is a graphical example taken from the Stori product visually showing the top five wait events, that is, all the Oracle wait time for all Oracle processes over a specified snapshot interval grouped by the top five events.
The number of wait events and the detailed data available is amazing. As of Oracle Database 12c (12.1) there are just under 1600 wait events. If this seems overwhelming, most DBAs routinely only work with a handful at a time and probably truly understand even less.
Available CPU Consumption DetailWell... the detail is limited. However, Gathering CPU consumption is very straightforward via trace files and the Oracle Database v$ views. From the v$ views, CPU consumption can be gathered for a session from v$sess_time_model and for an entire instance from v$sys_time_model. I have blogged numerous times about the quality of these CPU consumption values for both server processes and background processes here and here. (While I will not cover this here, the closer you look at Oracle CPU consumption, the more likely you understand even Oracle is making some CPU consumption assumptions.)
The glaring hole in Oracle time based analysis is Oracle does not provide detailed CPU consumption numbers. For an Oracle process the most detailed information we can obtain from Oracle is the total consumption.
If your boss asked you what Oracle is doing “with all that CPU” the best you can do is provide the total amount and then make inferences based on the Oracle wait events, the operating system situation, the application code and the user experience. That’s a lot of analysis information and obviously allows for a very detailed and spot-on analysis.
But still, answering the “What is Oracle doing with all that CPU?” should in my opinion be much easier to answer and more clear. And compared to the wait interface detail, available CPU consumption detail is simply incomplete. I think as DBAs, we can do better.
What I want is the same level of CPU detail as wait event detail. Or at least something so I can reduce the amount of inferencing required when answering the question, "What is Oracle doing with all that CPU?" It would be nice to gather Oracle CPU consumption for any Oracle process grouped by Oracle kernel function. Below is an example of what I would like to see.
Once I know the Oracle function, then I can find out, with some good detail, what the process is doing while consuming the CPU...and answer “the question.”
Is This Really Worth My Time?Perhaps not. But consider these benefits. As I mentioned previously, I want to better answer the question, "What is Oracle doing with all that CPU?" Another benefit is to confirm the performance story. For example, if there is a parsing issue, I will expect to see some type of shared pool and/or library cache related CPU consumption. Another less obvious benefit is understanding the performance situation when a small amount of wait time is present. Without the wait time, our inferencing capability is limited. Knowing the CPU consumption details can negate the absence of wait time detail limitation. And finally, you may find an Oracle bug.
Next StepsThis is a good place to end this posting. In the next posting, we will explore the tools available to gather Oracle process CPU consumption down to the Oracle kernel function.
But if you can’t wait until the next posting (or you have lots of time), here are some resources.
1. Presentation: Detailing Oracle CPU Consumption. Download here.
2. Tool: Oracle wait time and kernel function CPU monitor. Download the fulltime.sh script.
Thanks for reading,
If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting, Advanced Oracle Performance Analysis, and my new seminar, Go Faster: Make Oracle Work For You! I teach these classes around the world multiple times each year. For the latest schedule, go to www.orapub.com . I also offer on-site training and consulting services.
P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently info@ orapub. com.
This post also applies to non-Exadata systems as hard drives work the same way in other storage arrays too – just the commands you would use for extracting the disk-level metrics would be different.
I just noticed that one of our Exadatas had a disk put into “predictive failure” mode and thought to show how to measure why the disk is in that mode (as opposed to just replacing it without really understanding the issue ;-)
SQL> @exadata/cellpd Show Exadata cell versions from V$CELL_CONFIG.... DISKTYPE CELLNAME STATUS TOTAL_GB AVG_GB NUM_DISKS PREDFAIL POORPERF WTCACHEPROB PEERFAIL CRITICAL -------------------- -------------------- -------------------- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ---------- FlashDisk 192.168.12.3 normal 183 23 8 FlashDisk 192.168.12.3 not present 183 23 8 3 FlashDisk 192.168.12.4 normal 366 23 16 FlashDisk 192.168.12.5 normal 366 23 16 HardDisk 192.168.12.3 normal 20489 1863 11 HardDisk 192.168.12.3 warning - predictive 1863 1863 1 1 HardDisk 192.168.12.4 normal 22352 1863 12 HardDisk 192.168.12.5 normal 22352 1863 12
So, one of the disks in storage cell with IP 192.168.12.3 has been put into predictive failure mode. Let’s find out why!
To find out which exact disk, I ran one of my scripts for displaying Exadata disk topology (partial output below):
SQL> @exadata/exadisktopo2 Showing Exadata disk topology from V$ASM_DISK and V$CELL_CONFIG.... CELLNAME LUN_DEVICENAME PHYSDISK PHYSDISK_STATUS CELLDISK CD_DEVICEPART GRIDDISK ASM_DISK ASM_DISKGROUP LUNWRITECACHEMODE -------------------- -------------------- ------------------------------ -------------------------------------------------------------------------------- ------------------------------ -------------------------------------------------------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------------------------------------------------------------------- 192.168.12.3 /dev/sda 35:0 normal CD_00_enkcel01 /dev/sda3 DATA_CD_00_enkcel01 DATA_CD_00_ENKCEL01 DATA "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sda 35:0 normal CD_00_enkcel01 /dev/sda3 RECO_CD_00_enkcel01 RECO_CD_00_ENKCEL01 RECO "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sdb 35:1 normal CD_01_enkcel01 /dev/sdb3 DATA_CD_01_enkcel01 DATA_CD_01_ENKCEL01 DATA "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sdb 35:1 normal CD_01_enkcel01 /dev/sdb3 RECO_CD_01_enkcel01 RECO_CD_01_ENKCEL01 RECO "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sdc 35:2 normal CD_02_enkcel01 /dev/sdc DATA_CD_02_enkcel01 DATA_CD_02_ENKCEL01 DATA "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sdc 35:2 normal CD_02_enkcel01 /dev/sdc DBFS_DG_CD_02_enkcel01 DBFS_DG_CD_02_ENKCEL01 DBFS_DG "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sdc 35:2 normal CD_02_enkcel01 /dev/sdc RECO_CD_02_enkcel01 RECO_CD_02_ENKCEL01 RECO "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sdd 35:3 warning - predictive CD_03_enkcel01 /dev/sdd DATA_CD_03_enkcel01 "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sdd 35:3 warning - predictive CD_03_enkcel01 /dev/sdd DBFS_DG_CD_03_enkcel01 "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" /dev/sdd 35:3 warning - predictive CD_03_enkcel01 /dev/sdd RECO_CD_03_enkcel01 "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU"
Ok, looks like /dev/sdd (with address 35:3) is the “failed” one.
When listing the alerts from the storage cell, indeed we see that a failure has been predicted, warning raised and even handled – XDMG process gets notified and the ASM disks get dropped from the failed grid disks (as you see from the exadisktopo output above if you scroll right).
CellCLI> LIST ALERTHISTORY WHERE alertSequenceID = 456 DETAIL; name: 456_1 alertDescription: "Data hard disk entered predictive failure status" alertMessage: "Data hard disk entered predictive failure status. Status : WARNING - PREDICTIVE FAILURE Manufacturer : HITACHI Model Number : H7220AA30SUN2.0T Size : 2.0TB Serial Number : 1016M7JX2Z Firmware : JKAOA28A Slot Number : 3 Cell Disk : CD_03_enkcel01 Grid Disk : DBFS_DG_CD_03_enkcel01, DATA_CD_03_enkcel01, RECO_CD_03_enkcel01" alertSequenceID: 456 alertShortName: Hardware alertType: Stateful beginTime: 2013-11-27T07:48:03-06:00 endTime: 2013-11-27T07:55:52-06:00 examinedBy: metricObjectName: 35:3 notificationState: 1 sequenceBeginTime: 2013-11-27T07:48:03-06:00 severity: critical alertAction: "The data hard disk has entered predictive failure status. A white cell locator LED has been turned on to help locate the affected cell, and an amber service action LED has been lit on the drive to help locate the affected drive. The data from the disk will be automatically rebalanced by Oracle ASM to other disks. Another alert will be sent and a blue OK-to-Remove LED will be lit on the drive when rebalance completes. Please wait until rebalance has completed before replacing the disk. Detailed information on this problem can be found at https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1112995.1 " name: 456_2 alertDescription: "Hard disk can be replaced now" alertMessage: "Hard disk can be replaced now. Status : WARNING - PREDICTIVE FAILURE Manufacturer : HITACHI Model Number : H7220AA30SUN2.0T Size : 2.0TB Serial Number : 1016M7JX2Z Firmware : JKAOA28A Slot Number : 3 Cell Disk : CD_03_enkcel01 Grid Disk : DBFS_DG_CD_03_enkcel01, DATA_CD_03_enkcel01, RECO_CD_03_enkcel01 " alertSequenceID: 456 alertShortName: Hardware alertType: Stateful beginTime: 2013-11-27T07:55:52-06:00 examinedBy: metricObjectName: 35:3 notificationState: 1 sequenceBeginTime: 2013-11-27T07:48:03-06:00 severity: critical alertAction: "The data on this disk has been successfully rebalanced by Oracle ASM to other disks. A blue OK-to-Remove LED has been lit on the drive. Please replace the drive."
The two alerts show that we first detected a (soon) failing disk (event 456_1) and then ASM kicked in and dropped the ASM disks from the failing disk and rebalanced the data elsewhere (event 456_2).
But we still do not know why we are expecting the disk to fail! And the alert info and CELLCLI command output do not have this detail. This is where the S.M.A.R.T monitoring comes in. Major hard drive manufacturers support the SMART standard for both reactive and predictive monitoring of the hard disk internal workings. And there are commands for querying these metrics.
Let’s find the failed disk info at the cell level with CELLCLI:
CellCLI> LIST PHYSICALDISK; 35:0 JK11D1YAJTXVMZ normal 35:1 JK11D1YAJB4V0Z normal 35:2 JK11D1YAJAZMMZ normal 35:3 JK11D1YAJ7JX2Z warning - predictive failure 35:4 JK11D1YAJB3J1Z normal 35:5 JK11D1YAJB4J8Z normal 35:6 JK11D1YAJ7JXGZ normal 35:7 JK11D1YAJB4E5Z normal 35:8 JK11D1YAJ8TY3Z normal 35:9 JK11D1YAJ8TXKZ normal 35:10 JK11D1YAJM5X9Z normal 35:11 JK11D1YAJAZNKZ normal FLASH_1_0 1014M02JC3 not present FLASH_1_1 1014M02JYG not present FLASH_1_2 1014M02JV9 not present FLASH_1_3 1014M02J93 not present FLASH_2_0 1014M02JFK not present FLASH_2_1 1014M02JFL not present FLASH_2_2 1014M02JF7 not present FLASH_2_3 1014M02JF8 not present FLASH_4_0 1014M02HP5 normal FLASH_4_1 1014M02HNN normal FLASH_4_2 1014M02HP2 normal FLASH_4_3 1014M02HP4 normal FLASH_5_0 1014M02JUD normal FLASH_5_1 1014M02JVF normal FLASH_5_2 1014M02JAP normal FLASH_5_3 1014M02JVH normal
Ok, let’s look into the details, as we also need the deviceId for querying the SMART info:
CellCLI> LIST PHYSICALDISK 35:3 DETAIL; name: 35:3 deviceId: 26 diskType: HardDisk enclosureDeviceId: 35 errMediaCount: 0 errOtherCount: 0 foreignState: false luns: 0_3 makeModel: "HITACHI H7220AA30SUN2.0T" physicalFirmware: JKAOA28A physicalInsertTime: 2010-05-15T21:10:49-05:00 physicalInterface: sata physicalSerial: JK11D1YAJ7JX2Z physicalSize: 1862.6559999994934G slotNumber: 3 status: warning - predictive failure
Ok, the disk device was /dev/sdd, the disk name is 35:3 and the device ID is 26. And it’s a SATA disk. So I will run smartctl with the sat+megaraid device type option to query the disk SMART metrics – via the SCSI controller where the disks are attached to. Note that the ,26 in the end is the deviceId reported by the LIST PHYSICALDISK command. There’s quite a lot of output, I have highlighted the important part in red:
> smartctl -a /dev/sdd -d sat+megaraid,26 smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-400.11.1.el5uek] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: HITACHI H7220AA30SUN2.0T 1016M7JX2Z Serial Number: JK11D1YAJ7JX2Z LU WWN Device Id: 5 000cca 221df9d11 Firmware Version: JKAOA28A User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Thu Nov 28 06:28:13 2013 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (22330) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 028 028 016 Pre-fail Always - 430833663 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 103 3 Spin_Up_Time 0x0007 117 117 024 Pre-fail Always - 614 (Average 624) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 69 5 Reallocated_Sector_Ct 0x0033 058 058 005 Pre-fail Always - 743 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 112 112 020 Pre-fail Offline - 39 9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 30754 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 69 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 80 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 80 194 Temperature_Celsius 0x0002 253 253 000 Old_age Always - 23 (Min/Max 17/48) 196 Reallocated_Event_Count 0x0032 064 064 000 Old_age Always - 827 197 Current_Pending_Sector 0x0022 089 089 000 Old_age Always - 364 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 0 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 30754 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
So, from the highlighted output above we see that the Raw_Read_Error_Rate indicator for this hard drive is pretty close to the threshold of 16. The SMART metrics really are just “health indicators” following the defined standards (read the wiki article). The health indicator value can range from 0 to 253. For the Raw_read_Error_Rate metric (which measures the physical read success from the disk surface), bigger value is better (apparently 100 is the max with the Hitachi disks at least, most of the other disks were showing around 90-100). So whenever there are media read errors, the metric will drop, the more errors, the more it will drop.
Apparently some of the read errors are inevitable (and detected by various checks like ECC), especially in the high-density disks. The errors will be corrected/worked around, sometimes via ECC, sometimes by a re-read. So, yes, your hard drive performance can get worse as disks age or are about to fail. If the metric ever goes below the defined threshold of 16, the disk apparently (and consistently over some period of time) isn’t working so great, so it should better be replaced.
Note that the RAW_VALUE column does not neccesarily show the number of failed reads from the disk platter. It may represent the number of sectors that failed to be read or it may be just a bitmap – or both combined into the low- and high-order bytes of this value. For example, when converting the raw value of 430833663 to hex, we get 0x19ADFFFF. Perhaps the low-order FFFF is some sort of a bitmap and the high 0rder 0x19AD is the number of failed sectors or reads. There’s some more info available about Seagate disks, but in our V2 we have Hitachi ones and I couldn’t find anything about how to decode the RAW_VALUE for their disks. So, we just need to trust that the “normalized” SMART health indicators for the different metrics tell us when there’s a problem.
Even though when I ran the smartctl command, I did not see the actual value (nor the worst value) crossing the threshold, 28 is still pretty close to the threshold 16, considering that normally the indicator should be close to 100. So my guess here is that the indicator actually did cross the threshold, this is when the alert got raised. It’s just that by the time I logged in and ran my diagnostics commands, the disk worked better again. It looks like the “worst” values are not remembered properly by the disks (or it could be that some SMART tool resets these every now and then). Note that we would see SMART alerts with the actual problem metric values in the Linux /var/log/messages file if the smartd service were enabled in the Storage Cell Linux OS – but apparently it’s disabled and probably some Oracle’s own daemon in the cell is monitoring that.
So what does this info tell us – a low “health indicator” for the Raw_Read_Error_Rate means that there are problems with physically reading the sequences of bits from the disk platter. This means bad sectors or weak sectors (that are soon about to become bad sectors probably). So, had we seen a bad health state for UDMA_CRC_Error_Count for example, it would have indicated a data transfer issue over the SATA cable. So, it looks like the reason for the disk being in the predictive failure state is about it having just too many read errors from the physical disk platter.
If you look into the other highlighted metrics above – the Reallocated_Sector_Ct and Current_Pending_Sector, you see there are hundreds of disk sectors (743 and 364) that have had IO issues, but eventually the reads finished ok and the sectors were migrated (remapped) to a spare disk area. As these disks have 512B sector size, this means that some Oracle block-size IOs from a single logical sector range may actually have to read part of the data from the original location and seek to some other location on the disk for reading the rest (from the remapped sector). So, again, your disk performance may get worse when your disk is about to fail or is just having quality issues.
For reference, here’s an example from another, healthier disk in this Exadata storage cell:
SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 086 086 016 Pre-fail Always - 2687039 2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline - 99 3 Spin_Up_Time 0x0007 119 119 024 Pre-fail Always - 601 (Average 610) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 68 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 114 114 020 Pre-fail Offline - 38 9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 30778 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 68 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 81 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 81 194 Temperature_Celsius 0x0002 253 253 000 Old_age Always - 21 (Min/Max 16/46) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
The Raw_Read_Error_Rate indicator still shows 86 (% from ideal?), but it’s much farther away from the threshold 16. Many of the other disks showed even 99 or 100 and apparently this metric value changed as the disk behavior changed. Some disks with value of 88 jumped to 100 and an hour later they were at 95 and so on. So the VALUE column allows real time monitoring of these internal disk metrics, the only thing that doesn’t make sense right now is that why does teh WORST column get reset over time.
For the better behaving disk, the Reallocated_Sector_Ct and Current_Pending_Sector metrics show zero, so this disk doesn’t seem to have bad or weak sectors (yet).
I hope that this post is another example that it is possible to dig deeper, but only when the piece of software or hardware is properly instrumented, of course. If it didn’t have such instrumentation, it would be way harder (you would have to take a stethoscope and record the noise of the hard drive for analysis or open the drive in dustless vacuum and see what it’s doing yourself ;-)
Note that I will be talking about systematic Exadata Performance troubleshooting (and optimization) my Advanced Exadata Performance online seminar on 16-20. December.Related Posts
Next week Rittman Mead is the analytics sponsor for the UKOUG Tech13 conference up in Manchester.
We’ve got a great line up of talks, covering everything from BI Apps, Endeca, Hadoop, OBIEE and mobile – Stewart and Charles are both flying in from the US to present alongside Mark, Adam and myself. The full list is as follows:
- Deep dive into Oracle BI Applications using Oracle Data Integrator [Mark Rittman] – Sunday 12.30-2pm
- Leveraging Hadoop / Map Reduce OBIEE 11g and ODI 11g [Mark Rittman] – Sunday 3pm – 4.30pm
- BI Across Any Data Source with OBIEE and Oracle Endeca Discovery [Mark Rittman & Adam Seed] – Monday 5.05-5.50pm
- The Changing World of Business Intelligence [Jon Mead] – Tuesday 11.20 – 12.20
- Oracle BI Multi User Development (MDS XML vs MUDE) [Stewart Bryson] – Tuesday 11.30 -12.30
- Designing for a mobile World using OBIEE [Charles Elliot] – Wednesday 1.45 – 2.45
On Monday night we are teaming up with our friends at Pythian to host some drinks and nibbles at Taps Bar, which is just around the corner from the conference venue. We’ll be there from 6.30-9.00pm, so come and join us for a free drink.Analytics Sponsor
As part of being the analytics sponsor for the event we are looking to collate as much real time, social media and demographic information about the event as possible. We will be displaying an analytics dashboard in the conference venue detailing statistics from this data. To help us, could you:
- Complete the form here to give use some background infromation about why you are going; and
- Use the official hashtag ukoug_tech13 when tweeting anything about the event.
It’s looking like it will be a great few days, so look forward to seeing you up there.
I think that most sufficiently large enterprise SaaS vendors should offer an appliance option, as an alternative to the core multi-tenant service. In particular:
- SaaS appliances address customer fears about security, privacy, compliance, performance isolation, and lock-in.
- Some of these benefits occur even if the appliance runs in the same data centers that host the vendor’s standard multi-tenant SaaS. Most of the rest occur if the customer can choose a co-location facility in which to place the appliance.
- Whether many customers should or will use the SaaS appliance option is somewhat secondary; it’s a check-mark item. I.e., many customers and prospects will be pleased that the option at least exists.
How I reached them
Core reasons for selling or using SaaS (Software as a Service) as opposed to licensed software start:
- The SaaS vendor handles all software upgrades, and makes them promptly. In principle, this benefit could also be achieved on a dedicated system on customer premises (or at the customer’s choice of co-location facility).
- In addition, the SaaS vendor handles all the platform and operational stuff — hardware, operating system, computer room, etc. This benefit is antithetical to direct customer control.
- The SaaS vendor only has to develop for and operate on a tightly restricted platform stack that it knows very well. This benefit is also enjoyed in the case of customer-premises appliances.
Conceptually, then, customer-premises SaaS is not impossible, even though one of the standard Big Three SaaS benefits is lost. Indeed:
- Microsoft Windows and many other client software packages already offer to let their updates be automagically handled by the vendor.
- In that vein, consumer devices such as game consoles already are a kind of SaaS appliance.
- Complex devices of any kind, including computers, will see ever more in the way of “phone-home” features or optional services, often including routine maintenance and upgrades.
But from an enterprise standpoint, that’s all (relatively) simple stuff. So we’re left with a more challenging question — does customer-premises SaaS make sense in the case of enterprise applications or other server software?
Why would a customer actually want on-premises SaaS, as opposed to the standard remote version? The first ideas that come to mind are:
- Security and/or privacy considerations, real or imagined. This is in fact the motivation behind the single case of on-premises enterprise SaaS I have confirmed, namely one that Cloudant told me about.* (I don’t have similar levels of detail about Glassbeam’s one on-premises subscription customer.)
- Similarly, a less specific desire for isolation …
- … and/or control.
- Avoiding the expense of data movement to/from a remote location. For example, an enterprise might use SaaS OLTP (OnLine Transaction Processing) apps whose results it wants to stream to an on-premises data warehouse. Or the enterprise might have lower-volume but also lower-latency — and hence more costly — data integration needs, perhaps between different OLTP application suites, or with some MDM (Master Data Management) in the mix.
And, um — that’s about all I’ve got.
*Yes, I know Cloudant is DBaaS — but to me that’s a kind of SaaS, in which the S just happens to center around a DBMS.
Confusing matters further, there’s a middle option as well. salesforce.com and HP just announced that salesforce.com apps will, for the first time, run on dedicated customer-specific racks. But this will only be within the same data centers and operation groups that handle the rest of salesforce.com’s system. Notes on what’s being called a “pod” strategy start:
- This suggests a perceived demand for isolation.
- If these pods are offered to accounts large enough to saturate a bunch of servers each, they need not be much more expensive than the multi-tenant version of salesforce’s offering.
- Other than (fully-loaded) cost, it’s hard to see a downside to this vs. multi-tenant SaaS.
Notwithstanding ever-increasing levels of comfort with SaaS and cloud computing, I’d guess that a number of enterprises will find the cost of single-tenant SaaS more palatable than the queasiness they feel about multi-tenant alternatives. And so I think there’s a place for single-customer enterprise SaaS stacks somewhere; the main remaining question is where they will be located.
Ducking that question a bit longer, let me note that:
- In any scenario, we’re most likely talking about something like SaaS appliances. Customer-premises server SaaS that isn’t in some kind of appliance form is madness (unless the SaaS vendor is paid for on-site support as well), because no SaaS vendor wants to support hardware it can’t specify or control.
- Some enterprises in some countries will surely insist on keeping data within national borders, for reasons of geo-compliance. Hence there will be a need to deploy SaaS appliances either literally to their premises, or else to an in-country co-location facility, perhaps managed by a big telecom firm. Of course, that need can only arise if vendors first overcome issues of software nationalization — language, regulations, other business customs, whatever.
- The final point in my recent SaaS discussion post was about lock-in. If you use something that only runs in the supplier’s data centers, your lock-in is even worse than it is with most enterprise IT technology.
- I can’t currently think of many examples in which SaaS appliances need to be located directly in the customers’ main data centers. When you get to data flows and volumes big enough for that to matter, you’re likely talking about the kinds of internet applications that probably shouldn’t be on-premises in the first place.
And that finally brings us to the opinions I copied up top.
I think that most sufficiently large enterprise SaaS vendors should offer an appliance option, as an alternative to the core multi-tenant service. In particular:
- SaaS appliances address customer fears about security, privacy, compliance, performance isolation, and lock-in.
- Some of these benefits occur even if the appliance runs in the same data centers that host the vendor’s standard multi-tenant SaaS. Most of the rest occur if the customer can choose a co-location facility in which to place the appliance.
- Whether many customers should or will use the SaaS appliance option is somewhat secondary; it’s a check-mark item. I.e., many customers and prospects will be pleased that the option at least exists.
- Naomi Bloom is a SaaS purist, who would presumably deplore the whole concept of “customer-premises SaaS”.