Re: Database upgrade/move I/O waits performance
Date: Tue, 23 Jul 2019 22:28:58 -0400
Message-ID: <CAPptggUdLALbD9dOXM+og=85goKs-DVf42b2PgmcM75sWcqz4Q_at_mail.gmail.com>
Lyall,
The VMWare admins can dump reports from vCenter that report on resources consumed by all guest VMs and provided by data stores (10,000 ft view). Specific date/time ranges would help. Aggregation over longer periods will tend to hide spikes. Avoid that.
Are HugePages enabled in the VMware layer?
Is a Gigabit Ethernet network being used to support iSCSI storage? (Please say trunked 10 GigE on dedicated network not some VLAN). What VMware virtual NIC driver is being used?
What version of VMware do you use?
Have you read papers authored by *House* of *Brick* technologies? If time is your most critical issue I'd call them first thing in the morning. Dave Welch (if available) could wrangle that room of VMware admins, SAN admins, System Administrators and DBAs into getting to the root cause (buried below the database layer) faster that a mailing list of Oracle DBAs.
Are there queries that access objects whose segments are assigned to the keep buffer pool that are not waiting on I/O? If so how are those performing?
Are system stats gathered under workload?
What optimizer parameters are set? Jonathan Lewis advocated dumping 10053 traces simply for comparing optimizer parameters between the old and new environments.
Sounds like fun once you get some decent I/O.
Reminds me of the time that a client was supposed to have 2 trays of SAS drives in a Compellent unit. That tray of SATA drives were the little engine that couldn't. A stats pack report comparing average service times of SAS vs SATA LUNs showed that one of those things is not like the other.
Best of luck.
Paul
On Tue, Jul 23, 2019, 12:41 PM Lyall Barbour <lyallbarbour_at_sanfranmail.com> wrote:
> Hello gurus,
> We've upgraded and moved the company's CRM OLTP Oracle database.
>
> Old: physical hardware, 512g RAM, 32 CPU -- Oracle 11.2.0.4 Enterprise,
> 16g MEMORY_TARGET, SPM baselines used, one or two Profiles.
> New: VMWare VM, 224g RAM, 24 vCPU -- Oracle 12.1.0.2 Enterprise, 72g
> SGA_TARGET, HugePages, 12g PGA_TARGET, no SPM baselines, multiple Profiles.
>
> Having said all that, there's a consistency to the slowness that's very
> interesting. We are constantly having I/O waits on all (?) queries, if not
> all, the main top sqls that are running in v$session_longops.
>
> Comparing one query. Explain Plan, Cost, disk reads, buffer gets, all
> line up pretty well in 11g and 12c. The 2 things that are different is the
> actual time, slowest in 11g is 2.5 seconds, 12c is 14 seconds. and
> percentage of work, 11g disk work is 85% and cpu work is 15%. 12c disk
> work is 98%, cpu 2%.
>
> I'm not trying to get any solutions from everyone here, i'm trying to get
> direction on where to look next so i know who to talk to at my company with
> this issue. Do i talk to the SAN folks and see what their graphs are
> like? VM team? keep digging into AWR stats and comparisons on the
> database?
>
> Comparing AWR hour periods from today (oracle 12.1) to last week, tuesday,
> (oracle 11.2) the amount of buffer gets and disk reads all line up with the
> issue i described above. 420T of buffer cache reads last week, 10T today,
> in that hour period. ALL work is waiting on disk.
>
> Any help... helps.
>
> Thanks,
> Lyall Barbour
> -- http://www.freelists.org/webpage/oracle-l
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Jul 24 2019 - 04:28:58 CEST