Re: Alerting early for minimum impact

From: Kellyn Pot'Vin-Gorman <dbakevlar_at_gmail.com>
Date: Tue, 17 Sep 2024 14:11:27 -0700
Message-ID: <CAN6wuX2Dne45R_8kEXDX2k3=7_JQmgyTvCAp_STA74OVZgQg6A_at_mail.gmail.com>



Agree with Jonathan on monitoring for these metrics that you can simply turn the final query into a metric extension in OEM to alert you when it surpasses it.

*Kellyn Gorman*
DBAKevlar Blog <
http://dbakevlar.com>
about.me/dbakevlar

On Tue, Sep 17, 2024 at 14:01 Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

> I haven't thought this one through, but could you do something with v$sql,
> checking the related columns, possibly:
>
> IO_CELL_OFFLOAD_ELIGIBLE_BYTES
> IO_INTERCONNECT_BYTES
> IO_CELL_UNCOMPRESSED_BYTES
> IO_CELL_OFFLOAD_RETURNED_BYTES
>
>
>
>
> Regards
> Jonathan Lewis
>
>
>
> On Tue, 17 Sept 2024 at 06:38, yudhi s <learnerdatabase99_at_gmail.com>
> wrote:
>
>> Hi ,
>>
>> We are using Oracle exadata servers 23.1 image and the database version
>> is 19c.
>>
>> We have just encountered a scenario in which we have one select query
>> which was doing a full table scan(cell smart scan) and suddenly as part of
>> a bulk migration effort that query started running Avg ~4000 times per hour
>> from initial(avg ~20 runs per hour usual pattern) and this has caused the
>> cell server cpu spike to ~85% from normal ~30% CPU and the disk IO response
>> (mainly the write response) increased from <.5ms to double i.e >1.5 ms and
>> it impacted one of the other database which was latency sensitive one
>> hosted in that same box.
>>
>> It took us a long time to get to the sql because it was not showing as
>> any of the TOP contributors in that database layer but it did impact the
>> storage cell servers IO and CPU significantly. The response time of that
>> sql just increased from ~5-10 seconds earlier to ~20 seconds during the
>> issue period , so we were unable to find this as one of the top sql in that
>> database. And this query was not stuck but finishing up.
>>
>> And also we were having 3 to 4 other databases hosted in that exadata
>> cluster. And when we got complaints from a different database which was a
>> latency sensitive one , we spent a good amount of time digging into the top
>> sqls from that database whereas it was a different database query which has
>> caused this issue.
>>
>> So I wanted to understand , in such a scenario in which one of the
>> queries impacting the storage cell IO/CPU may not be one showing up in the
>> database layer. So what alerting can be done to get to it as soon as
>> possible. and if anything is possible in the OEM metrics itself which we
>> can configure to catch these kinds of issues beforehand? Also I don't see
>> an easy way to find the top queries consuming storage cell CPU or IO across
>> all the databases hosted in that box. Is there any option possible from
>> awr/dba_hist* views?
>>
>> Regards
>> Yudhi
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 17 2024 - 23:11:27 CEST

Original text of this message