Re: Alerting early for minimum impact
Date: Wed, 18 Sep 2024 09:35:46 +0530
Message-ID: <CAEzWdqcmLf397be=6RrZyPgCHTrnV03p99GLbZKfC5pSd9njYA_at_mail.gmail.com>
> Agree with Jonathan on monitoring for these metrics that you can simply
> turn the final query into a metric extension in OEM to alert you when it
> surpasses it.
>
>
>
>
> *Kellyn Gorman*
> DBAKevlar Blog <http://dbakevlar.com>
> about.me/dbakevlar
>
>
>
> On Tue, Sep 17, 2024 at 14:01 Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>> I haven't thought this one through, but could you do something with
>> v$sql, checking the related columns, possibly:
>>
>> IO_CELL_OFFLOAD_ELIGIBLE_BYTES
>> IO_INTERCONNECT_BYTES
>> IO_CELL_UNCOMPRESSED_BYTES
>> IO_CELL_OFFLOAD_RETURNED_BYTES
>>
>>
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>> On Tue, 17 Sept 2024 at 06:38, yudhi s <learnerdatabase99_at_gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
>>> We are using Oracle exadata servers 23.1 image and the database version
>>> is 19c.
>>>
>>> We have just encountered a scenario in which we have one select query
>>> which was doing a full table scan(cell smart scan) and suddenly as part of
>>> a bulk migration effort that query started running Avg ~4000 times per hour
>>> from initial(avg ~20 runs per hour usual pattern) and this has caused the
>>> cell server cpu spike to ~85% from normal ~30% CPU and the disk IO response
>>> (mainly the write response) increased from <.5ms to double i.e >1.5 ms and
>>> it impacted one of the other database which was latency sensitive one
>>> hosted in that same box.
>>>
>>> It took us a long time to get to the sql because it was not showing as
>>> any of the TOP contributors in that database layer but it did impact the
>>> storage cell servers IO and CPU significantly. The response time of that
>>> sql just increased from ~5-10 seconds earlier to ~20 seconds during the
>>> issue period , so we were unable to find this as one of the top sql in that
>>> database. And this query was not stuck but finishing up.
>>>
>>> And also we were having 3 to 4 other databases hosted in that exadata
>>> cluster. And when we got complaints from a different database which was a
>>> latency sensitive one , we spent a good amount of time digging into the top
>>> sqls from that database whereas it was a different database query which has
>>> caused this issue.
>>>
>>> So I wanted to understand , in such a scenario in which one of the
>>> queries impacting the storage cell IO/CPU may not be one showing up in the
>>> database layer. So what alerting can be done to get to it as soon as
>>> possible. and if anything is possible in the OEM metrics itself which we
>>> can configure to catch these kinds of issues beforehand? Also I don't see
>>> an easy way to find the top queries consuming storage cell CPU or IO across
>>> all the databases hosted in that box. Is there any option possible from
>>> awr/dba_hist* views?
>>>
>>> Regards
>>> Yudhi
>>>
>>
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Sep 18 2024 - 06:05:46 CEST