RE: Alerting early for minimum impact

From: Pierre Labrousse <"Pierre>
Date: Wed, 18 Sep 2024 09:23:50 +0000
Message-ID: <PR1P264MB18245AEE8B677467B25FDA49F6622_at_PR1P264MB1824.FRAP264.PROD.OUTLOOK.COM>



Hello,
To complete Jonathan response, you could also use the SQL script provide by Oracle into MOS note: Script to Collect Exadata Cell Performance Information (cellperfdiag.sql) (Doc ID 2038493.1)

Pierre.



De : oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> de la part de Jonathan Lewis <jlewisoracle_at_gmail.com> Envoyé : mercredi 18 septembre 2024 11:17 À : yudhi s <learnerdatabase99_at_gmail.com> Cc : dbakevlar_at_gmail.com <dbakevlar_at_gmail.com>; Oracle L <oracle-l_at_freelists.org> Objet : Re: Alerting early for minimum impact

>>> So I wanted to understand , in such a scenario in which one of the queries impacting the storage cell IO/CPU may not be one showing up in the database layer. >>> So what alerting can be done to get to it as soon as possible

You want to know when something is putting a heavy load on the cell servers. In particular you want to know when some other database is loading the cell server database so much that the response times on your database are being affected. If you want to do this from your database I've suggested that you query your v$sql to report statements which have recently been doing a lot of Cell I/O. As with elapsed time, the meaning of "heavy I/O load" is subjective and database dependent. Obviously you will need all the databases accessing that set of cell servers to alert on their own cell loading so that when your database alerts on a time problem you can see if another database is showing a cell loading alert.

If you want to do something about a little more central you can examine v$cell_thread_history which is a bit like ASH for cell servers and reports across all databases running on the same cells as the database you're querying from. It holds about 10 minutes of recent data, sampled once per second, and pulls this data on demand from the cells. As with elapsed_time etc. on v$sql there are useful numbers available, but the decision on how to decide what constitutes a threat is up to your knowledge of the systems. Tanel Poder wrote something about this view a long time ago, and I don't think you'll find anything useful about it in any of the Oracle reference manuals. (See https://tanelpoder.com/2013/05/23/vcell_thread_history-ash-for-exadata-storage-cells/ )

.
Regards
Jonathan Lewis

On Wed, 18 Sept 2024 at 05:06, yudhi s <learnerdatabase99_at_gmail.com<mailto:learnerdatabase99_at_gmail.com>> wrote: Thank you Jonathan and Kelly.

We have alerting in place in regards to "elapsed_time" of the queries from v$sql and that is a measurement on which we can rely upon to understand the query run time. And in a system which is latency sensitive we can have alerting when the elapsed_time goes beyond a few minutes vs in other system we may have it when the query runs a longer time.

But here in this scenario there are multiple databases hosted in the same box and they may be mixed (few are latency sensitive vs others are throughput oriented etc), so how and what value of these metrics we should be alerted on? I am unable to understand how to make these metrics converted to useful alerting so as to understand , we really have issues and it's impacting the whole cluster now, as these seem to be a bit subjective? Can you please explain a bit more on this?

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Sep 18 2024 - 11:23:50 CEST

Original text of this message