Odd wait event freezing database
From: Lok P <loknath.73_at_gmail.com>
Date: Thu, 8 Apr 2021 23:59:35 +0530
Message-ID: <CAKna9VaY4KFu6RFfHYnRZ4t69Ct4J2itLSejGFG-rMrgNv=Z1A_at_mail.gmail.com>
Hi All, Its version 11.2.0.4 of Oracle exadata is a 4 node RAC database. We are seeing one of the query runs normally finish in a few seconds but sometimes it runs for 3-4 minutes with the wait event being noted as "reliable message" and during that time period things seem to freeze in the database almost all the nodes getting stuck. So I am not sure if this query is the cause of the slowness or the victim, but it seems whenever such an issue occurred this query was getting executed from multiple sessions and was running longer than expected time. No change in plan happened for this query and with the same plan it used to finish in seconds during other times. So wanted to understand if we are hitting any bug around this wait event as this looks a bit unusual? It seems happening while scanning mostly table TSFS in FULL , want to understand what's wrong with scanning table TSFS?
Below attached is the sql monitor for the same query which is showing all time(~200+ seconds) being spent on event "reliable message" only.
Date: Thu, 8 Apr 2021 23:59:35 +0530
Message-ID: <CAKna9VaY4KFu6RFfHYnRZ4t69Ct4J2itLSejGFG-rMrgNv=Z1A_at_mail.gmail.com>
Hi All, Its version 11.2.0.4 of Oracle exadata is a 4 node RAC database. We are seeing one of the query runs normally finish in a few seconds but sometimes it runs for 3-4 minutes with the wait event being noted as "reliable message" and during that time period things seem to freeze in the database almost all the nodes getting stuck. So I am not sure if this query is the cause of the slowness or the victim, but it seems whenever such an issue occurred this query was getting executed from multiple sessions and was running longer than expected time. No change in plan happened for this query and with the same plan it used to finish in seconds during other times. So wanted to understand if we are hitting any bug around this wait event as this looks a bit unusual? It seems happening while scanning mostly table TSFS in FULL , want to understand what's wrong with scanning table TSFS?
Below attached is the sql monitor for the same query which is showing all time(~200+ seconds) being spent on event "reliable message" only.
-- http://www.freelists.org/webpage/oracle-lReceived on Thu Apr 08 2021 - 20:29:35 CEST
- text/plain attachment: Sql_monitor_reliable_message.txt