Re: How to trace the root cause of "SQL*Net more data to client" for only few executions of a query

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 14 Oct 2019 20:56:04 +0000
Message-ID: <CWXP265MB17505DABA735FAA7AC9A6598A5900_at_CWXP265MB1750.GBRP265.PROD.OUTLOOK.COM>


As Andy says, setting an array size of 90K for a single fetch is a bit extreme. Can you confirm that what you're reporting is the "r=" value from a FETCH# line in the trace file. Can you show us a couple of WAIT lines above the one with the ela = 2 billion.

The p1 value is a little odd - unlike any I've seen before. Normally if you convert the value from decimal to hex you can see the ASCII for things like "BEQ" or "TCP", in your case the values read "(DES" - which looks like the first few characters of a tnsnames entry. It makes me wonder whether you're seeing an occasionally memory overflow corrupting the communication.

Another thought is that with the very large fetch size your Java front end randomly goes into a catastrophic garbage collection before it finally gets back to the database with the "ready for the next few rows" message.

Regards
Jonathan Lewis



From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on behalf of kunwar singh <krishsingh.111_at_gmail.com> Sent: 14 October 2019 17:31
To: Andy Sayer
Cc: Dominic Brooks; ORACLE-L; mwf_at_rsiz.com Subject: Re: How to trace the root cause of "SQL*Net more data to client" for only few executions of a query

Hi Andy,All,
Brief update.

I got the 10046 trace and found there is one single long wait , which happens sometimes and it causes the issue.

14572 WAIT #140333333333333: nam='SQL*Net more data to client' ela= 2100000012 driver id=675562835 #bytes=8100 p3=0 obj#=3434343 tim=8747474744777

At other times the wait time is like 25 micro seconds, but during the issue a single wait goes upto 2100 seconds like shown above.

JDBC Thin Client is used at the client side and FETCH calls shows more 90k rows fetched in a single call , so i assume it is set to very high or is default?

I am still check on other points that you mentioned.

Rgds,
Kunwar

On Wed, Oct 9, 2019 at 2:50 PM Andy Sayer <andysayer_at_gmail.com<mailto:andysayer_at_gmail.com>> wrote: SQL live Monitor reports use ASH behind the scenes, they ignore sql*net message from client as it’s classified as an idle wait.

More data to client is classified as active as the client has said they definitely want this data but some buffer in the middle is not big enough to fit it all in.

I would suggest looking at how much data the client is requesting and consider what it’s trying to do with it. I would guess either they’re missing some pagination (ie do one fetch only) or it’s doing some aggregation (etc) logic before the end user sees it - move that to the DB if you can.

Hope that helps,
Andy

On Wed, 9 Oct 2019 at 19:40, kunwar singh <krishsingh.111_at_gmail.com<mailto:krishsingh.111_at_gmail.com>> wrote: Hi Andy,
Good points. I use the sql monitor active report and DBA_HIST_ACTIVE_SESS_HISTORY. I have seen cases where sql*net message from client are missing from sql monitor report and DBA_HIST_ACTIVE_SESS_HISTORY. but in this case i do see few samples for "SQL*Net message from client" as well.

I am checking on other points that you have mentioned along with other useful points mentioned by Dominic, Mark .

On Wed, Oct 9, 2019 at 1:50 PM Andy Sayer <andysayer_at_gmail.com<mailto:andysayer_at_gmail.com>> wrote: What are you using to show “ most of the time is being spent on "SQL*Net more data to client"”? Is it something that will ignore sql*net message from client?

A standard 10046 trace will show you everything you really need here - rows returned in a fetch and individual wait times for this event and how often they occurred for a fetch.

What’s client driver being used? How is the fetching configured?

Thanks,
Andy

On Wed, 9 Oct 2019 at 18:34, Mark W. Farnham <mwf_at_rsiz.com<mailto:mwf_at_rsiz.com>> wrote: and (not but) are different client programs or client locations the “catchers” of the data?

if different, seeing if the slow ones are consistently of a type could be useful in diagnosis and prevention.

a VLAN with no real QOS (even if configured and promised) can also correlate with this, especially if the physical LAN is used for video.

mwf

From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> [mailto:oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>] On Behalf Of Dominic Brooks Sent: Wednesday, October 09, 2019 1:16 PM To: krishsingh.111_at_gmail.com<mailto:krishsingh.111_at_gmail.com> Cc: ORACLE-L
Subject: Re: How to trace the root cause of "SQL*Net more data to client" for only few executions of a query

What arraysize is being used ? What’s the average row size?

https://blog.tanelpoder.com/2008/02/10/sqlnet-message-to-client-vs-sqlnet-more-data-to-client/ Sent from my iPhone

On 9 Oct 2019, at 17:32, kunwar singh <krishsingh.111_at_gmail.com<mailto:krishsingh.111_at_gmail.com>> wrote: Hi Listers,
Our customer have one job which runs daily and one of the sqls which executes like 500 times had 5 odd executions where it runs slower and most of the time is being spent on "SQL*Net more data to client". Normal run times are like few seconds and the longer run times are like 1 hour or so.

For other executions we dont see that wait event being a problem.

What kind of tracing can we do to go in depth of the issue. I want to be sure before i suggest some tuning at SDU sizing level etc. Because we cannot predict when the slow run of the query will happen we cannot wait for it and then run strace etc exactly at that time ,Basically we will have to automate the tracing step.

--

Cheers,
Kunwar

--

Cheers,
Kunwar

--

Cheers,
Kunwar
--

http://www.freelists.org/webpage/oracle-l Received on Mon Oct 14 2019 - 22:56:04 CEST

Original text of this message