Re: Finding cause and fix for Idle wait event

From: Lok P <loknath.73_at_gmail.com>
Date: Sat, 25 Feb 2023 01:05:12 +0530
Message-ID: <CAKna9VbnWGpVksJUrJCS0EM5jo9xc8WjeFVwyOOnewzZJiKU9w_at_mail.gmail.com>



Thank you Jonathan.

Apology for not being clear. Actually our understanding was that "look up cache" means querying/joining the informatica cached result itself (in its own memory), which doesn't need to hit the database. So if that is happening along with the batched insert i.e. for each batch of insert informatica trying to do that additional lookup/query from its own cache, it may take some time and that is why we are seeing the idle wait events after each batched insert. However, after talking with the etl guys we got to know , there are some lookups happening at the initial stage of the workflow after pulling rows from other upstream systems and transforming those to get those final ~200million rows into informatica cache. But once the results set is ready no further lookup happens.

It means the batch of insert starts, when all of those 200million records are fully massaged and available for loading and no additional lookup/transformation happens during the insert operation into the database. So ideally we should not see much of those "SQL*Net message to client" in between each batch/112 rows of insert.We are, yet to pull an informatica person to see if some part of those ~200M is spilling to disk which is causing unexpected slowness here, but we checked overall memory not getting saturated at informatica level when this insert happens.

Also Jonathan, Can it be possible some unexpected network(oracle to informatica) issue can cause such slowness and we can perform some network level setting check or Does any of sdu_size or send_buff_size/recv_buff_size can play a role here?

Just to prove the point, we fetched the tkprof of that session which was doing the INSERT and below is the section of raw trace file. Also published in the gist below. Both of these point to the highest time being spent on " SQL*Net message to client".

https://gist.github.com/oraclelearner/57716ca2fe34215b3cb054886b48f872

https://gist.github.com/oraclelearner/66469ea560c24d826587ebf08814ad52

WAIT #139633662247992: nam='SQL*Net more data from client' ela= 132 driver id=1413697536 #bytes=3 p3=0 obj#=2450584 tim=10800973598191 WAIT #139633662247992: nam='cell single block physical read' ela= 494 cellhash#=3470537248 diskhash#=3304542381 bytes=8192 obj#=2450584 tim=10800973598927

WAIT #139633662247992: nam='SQL*Net more data from client' ela= 187 driver
id=1413697536 #bytes=27 p3=0 obj#=2450584 tim=10800973600635
WAIT #139633662247992: nam='cell single block physical read' ela= 440
cellhash#=3769560041 diskhash#=1003029162 bytes=8192 obj#=2450584 tim=10800973601925
WAIT #139633662247992: nam='SQL*Net more data from client' ela= 111 driver id=1413697536 #bytes=9 p3=0 obj#=2450584 tim=10800973603176 EXEC
#139633662247992:c=7156,e=8777,p=2,cr=0,cu=781,mis=0,r=112,dep=0,og=1,plh=0,tim=10800973604895 WAIT #139633662247992: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=2450584 tim=10800973604968 WAIT #139633662247992: nam='SQL*Net message from client' ela= 29153 driver id=1413697536 #bytes=1 p3=0 obj#=2450584 tim=10800973634145 BINDS #139633662247992: Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total
Waited
  • Waited ----------
    SQL*Net message from client 2305 0.62 108.36 SQL*Net more data from client 6911 0.07 1.08 cell single block physical read 5712 0.05 4.05 SQL*Net message to client 2304 0.00 0.00 gc current grant 2-way 9 0.00 0.00 PGA memory operation 1 0.00 0.00 log file switch completion 1 0.03 0.03 enq: FB - contention 359 0.00 0.07 row cache lock 2 0.00 0.00 DLM cross inst call completion 4 0.00 0.00 KJC: Wait for msg sends to complete 2 0.00 0.00 gc current multi block request 19 0.00 0.00 Disk file operations I/O 1 0.00 0.00 gc current block busy 1 0.00 0.00 gc current block 2-way 2 0.00 0.00 gc current grant busy 1 0.00 0.00 latch: redo allocation 2 0.00 0.00

On Fri, Feb 24, 2023 at 7:01 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

>
> "lookup cache is finished" means nothing to anyone who doesn't work with
> Informatica, and (based on previous experience) doesn't necessarily mean
> much to most people who do work with Informatica.
>
> I think you said this process is inserting 200M rows. Does that mean that
> somehow informatica is reading 200M rows from somewhere and then doing some
> sort of lookup to translate some incoming values to outgoing values. Does
> that "lookup" operate as something like a nested loop, or does it operate
> as a hash join, or as a merge join. Where do the 200M rows go as there
> "looked-up" values are found?
>
> Has Informatica spilled the rows to disc, and has some change in the
> lookup method meant that it now takes more time for informatica to find the
> next batch of rows. The time loss is entirely due to what Informatica is
> doing, and if you don't have the exact details of what it's doing and how
> it's doing it there's nothing you can do at the Oracle end of the system
> (which is just waiting for Informatica to supply data) to get rid of the
> waits. (You have to get the Informatica person to explain carefully how a
> "lookup" works, and maybe you can ask a few "what if" questions that might
> prompt him/her to come up with a scenario that the pair of you can agree is
> consistent with what Oracle is seeing.
>
> Regards
> Jonathan Lewis
>
>
>
>
>
>
> On Fri, 24 Feb 2023 at 12:48, Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thank you Nenad. I just checked again with the ETL dev and saw there is
>> actually lookup cache finished completely, before these batch INSERT
>> started. But still why there is so much 'sql * net message from client'
>> wait coming within each batch of INSERT submitted to the database? I keep
>> on checking gv$session when the insert is running and seeing its active and
>> then goes inactive with 'sql * net message from client' for 2-3 seconds.
>>
>> So in above scenario where no look up cache is happening but it's just
>> the batch insert running from etl, will it be network related issue then?
>> Also would it be good idea here to trace the session and serial#(like
>> below) to get any packet transfer clue?
>>
>> begin
>> dbms_monitor.session_trace_enable(
>> session_id => &m_sid,
>> serial_num => &m_serial,
>> waits => true,
>> bind => true,
>> plan_stat => 'all_executions'
>> );
>> end;
>> /
>>
>> begin
>> dbms_monitor.session_trace_disable(
>> session_id => &m_sid,
>> serial_num => &m_serial
>> );
>> end;
>> /
>>
>> On Fri, 24 Feb, 2023, 2:10 pm Nenad Noveljic, <nenad.noveljic_at_gmail.com>
>> wrote:
>>
>>> I was under impression that look ups means just filters/read from the
>>>> informatica cache which would be fast, but I agree sorting rows like
>>>> 200million and joining can be resource intensive.
>>>>
>>> Informatica can also start spilling data to files.
>>>
>>> Is there any instrumentation available st informatica which shows how
>>>> much time/resources does those cached lookups take for a workflow?
>>>
>>> Not that I'm aware of. I don't, however, know the product. But you could
>>> measure CPU/memory/IO load to get an idea how intensive the processing is.
>>>
>>>
>>>> And do you say, modifying sdu_size/tdu in client side or database is
>>>> not going to make any difference in this case?
>>>
>>> Not if the primary cause is the Informatica process being busy with
>>> data crunching.
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Feb 24 2023 - 20:35:12 CET

Original text of this message