RE: High consistent Gets and response time in prod

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Sat, 1 May 2021 09:16:54 -0400
Message-ID: <35a901d73e8c$41a8e410$c4faac30$_at_rsiz.com>



Improper cleanout is not a required contribution, but can make “empty front” more expensive. In your case it seems you read enough blocks to explain the elapsed time.  

Oracle does not order either freelists or free bitmaps so that lowest blocks are used first (even on conventional inserts), and append inserts go high regardless of available free blocks. Filtering on rownum for 1 row and getting that huge number of reads is sufficient evidence for me that this is “empty front.” (running snapstat or Tanel’s much more complete snapper is not a bad idea, ever, if you want to prove what’s going on in detail. I just don’t see a need for it this time.)  

Since at least Oracle 4 this has been a problem, but until the advent of “append” it was not very commonly a noticeable problem. (When combined with pending cleanout and only reads into the PGA it can be hilariously expensive). Jerry told me they never saw it in 2 or 3, but that may have been because it was standard operating procedure to full export, full reload in 2 and 3. Every 4 hours for 2, daily in 3.  

Combined with unindexed checks for row existence to gate whether or not to run some transactional process empty front can be a real drag.

Since Oracle does not have a way to set the low water mark to the lowest block having any contents, fixing this probably means reloading. (Adding that to the code without creating a bunch of bugs is more probably more difficult than it sounds, and the existence of the “move” operation mitigates against implementing this probably much cheaper operation.)  

Avoiding this problem (if you have partitioning) in the case of rows with born on dates that correlate with the discard date is simply to partition by date, throwing away old partitions as they become empty.  

IF this was the result of a long overdue cleanup (Oracle E-biz Workflow tables come to mind), and you do have non-direct inserts routinely, a onetime move or reload is probably all you need. IF you routinely delete obsolete rows and only append new rows in, periodically (once daily, weekly, monthly?) tracking the number of consistent gets to find the first row can give you a diagnostic of when it is worthwhile to re-org.  

The thinking on that is roughly: If the sum of time spent reading past empty front blocks in your application exceeds the time to re-build plus the hassle of operationally scheduling the rebuild, then do it. The hassle of operationally scheduling a rebuild varies wildly with the polar easy for shops that are strictly only online from 9 AM to 5 PM in a single time zone and the biggest table you have can be rebuilt overnight to extremely hard for globally available 24x7x365 operations with giant monoliths.  

The special case of empty front is one of the culprits in propagation of the freespace fragmentation myth and old school “rebuild everything” notions. Unless an unmonitored rebuild everything routine operation is definitively the cheapest thing for you to do, don’t go that way.  

Very most often highly selective rebuilds for a reason are useful. This sure looks like one.  

I presume you loaded a fresh empty table from prod for test, right?  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jonathan Lewis Sent: Saturday, May 01, 2021 4:05 AM
To: ORACLE-L
Subject: Re: High consistent Gets and response time in prod    

Various scenarios could produce this.

As others have said you need more information - Sayan suggested Tanel's "snapper" which is a tool that everyone should know about, but if you haven't used it before  

Start a new session

run the query  

select

        name, value
from

        V$mystat
join v$statname
using

        (statistic#)
where

        value != 0
/  

That will tell us what sort of work the session has done.

There are two obvious possibilities:

  • it's doing a tablescan of a very large table which has had almost all the data deleted butbut the blocks haven't been cleaned out properly.
  • it's doing a huge amount of work to get read-consistent versions of blocks.

In the first case we will see lots of multiblock reads, in the second we'll see lots of "xxxx - undo records applied".  

You mentioned the 30 hour parallel query - are these actually related to DML, have they got uncommitted transactions sitting behind them

Check v$lock for locks TM locks on the table

Check v$transaction for transactions that have done a lot of work (used_urec) and started a long time in the past.  

You could also query v$session_event

select event, total_waits, time_waited
from v$session_event
where sid = sys_context('userenv','sid') order by

        time_waited
/  

And, of course, you could enable extend tracing (10046 level 8) before running the query to get a detailed sequence of activity for the session    

Regards

Jonathan Lewis    

P.S. The fact that the table is 11.2M blocks (when it looks like it should be less) is interesting - and the 11.5M physical reads suggests a very long (empty) tablescan to find the first row with a few hundred thousand reads of undo to check commit times ("transaction table consistent reads - undo records applied"). There's a hint of big processes doing "delete everything, insert it all again" and not cleaning up the space management bitmaps properly. (Oracle has a history of this, but I don't know if it can still happen in 19c.)        

On Fri, 30 Apr 2021 at 23:39, Ram Raman <veeeraman_at_gmail.com> wrote:

Hi,  

Fri evening and I have a problem. grrr..  

We have a simple SELECT statement like this:  

select * from c.pd where rownum <=1    


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |


| 0 | SELECT STATEMENT | | 1 | 993 | 4 (0)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | TABLE ACCESS FULL| PD | 1 | 993 | 4 (0)| 00:00:01 |


 

Predicate Information (identified by operation id):


 

   1 - filter(ROWNUM<=1)    

Statistics


          0 recursive calls

          0 db block gets

   11290619 consistent gets

   11289575 physical reads

       8024 redo size

       8510 bytes sent via SQL*Net to client

        408 bytes received via SQL*Net from client

          2 SQL*Net roundtrips to/from client

          0 sorts (memory)

          0 sorts (disk)

          1 rows processed    

First thing that jumps out is the number of consistent gets, which is in the same range in prod again and again. In test also, the consistent gets are in the same range for repeated executions - just couple of dozens.    

TABLE_NAME                               NUM_ROWS LAST_ANAL       BLOCKS AVG_ROW_LEN   PCT_FREE   PCT_USED EMPTY_BLOCKS  AVG_SPACE

------------------------ ------------------------ --------- ------------ ----------- ---------- ---------- ------------ ----------

 CHAIN_CNT PAR MON COMPRESS

  • --- --- --------
PD                                 1,420,080 29-APR-21   11,537,288         993         10                       0        0

         0 NO  YES DISABLED

 

What is confusing is that the number of blocks the table takes in the prod. I did some calc and I feel that it should take around 172,000+ blocks, given our block size is 8K, but I am unable to explain 11.5M blocks.    

--
http://www.freelists.org/webpage/oracle-l
Received on Sat May 01 2021 - 15:16:54 CEST

Original text of this message