Re: Hot index block split on a very busy table--what is the impact

From: Zhu,Chao <zhuchao_at_gmail.com>
Date: Wed, 4 Oct 2006 21:51:21 +0800
Message-ID: <962cf44b0610040651u1d683c48p4d9628d9fbbc36fe@mail.gmail.com>

Thanks very much to Tanel, Jonathan, and others whoe reviewed the issue.

To give an update on this issue, it turned out to be an oracle bug in implementing the _db_block_hash_latches and spin_count is set to spin_count*cpu_count internally by oracle. And another problem on scan the index structure which I don't quite clear yet.

We applied a patch and the issue is gone now.

On 9/2/06, Zhu,Chao <zhuchao_at_gmail.com> wrote:
>
>
>
> On 8/31/06, Tanel Poder <tanel.poder.003_at_mail.ee> wrote:
> >
> > You're on 9i or higher, right?
> >
> > One possibility is that this is caused by shared CBC latching
> > peculiarities:
> >
> > 1) during normal selects your index root block can be examined under a
> > shared cache buffers chains latch.
> > So as long as everybody is only reading the index root block, everybody
> > can do it concurrently (without pinning the block). The "current holder
> > count" in the CBC latch structure is just increased by one for every read
> > only latch get and decreased by one on every release. 0 value means that
> > nobody has this latch taken currently.
> >
> > Nobody has to wait for others for reading index root block in all read
> > only case. That greatly helps to combat hot index root issues.
> >
> > 2) Now if a branch block split happens a level below the root block, the
> > root block has to be pinned in exclusive mode for reflecting this change in
> > it. In order to pin a block you need to get the corresponding CBC latch in
> > exclusive mode.
> > If there are already a bunch of readers on the latch, then the exclusive
> > latch getter will just flip a bit in the CBC latch structure - stating it's
> > interest for exclusive get.
> >
> > Every read only latch get will check for this bit, if it's set, then the
> > getters will just spin instead, waiting this bit to be cleared (they may
> > yield or sleep immediately as well, I haven't checked). Now the exclusive
> > getter has to spin/wait until all the shared getters have released the latch
> > and the "current holder count" drops to zero. Once it's zero (and the getter
> > manager to get on to CPU) it can get the latch, do its work and release the
> > latch.
> >
> > During all that time starting from when the "exclusive interest" bit was
> > set, nobody could access this indexes root block except the processes which
> > already had the latch in shared mode. Depending on latch spin/sleep strategy
> > for this particular case and OSD implementation, this could mean that all
> > those "4000 readers per second" start just spinning on that latch, causing
> > heavy spike in CPU usage and they all queue up.
> >
> >
> > How do diagnose that:
> >
> > You could sample v$latch_misses to see whether the number of "kcbgtcr:
> > kslbegin shared" nowaitfails/sleeps counter takes an exceptional jump up
> > once you observe this hiccup.
> >
> >
> > How to fix that once diagnosed:
> >
> > The usual stuff, like partitioning if possible or creating a single
> > table hash cluster instead.
> >
> > If you see that the problem comes from excessive spinning, think about
> > reducing the spinning overhead (by reducing _spin_count for example). This
> > could affect your other database functions though..
> >
> > If you can't do the above - then if you have off-peak time, then analyse
> > indexes (using treedump for start) and if you see a block split coming in a
> > branch below root block, then force the branch block to split during
> > off-peak time by inserting carefully picked values into the index tree,
> > which go exactly in the range which cause the proper block to split. Then
> > you can just roll back your transaction - the block splits are not rolled
> > back nor coalesced somehow, as this is done in a separate recursive
> > transaction.
> >
> > Tanel.
> >
> >
> > ------------------------------
> > *From:* oracle-l-bounce_at_freelists.org [mailto:
> > oracle-l-bounce_at_freelists.org] *On Behalf Of *Zhu,Chao
> > *Sent:* Wednesday, August 30, 2006 21:53
> >
> > *To:* oracle-l_at_freelists.org
> > *Subject:* Hot index block split on a very busy table--what is the
> > impact
> >
> > hi, all,
> > We have a few database with very high concurrent access busy tables,
> > that some indexes of the busy tables could be accessed 1.5K-4K times per
> > second. (PK lookup)
> > I am wondering , when such index got block split (at the root, or
> > branch level), what will be the impact on the system.
> > Index block split is said to be an expensive operation, during the
> > block split on branch/root block, what the other sessions that is doing
> > select based on this index, be waiting on? There is a wait event named:
> > Index Block Split with p1,p2,p3 pointing to block address, level.
> > Document "Description of oracle 7 Wait event and enqueue" says it
> > will yield CPU.
> > We have a few production incident when load suddenly jump from 10, to
> > several hundred, and during that time nothing can be done. From our
> > monitoring tools, it is mainly "latch free" contention inside oracle, and it
> > is cache buffer chains. The load spike matches the oracle description of
> > block split, but the wait event does not match. And because the load spike
> > time is very short, we can't capture the actual wait event/p1,p2,p3 during
> > the exact time of the load spike time.
> > Anyone have similar problem/insight into this issue?
> > Is there some v$ view to track, (v$segstat does not) have log about
> > which index go through how many time block split? Or any other effecitve
> > way?
> >
> >
> > --
> > Regards
> > Zhu Chao
> > www.cnoug.org
> >
>
>
>
> --
> Regards
> Zhu Chao
> www.cnoug.org
>

-- 
Regards
Zhu Chao
www.cnoug.org

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Oct 04 2006 - 08:51:21 CDT