Re: row cache lock contention parallel insert

From: LS Cheng <exriscer_at_gmail.com>
Date: Wed, 23 Dec 2009 16:51:44 +0100
Message-ID: <6e9345580912230751o6e04cb5ck4e82cbaa93680a75_at_mail.gmail.com>



Hi!

The dynamic extent allocation (10MB extent size) is around 400 per subpartition. I just noticed yesterday that the subpartition key is WRONG, there is only a value for the subpartition key so all data goes to the same subpartition, seems that the developer didnt check the values and got this wrong. This will be fixed after the holidays. I think the waits you are referring where Parallel Slaves pile up is shown as enq: HV contention.

In the 5 minutes frame there could be around 12000 (400 x 3 tables x 10 processes) extent allocations spread in 10 application processes (therefore the Parallel Slaves). Is it a lot? May be, no doubt, but should this cause hanging in a 6 nodes RAC? No, I dont think so at most I expect slow performance but not hanging.

There is a total of 64 slaves, parallel_max_servers is set to 64, there are 32 Cores.

The jobs are controlled by some metadata status tables, if not finished the following would not kicked in. For example if process1 is running for more than 5 minutes the following cycle would not start, it checks status table, if process1.status = 'RUNNING' then it waits.

The ETL processes goes like this

  1. Staging text files
  2. Read these texts files as External Table and load to a Staging Subpartitioned Table (TEMP Table)
  3. Read the Staging Table and loads to two more subpartitioned tables concurrently

Thanks!

--
LSC



On Wed, Dec 23, 2009 at 1:10 PM, Mark W. Farnham <mwf_at_rsiz.com> wrote:


> So how much dynamic extension is going on? Is this the bug where when one
> thread needs to allocate an extent all the parallel inserters to the same
> object pile up and also add an extent?
>
>
>
> So, if you count the number of extents on all the objects getting inserted
> into at 00:04:59 of your 5 minute cycle (presuming the previous cycle
> finished without exhausting the time slice), and you do it again when the
> work for the 5 minute period is over, are we routinely talking about a lot
> of extents being added or not?
>
>
>
> When you say 10 application processes but with 64 slaves, is that 64 total
> or 640 total? What is your runqueue or the equivalent for whatever OS you’re
> on?
>
>
>
> End of diagnostic data questions.
>
>
>
> Random suggestion leading question: What is driving the five minute window?
> Is this controlled such that a set of processes does not start on the five
> minute boundary if the previous set has not finished? What sort of input is
> being consumed by the insert jobs? Are the three tables the input or the
> destination. If input, are they new sets of input or rows added to
> previously processed data? Is there a way to stagger the starts
> productively?
>
>
>
> mwf
>
>
> ------------------------------
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *LS Cheng
> *Sent:* Wednesday, December 23, 2009 3:46 AM
> *To:* Greg Rahn
>
> *Cc:* info_at_sqltools-plusplus.org; oracle-l_at_freelists.org
> *Subject:* Re: row cache lock contention parallel insert
>
>
>
> Hi
>
>
> Thanks for the info
>
> The workload is more or less every 5 minutes 10 data load processes kicks
> in, 3 of them needs to load around 12 million of rows each (going through
> ETL which are 3 tables per process), the rest from 300000 to 3 million.
>
> All insert append with parallel dml and PQ for the SELECT
> (insert...select). Basically there is only 10 application processes but with
> 64 slaves running almost all the time. All ETL Tables has more or less 4000
> to 5000 subpartitions
>
> I noticed also compatible is set to 10.2.0.1 instead of 10.2.0.4, might be
> some old bug? Probably because when I was doing test with a test database,
> also 10.2.0.4 with compatible set to 10.2.0.3 I managed to reproduce massive
> row cache contention, when I changed compatible to 10.2.0.4 row cache
> requests reduced by a factor of 10. I will change compatible to 10.2.0.4
> after holidays because they dont allow anymore system changes until then
>
> Thanks!
>
> --
> LSC
>
> On Tue, Dec 22, 2009 at 4:30 PM, Greg Rahn <greg_at_structureddata.org>
> wrote:
>
> That query is used for the row cache read call back for SEG$ when the
> segment entry is not available in the row cache. Once the row cache
> entry is available, all the reads should go through row cache rather
> than from disk. It seems that something may be wrong with the row
> cache layer.
>
> Can you describe the workload?
> What are the other top SQLs?
>
>
> On Mon, Dec 21, 2009 at 3:47 PM, LS Cheng <exriscer_at_gmail.com> wrote:
> > Yes there are many executions on this two query:
> >
> > 2ym6hhaq30r73 - around 11 millions executions per hour
> > SELECT type#, blocks, extents, minexts, maxexts, extsize, extpct, user#,
> > iniexts, NVL (lists, 65535), NVL (GROUPS, 65535), cachehint,
> hwmincr,
> > NVL (spare1, 0), NVL (scanhint, 0)
> > FROM seg$
> > WHERE ts# = :1 AND file# = :2 AND block# = :3
>
> --
>
> Regards,
> Greg Rahn
> http://structureddata.org
>
>
>
-- http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 23 2009 - 09:51:44 CST

Original text of this message