Re: switching to 'direct path read' should help prevent thrashing buffer cache ?

Home -> Community -> Usenet -> c.d.o.server -> Re: switching to 'direct path read' should help prevent thrashing buffer cache ?

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Thu, 21 Dec 2006 23:06:08 -0000
Message-ID: <UZudnVqEnaX-jhbYRVnyrQA@bt.com>

<p.santos000_at_gmail.com> wrote in message news:1166725969.230627.255110_at_a3g2000cwd.googlegroups.com...
>
> Jonathan Lewis wrote:
>> <p.santos000_at_gmail.com> wrote in message
>> news:1166653299.575962.3830_at_48g2000cwx.googlegroups.com...
>> > Folks,
>> > Our environment is neither an OLTP nor a DSS system. We have
>> > transactions that cross boundaries.
>> >
>> > Particularly we have these one of jobs that many of our customers run
>> > at any time during the
>> > day in which they chose to clear a table column.
>> > Most clients clear anywhere from 1 million rows to 11 million rows
>> > per job.
>> >
>> > The pseudo-code and SQL looks like this:
>> >
>> > while SQL%ROWCOUNT < 500,000
>> > do
>> > UPDATE DEMO_<CLIENT_ID>
>> > set col1= null,
>> > col2= null,
>> > col3= null
>> > WHERE ( col1 is not null OR
>> > col2 is not null OR
>> > col3 is not null ) AND ROWNUM <= 500,000;
>> > commit;
>> > done.
>> >
>> > We use a ROWNUM limit in order to prevent row blocking for other
>> > processes that
>> > might be processing single row dml against the table ..
>> >
>> > We have increased the efficiency of these processes .. making IO
>> > faster and now customers
>> > are just doing it more often. ... this obviously thrashes my buffer
>> > cache.
>> >
>> > Nearly all updates spend most of their time waiting on 'db file
>> > scattered read'.
>> > We have db_file_multiblock_read_count = 128.
>> >
>> > Should also mention that this process connects via a shared/mts
>> > connection... although
>> > we can change that if needed.
>> >
>> > I'm thinking about having just this process run in parallel in order
>> > to
>> > bypass the buffer cache because, I don't believe this process benefits
>> > from caching and it causes blocks to age out faster for other clients
>> > that are doing other things .. and do benefit from caching.
>> >
>> > My thought is that if I switch this to a dedicated connection and I
>> > add a PARALLEL hint
>> > ( even if it's just 2 parallel servers per job), the job will
>> > complete faster, it will prevent my cache from being thrashed only at
>> > the cost of more pga memory , and a little bit more io.
>> >
>> > I'm looking for the cons in doing something like this?
>> >
>>
>>
>> How big is the whole table that's being processed ?
>> How tightly co-located is the data set to be updated ?
>>
>> If you can run into locking issues doing this in a single
>> update, what does it mean for the business logic that
>> one process can (apparently) update rows that another
>> process is apparently trying to put out of bounds ?
>>
>> Parallel probably won't help - large tablescans (and
>> I assume this table is larger than 2% of the size of the
>> db_cache - but you could try setting it to NOCACHE)
>> use only the tail-end of the buffer, so tend not to do much
>> damage to the cache. Moreover, a parallel update will
>> lock the table - and you are committing regularly to try
>> and avoid locking issues.
>>
>>
>> If the few million rows represent a small fraction of the
>> data, with a good clustering effect, then creating a function
>> based index to identify them, and updating through the index
>> might help.
>>
>> Otherwise, take out the commit - you are repeatedly re-reading
>> the blocks that you have just changed, introducing extra costs
>> of cleanout, and excessive scattered reads (possibly assisted by
>> a file-system or disk cache, but still wasting resources.)
>>
>>
>>
>>
>> --
>> Regards
>>
>> Jonathan Lewis
>> http://jonathanlewis.wordpress.com
>>
>> Author: Cost Based Oracle: Fundamentals
>> http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
>>
>> The Co-operative Oracle Users' FAQ
>> http://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
>
> Jonathan,
> I'm not sure how parallel updating the table will not be faster.
> If I have to update 500K records on a table, I can do it much faster
> if
> I do it in parallel (degree 2). We don't have any parallel operations
> in
> our system at this time, but when a customer wants a "rush" job, this
> is
> how I do it for them.
>
> The particular customer that I'm investigating has a 10GB table, and
> about 150 columns. 50 of those columns are varchar2(1) columns that
> the customer "resets" or clears every week... don't ask .. I've tried
> to get
> them to increment the flag by 1 each week, but they don't want to ..
> they
> would rather reset the column to NULL.
>
> I'm afraid that NOCACHE would impair other processes that do benefit
> from caching. Also removing the commit between updates causes
> rows to be blocked for too long .. and this causes real problems in
> our system.
> Currently this customer clears 9 million records in about 8 hrs... so
> having
> rows locked that long is not possible.
>
> Does a parallel update cause more locking that the same update with
> parallelization? I don't think so.
>
> This is the big picture
>
> Some customers will clear or set to NULL a number of "flag" columns.
> These
> are all VARCHAR2(1). Then they will upload their customer lists and
> have
> our "load" process set a 1 or more columns to "1" .. to flag the user.
> This load process
> means looking up a single record etc .. and updating it by ROWID. A
> load job
> can have 5-10 million users.
>
> Then they will run about 50-200 queries during the rest of the week
> where they select users based on any WHERE clause combination ..
> including the
> "flag" that they just set...
>
> Finally, they clear/reset the columns to NULL and repeat the
> update/select/clear
> process on a weekly basis.
>
> It's a touch workload to manage, but I have no choice but to make
> this system
> work until 2008 .. when we redesign our application to no do these
> kinds of
> transactions all on the same system.
>
> -peter
>

You could try reading your question and my answer.

You suggested that going parallel would protect your buffer cache from being cached

I explained that your buffer cache was probably not being thrashed because of the way tablescans of large tables work. I didn't make any comment about the update being faster or slower.

You are now saying:

"Does a parallel update cause more locking that the same update with parallelization? I don't think so"

Despite the fact that I said in my answer:
>> Moreover, a parallel update will
>> lock the table - and you are committing regularly to try
>> and avoid locking issues.
>>

Can you explain why you think that locking the rows you are changing constitutes less locking than locking the whole table ?

And in passing, your original statement was that

"Particularly we have these one of jobs that many of our customers run "at any time during the day in which they chose to clear a table column.

But now it's

    "The particular customer that I'm investigating has a 10GB table, and
    "about 150 columns.  50 of those columns are varchar2(1) columns that
    "the customer "resets" or clears every week... don't ask .. I've tried

So is it "any time during the day" or is it "every week" ? It makes a difference
to the strategies you mght choose.

Finally, here's a section about the CACHE/NOCACHE option, taken from the 10.2 SQL reference manual:

NOCACHE For data that is not accessed frequently, this clause indicates that the
blocks retrieved for this table are placed at the least recently used end of the LRU list
in the buffer cache when a __FULL TABLE SCAN__ is performed. NOCACHE is the default for LOB storage.

(My capitals on FULL TABLE SCAN - cache/nocache does not affect other data access).

-- 
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

Received on Thu Dec 21 2006 - 17:06:08 CST