Re: cache buffer chains/where in code
Date: Sat, 28 Nov 2009 09:05:46 -0800
Message-ID: <a9c093440911280905y2738581erfdab9f2967d396e3_at_mail.gmail.com>
Given that config, I'd say that system is has at over 4X the amount of db connections it probably should (and needs to work well) - I'd back it down to 64 as a staring point and make sure the connection pool does not grow. Set the initial and max connections to be the same number. One might think that you need more sessions to keep the CPUs busy (and you may need more than 1 per CPU thread) but the reality is this: With a high number of sessions, the queue is longer for everything. The chance of getting scheduled when it needs to goes down and if there is fairly steady and a medium to high load, any "bip" will cause a massive queue for a resource. Consider what happens when calls are taking milliseconds and for a split second, some session holds a shared resource - it may take the system tens of minutes to recover from that backlog. This is why most high throughput OLTP systems only want to run at a max of 65% (or so) CPU utilization with very short run queues - so that if there is any slow down, there is enough resource head room to recover. Otherwise the system will likely be in a unrecoverable flat spin at Mach 5.
On Sat, Nov 28, 2009 at 12:13 AM, Christo Kutrovsky
<kutrovsky.oracle_at_gmail.com> wrote:
> Greg,
>
> It's a single UltraSparc T2 CPU, which is 8 cores, 8 threads. Note that each
> core has 2 integer pipelines. So you could assume 16 CPUs and 64 threads.
>
> There are many things that are wrong with this setup, and reducing the
> number of connections is something I am considering. However it's not that
> simple. Imagine that instead of CPU those were doing IO. You want to have a
> relatively deep IO queue to allow the raid array to deliver.
>
> One thing that puzzles me is given that the suspicion is deep cpu run queue
> is problems, why only one very specific latch is causing the problem. There
> are several different types of queries running at the same time, why only
> one specific query is causing latch contention, why not the other ones.
-- Regards, Greg Rahn http://structureddata.org -- http://www.freelists.org/webpage/oracle-lReceived on Sat Nov 28 2009 - 11:05:46 CST