Re: Sun T2000

From: Alexander Fatkulin <afatkulin_at_gmail.com>
Date: Tue, 27 Mar 2007 15:14:11 +1100
Message-ID: <49d668000703262114r7945d22k1c509b052ef77c5@mail.gmail.com>

I've asked Riyaj Shamsudeen permission in order to forward this message to a list. He had some observations regarding the Sun E25K platform.

Original conversation (I've added benchmarks for Opteron and Itanium2)

Riyaj,

thanks for supplying this info.

About remote memory references - yes it's supposed to be slower in NUMA (but 7-8 times looks like a rather huge penalty for doing this?). But anyway - see my latest post where I simply referenced dual over and over and over. It's a read-only memory location so hardware should be able to cache it to avoid remote memory references?

About CPU per LIO.

Here is another little benchmark that compares doing an ~ 1 000 000 LIO in a NL join.

SQL> create table t (n, v) pctfree 90 pctused 0 as 2 select level, rpad('a', 16000, 'a') from dual connect by level <= 1000;

Table created.

No we can join this table to itself in a NL join doing approximately 1 000 000 LIO. Prolian results:

select count(*) from (
select /*+ use_nl(a b) */ *
from t a, t b
where a.n=b.n
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse        1      0.00       0.00          0        116          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      3.53       3.44          0    1007006          0           1


------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        4      3.53       3.45          0    1007122          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 54

Rows Row Source Operation

-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=1007006 pr=0 pw=0 time=3448139 us)

1000 NESTED LOOPS (cr=1007006 pr=0 pw=0 time=4338772 us) 1000 TABLE ACCESS FULL T (cr=1006 pr=0 pw=0 time=10076 us) 1000 TABLE ACCESS FULL T (cr=1006000 pr=0 pw=0 time=3433597 us)

T2000:

select count(*) from (
select /*+ use_nl(a b) */ *
from t a, t b
where a.n=b.n
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse        1      0.03       0.02          0        140          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     17.57      17.15          0    1007006          0           1


------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        4     17.60      17.18          0    1007146          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 5

Rows Row Source Operation

-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=1007006 pr=0 pw=0 time=17158504 us)

1000 NESTED LOOPS (cr=1007006 pr=0 pw=0 time=17266923 us) 1000 TABLE ACCESS FULL T (cr=1006 pr=0 pw=0 time=25090 us) 1000 TABLE ACCESS FULL T (cr=1006000 pr=0 pw=0 time=17126285 us)

Sun V40z (Opteron 852 - 2.6Ghz):

select count(*) from (
select /*+ use_nl(a b) */ *
from t a, t b
where a.n=b.n
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse        1      0.00       0.00          0          2          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.17       2.38          0    1007006          0           1


------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        4      2.17       2.38          0    1007008          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 59

Rows Row Source Operation

-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=1007006 pr=0 pw=0 time=2381925 us)

1000 NESTED LOOPS (cr=1007006 pr=0 pw=0 time=2561554 us) 1000 TABLE ACCESS FULL T (cr=1006 pr=0 pw=0 time=6077 us) 1000 TABLE ACCESS FULL T (cr=1006000 pr=0 pw=0 time=2368750 us)

HP Integrity SuperDome (1.1GHz Itanium MX2 CPU, SX1000 chipset):

select count(*) from (
select /*+ use_nl(a b) */ *
from t a, t b
where a.n=b.n
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse        1      0.01       0.00          0          3          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      5.06       4.95          0    1006005          0           1


------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        4      5.07       4.96          0    1006008          0           1

Misses in library cache during parse: 1
Optimizer mode: CHOOSE
Parsing user id: 5

Rows Row Source Operation

-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=1006005 pr=0 pw=0 time=4955091 us)

1000 NESTED LOOPS (cr=1006005 pr=0 pw=0 time=5611484 us) 1000 TABLE ACCESS FULL T (cr=1005 pr=0 pw=0 time=7049 us) 1000 TABLE ACCESS FULL T (cr=1005000 pr=0 pw=0 time=4946374 us)

Forwarded message ---------- From: Shamsudeen, Riyaj <riyaj.shamsudeen_at_cingular.com> Date: Mar 27, 2007 12:49 AM Subject: Your posting about CPU speed To: afatkulin_at_gmail.com

We had very similar issue in 2005. We moved from one of the Fujitsu server to E25K server and performance was horrible. No change in Application, DB or Disk arrays.

One key indicator that you could use is: CPU seconds consumed per million logical reads. [Steve Adams pointed me in correct direction ]

If you have old statspack and new statspack data with good application workload, you can easily calculate this metric. Basically, 'CPU used by this session' and logical reads[ Consistent gets + current gets ] from statspack repository will be used to calculate this metric (Sorry, I seem to have lost actual SQL).

After fighting for over 47 days with Sun, we gave up and went back to Fujitsu. In the attached graph, you can see a clear jump (on 5/9/2005) after migrating to E25K server. Specifically, copying in to the log buffer was slower in E25K server.

Something fishy about Sun servers, seemingly remote memory access is slower.

Another test, use psradmin and attach your dedicated server process to a specific CPU and repeat the test for as many CPUs. This might throw more light in to this situation.

If you are in solaris 10, you could also use dtrace to print statistics for time spent for remote memory access. Sun engineers might be able to help you better for this.

Sorry, I don't have posting privileges yet and can't post to the list.

Thanks

Riyaj Shamsudeen

-- 
Alexander Fatkulin
--
http://www.freelists.org/webpage/oracle-l

Received on Mon Mar 26 2007 - 23:14:11 CDT