Home » RDBMS Server » Performance Tuning » Help proving a database is slow (10.2 / Solaris)
Help proving a database is slow [message #458786] |
Tue, 01 June 2010 19:13 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I'm experiencing some very strange behaviour in my database, and it's happened at the worst time possible: during deployment.
Prior to now, it has been working admirably, but something has changed and noone is admitting to having changed anything.
I have a very simple test case that demonstrates the problem, but I do not have a benchmark to compare against. If someone asks "why do you think it should run faster?" I don't have a good answer. I want some proof that lower-spec boxes around the world are doing this orders of magnitude faster than our box.
What I would like is for some of you guys to run a query on your very large tables (>10M rows) and tell me the performance.
Here's the query:
select *
from (
select /*+first_rows*/ *
from <really big table> sample (1) a
join <same really big table> b
on a.primary_key = b.primary_key
and rownum <= 10000
)
where rownum > 1;
Please time it the FIRST TIME, because I want to rule out caching benefits.
The objective here is to pick up 10000 from all over the table using the SAMPLE mechanism, and then lookup those rows in the same table by using the Primary Key index. This should be especially fast because the table rows that are accessed via the index are already in the buffer cache from the SAMPLE access.
The ROWNUM > 1 will ensure that rows are not returned across the network, thereby isolating the test to the DB server alone.
I am getting consistent results across my database: with tables having row widths of 200-300 bytes, this query takes around 130s. This equates to <100 rows/sec. Prior to this problem we were getting Nested Loops joins of 6-10 tables returning rows at around 500-600 rows/sec. Now here's something really weird, a job that was using a Nested Loops plan was running at 50 rows/sec for about 2 hours yesterday, when it suddenly accellerated to 500+ rows/sec for the remainder of the job (another 30 mins or so). That improvement was temporary, and now the DB is back to its old tricks.
Here's what we have already checked:
- Contention: This is the only query running on the database. I have checked this using OEM looking at active sessions, and by performing a "ps -ef" on Unix.
- Buffer Cache too small: Currently 928MB, but we are going to increase it this morning - just need to schedule an outage.
- Shared Pool too small: Currently 240MB, but will increase.
I don't think these values are small enough to warrant the bad performance, and the DBA has not changed them recently, and most importantly, they did not change in the middle of the job that suddenly accellerated.
- Other databases on the same box: There is 1 other database. It is idle.
- Non-database activity on the box: "ps -ef" and "top" shows no other activity. We have also bounced the box.
- Heavy disk usage: "sar -d 5 1" does show some activity on the disks where our data is stored that I cannot explain, but it is very modest (<1%) compared to when a query is running. The avwait statistic on sar is showing 0.0 both when the box is idle and when my query is running.
The disks that this database resides on are physical devices that may be shared across other computers. I cannot see activity generated by the other computers, although Full Table Scan performance is great. It is only indexed lookups that are bad.
If anyone can help by running the above query on their large table, I would greatly appreciate knowing:
- The time taken
- The AVG_ROW_LEN of the table (from USER_TABLES)
- # rows in the table
- Any specs of your environment (buffer cache, shared pool, number of CPUs, RAM.
Ross Leishman
|
|
|
|
|
Re: Help proving a database is slow [message #458854 is a reply to message #458798] |
Wed, 02 June 2010 03:38 |
John Watson
Messages: 8962 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
Here you go, hope it is useful. btw, low spec demo system - nothing else happening.
1 select *
2 from (
3 select /*+first_rows*/ *
4 from sc_cp_local sample (1) a
5 join sc_cp_local b
6 on a.prk = b.prk and a.prl=b.prl
7 and rownum <= 10000
8 )
9* where rownum > 1
EVD> /
no rows selected
Elapsed: 00:01:04.21
Execution Plan
----------------------------------------------------------
Plan hash value: 882620215
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 38M| | 419K (2)| 01:23:54 |
| 1 | COUNT | | | | | | |
|* 2 | FILTER | | | | | | |
| 3 | VIEW | | 10000 | 38M| | 419K (2)| 01:23:54 |
|* 4 | COUNT STOPKEY | | | | | | |
|* 5 | HASH JOIN | | 596K| 55M| 35M| 419K (2)| 01:23:54 |
| 6 | TABLE ACCESS SAMPLE| SC_CP_LOCAL | 591K| 28M| | 125K (2)| 00:25:04 |
| 7 | TABLE ACCESS FULL | SC_CP_LOCAL | 59M| 2652M| | 125K (2)| 00:25:09 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM>1)
4 - filter(ROWNUM<=10000)
5 - access("A"."PRK"="B"."PRK" AND "A"."PRL"="B"."PRL")
Statistics
----------------------------------------------------------
404 recursive calls
0 db block gets
456903 consistent gets
464165 physical reads
0 redo size
550 bytes sent via SQL*Net to client
405 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
0 rows processed
EVD> select avg_row_len from user_tables where table_name='SC_CP_LOCAL';
AVG_ROW_LEN
-----------
51
EVD> select count(*) from SC_CP_LOCAL;
COUNT(*)
----------
58633079
Elapsed: 00:01:21.88
EVD> sho sga
Total System Global Area 1071333376 bytes
Fixed Size 1318172 bytes
Variable Size 444596964 bytes
Database Buffers 612368384 bytes
Redo Buffers 13049856 bytes
EVD> select stat_name,value from v$osstat;
STAT_NAME VALUE
---------------------------------------------------------------- ----------
NUM_CPUS 2
IDLE_TIME 17787711
BUSY_TIME 386805
USER_TIME 249457
SYS_TIME 136035
IOWAIT_TIME 462129
NICE_TIME 551
RSRC_MGR_CPU_WAIT_TIME 0
LOAD .559570313
PHYSICAL_MEMORY_BYTES 2121875456
TCP_SEND_SIZE_MIN 4096
STAT_NAME VALUE
---------------------------------------------------------------- ----------
TCP_SEND_SIZE_DEFAULT 16384
TCP_SEND_SIZE_MAX 131072
TCP_RECEIVE_SIZE_MIN 4096
TCP_RECEIVE_SIZE_DEFAULT 87380
TCP_RECEIVE_SIZE_MAX 174760
GLOBAL_SEND_SIZE_MAX 131071
GLOBAL_RECEIVE_SIZE_MAX 131071
18 rows selected.
Elapsed: 00:00:00.23
EVD>
|
|
|
Re: Help proving a database is slow [message #458867 is a reply to message #458854] |
Wed, 02 June 2010 04:57 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Huh! For some reason it ignored the FIRST_ROWS hint and did a HASH JOIN.
You definitely have a UNIQUE index on (PRK, PRL), right? Don't suppose you could run it again with /*+ ORDERED USE_NL(b) INDEX(b)*/? But man, full scan of 59M rows in a minute on your "low spec" system. The tool who put together our 8CPU SPARC T2000 with 32GB RAM should be ashamed, cos we've got some real problems. Our full scans don't perform ANYTHING like that, so I'm thinking the problem is not local to NESTED LOOPS joins.
Ross Leishman
|
|
|
|
Re: Help proving a database is slow [message #458976 is a reply to message #458786] |
Wed, 02 June 2010 15:13 |
moshea
Messages: 51 Registered: February 2008 Location: Dublin, Ireland
|
Member |
|
|
40+ seconds on a table with 9.3M fat rows
12 seconds on a table with 125M+ short rows.
Hmm, quite the difference.
Its our test box, but its actually the same class of HP hardware we use in production.
-Michael
20:31:21 SQL> set lin 130
20:31:34 SQL> select avg_row_len from all_tables
20:31:42 2 where table_name = 'TRANSMAIN';
AVG_ROW_LEN
-----------
590
20:31:44 SQL> set autotrace traceonly
20:32:21 SQL>
20:32:21 SQL> set timing on
20:32:29 SQL>
20:32:30 SQL> select *
20:32:30 2 from (
20:32:30 3 select /*+first_rows*/ *
20:32:30 4 from TRANSMAIN sample (1) a
20:32:30 5 join TRANSMAIN b
20:32:30 6 on a.transik = b.transik
20:32:30 7 and rownum <= 10000
20:32:30 8 )
20:32:30 9 where rownum > 1;
no rows selected
Elapsed: 00:00:41.61
Execution Plan
----------------------------------------------------------
Plan hash value: 4091699509
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 75M| 357K (1)| 01:11:30 |
| 1 | COUNT | | | | | |
|* 2 | FILTER | | | | | |
| 3 | VIEW | | 10000 | 75M| 357K (1)| 01:11:30 |
|* 4 | COUNT STOPKEY | | | | | |
| 5 | NESTED LOOPS | | 93094 | 104M| 357K (1)| 01:11:30 |
| 6 | TABLE ACCESS SAMPLE | TRANSMAIN | 93094 | 52M| 171K (1)| 00:34:14 |
| 7 | TABLE ACCESS BY INDEX ROWID| TRANSMAIN | 1 | 590 | 2 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | P_TRANSMAIN | 1 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM>1)
4 - filter(ROWNUM<=10000)
8 - access("A"."TRANSIK"="B"."TRANSIK")
Statistics
----------------------------------------------------------
77 recursive calls
0 db block gets
43629 consistent gets
64260 physical reads
0 redo size
20774 bytes sent via SQL*Net to client
227 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
0 rows processed
20:33:13 SQL>
20:34:15 SQL>
20:38:16 SQL>
20:39:22 SQL> set autotrace off
20:39:35 SQL> select stat_name, value from v$osstat;
STAT_NAME VALUE
---------------------------------------------------------------- ----------
NUM_CPUS 10
IDLE_TIME 310772517
BUSY_TIME 15574453
USER_TIME 11029470
SYS_TIME 4544983
IOWAIT_TIME 13043895
AVG_IDLE_TIME 31067476
AVG_BUSY_TIME 1547722
AVG_USER_TIME 1093178
AVG_SYS_TIME 444717
AVG_IOWAIT_TIME 1294669
OS_CPU_WAIT_TIME 5.2162E+12
RSRC_MGR_CPU_WAIT_TIME 0
LOAD .0078125
NUM_CPU_SOCKETS 10
PHYSICAL_MEMORY_BYTES 5.0413E+10
VM_IN_BYTES 1393090560
VM_OUT_BYTES 1601556480
18 rows selected.
Elapsed: 00:00:00.01
20:39:36 SQL> sho sga
Total System Global Area 3992977408 bytes
Fixed Size 2159400 bytes
Variable Size 3929836760 bytes
Database Buffers 50331648 bytes
Redo Buffers 10649600 bytes
20:42:27 SQL>
20:42:30 SQL> select count(*) from transmain;
COUNT(*)
----------
9331892
Elapsed: 00:00:06.11
20:49:40 SQL>
20:53:54 SQL>
20:53:54 SQL> set autotrace traceonly
20:54:06 SQL>
20:54:07 SQL> set timing on
20:54:09 SQL>
20:54:10 SQL> select *
20:54:11 2 from (
20:54:11 3 select /*+first_rows*/ *
20:54:11 4 from calcfiguresub sample (1) a
20:54:11 5 join calcfiguresub b
20:54:11 6 on a.calcfigik = b.calcfigik and a.calcfigsubdatik = b.calcfigsubdatik
20:54:11 7 and rownum <= 10000
20:54:11 8 )
20:54:11 9 where rownum > 1;
no rows selected
Elapsed: 00:00:12.48
Execution Plan
----------------------------------------------------------
Plan hash value: 143895197
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 761K| 3820K (1)| 12:44:02 |
| 1 | COUNT | | | | | |
|* 2 | FILTER | | | | | |
| 3 | VIEW | | 10000 | 761K| 3820K (1)| 12:44:02 |
|* 4 | COUNT STOPKEY | | | | | |
| 5 | NESTED LOOPS | | 1239K| 28M| 3820K (1)| 12:44:02 |
| 6 | TABLE ACCESS SAMPLE | CALCFIGURESUB | 1239K| 14M| 98853 (2)| 00:19:47 |
| 7 | TABLE ACCESS BY INDEX ROWID| CALCFIGURESUB | 1 | 12 | 3 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | P_CALCFIGURESUB | 1 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM>1)
4 - filter(ROWNUM<=10000)
8 - access("A"."CALCFIGIK"="B"."CALCFIGIK" AND
"A"."CALCFIGSUBDATIK"="B"."CALCFIGSUBDATIK")
Statistics
----------------------------------------------------------
290 recursive calls
0 db block gets
43546 consistent gets
6703 physical reads
0 redo size
394 bytes sent via SQL*Net to client
227 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
0 rows processed
20:54:23 SQL> set autotrace off
20:56:13 SQL> select count(*) from calcfiguresub;
COUNT(*)
----------
125261366
Elapsed: 00:02:44.41
20:59:11 SQL> select avg_row_len from all_tables
20:59:19 2 where table_name = 'CALCFIGURESUB';
AVG_ROW_LEN
-----------
12
Elapsed: 00:00:00.06
20:59:27 SQL>
|
|
|
Re: Help proving a database is slow [message #458997 is a reply to message #458976] |
Wed, 02 June 2010 19:38 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Hey, thanks for the support you guys. I'm overwhelmed. This is teriffic stuff.
For those interested, one of our Unix guys (as opposed to "their" Unix guys, because the hardware and DBA support is outsourced) has been doing some raw tests with Unix dd to benchmark the disks, and they are writing and reading 10 times slower than our production environment.
Because these are NFS disks on an SVT environment, we suspect that we are sharing a network device with another application in SVT, which it flogging the network that connects the disks to our Oracle server. Just a theory.... very difficult to prove with our levels of access, but your benchmarks make me much more confident of our position.
Ross Leishman
|
|
|
Re: Help proving a database is slow [message #459005 is a reply to message #458786] |
Wed, 02 June 2010 22:40 |
trantuananh24hg
Messages: 744 Registered: January 2007 Location: Ha Noi, Viet Nam
|
Senior Member |
|
|
OKie, I'll post the test sql as you suggest.
logvnp@VNP> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for Solaris: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
5 rows selected.
logvnp@VNP> select count(1) from big_table;
COUNT(1)
----------
1000000
logvnp@VNP> select sid from v$session
2 where username='LOGVNP'
3 and program like 'sq%';
sid
----------
383
logvnp@VNP> set autotrace traceonly explain
logvnp@VNP> select *
2 from (
3 select /*+first_rows*/*
4 from big_table sample(1) a
5 join big_table b
6 on a.id=b.id
7 and a.object_name=b.object_name
8 and rownum<=10000)
9 where rownum<1
10 /
Execution Plan
----------------------------------------------------------
Plan hash value: 1902106588
--------------------------------------------------------------------------------
-----
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
--------------------------------------------------------------------------------
-----
| 0 | SELECT STATEMENT | | 1 | 282 | 3210 (2)| 00:00
:39 |
|* 1 | COUNT STOPKEY | | | | |
|
| 2 | VIEW | | 10000 | 2753K| 3210 (2)| 00:00
:39 |
|* 3 | COUNT STOPKEY | | | | |
|
|* 4 | HASH JOIN | | 10000 | 2841K| 3210 (2)| 00:00
:39 |
| 5 | TABLE ACCESS SAMPLE| BIG_TABLE | 10000 | 947K| 3207 (2)| 00:00
:39 |
| 6 | TABLE ACCESS FULL | BIG_TABLE | 200 | 19400 | 2 (0)| 00:00
:01 |
--------------------------------------------------------------------------------
-----
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<1)
3 - filter(ROWNUM<=10000)
4 - access("A"."ID"="B"."ID" AND "A"."OBJECT_NAME"="B"."OBJECT_NAME")
logvnp@VNP> set autotrace off
logvnp@VNP> select stat_name, value
2 from v$osstat;
STAT_NAME VALUE
---------------------------------------------------------------- ----------
NUM_CPUS 4
IDLE_TIME 50166369
BUSY_TIME 8755311
USER_TIME 7659225
SYS_TIME 1096086
IOWAIT_TIME 0
AVG_IDLE_TIME 12537823
AVG_BUSY_TIME 2185126
AVG_USER_TIME 1911057
AVG_SYS_TIME 270313
AVG_IOWAIT_TIME 0
OS_CPU_WAIT_TIME 541400
RSRC_MGR_CPU_WAIT_TIME 0
LOAD .87109375
PHYSICAL_MEMORY_BYTES 1.7093E+10
VM_IN_BYTES 3877429248
VM_OUT_BYTES 2345951232
17 rows selected.
However, why did you think that join a big table by itself will help you apologize the different between many system?
Let me take an another statement, and measure it.
logvnp@VNP> set feedback off
logvnp@VNP> select * from (
2 select * from
3 big_table a
4 join big_table b
5 on a.id=b.id
6 and a.object_name=b.object_name)
7 /
logvnp@VNP> @snap ash 5 1 383
Sampling with interval 5 seconds, 1 times...
-- Session Snapper
-----------------------------------------------------------------------
Active% | SQL_ID | EVENT | WAIT_CLASS
-----------------------------------------------------------------------
76% | 62jynh05qw7zw | direct path write temp | User I/O
24% | 62jynh05qw7zw | ON CPU | ON CPU
-- End of ASH snap 1, end=2010-06-03 10:42:00, seconds=5, samples_taken=41
PL/SQL procedure successfully completed.
logvnp@VNP> col sql_text format a30
logvnp@VNP> select sql_text, cpu_time/1000000 cpuon, parse_calls
2 from v$sql
3 where sql_id='62jynh05qw7zw';
SQL_TEXT CPUON PARSE_CALLS
------------------------------ ---------- -----------
select * from ( select * from 4.106833 1
big_table a join big_table b
on a.id=b.id and a.object_name
=b.object_name)
logvnp@VNP> set autotrace traceonly explain
logvnp@VNP> select * from (
2 select * from
3 big_table a
4 join big_table b
5 on a.id=b.id
6 and a.object_name=b.object_name)
7 /
Execution Plan
----------------------------------------------------------
Plan hash value: 1690342657
--------------------------------------------------------------------------------
--------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Ti
me |
--------------------------------------------------------------------------------
--------
| 0 | SELECT STATEMENT | | 1000K| 185M| | 16904 (2)| 00
:03:23 |
|* 1 | HASH JOIN | | 1000K| 185M| 103M| 16904 (2)| 00
:03:23 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 1000K| 92M| | 3258 (3)| 00
:00:40 |
| 3 | TABLE ACCESS FULL| BIG_TABLE | 1000K| 92M| | 3258 (3)| 00
:00:40 |
--------------------------------------------------------------------------------
--------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."ID"="B"."ID" AND "A"."OBJECT_NAME"="B"."OBJECT_NAME")
The only thing made differently was IO, I think, don't I?
|
|
|
Re: Help proving a database is slow [message #459007 is a reply to message #459005] |
Wed, 02 June 2010 23:00 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Thanks trantuananh24hg. 1M rows is not really a big enough table for my preferences in this test. I also think maybe you are missing a PK index, because your SQL performed a HASH JOIN, or maybe your database just ignored the FIRST_ROWS hint like John Watson's.
The reason I constructed the query the way I did is because I wanted to demonstrate the cost of accessing single blocks from disparate locations on disk, without overheads of returning data across the network. Your second example uses FULL TABLE SCANS, which will perform multi-block reads of contiguous blocks on disk. Also, even through you specified EXPLAIN ONLY, SQL*Plus still pulls the data back across the network, it just does not display.
Thanks for your input.
Ross Leishman
|
|
|
Re: Help proving a database is slow [message #459195 is a reply to message #459007] |
Thu, 03 June 2010 17:23 |
coleing
Messages: 213 Registered: February 2008
|
Senior Member |
|
|
I cant do any tests for you as im not at work at the moment, but it is sounding like sequential reads are fine, but scattered reads are not?
Are you sure your indexes have an ok clustering factor?
|
|
|
|
Re: Help proving a database is slow [message #459201 is a reply to message #458786] |
Thu, 03 June 2010 21:41 |
trantuananh24hg
Messages: 744 Registered: January 2007 Location: Ha Noi, Viet Nam
|
Senior Member |
|
|
@reishman: The Big_table was created by looping insert from all_objects, so that, it only has got check constraint, not is primary key. So, I can solve it by using Exceptions table, but if I use, the big table can not contained to 10m rows.
@coleing: I guess the block size is 8K.
|
|
|
Re: Help proving a database is slow [message #459418 is a reply to message #459201] |
Fri, 04 June 2010 18:42 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Coleing, we are a bit powerless. Pretty much just the application design and programming has been contracted to us.
Regarding clustering factor, I designed the test case above to thwart clustering factors by choosing rows from all over the place. I was looking to benchmark the "worst-case", where each row that you process is in a different block. Even though this is an undesirable application characteristic, I think it should still pump data out at MUCH better than <100 rows/sec.
We might have had a shot at infrastructure redesign if we had caught it earlier, but we are in the middle of deployment now, so it's too late.
The reason we haven't noticed it before now is because it hasn't been a problem. Historic performance hasn't been blindingly fast, but it has been quite viable.
We escalated the issue using the Unix "dd" test case, which showed the disk to be slow without bringing Oracle into the equation. They identified a setting in the NFS that was different to our PROD system - something to do with "Direct IO". We had an outage and they changed the setting. This fixed the Unix "dd" test case (10x faster than it was), but the Oracle test case I posted above is unchanged.
Someone mentioned that there is an Oracle setting related to "Direct IO", and that even if we change it on the NFS, Oracle won't use it. If there is any truth to this then we need to escalate it to the outsourced DBAs.
Ross Leishman
|
|
|
Re: Help proving a database is slow [message #459729 is a reply to message #459418] |
Tue, 08 June 2010 02:06 |
hkchital
Messages: 128 Registered: September 2008 Location: Singapore
|
Senior Member |
|
|
>Someone mentioned that there is an Oracle setting related to "Direct IO"
That would be "filesystemio_options" which can be "none", "directio", "asynch" or "setall"
If they have enabled Direct I/O at the server level, all your file I/O is not using FileSystem Buffers. Then again, with NFS, this should have been enabled.
With Direct I/O enabled, you generally need to increase your db_buffer_cache to overcome the loss of the OS (filesystem) buffers !
I am not very sure about your SQL test whether that is the right way to "prove" something. You are attempting Random, single block *reads*. How about multiblock reads ? How about writes ?
Hemant K Chitale
[Updated on: Tue, 08 June 2010 02:07] by Moderator Report message to a moderator
|
|
|
Re: Help proving a database is slow [message #460153 is a reply to message #459729] |
Wed, 09 June 2010 22:37 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
FILESYSTEMIO_OPTIONS is set to "asynch", and our NFS mounts on Salaris use the "forcedirectio" option. My understanding is that this will bypass the filesystem cache.
Our multi-block reads seem to be OK - no complaints. It is our single block reads that are killing us. For the purpose of testing, I want to choose blocks NOT in the buffer cache, so that I can measure the cost of single block reads from disk.
As it turns out, the test case I posted above is still quite subject to buffering and clustering factors; so its value is limited. I created another test case that randomises the order of blocks retrieved, but it is not easily transportable to other environments, so I won't bother posting it here.
Thanks for your help. I now have traction with our Database, Network and System administrators to track this problem down. There is a general recognition that something is wrong.
Ross Leishman
|
|
|
Re: Help proving a database is slow [message #462736 is a reply to message #460153] |
Sat, 26 June 2010 05:13 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
If anyone is interested in the outcome of this....
We got assistance down at the NAS level. One of the network cards in the NAS was configured incorrectly and the result was that it didn't seem to utilise the disk cache.
This didn't really solve our problem though, queries were still slow. We measured round trips to the disk consistently between 0.011 and 0.014 seconds. According to our NAS guys, that is an expected and valid outcome (they actually quoted 0.007 to 0.010 seconds, but our experience was in within reasonable variance.
Given that the disk hits were performing as expected, we refocussed on Oracle figuring if a round trip to disk is not going to get any better, we need to do fewer round trips. We increased the buffer cache and got a better hit ratio. It improved things a fair bit, but we still had a long build-up in an ETL job to get the cache built up with the tables being processed.
Finally made a breakthrough today: in each of our SQL queries in the ETL we included a call to a packaged function that returned a setting that affected the range of data to be processed.
The function was not defined as parallel_enable because it used a global package state variable (violated the wnps purity condition). I changed the package variable over to a SYS_CONTEXT call (which is global across the parallel query servers), set the function to PARALLEL_ENABLE, and WHOOSHKA! it went like a rocket.
|
|
|
Goto Forum:
Current Time: Sun Jan 12 16:25:18 CST 2025
|