Home » RDBMS Server » Performance Tuning » Managing Large Table (Oracle 10g)
Managing Large Table [message #521185] |
Fri, 26 August 2011 13:46 |
|
EbenT
Messages: 1 Registered: August 2011
|
Junior Member |
|
|
Hello,
I am working with an online application with the database in Oracle 10G. We have a table with 10 million rows and this table is subjected to grow in future also. Moreover we cannot archive some of these rows as these records are required for referencing.
We have all necessary indexes on the table but querying this table takes a lot of time especially when it is joined with other tables.
Request you to please suggest me some methods with which I can manage this table in a better way so that queries joining this table would execute faster..
Am attaching the code and Explain Plan.
Thanks..
Eben Thomas
SELECT
TAB1.C6,
TAB1.C8,
TAB1.C10,
TAB3.C4,
TAB3.C5,
TAB3.C1
FROM TAB1
INNER JOIN TAB2 ON
TAB2.C3 = TAB1.C2 AND TAB2.C4 = TAB1.C3
AND TAB2.C5=TAB1.C4 AND TAB2.C6 = TAB1.C5
INNER JOIN TAB3 ON
TAB3.C2 = TAB1.C1 ;
-- THE COUNT OF THE ROWS IN TAB1 IS 108207471
create table TAB1
(
C1 NUMBER(10) not null,
C2 NUMBER(10) not null,
C3 NUMBER(10) not null,
C4 NUMBER(10) not null,
C5 NUMBER(10) not null,
C6 DATE,
C7 DATE,
C8 DATE,
C9 DATE,
C10 DATE,
C11 DATE,
C12 CHAR(1),
C13 NUMBER(10),
C14 VARCHAR2(255),
C15 DATE,
C16 DATE not null,
C17 NUMBER(10) not null,
C18 NUMBER(1) not null,
);
-- Create/Recreate primary, unique and foreign key constraints
alter table TAB1 add constraint XPKTAB1 primary key (C1);
-- Create/Recreate indexes
create unique index XAK1TAB1 on TAB1 (C17, C3, C4, C5, C2);
create index XIE1TAB1 on TAB1 (C18, C2, C3, C4, C5);
create index XIE2TAB1 on TAB1 (C18, C5, C2, C3, C4);
create index XIE3TAB1 on TAB1 (C18, C5, C3, C4, C2);
create index XIE4TAB1 on TAB1 (C18, C8, C12, C2, C1);
create index XIE5TAB1 on TAB1 (C18, C12, C3, C4, C5, C2);
create index XIE6TAB1 on TAB1 (C18, C9, C8, C12, C2, C1);
create index XIE8TAB1 on TAB1 (C18, C17, C2, C3, C4, C5);
create index XIE9TAB1 on TAB1 (C18, C2, C12);
-- THE COUNT OF THE ROWS IN TAB2 IS 3000
create table TAB2
(
C1 NUMBER(10) not null,
C2 CHAR(1) default 'S' not null,
C3 NUMBER(10) not null,
C4 NUMBER(10) not null,
C5 NUMBER(10) not null,
C6 NUMBER(10) not null,
C7 DATE not null,
C8 DATE not null
)
-- Create/Recreate primary, unique and foreign key constraints
alter table TAB2 add constraint XPKTAB2 primary key (C1);
-- Create/Recreate indexes
create index XIE1TAB2 on TAB2 (C2, C3, C4, C5, C6);
create index XIE2TAB2 on TAB2 (C8);
-- THE COUNT OF THE ROWS IN TAB3 IS 76504475
create table TAB3
(
C1 CHAR(1) default 'N' not null,
C2 NUMBER(10) not null,
C3 DATE not null,
C4 DATE not null,
C5 DATE not null,
C6 VARCHAR2(1),
C7 NUMBER(10)
)
;
-- Create/Recreate primary, unique and foreign key constraints
alter table TAB3 add constraint F1 foreign key (C2) References TAB1 (C1);
-- Create/Recreate indexes
create index XIE1TAB3 on TAB3 (C1);
create index XIE2TAB3 on TAB3 (C1, C4);
create index XIE3TAB3 on TAB3 (C1, C2);
create index XIF1TAB3 on TAB3 (C2);
create unique index XPKTAB3 on TAB3 (C1, C2, C3, C4, C7);
|
|
|
|
|
Re: Managing Large Table [message #521202 is a reply to message #521185] |
Fri, 26 August 2011 14:35 |
John Watson
Messages: 8960 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
OK, I'll have a try.
First, there is a typo in your test case, an extra comma in the first CREATE TABLE. Shame to ruin such a well presented test case!
This is the execution plan I get:Execution Plan
----------------------------------------------------------
Plan hash value: 3048152444
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 178 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 178 | 3 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 126 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL | TAB3 | 1 | 34 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| TAB1 | 108M| 9493M| 0 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | XPKTAB1 | 1 | | 0 (0)| 00:00:01 |
|* 6 | INDEX FULL SCAN | XIE1TAB2 | 1 | 52 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("TAB3"."C2"="TAB1"."C1")
6 - access("TAB2"."C3"="TAB1"."C2" AND "TAB2"."C4"="TAB1"."C3" AND
"TAB2"."C5"="TAB1"."C4" AND "TAB2"."C6"="TAB1"."C5")
filter("TAB2"."C3"="TAB1"."C2" AND "TAB2"."C4"="TAB1"."C3" AND
"TAB2"."C5"="TAB1"."C4" AND "TAB2"."C6"="TAB1"."C5") which is nothing like yours. Perhaps I need more statistics, you provided only the one for rows in tab1. But I do wonder if you need an index on c2,c3,c4,c5. Or perhaps you could change XAK1TAB1 to put c17 at the end. After creating the index called C2345, I get this:
Execution Plan
----------------------------------------------------------
Plan hash value: 2910477617
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 178 | 1 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 178 | 1 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 144 | 1 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | XIE1TAB2 | 1 | 52 | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TAB1 | 108M| 9493M| 0 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | C2345 | 108M| | 0 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | XIF1TAB3 | 1 | | 0 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | TAB3 | 1 | 34 | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("TAB2"."C3"="TAB1"."C2" AND "TAB2"."C4"="TAB1"."C3" AND
"TAB2"."C5"="TAB1"."C4" AND "TAB2"."C6"="TAB1"."C5")
7 - access("TAB3"."C2"="TAB1"."C1")
|
|
|
Re: Managing Large Table [message #521286 is a reply to message #521202] |
Sun, 28 August 2011 02:27 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Quote:each of the following should have its own INDEX
TAB1.C1
TAB1.C2
TAB1.C3
TAB1.C4
TAB1.C5
TAB2.C3
TAB2.C4
TAB2.C5
TAB2.C6
TAB3.C2
Assuming Black Swan is suggesting you create 10 single column indexes, ignore him. This would be moronic. If that is not what he means, my bad...
Looking at your query, I see you are not filtering any data. Taking your query at face value, such that it is not filtering any data, you are joining all rows in all tables in the query. That means, excepting for FAST FULL INDEX SCAN, indexes are pretty much useless for this query. Your query should be doing three full table scans and using hash join to join them together. For a large table, if you see nested loop as a join strategy somewhere, think twice.
Knowing this, you need to decide at what point you want to consider join related storage strategies that would facilitate joins. This means of course PARTITIONING. Looking at this query I might suggest the following partitioning:
1) TAB1 partition (c1) subpartition (c2,c3,c4,c5)
2) TAB2 partition (c3,c4,c5,c6)
3) TAB3 partition (c2)
you pick the partitioning stragegy (range/list/hash) as appropriate. 11gR2 docs say you can do hash/hash partitioning but I can't figure out the right syntax. I suspect it is an unimplemented feature (too bad).
The idea is that you will do partition wise hash joins for this query. This means two things:
1) you make the sets of data being joined much smaller and thus much more likely to be cached in memory for the join.
2) you enable parallel wish joins making it possible to engage multiple CPUs in the join task.
You may need to manage the memory though so get help if you need it.
These are just thoughts based on your open ended question. Do the necessary research to learn the right stuff if you want to engage partitioning. Good luck, Kevin
[Updated on: Sun, 28 August 2011 02:55] by Moderator Report message to a moderator
|
|
|
Re: Managing Large Table [message #521288 is a reply to message #521286] |
Sun, 28 August 2011 02:45 |
John Watson
Messages: 8960 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
Kevin, I disagree (which is very unusual when I read one of your posts!) Look at the two x plans I posted, in both cases they avoid a scan of the big table. Yes, they scan either TAB3 or TAB2 (TAB2 as an index full scan) but then they use a nested loop join to get to TAB1. Of course, I get those plans because Optimizer Dynamic Sampling tells the optimizer that there are no rows in TAB3 and TAB2. Depending on how many rows will be thrown out by the inner joins, there may be no need for the big hash joins where partitioning would help.
OP, can you give some figures for rows in the dimension tables, and how many of the 108M you expect to have after the joins?
(ps - Kevin, if you are in CT, you should be asleep. Or is Irene keeping you awake?)
|
|
|
Re: Managing Large Table [message #521289 is a reply to message #521288] |
Sun, 28 August 2011 03:10 |
John Watson
Messages: 8960 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
My bad! I misread the post, I hadn't seen the figures for rows in tab2 and tab3. Now that I set them all like this.
exec dbms_stats.set_table_stats('jon','tab1',numrows=>108207471);
exec dbms_stats.set_table_stats('jon','tab2',numrows=>3000);
exec dbms_stats.set_table_stats('jon','tab3',numrows=>76504475);
the query gives me
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3000 | 521K| 1724 (99)| 00:00:21 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 3000 | 521K| 1724 (99)| 00:00:21 |
|* 3 | HASH JOIN | | 3000 | 421K| 1724 (99)| 00:00:21 |
| 4 | INDEX FULL SCAN | XIE1TAB2 | 3000 | 152K| 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TAB1 | 108M| 9493M| 1329 (98)| 00:00:16 |
|* 6 | INDEX RANGE SCAN | XIF1TAB3 | 23 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| TAB3 | 1 | 34 | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TAB2"."C3"="TAB1"."C2" AND "TAB2"."C4"="TAB1"."C3" AND
"TAB2"."C5"="TAB1"."C4" AND "TAB2"."C6"="TAB1"."C5")
6 - access("TAB3"."C2"="TAB1"."C1")
But my index still helps
create index c2345 on tab1(c2,c3,c4,c5);
and I get this
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3000 | 521K| 0 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 3000 | 521K| 0 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 3000 | 421K| 0 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | XIE1TAB2 | 3000 | 152K| 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TAB1 | 1 | 92 | 0 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | C2345 | 108M| | 0 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | XIF1TAB3 | 23 | | 0 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | TAB3 | 1 | 34 | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("TAB2"."C3"="TAB1"."C2" AND "TAB2"."C4"="TAB1"."C3" AND
"TAB2"."C5"="TAB1"."C4" AND "TAB2"."C6"="TAB1"."C5")
7 - access("TAB3"."C2"="TAB1"."C1")
[edit - typo]
[Updated on: Sun, 28 August 2011 03:12] Report message to a moderator
|
|
|
Re: Managing Large Table [message #521327 is a reply to message #521185] |
Sun, 28 August 2011 20:56 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
John, I am glad you posted. It seems this is a situation where both of us can be correct at the same time. It is quite possible that an index lookup and nested loop could be faster. But for this to happen it would be necessary for lots of rows to be discarded because there are a lot of missing hits. Assuming TAB1 is the big table, one of its foreign keys would have to contain lots of nulls. In this situation, a scan of a refernence table would yield a small number of lookups against the large table. In this situation an index lookup into TAB1 via nested loop would be great. Is this what you are describing?
But I doubt that will be the case. I admit to making the following assumption and that this caused me to avoid the above possibility. The assumption on my part is that the TAB1 table will join back to TAB2 and TAB3 will litte or no loss of rows. As such nested loop across 10 million rows table is going to be costly no matter which direction you go.
To clarify, given the situation in paragraph #1 above, if TAB1 contained 10 million rows, but after joining to TAB1 from TAB2 as in your plans, the result was 100 thousand rows, then index lookup and nested loop might be better.
But given the situation in paragraph #2, then after a join there will still be 10 million rows (my assumption), in which case nested loop = wait long time. This is where FTS and HASH JOIN are better. The partitioning I noted (not something I would likely do unless I was deperate since I don't think 10 million rows is a lot), is my way to make the joins small and parallel all the way. Maybe the OP can find a test system and give it a shot to see what happens.
What do you think John? Kevin and yep am in CT and nope Irene was a bust. Just spending too much time online these days. Thanks for noticing.
|
|
|
Goto Forum:
Current Time: Fri Nov 22 01:59:39 CST 2024
|