SQL Query too slow [message #613605] |
Thu, 08 May 2014 08:17 |
|
aldy
Messages: 6 Registered: May 2014 Location: India
|
Junior Member |
|
|
Hi,
Database Version - 11.2.0.2.0
I've 2 tables - A and B each with 120 million records and I'm trying to pull additional records alone from Table A.
There are 4 columns getting joined on either tables and all those columns are been indexed and statistics have been gathered recently.
I've tried NOT EXISTS, OUTER JOIN but to avail, very slow response time.
Would be great if you could share some ideas on how to improve the performance.
|
|
|
|
Re: SQL Query too slow [message #613607 is a reply to message #613605] |
Thu, 08 May 2014 08:28 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Sounds like you want rows in table A not in table B.
- Make sure that all 4 columns in table A are defined as NOT NULL
- Make sure that all 4 columns in table B are defined as NOT NULL
- Create an index on the 4 join columns of table B
- Write your SQL as a NOT IN sub-query
Ross Leishman
|
|
|
|
|
|
Re: SQL Query too slow [message #613614 is a reply to message #613609] |
Thu, 08 May 2014 08:57 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
aldy wrote on Thu, 08 May 2014 19:16I tried using NOT IN as well and after 40 to 50 mins, i killed the session as it was taking too long.
Ross did not suggest only that, but 3 other points as well. When all those points are put together and applied then you might see some difference. But to comment about anything, we would require some additional information. Please follow the links provided to you and post the required details.
|
|
|
Re: SQL Query too slow [message #613618 is a reply to message #613614] |
Thu, 08 May 2014 09:07 |
|
aldy
Messages: 6 Registered: May 2014 Location: India
|
Junior Member |
|
|
SQL Used
SELECT FND.C1,
FND.C2,
FND.C3,
FND.C4
FROM A FND,
B stg
WHERE FND.ROW_FLAG = 'Y'
AND FND.C1 = STG.C1(+)
AND FND.C2 = STG.C2(+)
AND FND.C3 = STG.C3(+)
AND FND.C4 = STG.C4(+)
AND STG.C1 IS NULL;
Explain plan for the same
Plan hash value: 1870775268
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1441K| 49M| | 975K (3)| 03:15:12 | | |
|* 1 | FILTER | | | | | | | | |
|* 2 | HASH JOIN RIGHT OUTER | | 1441K| 49M| 3985M| 975K (3)| 03:15:12 | | |
| 3 | TABLE ACCESS FULL | B | 144M| 2336M| | 52592 (10)| 00:10:32 | | |
| 4 | PARTITION RANGE ALL | | 144M| 2611M| | 506K (3)| 01:41:13 | 1 |1048575|
| 5 | PARTITION LIST SINGLE| | 144M| 2611M| | 506K (3)| 01:41:13 | KEY | KEY |
| 6 | TABLE ACCESS FULL | A | 144M| 2611M| | 506K (3)| 01:41:13 | KEY | KEY |
-------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STG"."C1" IS NULL)
2 - access("FND"."C1"="STG"."C1"(+) AND
"FND"."C2"="STG"."C2"(+) AND
"FND"."C3"="STG"."C3"(+) AND "FND"."C4"="STG"."C4"(+))
[RL: code tags]
[Updated on: Thu, 08 May 2014 16:37] by Moderator Report message to a moderator
|
|
|
|
|
|
Re: SQL Query too slow [message #613629 is a reply to message #613624] |
Thu, 08 May 2014 10:38 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
@OP,
1. Did you observe the FTS on both tables? Do you have an idea about why?
2. Do you really need an outer join? Because, some/most of the times it has been observed that an outer join is used unnecessarily adding much more overhead than required.
3. You are yet to explain about the indexes.
4. What about the integrity constraints?
5. If point 4 is answered, you need to tell us whether the foreign keys are supported by an index. Since it plays an important role in performance, strictly speaking about dunsel join removals.
And much more...
P.S. - Not sure completely if dunsel join removal would actually apply here.
[Updated on: Thu, 08 May 2014 10:40] Report message to a moderator
|
|
|
Re: SQL Query too slow [message #613644 is a reply to message #613629] |
Thu, 08 May 2014 16:51 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Could be that you're running out of memory. According to your first plan, Oracle estimates it will need about 4GB of the PGA for the hash join, otherwise it will spill to disk and take a lot longer.
Take a look at this article on Serial Partition-Wise Joins. They can be used to make memory bound joins more scalable.
The idea would be partiton (or sub-partition) BOTH tables on the 4 join keys. I would try it with 128 partitions. If you can't change the partitioning of the table - and you are ONLY selecting columns from the join key (not selecting any other columns) then you could try creating globally partitioned indexes on both tables - GLOBALLY PARTITIONED BY HASH (A,B,C,D) PARTITIONS 128
The idea here is to lower the memory footprint
Ross Leishman
|
|
|
Re: SQL Query too slow [message #613673 is a reply to message #613644] |
Fri, 09 May 2014 02:02 |
Roachcoach
Messages: 1576 Registered: May 2010 Location: UK
|
Senior Member |
|
|
If your stats really are up to date, I am thinking "A" is a really "fat" table? Oo there are LOBs in there or something?
It's estimating 10 minutes to scan "B" but 10 times that value to do the same to "A".
"B" is more like the expected performance - Is there's something funky with "A"?
[Updated on: Fri, 09 May 2014 02:03] Report message to a moderator
|
|
|
|
Re: SQL Query too slow [message #613683 is a reply to message #613681] |
Fri, 09 May 2014 02:45 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1441K| 49M| | 975K (3)| 03:15:12 | | |
|* 1 | FILTER | | | | | | | | |
|* 2 | HASH JOIN RIGHT OUTER | | 1441K| 49M| 3985M| 975K (3)| 03:15:12 | | |
| 3 | TABLE ACCESS FULL | B | 144M| 2336M| | 52592 (10)| 00:10:32 | | |
| 4 | PARTITION RANGE ALL | | 144M| 2611M| | 506K (3)| 01:41:13 | 1 |1048575|
| 5 | PARTITION LIST SINGLE| | 144M| 2611M| | 506K (3)| 01:41:13 | KEY | KEY |
| 6 | TABLE ACCESS FULL | A | 144M| 2611M| | 506K (3)| 01:41:13 | KEY | KEY |
-------------------------------------------------------------------------------------------------------------------------------
Ummm... this says your Table A has 1,048,575 partitions. Is that right? Wouldn't each partition have about 100 rows on average?
It doesn't really sound optimal. Just for kicks, try it on a table with no fewer than 1M rows per partition and see how that goes.
Another thought, have you ever deleted lots of data from Table A? Those empty but previously used blocks could still be costly in a full table scan.
Ross Leishman
|
|
|
|
|
Re: SQL Query too slow [message #613742 is a reply to message #613605] |
Fri, 09 May 2014 16:42 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
There is way too much missing information here. But we will try anyway.
I suspect first that the number of partitions, if it really is 1 million plays a significant role here. That seems like way way way too many partitions to me but what do I know.
Second, unless your partitioning is based on some subset of your join columns, I won't help you much. Oracle will simple have to repartition all the data anyway.
It would help if you provided the following:
1. the basis of the partitioning on both tables (both tables are partitioned the same way right?)
2. primary key definitions on the two tables
3. indications of if the join columns are defined as null or not null
4. the predicates portion of the query plan as just the plan_output is almost useless
5. existence of any foreign key constraints between the two tables and what these are
6. maybe the real query? as I doubt this is it
The use of outer-join as a away to simulate an anti-join does not work well in cases where there are lots of matches because these matches must be done first and then tossed out, hence the FILTER operation at line 1. Best I think to go back to the NOT EXISTS. That way the join attempts will stop for each key once a match is found. Unless of course you are joining across what is the primary key of both tables.
Third, please explain the partitioning pattern you are using.
Last, what kind of changes are you willing to make?
1. would you be willing to change the partitioning scheme of these two tables?
2. would you be willing to increase pga_aggregate_target?
3. would you be willing to use parallel query (IF IT IS APPROPRIATE, MIGHT NOT BE FOR YOUR SYSTEM)?
if the only purpose is to make this query go fast then the following seems to me to be the right approach.
hash partition (f1,f2,f3,f4) partitions 32 (do this on both tables (check the docs for the proper syntax))
select /*+ parallel (4) */ ...
from a
where not exists (select null from b where b.c1 = a.c1 ...)
/
What we would be looking for from this would be:
1. for the hash partitioning to make the joins small, about 4 million rows each
2. for the parallel query to allow you to join multiple partitions concurrently. I have shown 4 here but you can increase or decrease based on your systems available resources
3. for the anti-join to stop once a match is found and thus minimize join attempts (assuming we get an anti-join in the query plan)
I am guessing the corresponding plan would look like this, but with parallel and partition info too
| 0 | SELECT STATEMENT |
| 1 | HASH JOIN RIGHT ANTI |
| 2 | TABLE ACCESS FULL | B
| 3 | TABLE ACCESS FULL | A
As with most queries there are two basic costs:
1. the cost of initially getting data from the tables (precision query vs. warehouse query)
2. the cost of everything else in query execution (read set operations like joins/sorts/etc.)
You goal is to minimize the biggest pieces, possibly both types of costs.
For #1 (initial data acquisition) think row filtering (you have none), and column projection (think covering indexes or EXADATA, neither of which you have indicated you have).
For #2 (other stuff) for this query means make the join efficient which means memory management and partitioning to achieve optimal or one-pass join thus minimizing temp space usage.
Good luck, Kevin
[Updated on: Fri, 09 May 2014 17:06] Report message to a moderator
|
|
|
Re: SQL Query too slow [message #613754 is a reply to message #613742] |
Sat, 10 May 2014 10:18 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
Well described by Kevin. Very informative.
The only thing I contemplate is that, whether supporting indexes for the foriegn keys would help with respect to dunsel join removal. I would really want to know if it implies in OP's case if OP provides this information.
[Updated on: Sat, 10 May 2014 10:24] Report message to a moderator
|
|
|
Re: SQL Query too slow [message #613755 is a reply to message #613605] |
Sat, 10 May 2014 10:21 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Nope. There will be no join elimination (oracle's officle term for what I call Dunsel Join Removal). The existential question cannot be answered only through knowledge of constraints. We actually have to visit the data for this one.
Kevin
|
|
|