Performance issue with long running query [message #500825] |
Wed, 23 March 2011 14:52 |
|
prasadpsk
Messages: 4 Registered: March 2011
|
Junior Member |
|
|
Hi,
I have a long running query like below which is taking around 1 hour to pull around 650k records for 1 day data.
SELECT DISTINCT customer_trx_line_id, customer_trx_id
FROM ra_cust_trx_line_gl_dist_all
WHERE last_update_date BETWEEN TO_DATE ('2011-01-28','yyyy-mm-dd HH24:MI:SS') AND TO_DATE ('2011-01-29','yyyy-mm-dd HH24:MI:SS')
On one particular day of 28th jan there are around 650K records created. There is an index exists on last_update_date column and the query is utilizing the index but still taking long time to execute. I need to pull the DISTINCT records. Without DISTINCT clause the query is executing fast but when trying to pull the DISTINCT records then the query is taking long time.
Any inputs/ideas to improve the performance of this query will be highly appreciated.
Thanks,
Prasad.
|
|
|
|
|
|
Re: Performance issue with long running query [message #500835 is a reply to message #500834] |
Wed, 23 March 2011 17:25 |
ThomasG
Messages: 3212 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
ma_appsdba wrote on Wed, 23 March 2011 22:31Try creating function based index on TO_DATE(last_update_date)
The to_date would make no sense if last_update_date is a date column. An index on last_update_date might help, though. Depending on the way the data is selected normally perhaps even an function based index on TRUNC(last_update_date)
|
|
|
Re: Performance issue with long running query [message #500922 is a reply to message #500827] |
Thu, 24 March 2011 09:00 |
|
LKBrwn_DBA
Messages: 487 Registered: July 2003 Location: WPB, FL
|
Senior Member |
|
|
Create a function index on TRUNC(last_update_date) as ThomasG suggest, and then you could try analytical query to see if it's faster than "DISTINCT":
SELECT * FROM (
SELECT customer_trx_line_id,
customer_trx_id,
DENSE_RANK ()
OVER (PARTITION BY customer_trx_line_id, customer_trx_id
ORDER BY ROWID) rk
FROM ra_cust_trx_line_gl_dist_all
WHERE TRUNC (last_update_date) = TO_DATE ('2011-01-28', 'yyyy-mm-dd'))
WHERE rk = 1
[Updated on: Thu, 24 March 2011 09:32] by Moderator Report message to a moderator
|
|
|
Re: Performance issue with long running query [message #501144 is a reply to message #500922] |
Sat, 26 March 2011 06:46 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Are you people all mad? This query is already using an index to pull the EXACT 650K rows it needs, and then de-duplicating them. It really couldn't be simpler. Creating a function based index will mean he now has TWO indexes that do essentially the same thing (find the exact same rows), except now there's a lot less free space in the database because it's been chewed up by a useless index.
An analytic function won't be faster. It will do a sort - slower than the hash-distinct it is currently doing.
I'd be interested to know whether the problem is in the row retrieval or the sort. How long does this take:
SELECT MAX(customer_trx_line_id)
FROM ra_cust_trx_line_gl_dist_all
WHERE last_update_date BETWEEN TO_DATE ('2011-01-28','yyyy-mm-dd HH24:MI:SS') AND TO_DATE ('2011-01-29','yyyy-mm-dd HH24:MI:SS')
This SQL will retrieve all of the same data, but will not do the hash-unique (DISTINCT).
If it is still comparably slow, then the problem is probably disk latency. All those indexed lookups are single-block reads - lots of back-and-forth traffic to the disk, and lots of waiting time. If you add customer_trx_line_id and customer_trx_id to the last_update_date index, that means you won't need to lookup all those table blocks. Much less IO, should be at least 2x faster - probably more. If you cannot muck about with the indexes, Parallel Query is a very good way of dealing with disk latency. If the disk is going to take so darned long to serve block requests, at least you can get Parallel Query to have it retrieve them 4 or 8 at a time. However, before you go parallelizing your query, you need to speak to your DBA about it. There is an excellent chance that you will destroy the performance of the database if you do it inappropriately.
If the above SQL is quick, then the HASH-distinct is taking up the time. That would probably mean your PGA_AGGREGATE_TARGET is set too small, and you would need to ask your DBA to increase it.
Ross Leishman
|
|
|
Re: Performance issue with long running query [message #501204 is a reply to message #500825] |
Sun, 27 March 2011 11:36 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Ross is correct on many counts. The ideas of TRUNC and FUCTION BASED INDEX are way off. They either change the result, or are useless in terms of helping.
The fastest way to get this query done is to create the following index.
SELECT DISTINCT customer_trx_line_id, customer_trx_id
FROM ra_cust_trx_line_gl_dist_all
WHERE last_update_date BETWEEN TO_DATE ('2011-01-28','yyyy-mm-dd HH24:MI:SS') AND TO_DATE ('2011-01-29','yyyy-mm-dd HH24:MI:SS')
create index xi1 on ra_cust_trx_line_gl_dist_all
(last_update_date , customer_trx_line_id, customer_trx_id
);
This index will give you three benefits:
1) it will do a range scan on the index for only the rows that match the date expression.
2) it will skip table access altogether thus never visiting the table because all columns required to answer the query are in the index.
3) distinct might take less time since the rows are bunching in semi-sorted order already **though I would not count it too much.
However, if your explanation is right, that the distinct is the cause of your grief, then you need to check on how the HASH UNIQUE is performing. Not sure how to speed that up. There must be a way to figure out if it is hashing in memory or if the hash spills to disk. Maybe ROSS has a comment on how to figure this out.
For 650k rows, this query should finish in under a minute. On my systems, I would expect it to get done between a max of 20 and as little as 5 seconds.
Kevin
[Updated on: Sun, 27 March 2011 11:38] Report message to a moderator
|
|
|
|
|
|
|
|
|
Re: Performance issue with long running query [message #501431 is a reply to message #501398] |
Tue, 29 March 2011 03:16 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Looks like a disk latency issue. Is the database on network disk (SAN/NAS)or local?
You can combat disk latency issues in a few different ways:
- Use faster disk
- Use less disk
- Increase disk utilisation
Faster disk is probably outside of your control.
You can use less disk by decreasing the wastage in blocks where the data YOU WANT is kept.
- Create the index Kevin suggested (best bet) to obviate table row retrieval
- Use table compression
- Reorganise the table to ensure rows for a single day are clustered together in common blocks.
Parallelism will increase disk utilisation.
Ross Leishman
|
|
|