Re: Sorting Large Amounts of Data- Oracle to SAS, Part Deux
Date: Wed, 12 May 2010 15:15:40 -0700 (PDT)
Message-ID: <245837.1596.qm_at_web114506.mail.gq1.yahoo.com>
OK Greg, first- you're preaching to the choir... :)
The group by is part of the larger aggregate function, but when we are not using the sorted hash cluster, then it appears in the mix. You are right, it's not part of the sorted hash cluster though.
SAS can and in my opinion, should sort the the data on it's side, but per an agreement with earlier groups, this put too much "pressure" on the SAS server and someone decided that ALL DEMANDS should be on the database and performed by Oracle.
The users are demanding a sorted output of data for their SAS files. I won't lie- I have still not received an answer that explains this requirement other than if they do not pull the data sorted, SAS will sort it each time they run it on their side?? It doesn't make complete sense to me and being a newbie to SAS, none of my research has resulted in an explanation of this or an example of repeated sorting from SAS if the data isn't presented in a sorted format from Oracle.
I also agree on the lack of reasoning behind parallel and this process. I'm having a more difficult time convincing folks of the reasons to not use parallel here, so the question was posed to me and I did want to ensure I wasn't missing something, so please accept my thanks for answering the original question and how you answered it.
Let me know if I missed anything here- the snow and the cold medicine is getting to me today! :) Kellyn
- On Wed, 5/12/10, Greg Rahn <greg_at_structureddata.org> wrote:
From: Greg Rahn <greg_at_structureddata.org> Subject: Re: Sorting Large Amounts of Data- Oracle to SAS, Part Deux To: "Kellyn Pedersen" <kjped1313_at_yahoo.com> Cc: "oracle Freelists" <oracle-l_at_freelists.org> Date: Wednesday, May 12, 2010, 3:25 PM
I'm not sure I understand how using a sorted hash cluster eliminates the need for the group by that you mention. I still am also unclear on why there is a sorting requirement for the outbound data to SAS? What problem does a sorted "export" solve? SAS can sort and group and order data also, correct?
I'm not sure there is a need to incorporate parallel here. If the requirement is to have the rows be returned in a global sorted order, that by definition, requires (at some point) a serialization point and thus is not parallel. If you have forever to load rows (loading hash clusters is slower and is a serial process compared to heap tables) and limited temp space or memory, then that is likely the best solution. Also, you have a single process sending rows to SAS understand one lonely dude can only send so many rows per second. One doesn't need massive amounts of parallelism if all roads lead to a single lane -- there is no point in having 32 exit lanes if there is only 1 door, having 1 exit lane is likely enough to keep 1 door "busy".
So look at it this way: if you were to use Parallel Execution and we assume that the sort can not fit entirely into memory what would happen is 1) read all the data 2) sort the data back to disk/temp 3) read data on disk/temp back in sorted order. Compare that to just reading the data in sorted order from a sorted hash cluster. And remember in both scenarios there is just one processing sending rows to SAS. From a resource requirement it should be obvious which one takes less given they can likely send rows at the same speed.
On Wed, May 12, 2010 at 12:05 PM, Kellyn Pedersen <kjped1313_at_yahoo.com> wrote:
> This is a simple select with a pass thru query from SAS, over the wire. It is 1.4 million rows, using a sorted_hash_cluster to sort the data for them so they won't do a group by and an order by in their query.
>
> The statement with both, without the sorted hash cluster takes an unknown amount of time- they just couldn't get it to complete. The sorted hash cluster completes in 4 hrs.
>
> Of course, now the boss wants to know- can I incorporate parallel? So much for that idea... We are pretty far into this design at this point, but any other ideas are always welcome!
-- Regards, Greg Rahn http://structureddata.org -- http://www.freelists.org/webpage/oracle-l -- http://www.freelists.org/webpage/oracle-lReceived on Wed May 12 2010 - 17:15:40 CDT