Improving Performance of Large Result Set [message #409021] |
Thu, 18 June 2009 15:02 |
jkomorek
Messages: 4 Registered: June 2009
|
Junior Member |
|
|
This is the classic "how to handle a lot of data" problem.
I have developed a java process that pulls 1 million+ records from the database and writes the data to file as a report. However, the performance is poor (6-8 hours to hours to run report). Here are my problems:
1) Pulling more than a million records at once requires too much memory, so I must "stream" the data to my process.
2) In order to "stream" the data I am paging the data using a top-n query:
select count(*)
from ( select a.*, rownum rnum
from (
SELECT *
FROM facility_book book
LEFT OUTER JOIN file_info fi on fi.facility_book_id = book.id
ORDER BY book.id
) a
where rownum <= #MAX# )
where rnum >= #MIN#
3) The top-n query runs very quickly at first, but slows down dramatically as it continues.
4) I cannot write the data to disk directly from Oracle for security purposes.
First of all, I am curious if there is a more efficient way of streaming data than paging.
Second, if paging is the answer, is there a way for me to tell Oracle to pull the data and order it once, then simply pull records from that cached data set? I'm thinking that temp tables might be the answer, though this is outside of my expertise.
This seems to be a very simple question of how to efficiently pull a large amount of data from Oracle in chunks. I am hoping I am overlooking an easy solution on this.
Thanks in advance.
|
|
|
|
Re: Improving Performance of Large Result Set [message #409026 is a reply to message #409021] |
Thu, 18 June 2009 16:34 |
jkomorek
Messages: 4 Registered: June 2009
|
Junior Member |
|
|
I will review the performance of the query as you requested; however, I was hoping to get a little more insight on the best approaches to retrieving large data sets.
Paging large data sets using a top-n query just seems like a poor solution.
1) If the data is changed between pages my whole data set is thrown off.
2) Each time I go back for a new page Oracle has to order the entire result set - which is where I believe the time is being spent.
Isn't there a better approach to retrieving chunks of a large result set? For example, would it not be more efficient to write all of the data to a temp table and retrieving the records from there?
|
|
|
Re: Improving Performance of Large Result Set [message #409027 is a reply to message #409026] |
Thu, 18 June 2009 16:43 |
ThomasG
Messages: 3212 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
The approach is pretty flawed.
When you "page by row number", then oracle has the fetch ALL THE ROWS again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again until it reaches the where rnum >= #MIN# and final does some writing into the report.
Normally, you execute the query in Java. Then you open a cursor. Then you fetch that cursor row by row, and process each row.
There shouldn't be any reason to hold more then the one row you are processing in memory, so I suspect the "requires too much memory" is actually a logical bug in the Java code.
And what does 4) mean? What's the "security difference" of writing continuously as opposed to writing in chunks?
[Updated on: Thu, 18 June 2009 16:45] Report message to a moderator
|
|
|
Re: Improving Performance of Large Result Set [message #409028 is a reply to message #409021] |
Thu, 18 June 2009 16:55 |
jkomorek
Messages: 4 Registered: June 2009
|
Junior Member |
|
|
ThomasG,
Quote: | There shouldn't be any reason to hold more then the one row you are processing in memory, so I suspect the "requires too much memory" is actually a logical bug in the Java code.
|
I will go back and review this. My assumption was that the ResultSet was growing in memory as I read more records from it, but hopefully I'm off mark on that. If my code is just holding on to the objects then I'll be very happy.
Quote: | And what does 4) mean? What's the "security difference" of writing continuously as opposed to writing in chunks?
|
Our database server is strictly isolated from the rest of our network (including application servers). As such, it is not an option for me to allow Oracle to write a file to the local DB server then retrieve the file from my application.
Thanks again for the help.
|
|
|
|
|
Re: Improving Performance of Large Result Set [message #409031 is a reply to message #409030] |
Thu, 18 June 2009 17:17 |
ThomasG
Messages: 3212 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
Actually, scratch the "scrollable resulset" thing, I think. That will most likely get you into memory troubles.
Who is actually READING those millions of records on the clients side while they "page" through them anyway?
The best approach would possibly be to process the entire resultset and be done with it.
Quote: | Our database server is strictly isolated from the rest of our network (including application servers)
|
Then how does the application get the data? ;-P
[Updated on: Thu, 18 June 2009 17:18] Report message to a moderator
|
|
|
|
Re: Improving Performance of Large Result Set [message #409033 is a reply to message #409021] |
Thu, 18 June 2009 17:33 |
jkomorek
Messages: 4 Registered: June 2009
|
Junior Member |
|
|
ThomasG,
Thank you very much for the help. I believe you've helped me understand that my problem lies within JDBC and better understanding the memory management within results sets.
I've gone back and reviewed my ORM (iBatis) and have uncovered some documentation that may help. I was too quick to dismiss the the fact that my only option was loading all the data in memory and I'm now investigating how I can process the result set one record at a time.
For the record, the report is meant to be processed by an automated system, not humans. So, this is a case in which I can't haggle down the complexity through requirements.
- Jonathan
|
|
|