Home » RDBMS Server » Performance Tuning » Improving Performance of Large Result Set (11g)
Improving Performance of Large Result Set [message #409021] Thu, 18 June 2009 15:02 Go to next message
jkomorek
Messages: 4
Registered: June 2009
Junior Member
This is the classic "how to handle a lot of data" problem.

I have developed a java process that pulls 1 million+ records from the database and writes the data to file as a report. However, the performance is poor (6-8 hours to hours to run report). Here are my problems:

1) Pulling more than a million records at once requires too much memory, so I must "stream" the data to my process.

2) In order to "stream" the data I am paging the data using a top-n query:

select count(*)
from ( select a.*, rownum rnum
from (
SELECT *
FROM facility_book book
LEFT OUTER JOIN file_info fi on fi.facility_book_id = book.id
ORDER BY book.id
) a
where rownum <= #MAX# )
where rnum >= #MIN#

3) The top-n query runs very quickly at first, but slows down dramatically as it continues.

4) I cannot write the data to disk directly from Oracle for security purposes.

First of all, I am curious if there is a more efficient way of streaming data than paging.

Second, if paging is the answer, is there a way for me to tell Oracle to pull the data and order it once, then simply pull records from that cached data set? I'm thinking that temp tables might be the answer, though this is outside of my expertise.

This seems to be a very simple question of how to efficiently pull a large amount of data from Oracle in chunks. I am hoping I am overlooking an easy solution on this.

Thanks in advance.
Re: Improving Performance of Large Result Set [message #409022 is a reply to message #409021] Thu, 18 June 2009 15:07 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
>I have developed a java process that pulls 1 million+ records from the database and writes the data to file as a report.

1,000,000 is a tiny amount of records.

http://www.orafaq.com/forum/t/84315/74940/
which have you tried & what were the results

ALTER SESSION SET SQL_TRACE=TRUE
& then run the report.

process trace file with TKPROF EXPLAIN=username/passowrd to see where time is being spent.
Re: Improving Performance of Large Result Set [message #409026 is a reply to message #409021] Thu, 18 June 2009 16:34 Go to previous messageGo to next message
jkomorek
Messages: 4
Registered: June 2009
Junior Member
I will review the performance of the query as you requested; however, I was hoping to get a little more insight on the best approaches to retrieving large data sets.

Paging large data sets using a top-n query just seems like a poor solution.
1) If the data is changed between pages my whole data set is thrown off.
2) Each time I go back for a new page Oracle has to order the entire result set - which is where I believe the time is being spent.

Isn't there a better approach to retrieving chunks of a large result set? For example, would it not be more efficient to write all of the data to a temp table and retrieving the records from there?
Re: Improving Performance of Large Result Set [message #409027 is a reply to message #409026] Thu, 18 June 2009 16:43 Go to previous messageGo to next message
ThomasG
Messages: 3212
Registered: April 2005
Location: Heilbronn, Germany
Senior Member
The approach is pretty flawed.

When you "page by row number", then oracle has the fetch ALL THE ROWS again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again and again until it reaches the where rnum >= #MIN# and final does some writing into the report.

Normally, you execute the query in Java. Then you open a cursor. Then you fetch that cursor row by row, and process each row.

There shouldn't be any reason to hold more then the one row you are processing in memory, so I suspect the "requires too much memory" is actually a logical bug in the Java code.

And what does 4) mean? What's the "security difference" of writing continuously as opposed to writing in chunks?

[Updated on: Thu, 18 June 2009 16:45]

Report message to a moderator

Re: Improving Performance of Large Result Set [message #409028 is a reply to message #409021] Thu, 18 June 2009 16:55 Go to previous messageGo to next message
jkomorek
Messages: 4
Registered: June 2009
Junior Member
ThomasG,

Quote:
There shouldn't be any reason to hold more then the one row you are processing in memory, so I suspect the "requires too much memory" is actually a logical bug in the Java code.


I will go back and review this. My assumption was that the ResultSet was growing in memory as I read more records from it, but hopefully I'm off mark on that. If my code is just holding on to the objects then I'll be very happy.

Quote:
And what does 4) mean? What's the "security difference" of writing continuously as opposed to writing in chunks?


Our database server is strictly isolated from the rest of our network (including application servers). As such, it is not an option for me to allow Oracle to write a file to the local DB server then retrieve the file from my application.


Thanks again for the help.
Re: Improving Performance of Large Result Set [message #409029 is a reply to message #409028] Thu, 18 June 2009 17:05 Go to previous messageGo to next message
ThomasG
Messages: 3212
Registered: April 2005
Location: Heilbronn, Germany
Senior Member
One option that might be of interest:

Oracle supports scrollable resultsets.

Which means you can create a ResultSet in Java, and then move about in that result set on the client side to whichever row you want.

Re: Improving Performance of Large Result Set [message #409030 is a reply to message #409021] Thu, 18 June 2009 17:05 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
>I was hoping to get a little more insight on the best approaches to retrieving large data sets.

You do NOT have a large dataset.
Your perspective is clouded from the lousy performance.

>Paging large data sets using a top-n query just seems like a poor solution.
Yes it is so STOP doing that!

>1) If the data is changed between pages my whole data set is thrown off.

It appears you never read Oracle's Concept Manual.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/toc.htm
Oracle will preserve a Read Consistant view of the data between the beginning of the transaction until the COMMIT is issued!
In other words, the data will NOT change, as far as your transaction is concerned!

>2) Each time I go back for a new page Oracle has to order the entire result set
So stop doing that!

>which is where I believe the time is being spent.
Do not "believe". Run SQL_TRACE to KNOW for sure what is really happening

>Isn't there a better approach to retrieving chunks of a large result set?
Process as a single result set

>For example, would it not be more efficient to write all of the data to a temp table and retrieving the records from there?
Why do you need any temp table?

If the source tables are owned by a single schema, please do the following:
exec dbms_stats.gather_schema_stats ('<data_owner>');
Re: Improving Performance of Large Result Set [message #409031 is a reply to message #409030] Thu, 18 June 2009 17:17 Go to previous messageGo to next message
ThomasG
Messages: 3212
Registered: April 2005
Location: Heilbronn, Germany
Senior Member
Actually, scratch the "scrollable resulset" thing, I think. That will most likely get you into memory troubles.

Who is actually READING those millions of records on the clients side while they "page" through them anyway?

The best approach would possibly be to process the entire resultset and be done with it.

Quote:
Our database server is strictly isolated from the rest of our network (including application servers)


Then how does the application get the data? ;-P

[Updated on: Thu, 18 June 2009 17:18]

Report message to a moderator

Re: Improving Performance of Large Result Set [message #409032 is a reply to message #409031] Thu, 18 June 2009 17:23 Go to previous messageGo to next message
ThomasG
Messages: 3212
Registered: April 2005
Location: Heilbronn, Germany
Senior Member
As to the memory problem you have, that might be solved when you set the ResultSet to

ResultSet.TYPE_SCROLL_IFORWARD_ONLY

like discussed here at the end.



Re: Improving Performance of Large Result Set [message #409033 is a reply to message #409021] Thu, 18 June 2009 17:33 Go to previous message
jkomorek
Messages: 4
Registered: June 2009
Junior Member
ThomasG,

Thank you very much for the help. I believe you've helped me understand that my problem lies within JDBC and better understanding the memory management within results sets.

I've gone back and reviewed my ORM (iBatis) and have uncovered some documentation that may help. I was too quick to dismiss the the fact that my only option was loading all the data in memory and I'm now investigating how I can process the result set one record at a time.

For the record, the report is meant to be processed by an automated system, not humans. So, this is a case in which I can't haggle down the complexity through requirements.

- Jonathan
Previous Topic: Database stops responding when run on production server
Next Topic: May you explain to me about AWR?
Goto Forum:
  


Current Time: Mon Nov 25 16:51:28 CST 2024