Join with 30 tables [message #539486] |
Mon, 16 January 2012 07:13 |
RMSoares
Messages: 46 Registered: September 2010 Location: Lisboa
|
Member |
|
|
Hi,
I have to do the optimization of a query that has the following characteristics:
- Takes 3 hours to process
- Performs the inner join with 30 tables
- Produces an output of 280 million records with 450 fields
First of all it is not feasible to make 30 updates (one for each table) to 280 million records.
The best solution that I had found so far was to create 3 temporary tables, where each of them to do the join with 1/3 of the 30 tables, and in the end I make the join between the main table and these three tables temporary.
I know that you will ask (or maybe not) to the query and samples, but it is impossible to create 30 examples.
I would like to have suggestions on how to optimize this type of querys that perform the join with multiple tables and produce a large output with (too) many columns.
Thanks for your support
|
|
|
|
|
|
Re: Join with 30 tables [message #539491 is a reply to message #539488] |
Mon, 16 January 2012 08:03 |
cookiemonster
Messages: 13962 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
nehagoyal1988 wrote on Mon, 16 January 2012 13:411. Use "EXISTS" in place of every "IN" operator.
That'd be an improvement because?
nehagoyal1988 wrote on Mon, 16 January 2012 13:41
2. Create all the possible indexes that you can on the joining fields of the tables to avoid Full Table Scans.
It could well be that full table scans are the most efficient approach. It depends on the query.
nehagoyal1988 wrote on Mon, 16 January 2012 13:41
3. Yes, It would be a good method to combine the output of some of the tables and store them in some temporary table, but in this case you have to create & drop these temporary tables every time new data comes in any of your tables.
Why would you need to drop and re-create the temp tables all the time?
[Updated on: Mon, 16 January 2012 08:03] Report message to a moderator
|
|
|
Re: Join with 30 tables [message #539493 is a reply to message #539486] |
Mon, 16 January 2012 08:57 |
John Watson
Messages: 8962 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
you could consider -
1. Using a few materialized views
2. If you have Enterprise Edition licenses, use parallel query
3. If you have the partitioning option, confirm that you are getting appropriate partition pruning and partitionwise joins.
But no promises.
|
|
|
Re: Join with 30 tables [message #539800 is a reply to message #539493] |
Wed, 18 January 2012 10:47 |
RMSoares
Messages: 46 Registered: September 2010 Location: Lisboa
|
Member |
|
|
Certainly, the output of the query that gathers information from 30 tables and generates 280 million records, is not the best information to make a report, but these were the requirements set by the customer and this is what was implemented. I have a more complicated task, to optimize something which the best solution would be to delete and create something achievable within what the customer wants.
I reversed the process, add partition to the table per company and time. Now we are loading by a partition which has reduced the total time of loading the table.
But I'd like to find a better solution, because the tables have exactly the same number of records (corresponding to all active clients) being an output table with the same number of records, but with the detail fields for each table.
|
|
|
|
|
|
|
Re: Join with 30 tables [message #539815 is a reply to message #539809] |
Wed, 18 January 2012 11:14 |
cookiemonster
Messages: 13962 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
RMSoares wrote on Wed, 18 January 2012 17:03We can change the world, but not the mind of the customer.
I think you'll find that no matter how difficult it is to change the mind of the customer, it's still easier than changing the physical realities of what databases are capable of.
|
|
|
Re: Join with 30 tables [message #539820 is a reply to message #539815] |
Wed, 18 January 2012 11:41 |
John Watson
Messages: 8962 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
If there really is no option, this is what I would try.
As I understand it, you have 30 tables, each of which has 280m rows. The rows need to be combined into one logical table, containing all the columns from each source table: presumably, there is a primary key in each table that you use for the join.
You have to use partitioning. I would choose one table, and hash partition it on the pimary key (assuming an even key distribution) into, say, 32 partitions of about 10m rows each. Then use reference partitioning on the other 29 tables. That will give you partitionwise joins and potential for a reasonable degree of parallelism, you could try parallel_degree_policy=auto or experiment with tuning it yourself.
I don't know what join method will be best, you might have to hint for a hash join (or make the indexes invisible) and you'll need a lot of PGA, be sure to set _pga_max_size to lots of GB.
[update: I've just noticed that you are using release 10.x. That means you don't have reference partitioning, so you'll have to equipartition the tables yourself, and you may have to do some work to get the partitionwise joins, but it is still possible. And you do'nt have parallel_degree_policy either, so you'll have to tune that by hand, too.]
[Updated on: Wed, 18 January 2012 11:46] Report message to a moderator
|
|
|
|
|
Re: Join with 30 tables [message #539899 is a reply to message #539893] |
Thu, 19 January 2012 02:57 |
RMSoares
Messages: 46 Registered: September 2010 Location: Lisboa
|
Member |
|
|
They are expecting that take 1 hour ...
The process start at 07AM,because is at this time taht we have all 30 tables, but i have the major tables at 02AM, i will try to join the major of tables before 07AM and created a tempary table, that will be used in the join to rest of tables.
|
|
|