Changing Query Approach [message #288463] |
Mon, 17 December 2007 11:32 |
mrk_anand
Messages: 7 Registered: December 2007
|
Junior Member |
|
|
I've a fact table with four dimensions and 10 measures. One of the dimension is time dimension with monthly granularity. If I create a report out of it with it fetches 8.4 million records. I am planning to make the fact table with three dimensions and 120 measures. Thus I can reduce the records to around 0.7 million. I would like to know is this a correct move. The query with 4 dimesnions,10 measures and 8.6 million will be faster or the qurey which fetches 3 dimension,120 measures and 0.7 million records will be faster !!!
|
|
|
|
Re: Changing Query Approach [message #288508 is a reply to message #288465] |
Mon, 17 December 2007 20:48 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
So your report FETCHES 8.4 million rows, but does it DISPLAY 8.4 million rows? Or does it do some aggregation?
By the way you're talking about an alternate table with fewer dimensions but more measures, it sounds like you're talking about a PIVOT, not necessarily an aggregation.
But even this pivoted result set is 700,000 rows. Does your report REALLY display that many rows? Or does it do some filtering? Some aggregation?
Your use of the terms FACT and DIMENSION lead me to believe you are working on a Data Warehouse, possibly with much reporting but little updating. If so, these systems usually benefit from some wholesale aggregation.
If your report only uses two of the four dimensions, create a fast refresh MV that aggregates the four dimensions down to two. Then when you query the original table, QUERY REWRITE will know to fetch from the MV because it is faster.
If a dimension is multi-level and you are not reporting at the bottom level, create a fast refresh MV that aggregates up to the level at which you are reporting.
If you are filtering on one of the dimensions (not displaying EVERY row), use Bitmap indexes or partitioning on the fact table foreign keys to reduce your scans.
Ross Leishman
|
|
|
Re: Changing Query Approach [message #288509 is a reply to message #288508] |
Mon, 17 December 2007 20:57 |
mrk_anand
Messages: 7 Registered: December 2007
|
Junior Member |
|
|
Thanks for your suggestion Ross. You are right, I am trying to PIVOT the result set. The report fails because of the huge result set. Infact in the Pivoted result set of 700,000 rows around 300,000 are factless facts, but I need to show those in the report. I am not worried at the performance right now, However I would like to know whether the same can be fetched in a report !!!
|
|
|
|
Re: Changing Query Approach [message #288515 is a reply to message #288511] |
Mon, 17 December 2007 22:02 |
mrk_anand
Messages: 7 Registered: December 2007
|
Junior Member |
|
|
No Filters, All the 700,000 rows are leaf level rows. I need to aggregate. I can acheive the aggregation thru the built in hierarhcies. But right now the report gives only partial result set.Please suggest the approach is correct. I am doubtful whether the increased no of columns in the table from 10 measures to 120 measures will slow down the query performance.
|
|
|
|
Re: Changing Query Approach [message #288813 is a reply to message #288463] |
Tue, 18 December 2007 15:22 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
at the risk of sticking my neck out, here are some REALLY BAD rules of thumb:
1) a reduction of rows by one order of magnitude generally results in much better performance, anywhere between two and ten times, and is thus often a good approach to performance tuning of warehouses.
2) a reduction of rowwidth by one order of magnitude generally reduces query time by 50%.
3) conversly an increase is rowwidth by one order of magnitude will in general cause query response time to double.
OK, don't beat me up here, I am just offering my experiences in tuning Oracle over the years. They are just rules of thumb, and often applicable to peoples efforts. As you can see, mrk_anand is talking about reducing rowcounts by one order of magnitude, and increasing row width by one order of magnitude.
But you should keep in mind that different designs mean different queries, and if you plan on using query tools like BUSINESS OBJECTS or similar, then you may find that some queries become less difficult or more difficult to coerce out of the tool, depending upon what data model you choose.
OK, let me have it..
Kevin
|
|
|
Re: Changing Query Approach [message #288830 is a reply to message #288813] |
Tue, 18 December 2007 21:19 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
2 and 3 seem fairly logical, however I don't see the logic in #1. I haven't benchmarked it (and to tell the truth, I'm not excited enough about the prospect to go through the pain), but I can't see how reducing rows produces a disproportionately better result.
If your query contained sorts or hash joins, then it might make sense. A sort of M rows will be theoretically (MlogM)/(NlogN) faster than a sort of N rows (although its nowhere near that simple because of memory management). A hash join of two row sources too large to fit into memory would also be dis-proportionately faster if one or both row source were reduced. Once again, this is a result of memory management.
Other than that, I cannot understand why reducing rows processed would have a dis-proportionate reduction in runtime.
Am I missing something?
Ross Leishman
|
|
|
Re: Changing Query Approach [message #289075 is a reply to message #288463] |
Wed, 19 December 2007 08:08 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Well... I did say it was a RULE OF THUMB, not a hard fact. However, my experience with warehouses I have designed and built over the last three years tells me that just for the reasons you have mentioned, an order of magnitude reduction in rows generally does produce these gains.
Like anything YOUR MILAGE MAY VARY, but it is I think still a good rule, at least for what I have seen.
Warehouses I have helped create have common traits:
1) they use a Kimball design (more or less)
2) they do lots of aggregation either in queries or Materialized Views or home grown pre-aggregated tables
3) they do lots do lots of sorting
4) queries tend to either use bitmap indexes, or favor (FTS of Full Index Scans, or Partition Scans) and Hashing
5) partitioning never solved a performance problem for me (this surprises many people but I tell them that the best uses of partitioning are not performance related (they chew on that for a while)).
6) they do plenty of data pivoting or base their designs on a sets of business specific pivots
it is usually the choice of data pivots, and the pre-aggregation of rows that make the warehouse go faster. Bitmap indexes work as described too but there is a maintenance concept that goes along with them which I have still don't think I have gotten right yet.
Anyway, that is what I know about it. Thanks for your comments, got any more?
Kevin
|
|
|
Re: Changing Query Approach [message #289152 is a reply to message #289075] |
Wed, 19 December 2007 21:23 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Kevin Meade wrote on Thu, 20 December 2007 01:08 | Thanks for your comments, got any more?
| Much to the consternation of those around me and on this site, I've got bags of them.
If partitioning doesn't give you any performance improvement, then (and I do mean this with respect) it's possible you're doing it wrong.
I've seen a number of DW systems that date-partition based on the interface-date of incoming data. Eg. The file from December 20 goes into the monthly December partition. The "advantage" of this method is that it easy to roll-back a failed load by truncating partitions. 9 times out of 10 this is wrong, because the interface-date is not a key determinant in queries. There is usually some other "business date" or "effective date" of a transaction that makes a much better partition key because it allows partition pruning of queries that use it.
The other mis-use of partitions by Kimball-ites is using range partitions with surrogate keys. When you store dates in a dimension and reference them with an integer surrogate key in your fact table, the Fact table must be LIST partitioned, not RANGE partitioned. Despite proestations from the Oracle manual to the contrary, there are a number of situations where Oracle will partition prune a join-query (filtered on a join column) with list partitions where it will not work on range partitions.
The other awesome partitioning party-trick you need to try out is parallel partition-wise joins with your larger dimension tables. See here for more.
I think I've gone off-topic about as far as I can go now. How's the OP's problem going?
Ross Leishman
|
|
|
Re: Changing Query Approach [message #289300 is a reply to message #288463] |
Thu, 20 December 2007 10:46 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Me? doing it wrong? you've got to be kidding right?
You can't hurt my feelings Ross, I have seen too many of your posts, not to respect your opinions and experiences. Also, I would never claim to be an expert in using partitioning. I will look over what you have said and see how much I can take advantage of, thanks.
AS for the OP, mrk_anand has not posted in three days, maybe he is on vacation?
Hey mrk_anand, how is it going with this one, you got an update for us?
Kevin
|
|
|