Home » RDBMS Server » Performance Tuning » Options for optimising SQL running against Wide tables (11g)
Options for optimising SQL running against Wide tables [message #570642] |
Tue, 13 November 2012 07:33 |
|
jasontilbury
Messages: 4 Registered: November 2012
|
Junior Member |
|
|
Hello,
Apologies in advance if my Oracle speak is not what your used to I'm leading performance testing on the system here. The scale of the tests that generate the following scenario is not huge right now, only 50 users simulated (or you can think of them as independently running threads if you like). But here is the crunch, the queries generated (from generic transaction layer) are all running against a table that has 600 columns! We can't really control this right now, but this is causing masses amounts of IO (5GB per request) making requests queue for disk availability (which are setup RAID 0/1); its even noticable for as few as 3 threads. I have rendered the SQL on one occasion to execute in 13 seconds for a single user but this appears short lived as when stats were freshly gathered it went up to the normal 90-120 seconds. I've added the original query to the file, however the findings here along with our DBA (who I trust implicitly) suggest that no amount of editing the query will improve the response times, increasing the PGA/SGA (currently 4/6GB respsectively) will only delay the queuing for a bit and compression can help either. In short it looks as though we've hit hardware restrictions already for this particular scenario.
As I can't really explain how my rendered query no longer takes 13 seconds, it's niggling me that we might be missing a trick.So I was hoping for some guidance on possible ways of optimising these type of queries against such wide tables, in other words possibilities that we haven't considered...
Attached is the query and plan. let me know if anything else is needed.
-
Attachment: sqltrace.sql
(Size: 3.02KB, Downloaded 2092 times)
|
|
|
|
Re: Options for optimising SQL running against Wide tables [message #570661 is a reply to message #570642] |
Tue, 13 November 2012 20:31 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Well if John is having a go at it, then I want to try too. In addition to what John has asked for, can you also provide some additional information:
1) what is PERCENT. This is not an Oracle provided function. Is this your own function? If so, please provide the code for it.
2) same for DIV.
3) you said the table has some 600 columns, but your query uses only 8. Did I miss something here? Is this the only query or is there a large class of queries like this just with different columns thus eventually providing a query that covers every one of the 600 columns. If so please provide some other examples, and counts of the number of variations. For example, something like "there are 50 different queries that look like this sampling of 3" (query samples ...).
4) what database are you running? Do you have access to an Oracle EXADATA instance? If so, have you tried HYBRID COLUMNAR COMPRESSION in order to group columns together and thus avoid all that I/O. Seems like EXADATA HCC was created just for this problem. Based on your query it would eliminate 99% of the I/O. Of course if you don't have EXADATA then is your boss willing to spend the vast sums to get it?
5) how many rows are in the table, and how many rows do you expect to be returned by a typical query?
6) what is the partitioning used?
Thanks, Kevin
|
|
|
|
|
|
Re: Options for optimising SQL running against Wide tables [message #570696 is a reply to message #570642] |
Wed, 14 November 2012 07:21 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Thanks for that info.
Still waiting on what John asked for.
Given the rowcounts the function may not be your issue. Still, calling a function from SQL is expensive. You are calling three. You should inline the logic of the functions in the query to see what performance change occurs. Then you can know how to talk about that specific issue rather than guess about it.
The logic you have presented so far seems good to me. Your issue appears to be that you are reading entire rows when in fact you only want a small portion of each row, <5% of the row contributes to your result set of your typical queries most of the time by your description and I would wager <2% of the row most of the time. Thus as you have suggested, the most likey solution is to find a way not to read the entire row but instead only those columns you want. This of course requires some kind of re-architected storage solution. With that in mind there are a few alternative designs I can think of.
1) indexes. this won't work for you based on your description of the problem but it highlights the issue. Suppose for example you created an index on the table and this index contained all the columns your query needed. In this case you would never visit the table and thus never do the massive I/O. But since your queries are different, you would need an unknonwn number of indexes and that won't work.
2) vertical partitioning of the table. It is too bad that Oracle does not support native VERTICAL PARTITIONING with an appropriate storage clause, but there are ways to do this. Every one of them has some kind of implication and requires lots of changes somewhere (hence the value of native support if it was available).
A) The first way is via multiple tables. You create a table with the primary key and one or more columns you split off the main table. Then you create a view that joins these tables together. In 10g the feature JOIN ELIMINATION was added. It is enhanced in 11g. With this feature, and with the necessary meta-data, Oracle knows not to visit pieces of data and or tables that it does not need to answer a query even though the query itself looks like it requires them. Here is a link to one such example. Pay attention to the defninition of primary keys and the view that joins across the primary keys and the query plan that clearly does not visit both tables even though there is a join between them in the view used.
The success of this comes from being able to define the right set of sub-tables. If for example your queries came in two basic patterns, one set that used the first 300 columns and the other set of queries that used the second 300 columns then splitting your large table into two smaller tables each 300 columns wide would reduce the I/O in half since you always be visiting only half the table for each query. Now consider extension of the idea to 10 sub-table with 60 columns each or 100 sub-tables with 6 columns each. You figure out the optimum splits. Your goal should be an overall reduction in I/O of 90% or more.
B) a second way is via NESTED TABLES. You create a nested table type which again contains a subset of columns. Then your original table uses these NESTED TABLE definitions instead of the original columns. Each nested table is stored out of line. Thus if your query does not reference a column in some specific nested table, that nested table will not be visited. I think this is the 9i variation of join elimination before join elimination was available as an optimization feature. Otherwise, not sure why you would do this over A. You sill need to deal with the right set of column sub-sets.
C) CLUSTERING? I read a comment somewhere that suggested HASH CLUSTERS could do this and there was a link in the article, but the link was bad (figures) and I cannot concieve myself of how a hash cluster could solve this problem. I think the author was confusing horizontal partitioning with vertical partitioning. But I mention it in case someone else is smarter than I.
3) columns data storage. In this case we are talking EXADATA and its HYBRID COLUMNAR COMPRESSION. By storing data in a column orientation, EXADATA can visit only those "storage units" that that contain the columns your query wants and thus avoid all the I/O for columns you don't want. Add to this the massive compression potentials of HCC and you get a reduction of 99% of I/O for situations like yours.
So, until your file downloads, that is about it for me. Kevin
|
|
|
Re: Options for optimising SQL running against Wide tables [message #570725 is a reply to message #570696] |
Wed, 14 November 2012 19:48 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Your plan----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 167 | 5344 | 334K (1)| 01:17:58 | | |
| 1 | HASH GROUP BY | | 167 | 5344 | 334K (1)| 01:17:58 | | |
| 2 | PARTITION RANGE ITERATOR | | 235 | 7520 | 334K (1)| 01:17:58 | 3 | 6 |
| 3 | VIEW | | 235 | 7520 | 334K (1)| 01:17:58 | | |
| 4 | HASH GROUP BY | | 235 | 35250 | 334K (1)| 01:17:58 | | |
| 5 | TABLE ACCESS BY LOCAL INDEX ROWID| RNC_UCELL | 330K| 47M| 334K (1)| 01:17:57 | 3 | 6 |
|* 6 | INDEX RANGE SCAN | RNC_UCELL_PK | 165K| | 2875 (1)| 00:00:41 | 3 | 6 |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("RNC"='ErU_RNC_2' AND "DATETIME">=TO_DATE(' 2012-07-30 07:39:00', 'syyyy-mm-dd hh24:mi:ss') AND
"DATETIME"<=TO_DATE(' 2012-08-02 17:44:59', 'syyyy-mm-dd hh24:mi:ss'))
filter("DATETIME"<=TO_DATE(' 2012-08-02 17:44:59', 'syyyy-mm-dd hh24:mi:ss') AND "DATETIME">=TO_DATE('
2012-07-30 07:39:00', 'syyyy-mm-dd hh24:mi:ss'))
Notice that the DATETIME filter appears in both the Access Predicates and the Filter Predicates.
This leads me to believe that the index RNC_UCELL_PK is on columns (RNC, ????, DATETIME)
In which case, the index is scanning ONLY on RNC, returning ALL matching rows, filtering on DATETIME, and then looking up the table. ie. It reads EVERY row with matching RNC regardless of DATETIME (notwithstanding the partition iterator). This insidious behaviour is detectable only by looking at the Filter Predicates. The row counts / estaimtes for the INDEX RANGE SCAN step are only those returned AFTER both scanning and filtering. There is no way of knowing how many rows were scanned before filtering occurred.
If the index included DATETIME as the second column (RNC, DATETIME [, ????] ) then you would see ONLY Access Predicates, no filter predicates.
You may argue that Oracle is showing the DATETIME in the Access Predicates, therefore it is scanning on that column. It's not. It is misleading behaviour in the optimizer that seems to have been there since Access/Filter Predicates were introduced to the plan table (v10 I think) and it is still present in v11.
With daily partitioning, you are probably not reading very many redundant rows (just those before the start time on the first day and those after the end-time on the last day), but it's worth a try.
Ross Leishman
|
|
|
|
Re: Options for optimising SQL running against Wide tables [message #570993 is a reply to message #570990] |
Mon, 19 November 2012 15:16 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
If you wanted to keep working on it, I would suggest parallelizing the query. Modern disk systems are designed to process many requests at once. As a result, individual queries appear to be IO bound but in fact there is plenty of headroom left on the disk.
If you tend NOT to have dozens of these queries running at once, then you might reasonably expect a 10x improvement from parallelizing a single query.
Since you are using an index range scan, effectively parallelizing the index scan is not immediately possible. Oracle can parallelize full table scans, fast full index scans, and index range scans of partitioned tables. Although your table is partitioned, it is not partitioned in a way that helps us because there are not enough partitions with data we want to read. For this reason, you would need to Hash Sub-Partition the table. Choose a column with an even spread of values - such as a unique primary key - and then sub-partition on that column. Keep the index locally partitioned. For parallelizing a range scan, I recommend 32 or 64 hash sub-partitions. 32 will probably be enough to max-out your disk.
Ross Leishman
|
|
|
Goto Forum:
Current Time: Mon Jan 20 08:36:47 CST 2025
|