Home » RDBMS Server » Performance Tuning » How to index 15 columns when you cannot predict what columns will be used in a where clause? (Oracle DB 10gR2)
How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502113] |
Mon, 04 April 2011 10:08 |
|
vini_au
Messages: 14 Registered: April 2011 Location: Australia
|
Junior Member |
|
|
Hi All,
I have a challenge in my hands and I have not been able to work out an effective way to addressing it. I have done fair a bit of research and could not find a solution that suits my problem.
I am running a fairly busy Oracle 10gR2 DB, one of the tables has about 120 columns and this table receives on average 1500 insertions per second. The table is partitioned and the partitioning is based on the most important of the two timestamp columns. There are two timestamps, they hold different times.
Out of these 120 columns, about 15 need to be indexed. Out of the 15 two of them are timestamp, at least one of these two timestamp columns is always in the where clause the queries.
Now the challenge is, the queries we run can have any combination of the 13 other columns + one timestamp. In reality the queries never have more than 7 or 8 columns in the where clause but even if we had only 4 columns in the where clause we would still have the same problem.
So if I create one concatenated index for all these columns it will not be very efficient because after the 4th or 5th column the sorting would no longer be very useful and I believe the optimiser would simply not use the rest of the index. So queries that use the leading columns of the index in sequence work well, but if I need to query the 10th column the I have performance issues.
Now, if I create multiple single column indexes oracle will have to work a lot harder to maintain all these indexes and it will create performance issues (I have tried that). Besides, if I have multiple single column indexes the optimiser will do nested loops twice or three times and will hit only the first few columns of the where clause so I think it will kind of be the same as the long concatenated index.
What I am trying to do is exactly what the Bitmap index would do, it would be very good if I could use the AND condition that a Bitmap index uses. This way I could have N number of single column indexes which the optimiser could pick from and serve the query with exactly the ones it needs. But unfortunately using the Bitmap index here is not an option given the large amount of inserts that I get on this table.
I have been looking for alternatives, I have considered creating multiple shorter concatenated indexes but this still would not address the issue since many queries would still not be served properly and therefore would take a very long time to complete.
What I had in mind would be some sort of multidimensional index, I am not even sure if such thing exists. But essentially it would be some sort of index that could serve a query efficiently regardless of the fact that the where clause has the 1st, 3rd and last columns of the index.
So considering how widely used Oracle is and how many super large databases there are out there, this problem must be common and there must be an elegant solution for it.
Are you able to share some ideas with me?
Any light shed on this issue would be much appreciated.
Thanks
Vini
|
|
|
|
Re: How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502168 is a reply to message #502113] |
Tue, 05 April 2011 02:02 |
John Watson
Messages: 8960 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
Hi - your question shows that you know what you are talking about, but anyway, could partitioning help further?
For example, if the queries typically use a range predicate on date and an equality predicate on one other column, then: can you create range partitions on date and hash or list subpartitions on the other column, and make the partitions small enough that they reflect the granularity of the query? You could then remove those columns from the index and create a set of smaller local indexes on the remaining columns. Could use of local non-prefixed partitioned indexes remove another column from each index?
It would of course require a huge investment to design and test and migrate to a new partitioning structure, and it might be a disaster.
And an alternative could be to use multiple single column b*tree indexes. This was demonstrated on this board recently, see this post:
[[http://www.orafaq.com/forum/mv/msg/167413/495895/148813/#msg_495895]]
Those tests were with 11.2, I don't know if 10g can do the same dynamic conversion of b*trees to bitmaps.
Please forgive me if you've already been through these options.
|
|
|
Re: How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502389 is a reply to message #502113] |
Wed, 06 April 2011 03:31 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
A tough one for sure. Yes on the surface it sounds almost exactly like what BITMAP indexing was made for. But then you throw in that 15000 inserts per second and poof no go.
I can't say I have the answer, but here is one idea. I am going to key on the idea that you are inserting 15000 inserts/second to also mean that you don't update data already in the table.
To that end, the following might be an option. Try splitting the table into two parts. You said you already partitioned, so this would just be an extension of that idea. You will see here a sample of what I mean.
Here we are going to exploit an old Oracle 7 technique. Yes, just because its old don't mean it ain't useful, the manually partitioned view...
create table old_data
(
c01 integer not null
, c02 integer not null
, c03 integer not null
, c04 integer not null
, c05 integer not null
, c06 integer not null
, c07 integer not null
, c08 integer not null
, c09 integer not null
, c10 integer not null
, c11 integer not null
, c12 integer not null
, c13 integer not null
, c14 integer not null
, c15 integer not null
)
/
create table new_data
(
c01 integer not null
, c02 integer not null
, c03 integer not null
, c04 integer not null
, c05 integer not null
, c06 integer not null
, c07 integer not null
, c08 integer not null
, c09 integer not null
, c10 integer not null
, c11 integer not null
, c12 integer not null
, c13 integer not null
, c14 integer not null
, c15 integer not null
)
/
create or replace view all_data
as
select * from old_data union all
select * from new_data
/
create bitmap index old_data_bm01 on old_data (c01);
create bitmap index old_data_bm02 on old_data (c02);
create bitmap index old_data_bm03 on old_data (c03);
create bitmap index old_data_bm04 on old_data (c04);
create bitmap index old_data_bm05 on old_data (c05);
create bitmap index old_data_bm06 on old_data (c06);
create bitmap index old_data_bm07 on old_data (c07);
create bitmap index old_data_bm08 on old_data (c08);
create bitmap index old_data_bm09 on old_data (c09);
create bitmap index old_data_bm10 on old_data (c10);
create bitmap index old_data_bm11 on old_data (c11);
create bitmap index old_data_bm12 on old_data (c12);
create bitmap index old_data_bm13 on old_data (c13);
create bitmap index old_data_bm14 on old_data (c14);
create bitmap index old_data_bm15 on old_data (c15);
create index new_data_i01 on new_data(c01);
delete from plan_table;
explain plan for
select *
from all_data
where c01 > 1 and c02 = 2 and c03 = 3
/
select * from table(dbms_xplan.display);
So above we see two tables. These are actually a physical split of one table. OLD_DATA is as it sounds, all the data that your start with (lets say at the begining of a day), and NEW_DATA is all the data you are going to add to the table during some period of time, let us say today.
We are assuming that OLD_DATA is static. There are no UPDATES or DELETES against it. As such this data might benefit from BITMAP indexes. We are assuming also that all inserts will be sent to NEW_DATA and therefore you cannot index with BITMAP indexes here.
This split gives us two things: 1) it puts our majority of data into OLD_DATA and lets us index it using BITMAP technology, 2) that portion of the data we cannot index with BITMAPS we will index using traditional indexes, but this will only contain a small amount of rows compared to OLD_DATA and thus correspondingly less work.
What is left is to see that we can get a query to go after both parts efficiently, using BITMAP indexes on OLD_DATA and whatever our best scenario is on NEW_DATA which in this case may simply be the one date column. The view above achieves this.
Here is the query plan then.
SQL> delete from plan_table;
8 rows deleted.
SQL>
SQL> explain plan for
2 select *
3 from all_data
4 where c01 > 1 and c02 = 2 and c03 = 3
5 /
Explained.
SQL>
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 390 | 3 |
| 1 | VIEW | ALL_DATA | 2 | 390 | 3 |
| 2 | UNION-ALL | | | | |
|* 3 | TABLE ACCESS BY INDEX ROWID | OLD_DATA | 1 | 195 | 2 |
| 4 | BITMAP CONVERSION TO ROWIDS| | | | |
|* 5 | BITMAP INDEX SINGLE VALUE | OLD_DATA_BM02 | | | |
|* 6 | TABLE ACCESS BY INDEX ROWID | NEW_DATA | 1 | 195 | 1 |
|* 7 | INDEX RANGE SCAN | NEW_DATA_I01 | 1 | | 2 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("OLD_DATA"."C01">1 AND "OLD_DATA"."C03"=3)
5 - access("OLD_DATA"."C02"=2)
6 - filter("NEW_DATA"."C02"=2 AND "NEW_DATA"."C03"=3)
7 - access("NEW_DATA"."C01">1)
Note: cpu costing is off
23 rows selected.
At least on the surface, this holds some promise. We can clearly get the two types of accesses we want in one plan, BITMAP against our static data and whatever we can make happen on our non-static data. Of course there are lots of assumptions made here and it is not perfect.
1) we assume that there are not a lot of updates/deletes happening against the data in OLD_DATA.
2) we assume that OLD_DATA contain 99% of all your data and that NEW_DATA is 1% or less of overall rows.
3) we assume some kind of partition management will allow you to migrate NEW_DATA to OLD_DATA at the end of the day such that there is not significant amount of work for this, and such that it does not destroy statistics and other items that make the whole thing usable. In 11gR2 this seems workable to me.
4) it is not clear how much you will get out of the BITMAP indexes as I am not able to do any volume testing. You will have to load some realistic rowsets, and gather stats on them and make sure that for the equality predicates in the query, the bitmap indexes get used in multiples as they should. We did not see this actually happen here and I know not if this is because there is some limit related to the union all view, or if there are no stats or what so you will have to figure that out.
It is just an idea, and I wish you luck. Please let us know where you end up.
Kevin
|
|
|
Re: How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502462 is a reply to message #502389] |
Wed, 06 April 2011 08:35 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
You can still use a BITMAP access path without bitmap indexes. I'm not sure how efficient it will be, but create a bunch of single-column indexes on the columns that are fairly selective (DONT index the non-selective ones, like Male/Female, Yes/No). Then - in your query - use the INDEX_COMBINE hint. Oracle will perform range scans of the indexes it thinks are useful to the query, convert the ROWIDs to bitmaps, combine the bitmaps, convert back to ROWIDs and return the results. I don't know how effective this will be in practice, because if it accidentally scans a non-selective index you will pay dearly - unlike BITMAP indexes which are tolerant of this because of their compression.
You say every query uses one of the timestamps: consider how many rows it would return if it ONLY scanned on the timestamp - the columns you need to index are the ones that get you to one-tenth of that volume. i.e. If the timestamp scan returns 100K rows, what other filter predicates are required to get it to 10K? Index those columns. If all of the other filters combined don't get you to that one-tenth threshold, then there is probably nothing you can do; the time you save reducing the volume of table data to lookup is eaten up by index scans.
Also, are those INSERTs coming from multiple sessions? If they coming from individual users with many sessions, then it is true that BITMAP indexes are inappropriate. However it is a fallacy that you *cannot* use bitmap indexes on a volatile table. If all of the activity is coming from a single session, and that single session is applying the changes in set based statements (INSERT INTO ... SELECT FROM, or MERGE) then the BITMAP index maintenance is quite efficient. If - for example - there was a process picking the transactions off a queue and applying them to the table, and there was flexibility to batch these inserts into (say) a temporary table and then insert en-masse with parallel DML, then you could reasonably get away with BITMAP indexes.
Ross Leishman
|
|
|
|
|
Re: How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502630 is a reply to message #502462] |
Thu, 07 April 2011 11:50 |
|
vini_au
Messages: 14 Registered: April 2011 Location: Australia
|
Junior Member |
|
|
Quote:You can still use a BITMAP access path without bitmap indexes. I'm not sure how efficient it will be, but create a bunch of single-column indexes on the columns that are fairly selective (DONT index the non-selective ones, like Male/Female, Yes/No).
After I read you message I was very hopeful that this would solve my problem. Already had the single column indexes I just didn't know I could use this index_combine access path. Unfortunately the results did not meet my expectations, after a few tests with the hint I got the bitmap conversion from rowids but the cost was much higher. One interesting thing was that it was not using all the single column indexes so I included about 5 in the hint and the query's cost sky-rocketed.
Another interesting thing was that the optimiser was favouring indexes with a very high number of unique values. Not very selective as opposed to the selective ones.
In fact often the optimiser picks only one btree index with relatively low selectivity over the timestamp column. This leads me to the point you made below where the data needs to be narrowed down.
Quote:You say every query uses one of the timestamps: consider how many rows it would return if it ONLY scanned on the timestamp - the columns you need to index are the ones that get you to one-tenth of that volume. i.e. If the timestamp scan returns 100K rows, what other filter predicates are required to get it to 10K? Index those columns.
I guess the problem is exactly that, the optimiser is not reducing the amount of data by using other indexes. My understanding is that the optimiser will find the partitions it needs to scan, then it should get the timestamp and at least two more indexes to reduce the data to one-tenth as you suggested. If you think about it, it would be a natural process of narrowing it down to a small chunk that can then be scanned if there are other columns that still need to be matched against the where clause.
Quote:Also, are those INSERTs coming from multiple sessions? If they coming from individual users with many sessions, then it is true that BITMAP indexes are inappropriate. However it is a fallacy that you *cannot* use bitmap indexes on a volatile table.
The insertions are coming from multiple sessions but they belong to the same user. There are only about 2 to 4 sessions inserting data into this table at any given time and they are done in batches.
I have created a couple of bitmap indexes to see what would happen and so far it has not made much difference, either in terms of performance or load. However the execution plan using the bitmap indexes is still sub-optimal and once I created the first single column bitmap index the optimiser started doing bitmap conversion on the existing btree indexes without the hit which was interesting. Unfortunately the performance was still not acceptable.
Quote:If all of the activity is coming from a single session, and that single session is applying the changes in set based statements (INSERT INTO ... SELECT FROM, or MERGE) then the BITMAP index maintenance is quite efficient. If - for example - there was a process picking the transactions off a queue and applying them to the table, and there was flexibility to batch these inserts into (say) a temporary table and then insert en-masse with parallel DML, then you could reasonably get away with BITMAP indexes.
I am still not quite sure about whether or not bitmap indexes could be used here and whether or not they will provide any benefit. I used to think that bitmap indexes were not viable anywhere where there is a very high number of inserts but it seems not to be the case. Inserts is all I have, once the data is there it is only read until the partition is deleted.
One interesting thing that I have to mention is that depending on what columns are put where clauses on I can get lightning fast results using my existing concatenated indexes.
As I said before I have two timestamp columns. There are also two slightly different concatenated indexes which are about 17 or so columns long and equally inefficient. Each of these indexes have a timestamp column as its leading column.
Now if I query the table and in my query I have where clauses on the 1st, 5th and 13th columns of the index I get an exceptionally fast response. This is something that I have not been able to understand.
Correct me if I am wrong but when you have a concatenated index, it will be used only up to a point where the columns in the where clause match its leading columns contiguously. So if the where clause has 1st, 2nd, 3rd and 6th - the index will be used up until the 3rd column.
If what I said above is correct then my query will only take advantage of the first column of the index and therefore it should not be as efficient as it was.
I hope my points make sense and applogies for not posting the explain plans but my remote access decided not work just when I needed it.
[Updated on: Thu, 07 April 2011 11:53] Report message to a moderator
|
|
|
|
Re: How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502648 is a reply to message #502642] |
Thu, 07 April 2011 17:58 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I think you might have your terminology mixed up. A "selective" index is one that allows you to identify just a few matching rows. For example, a Primary Key is highly selective when used in an equals clause - it returns 0 or 1 row.
So when I say to only scan on selective columns, I mean those that have the fewest possible matches to your WHERE predicates.
If you scan large ranges of non-selective indexes, you will incur a massive IO overhead because they are not compressed like Bitmap indexes.
If you were to persist with this method, you need to be using single-column indexes. Concatenated indexes with columns you don't use waste space and increase IO. Especially if there are 16 extra columns you are not scanning on.
Regarding the surprisingly good performance of the 17-column index, there are two factors at play in this type of index. What you were talking about is the columns of the index that are used in the INDEX SCAN. Yes, only the leading subset of columns can be used in an index scan (let's ignore SKIP SCAN here because it doesn't apply). However, if you are just scanning on the 1st column, you can still FILTER on the 5th and 13th columns. You will still need to read both matching and non-matching values of those index columns, but you will not be looking up the table for the non-matching rows. If you DIDN'T have those columns in the index, Oracle would have to retrieve every row from the table that matched column 1, just to discard heaps of them because they do not match column 5 or 13.
You mention that you have some expectiations of performance. How many rows are the table, how many in each partition, how many will a typical query return, and how long do you expect this to take?
Ross Leishman
|
|
|
Re: How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502665 is a reply to message #502642] |
Thu, 07 April 2011 23:30 |
|
vini_au
Messages: 14 Registered: April 2011 Location: Australia
|
Junior Member |
|
|
Quote:I mention this because it appears you must live under many restrictions with this table and thus cannot do much to it.
I do indeed have many restrictions, we use a commercial system and so we cannot deviate too much from its design.
Quote:You would achieve a copy using one of any number of ASYNCHRONOUS REPLICATION methods. These methods all read the redo logs so your database would need to be in archive log mode. The two products that come to mind to me are STREAMS (FREE with Enterprise addition of Oracle), and GOLDEN GATE (expensive but also from Oracle). These tools can generally keep your replicated data up-to-date in "NEAR REAL TIME" if that matters. The work by reading your log files and applying transactions based on whatever rules you set.
This would be the ideal scenario that I want to have to replicate my DB in near real-time, just so I have an on-line backup of it.
For this exercise having a replica of the DB where I could make changes to the table wouldn't work because we use the same system to insert and read the data from the DB and it is indeed a very complex system.
Quote:Also, I am curious as to what kind of system you have that does 15000 inserts per second.
You misread my initial post, it is actually 1,500 insertions per second on average but it fluctuates during the day and sometimes goes up 2,500 insertions per second. Likewise at night it normally falls below 1,000.
The system that I run is a SIEM http://en.wikipedia.org/wiki/SIEM - it is pretty large and complex system which receives/collects logs from many network devices like firewalls, IDSP/IPS, windows, linux and so on. The amount of data sent to it is huge, all the logs are parsed and normalised before they are put into the database. This system is used for doing all sorts of monitoring and correlation near real-time.
Quote:Though it is certainly possible, adding fifteen to twenty indexes to this table will make insert take between two and ten times faster that the table with no indexes so you need to keep an eye on your insert rate.
I think you meant to say adding the indexes will make the inserts slower than the table with no indexes since when you don't have to build indexes you have less work to do. Correct?
|
|
|
Re: How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502669 is a reply to message #502648] |
Fri, 08 April 2011 00:04 |
|
vini_au
Messages: 14 Registered: April 2011 Location: Australia
|
Junior Member |
|
|
Quote:I think you might have your terminology mixed up. A "selective" index is one that allows you to identify just a few matching rows. For example, a Primary Key is highly selective when used in an equals clause - it returns 0 or 1 row.
Thanks for pointing this out, I actually did not write it properly - it was probably due to the fact that I was very tired when I was typing. Apologies if anything else didn't quite make sense.
What I meant to say was that the optimiser was not favouring selective indexes and instead it was using indexes with a very high number of unique values, non-selective indexes.
Quote:If you scan large ranges of non-selective indexes, you will incur a massive IO overhead because they are not compressed like Bitmap indexes.
From what I understand the compression in a btree index is simply building a table with the unique values and storing references to the rows where the values would be so that it does not have to store the same value several times. This reduces the size of the index and reduces I/O.
Ok so, what if these non-selective btree indexes are compressed? Will it make a significant difference in the I/O overhead?
Quote:If you were to persist with this method, you need to be using single-column indexes. Concatenated indexes with columns you don't use waste space and increase IO. Especially if there are 16 extra columns you are not scanning on.
I agree, that is why the concatenated indexes haven't worked well. Our queries are ad-hoc queries with predicates based on combination of several columns and if we don't know in advance which combinations then we cannot create concatenated indexes to serve them well.
The other problem is that with multiple single column indexes I am still not able to get the results I thought I would since the cost is sky-rocketing when the optimiser uses multiple single column indexes.
Quote:Regarding the surprisingly good performance of the 17-column index, there are two factors at play in this type of index. What you were talking about is the columns of the index that are used in the INDEX SCAN. Yes, only the leading subset of columns can be used in an index scan (let's ignore SKIP SCAN here because it doesn't apply).
Thanks for clarifying this!
Quote:However, if you are just scanning on the 1st column, you can still FILTER on the 5th and 13th columns. You will still need to read both matching and non-matching values of those index columns, but you will not be looking up the table for the non-matching rows. If you DIDN'T have those columns in the index, Oracle would have to retrieve every row from the table that matched column 1, just to discard heaps of them because they do not match column 5 or 13.
I didn't know oracle would do that, as you can tell by now there is a lot that I do not know. So having those extra trailing columns in an index is not a total waste since if oracle had to go to the table to it would have to do a lot more work and things would be far worse. Correct?
Quote:You mention that you have some expectiations of performance. How many rows are the table, how many in each partition, how many will a typical query return, and how long do you expect this to take?
Number of rows in the table right now is 787,628,140 - this is about 8 days of data or if you prefer 8 partitions since the data is partitioned by day. The partition rows range between 90M to 100M.
A typical query will return from 10 rows to 2,000 rows and could span from 5 minutes to 4 hours. Sometimes we run ad-hoc queries over a day or two of the data but even in these cases we wouldn't expect to get much more than 2,000 rows since it wouldn't make sense to show so many rows on the screen.
As for how fast I expect it to be, well in my test case if I formulate the query correctly I get the results in about 15-20 seconds, this test in particular spanned 4 days and returned about 100 rows.
|
|
|
Re: How to index 15 columns when you cannot predict what columns will be used in a where clause? [message #502670 is a reply to message #502669] |
Fri, 08 April 2011 00:20 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
That's not an unreasonable expectation.
Let's work through a test case:
- Pick one of your queries that is under-performing
- Is the partition key used as a filter predicate? Show us the WHERE clause.
- How many rows would be selected by the date predicate alone.
- How many other predicates in the query would - on their own - have similar or better selectiveness to the date predicate. eg. If the date predicate matches 10% of the table, what other filter predicates match 10% or less of the table. NOTE THAT THIS IS NOT THE SAME AS PREDICATES THAT ARE ONLY SELECTIVE WHEN COMBINED WITH THE DATE PREDICATE.
- Excluding all of the other non-selective predicates, how many rows match just the date predicate and the selective predicates described above?
Ross Leishman
|
|
|
Goto Forum:
Current Time: Sun Nov 24 21:00:21 CST 2024
|