Home » RDBMS Server » Performance Tuning » Best practice for querying weighted sparse data
Best practice for querying weighted sparse data [message #168691] |
Fri, 21 April 2006 11:22  |
kfritsche
Messages: 3 Registered: April 2006
|
Junior Member |
|
|
I am trying to figure out the best way to efficently query a subset of rows based on a sparsely populated column.
For instance consider a table with 10 million rows and a status column with the values and respective counts as follows:
Pending 100
Active 10,000
Completed 99,989,900
For reporting purposes I don't want to archive the "completed", I want them in the same table, however I don't want to search through the "completed" rows when looking for a particular "active" record.
I have tried using a partition for the completed records, however this seems too intensive for OLTP and seems to require an analyze to make it work effectively.
In the past I have created shadow tables populated via triggers which only store "Pending" and "Active" records and when they are changed to "Completed", they are deleted from the table.
I tried compressed indexes to no effect.
This seems a common problem, based on how many times I have seen it.
So what does everyone else do?
Thanks
|
|
|
Re: Best practice for querying weighted sparse data [message #168694 is a reply to message #168691] |
Fri, 21 April 2006 12:10   |
markmal
Messages: 113 Registered: April 2006 Location: Toronto, Canada
|
Senior Member |
|
|
Honestly, running heavy reports on OLTP database is not desirable.
You may consider to get your data from OLTP DB into a separate reporting DB - data warehouse. I said "data warehouse" because you have mentioned "historical" aspect of your reporting.
You can transform data while you are loading data into data warehouse, to be more easy for reporting and analysis. You can pre-join tables, denormalize lookup tables (dimensions), cleanse dirty data, etc..
After historical data has been loaded into DW you can purge some oldest portion from operational OLTP DB (depending on your OLTP data retention period).
In DW you can create a bitmap indexes on columns that have low cardinality. they help a lot for reporting, but not as good in OLTP.
About partitioning. Most of reports have date ranges, constrained by a reporting period.
I would not partition table just by Status. I would better partition by a row creation date, and *probably* sub-partition by Status. This can give optimizer a chance to effectively prune not ineresting partitions based on your searching conditions.
|
|
|
|
|
|
|
Goto Forum:
Current Time: Thu May 01 23:22:16 CDT 2025
|