SQL question - most recent value for each group of data
Date: Fri, 29 Feb 2008 15:27:38 +1100
Message-ID: <B1C87DCFE2040D41B6F46ADF9F8E4D9C01098E36@CALBBEX01.cal.riotinto.org>
I have a case where I need to return the most recent record for each
group of data and I'm interested in how to do most efficiently.
Versions - Oracle 8.1.7.4 on Windows 2003R2 SP2
Example data:
id name value dts 1 Al 2.3 28-Feb-2008 8:33 1 Al 2.5 28-Feb-2008 7:21 1 Fe 3.2 27-Feb-2008 4:55 2 Al 3.1 27-Feb-2008 3:22 The required results are: id name val_at_max_dts res_count 1 Al 2.3 2 1 Fe 3.2 1 2 Al 3.1 1
That is, for id=1, the most recent date is the 8:33 so the value
returned is the 2.3.
The res_count is how many result records there were for that id and
name.
Below I've got 2 queries that give the correct result - I'm interested
in other options that may perform better.
To help, the vast majority will have res_count = 1 and the max value
I've come across is 9.
I've found the analytical query below often performs better, but in
testing with extreme numbers of duplicates, the query with the derived
table performed better.
The SQL I've got to reproduce this is below:
drop table str;
create table str ( id integer , name varchar2(40) , val real , dts date
) ;
create index str_idx1 ON str ( id , name ) ;
- add ~100,000 samples insert into str select rownum , 'Al' , o1.object_id , sysdate-2 from dba_objects o1 , dba_objects o2 where o1.object_id < 100 and o2.object_id < 1000;
- add 2nd + 3rd result to each insert into str select rownum , 'Fe' , o1.object_id , sysdate from dba_objects o1 , dba_objects o2 where o1.object_id < 100 and o2.object_id < 1000;
insert into str select rownum , 'pH' , 7 , sysdate
from dba_objects o1 , dba_objects o2
where o1.object_id < 100 and o2.object_id < 1000;
- every 100th to have a second Al insert into str select rownum * 100 , 'Al' , o1.object_id , sysdate -3 from dba_objects o1 where o1.object_id < 1000 ;
- every 1000th to have third Al insert into str select rownum * 1000 , 'Al' , o1.object_id , sysdate -2 from dba_objects o1 where o1.object_id < 100 ;
commit;
exec dbms_stats.gather_table_stats ( ownname => USER , tabname => 'STR' , cascade => TRUE) ;
set autotrace traceonly
SELECT str1.id , str1.name , str1.val AS val_at_max_dts , smmry.max_dts
, smmry.res_count AS
FROM str str1 ,
( SELECT str2.id , str2.name , MAX(str2.dts) AS max_dts , COUNT(*) as res_count
FROM str str2 GROUP BY str2.id , str2.name ) smmry WHERE str1.id = smmry.id
AND str1.name = smmry.name
AND str1.dts = smmry.max_dts
-- AND str1.id BETWEEN 1000 and 2100
ORDER BY
str1.id , str1.name
;
SELECT *
FROM
(SELECT str1.id , str1.name , str1.val AS val_at_max_dts , str1.dts ,
RANK() OVER (PARTITION BY str1.id , str1.name ORDER BY str1.dts DESC) AS rank_dts ,
ROW_NUMBER() OVER (PARTITION BY str1.id , str1.name ORDER BY
str1.dts ASC) AS res_count
FROM str str1
-- WHERE str1.id BETWEEN 1000 and 2100
ORDER BY
str1.id , str1.name
)
WHERE rank_dts = 1
;
In practise, there are 2 sets of tables Recent tables - table "s" with field id, table "t" as a child and the final child "r" with the value and date field. Repeated for another 3 tables with older data. The dts field is not indexed in the "r" tables.
Thanks,
Bruce Reardon
NOTICE
This e-mail and any attachments are private and confidential and may contain privileged information. If you are not an authorised recipient, the copying or distribution of this e-mail and any attachments is prohibited and you must not read, print or act in reliance on this e-mail or attachments.
This notice should not be removed.
-- http://www.freelists.org/webpage/oracle-lReceived on Thu Feb 28 2008 - 22:27:38 CST