Home » SQL & PL/SQL » SQL & PL/SQL » RANK: Selecting the sum of the top n highest values, only n values max
RANK: Selecting the sum of the top n highest values, only n values max [message #155597] |
Tue, 17 January 2006 10:22 |
arnematt
Messages: 24 Registered: November 2005 Location: Netherlands
|
Junior Member |
|
|
Assume the following data in table T
A| B
=====
A 20
B 21
C 22
D 19
E 19
F 19
I want to have a sum of the 4 highest values in the table. If there are more values that rank 1,2,3 or 4, just take the the sum of the 4 highest but no more than 4 in total.
So I would want to have a result like: 19+20+21+22 = 82
and not: 19+19+19+20+21+22 = ....
Would the data look like this:
A 20
B 21
C 21
D 22
E 19
F 19
Then I would want it to be 20+21+21+22 = 84.
both rank and dense rank do not rule out the possibility that more than 4 values end up with a rank <5 (e.g. 1,2,3,3,5)
How to do this... something with distinct?
Thanks...
PS I did a search on 'RANK' in this forum and havent found anything about this...
[Updated on: Tue, 17 January 2006 10:41] Report message to a moderator
|
|
|
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155600 is a reply to message #155597] |
Tue, 17 January 2006 10:50 |
arnematt
Messages: 24 Registered: November 2005 Location: Netherlands
|
Junior Member |
|
|
Hm..
I just found something, in a thread here:
select * from (select * from emp order by sal desc) where rownum <=5
however, is this efficient? It's going to sort the whole table...
plus, what is 'rownum'? Does that work?
There should be a more efficient way shouldnt there... a full table scan with an array of 5, something like that, should be done by the optimizer. Not a full sort of the table and then take only 5...
very inefficient isnt it.
thanks.
|
|
|
|
|
|
|
|
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155631 is a reply to message #155597] |
Tue, 17 January 2006 13:43 |
arnematt
Messages: 24 Registered: November 2005 Location: Netherlands
|
Junior Member |
|
|
but this select from, order by desc.... i mean, it is going to be browsing through about 500,000,000 rows... (although partitioned by date). And what is worse, I am going to have to calculate it for EACH individual article_number in that table, over a certain period. There are about 200000 articles... Locations * articles * weeks could be up to 1,000,000 calculations; for each article number I have to find the top 5 selling locations...
I hope you have some time to look at my alternative solution, involving an index-by table and a cursor:
Suppose the problem concerns a sales table with four fields:
SALES
----
ARTICLE (NUMBER) --> up to 200000 different articles
LOCATION (NUMBER) --> about 200 locations
SOLD_AMT (NUMBER)
WEEK_NO (NUMBER)
Instead I was thinking of the following approach:
run a cursor for loop over the table to give all location/articles for the relevant period(s). Order this by LOCATION. So Articles are passed in random order in the loop.
Fill an index-by table using the article number as an index.
On the index location I put a record. This record contains (amongst lots of other things) a VARRAY(5) with top-selling locations.
e.g.: sales(article).top5(1) will contain the highest amount for that article. Location that was involved is not relevant, just its total sales amt for this article.
Since the set is ordered by location, I keep adding to the total (for each article there is one index location) until the location changes. Then, if the location has changed, I do a check to see if the total_sales for that location is in the top 5 of that article. Remember this goes for each article.
Would that be faster? If you need more info, I'd be happy to share...
thanks again
[Updated on: Tue, 17 January 2006 13:48] Report message to a moderator
|
|
|
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155639 is a reply to message #155631] |
Tue, 17 January 2006 14:44 |
Art Metzer
Messages: 2480 Registered: December 2002
|
Senior Member |
|
|
Have you looked into the ROW_NUMBER() analytic function? In that call, you would partition by your article_number.
My general advice to you would be, don't reinvent the wheel. "If you can do it in SQL, do it." Only resort to PL/SQL if you absolutely have to (which I don't think you do).
Make sure your tables statistics have been recently analyzed.
Then benchmark the SQL solution, and the PL/SQL solution, if you so choose.
For more on the ROW_NUMBER analytic function, search AskTom.
|
|
|
|
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155643 is a reply to message #155640] |
Tue, 17 January 2006 15:40 |
Art Metzer
Messages: 2480 Registered: December 2002
|
Senior Member |
|
|
An example might be best.
Here's a query that shows the top five locations (one of nine possible two-letter codes) for each of twenty-six articles (single letter of the alphabet):SQL> CREATE TABLE t (
2 article_number VARCHAR2(1)
3 , location VARCHAR2(2)
4 , dt DATE
5 , sales NUMBER
6 )
7 /
Table created.
SQL> INSERT INTO t
2 SELECT CHR(TRUNC(DBMS_RANDOM.VALUE(0,26)) + 65)
3 , CHR(TRUNC(DBMS_RANDOM.VALUE(0,3)) + 65)
4 || CHR(TRUNC(DBMS_RANDOM.VALUE(0,3)) + 65)
5 , TO_DATE('197901','YYYYMM') + DBMS_RANDOM.VALUE(0,10000)
6 , TRUNC(DBMS_RANDOM.VALUE(0,10000)) * 1E-2
7 FROM sys.all_objects
8 WHERE ROWNUM <= 5000
9 /
5000 rows created.
SQL> SELECT b.article_number
2 , MAX(DECODE(b.r,1,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#1_LOCATION"
3 , MAX(DECODE(b.r,2,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#2_LOCATION"
4 , MAX(DECODE(b.r,3,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#3_LOCATION"
5 , MAX(DECODE(b.r,4,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#4_LOCATION"
6 , MAX(DECODE(b.r,5,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#5_LOCATION"
7 FROM (SELECT a.article_number
8 , a.location
9 , a.total_sales
10 , ROW_NUMBER()
11 OVER (PARTITION BY a.article_number
12 ORDER BY a.total_sales DESC) r
13 FROM (SELECT t.article_number
14 , t.location
15 , SUM(t.sales) total_sales
16 FROM t
17 GROUP BY t.article_number
18 , t.location) a) b
19 GROUP BY b.article_number
20 ORDER BY b.article_number
21 /
A #1_LOCATION #2_LOCATION #3_LOCATION #4_LOCATION #5_LOCATION
- ------------- ------------- ------------- ------------- -------------
A CB: 1243.01 AB: 1220.12 CC: 1136.65 BB: 1092.00 AC: 873.07
B BC: 1315.07 CB: 1210.55 AB: 1096.69 AC: 992.31 CC: 913.12
C BB: 1526.50 BA: 1463.79 BC: 1254.00 AA: 1221.40 CB: 1202.82
D CB: 1774.60 BB: 1713.99 AA: 1592.01 AB: 1323.78 CA: 977.92
E AC: 1240.18 BA: 1163.41 AA: 1019.80 CA: 939.67 BB: 856.59
F AB: 1337.53 AA: 1325.00 CC: 1199.50 BB: 1124.83 BC: 1064.63
G AA: 1838.99 BA: 1383.37 CC: 1214.45 BC: 1183.18 AB: 1161.77
H AB: 1790.31 BB: 1493.82 CB: 1387.30 BA: 1332.45 BC: 1320.80
I AA: 1844.60 AC: 1186.10 BA: 1122.40 AB: 976.72 CA: 925.43
J AA: 1149.16 CB: 1108.97 BB: 957.67 AB: 902.75 AC: 899.79
K BC: 1568.66 CC: 1358.95 BA: 1335.33 CB: 1215.88 AB: 1091.22
L BB: 1238.54 BC: 1217.45 BA: 1119.37 CA: 1088.63 CC: 1079.85
M BC: 1273.45 CA: 1163.99 AC: 1112.90 BA: 1102.88 CB: 1063.84
N BB: 1724.76 CB: 1360.67 CA: 1287.25 AB: 1267.32 BA: 1238.88
O CA: 1422.41 CB: 1386.23 AB: 1380.47 AA: 1372.09 BA: 1312.06
P AB: 1451.14 AA: 1333.88 BC: 1131.36 BA: 1062.11 CC: 1008.58
Q BA: 1355.69 CA: 1276.48 CB: 1266.73 BB: 1131.05 AC: 1021.95
R BA: 1400.46 AC: 1296.94 CB: 1113.09 AB: 1052.78 CC: 1011.71
S AA: 1446.47 BB: 1381.35 CA: 1265.83 CC: 1248.37 AB: 1172.45
T CA: 1681.45 CB: 1407.47 AC: 1280.56 AA: 1219.67 BC: 1164.08
U CB: 1346.14 AA: 1209.67 AC: 1209.38 BB: 1133.90 CC: 1037.00
V AC: 1251.18 CC: 1249.93 BC: 1148.45 BA: 1138.22 AA: 1085.44
W CA: 1268.80 BA: 1208.33 BB: 1191.23 CB: 1112.65 AB: 1026.37
X CB: 1404.09 AA: 1281.85 BC: 1278.58 AB: 1084.74 AC: 961.25
Y BA: 1345.74 BB: 1279.04 BC: 1265.53 AB: 1145.38 CC: 1134.65
Z AA: 1585.47 CA: 1308.87 BA: 1223.27 BB: 1128.17 BC: 1104.22
26 rows selected.
SQL>
|
|
|
|
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155760 is a reply to message #155597] |
Wed, 18 January 2006 08:09 |
arnematt
Messages: 24 Registered: November 2005 Location: Netherlands
|
Junior Member |
|
|
However, however...
I am facing the problem that I would need to place a major WHERE clause on two locations in that select statement...
Will that still perform???
I will show you the basic select statement (in my solution a cursor that is used to fill the index-by table).
Also at the end of this post I have included my solution, well part of it... not asking 'what do you think of my code', more like an explanation of the approach if the query becomes too big.
==============================
SELECT STATEMENT
This statement consists of the INNER JOIN of several tables. This is to deal with restrictions that apply to the data in several tabels; e.g. for a particular article from the sales table (in reality the E_KASSABON table), the restriction applies that its should have status VALID in another table (here: e_art_hist.art_status; where e_art_hist is linked to e_kassabon through a substitute key in the e_art_geldig table).
Without wanting to go into too much detail here, the problem I now face is that the partition by needs only to be done over ONLY the articles in this result set; I suppose this implies the whole inner join bunch should go into BOTH the GROUP BY sections of your example as far as I can see??
Lot of processing time, and lot of text???
Hope you have some thoughts on this...
-------------------------------
SELECT e_vst.vst_nr, cls_nr, e_cls_art.art_nr
FROM e_art_hist
INNER JOIN e_art_geldig USING (art_key)
INNER JOIN e_cls_art ON e_art_geldig.art_nr = e_cls_art.art_nr
INNER JOIN e_cls_vst USING (cls_nr)
INNER JOIN e_vst ON e_vst.vst_nr = e_cls_vst.vst_nr
INNER JOIN e_kassabon ON e_kassabon.vst_nr = e_vst.vst_nr
AND e_kassabon.art_nr = e_cls_art.art_nr
WHERE e_kassabon.tract_datum BETWEEN (r1.peildatum AND r1.peildatum +6)
AND IND_KLANT = 'J'
AND IND_BOEKING = 'I'
AND e_art_geldig.beg_datum <= r1.peildatum
AND e_art_geldig.eind_datum >= r1.peildatum
AND e_art_hist.art_status = 'A'
AND e_cls_art.beg_datum_bron <= r1.peildatum
AND e_cls_art.eind_datum_bron >= r1.peildatum
AND e_cls_art.beg_datum <= r1.peildatum
AND e_cls_art.eind_datum > r1.peildatum
AND e_cls_vst.beg_datum_bron <= r1.peildatum
AND e_cls_vst.eind_datum_bron >= r1.peildatum
AND e_cls_vst.beg_datum <= r1.peildatum
AND e_cls_vst.eind_datum > r1.peildatum
ORDER BY
VST_NR,
BON_ID,
TRACT_DATUM ASC
---------------------
My solution (hope you can look through fieldname details etc):
one cursor to determine time period (peildatum: a date in a week). Second cursor loops within this one over mainly the E_KASSABON table, but with restrictions from many other tables (E_KASSABON is the big one).
VARRAY is used to store and update the top 5 of locations, while a certain location is being processed ('VST_NR' = 'LOCATION_NR'), for each article number (ART_NR) such a VARRAY is inserted in the index-by table on the index of the article number and updated each time the article number is encountered (loop is through ordered VST_NRs, not ordered ART_NRs. This is because I need ordering on BON_ID per VST_NR. (BON_ID = 'sale-id', where one sale can be more articles.)).
CREATE OR REPLACE PROCEDURE OMZETDOORGIFTE (v_result IN OUT NUMBER) AS
CURSOR c1_peildata IS
SELECT DISTINCT(TO_CHAR(verw_datum, 'YYYYIW')
FROM MONITOR
WHERE proceskode IN ('LOAD', 'TLOD')
AND db_datum >
SELECT (MAX(db_datum) FROM MONITOR
WHERE proceskode = 'IKBAGG'
AND result_kode < 2;
CURSOR c2_vstdag IS
SELECT
e_vst.vst_nr,
cls_nr,
e_cls_art.art_nr,
e_kassabon.bon_id,
e_kassabon. nogwat,
e_kassabon.aantal!!!
FROM e_art_hist
INNER JOIN e_art_geldig USING (art_key)
INNER JOIN e_cls_art ON e_art_geldig.art_nr = e_cls_art.art_nr
INNER JOIN e_cls_vst USING (cls_nr)
INNER JOIN e_vst ON e_vst.vst_nr = e_cls_vst.vst_nr
INNER JOIN e_kassabon ON e_kassabon.vst_nr = e_vst.vst_nr
AND e_kassabon.art_nr = e_cls_art.art_nr
WHERE e_kassabon.tract_datum BETWEEN (r1.peildatum AND r1.peildatum +6)
AND IND_KLANT = 'J'
AND IND_BOEKING = 'I'
AND e_art_geldig.beg_datum <= r1.peildatum
AND e_art_geldig.eind_datum >= r1.peildatum
AND e_art_hist.art_status = 'A'
AND e_cls_art.beg_datum_bron <= r1.peildatum
AND e_cls_art.eind_datum_bron >= r1.peildatum
AND e_cls_art.beg_datum <= r1.peildatum
AND e_cls_art.eind_datum > r1.peildatum
AND e_cls_vst.beg_datum_bron <= r1.peildatum
AND e_cls_vst.eind_datum_bron >= r1.peildatum
AND e_cls_vst.beg_datum <= r1.peildatum
AND e_cls_vst.eind_datum > r1.peildatum
ORDER BY
VST_NR,
BON_ID,
TRACT_DATUM ASC;
TYPE TOPVSTOMZ IS VARRAY(5) OF NUMBER;
);
TYPE Kasbonrec IS RECORD (
vst_nr E_KASSABON.VST_NR%TYPE, -- huidige vestigingsnummer
cls_nr E_KASSABON.CLS_NR%TYPE, -- alleen het meest recente cls_nr blijft bewaard.
art_nr E_KASSABON.ART_NR%TYPE, -- nodig voor sortering.
bon_id E_KASSABON.BON_ID%TYPE, --om bon id vast te houden; als dit wijzigt dan aantal klanten + 1
aantal_stuks E_KASSABON.AANTAL%TYPE, -- totaal aantal verkopen. Opgehoogd met aantal verkopen in huidig record.
aantal_bonnen NUMBER, --aantal klanten voor dit artikel. Opgehoogd met 1 zodra bonid vorig en huidig ongelijk zijn.
aantal_vestigingen NUMBER ,-- totaal aantal vestigingen dat dit artikel voert
vst_totaal NUMBER, --totale verkoopbedrag voor dit artikel voor de huidige vestiging; als vst_nr wijzigt dan reset
vst_top_5_Arr TOPVSTOMZ%TYPE -- array om voor dit artikel de vestigingstop 5 bij te houden
TYPE ArrA_VST_ART
IS TABLE OF Kasbonrec%ROWTYPE
INDEX BY PLS_INTEGER;
PROCEDURE VERWERK IS
top5Counter1 NUMBER;
top5Counter2 NUMBER;
arrIndex NUMBER;
BEGIN
OR r1_peildatum IN c1_peildata LOOP
FOR r2_vstdag IN c2_vstdag LOOP
arrIndex : = r2_vstdag.ART_NR;
aantalVestigingen := 0;
aantalBonnen := 0;
aantalStuks := 0;
--etc...
IF ArrA_VST_ART(arrIndex).VST_NR <> r2_vstdag.VST_NR
THEN
FOR top5Counter1 IN REVERSE 1..5 -- Telt af van 5 naar 1.
LOOP
IF huidige waarde >= Arr(top5Counter1)
THEN
FOR top5Counter2 IN 1..top5Counter1-1
LOOP
topVstOmz(top5Counter2) := topVstOmz(top5Counter2+1);
END LOOP;
topVstOmz(top5Counter1):= ArrA_VST_ART(r2_vstdag.ART_NR).VST_TOTAAL;
EXIT;
END IF;
END LOOP;
ArrA(arrIndex).VST_NR := r2_vstdag.VST_NR;
ArrA(arrIndex).VST_TOTAAL := 0;
aantalVestigingen := aantalVestigingen + 1; <<--"vestiging" stands for "location"
END IF;
IF ArrA_VST_ART(arrIndex).BON_ID <> r2_vstdag.BON_ID
THEN
END IF;
ArrA(arrIndex).AANTAL_VERKOCHT := ArrA(arrIndex).AANTAL_VERKOCHT + r2_vstdag.AANTAL;
ArrA(arrIndex).NETTO_OMZET_WAARDE := ArrA(arrIndex).NETTO_OMZET_WAARDE + r2_vstdag.BONREG_BEDRAG;
ArrA(arrIndex).BGW := ArrA(arrIndex).BGW + r2_vstdag.BGW;
IF r2_vstdag.PROM_ID IN (lijst met waarden voor promotie)
THEN
ArrA(arrIndex).PROM_OMZET_WAARDE := ArrA(arrIndex).PROM_OMZET_WAARDE + r2_vstdag.BONREG_BEDRAG;
END IF;
IF
ArrA(r2_vstdag.ART_NR).BON_ID <> r2_vstdag.BON_ID
THEN
BEGIN
ArrA(arrIndex).AANTAL_KLANTEN = ArrA(arrIndex).AANTAL_KLANTEN +1;
ArrA(arrIndex).BON_ID = r2_vstdag.BON_ID;
END;
END IF;
END LOOP;
END LOOP;
:
:
...
[Updated on: Wed, 18 January 2006 10:11] Report message to a moderator
|
|
|
Goto Forum:
Current Time: Thu Dec 12 07:17:58 CST 2024
|