A SQL Challenge [message #370699] |
Thu, 20 January 2000 13:11 |
Ben
Messages: 48 Registered: January 2000
|
Member |
|
|
I have a table with a 'number' type column which has thousands of rows. I want to write a query to retrieve the 20 largest (top 20) values on this column. No temporary table/view or PL/SQL object is allowed. Does anyone know how to do that?
|
|
|
Re: A SQL Challenge [message #370701 is a reply to message #370699] |
Thu, 20 January 2000 19:18 |
Greg Skakun
Messages: 10 Registered: January 2000
|
Junior Member |
|
|
The trick to this one is how to order the column and still use the rownum function to get the top 20 items. If you create an index on the column and force your query to access the table using the index, you'll get your rows sorted ascending by the column value. You simply then exclude the first count(*) - 20 rows from your query. Also you must force your query to use the index with some wierd where clause like "where TRX_AMT > -100000"
I'll email you directly with the sql*syntax.
|
|
|
Re: A SQL Challenge [message #370703 is a reply to message #370699] |
Thu, 20 January 2000 19:46 |
Greg Skakun
Messages: 10 Registered: January 2000
|
Junior Member |
|
|
Sample script (after creating the index):
1 select out_trx_amt
2 from gs_tab1 a
3 where out_trx_amt > -100000
4 and a.rowid not in
5 (select b.rowid
6 from gs_tab1 b
7 where out_trx_amt > -100000
8 and rownum <=
9 (select count(*)-20
10 from gs_tab1
11* where out_trx_amt > -100000))
|
|
|
Re: A SQL Challenge [message #370705 is a reply to message #370699] |
Thu, 20 January 2000 21:22 |
Paul
Messages: 164 Registered: April 1999
|
Senior Member |
|
|
Ben,
Try this:
SELECT other_cols,
num_col
FROM your_table a
WHERE 20 FROM your_table b
WHERE b.num_col > a.num_col);
This will return the rows with the 20 highest values, not the first twenty rows with the highest values.
Hope this helps,
Paul
|
|
|
Re: A SQL Challenge [message #370712 is a reply to message #370699] |
Fri, 21 January 2000 15:27 |
Mark E Kane
Messages: 7 Registered: January 2000
|
Junior Member |
|
|
Hey guys,
Don't make it so complicated. Include the following in your select statement:
where rownum<=20
order by <YOUR NUMBER ROW> desc;
That will order them starting at the greatest number and ensure you only bring back 20.
Mark
|
|
|
Re: A SQL Challenge [message #370715 is a reply to message #370699] |
Fri, 21 January 2000 16:11 |
Mark E Kane
Messages: 7 Registered: January 2000
|
Junior Member |
|
|
Hey guys,
Don't make it so complicated. Include the following in your select statement:
where rownum<=20
order by desc;
That will order them starting at the greatest number and ensure you only bring back 20.
Mark
|
|
|
Re: A SQL Challenge [message #370716 is a reply to message #370705] |
Fri, 21 January 2000 23:53 |
Ben
Messages: 48 Registered: January 2000
|
Member |
|
|
Paul,
Thank you very much for your help. Your idea is great! I needed to change the '<' TO>' in your main WHERE clause and it gave me what I wanted for most of the time. Sometimes however it gave me one more record if the 20th and 21st largest values are the same. I modified your script as below to avoid that and also sort the column.
SELECT other_cols,
num_col
FROM your_table a
WHERE 20 > (SELECT COUNT(*)
FROM your_table b
WHERE b.num_col > a.num_col)
AND rownum <= 20
ORDER BY num_col DESC;
Ben
|
|
|
Re: A SQL Challenge [message #370719 is a reply to message #370715] |
Sat, 22 January 2000 05:09 |
arun
Messages: 67 Registered: January 2000
|
Member |
|
|
hi mark,
they asked for 20 largest (top 20) values on this column. not first 20.
please read the question properly and reply.
yours query will retreive the first 20 rows and not top 20 values of specific column.
sorry to point out this. don't be angry.
be colz.
arun.
|
|
|
Re: A SQL Challenge [message #370720 is a reply to message #370705] |
Sat, 22 January 2000 08:46 |
Paul
Messages: 164 Registered: April 1999
|
Senior Member |
|
|
Ben,
Sorry about the mixup with the signs. Just want you to be aware of a potential 'gotcha' when you attempt to limit a return set using ROWNUM
This example uses the SCOTT/TIGER EMP table, since EMP only has 14 record, I'm selecting
only the 'Top 5', but the principle is the same.
-- first, display empno, sal and rownum for all
-- records note that the records were not
-- inserted in salary order
SQL> select empno, sal, rownum from emp;
EMPNO SAL ROWNUM
---------- ---------- ----------
7369 800 1
7499 1600 2
7521 1250 3
7698 2850 4
7782 2450 5
7788 3000 6
7839 5000 7
7844 1500 8
7876 1100 9
7900 950 10
7902 3000 11
7934 1300 12
7566 2975 13
7654 1250 14
14 rows selected.
-- now create a 'tie' for 5th by upping one salary
SQL> update emp
2 set sal = 2850
3 where empno = 7499;
1 row updated.
-- now show the 5 highest paid employees
SQL> select empno, sal
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 order by sal desc;
EMPNO SAL
---------- ----------
7839 5000
7788 3000
7902 3000
7566 2975
7499 2850
7698 2850
6 rows selected.
-- only want 5, use rownum to limit output
SQL> select empno, sal
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 and rownum <= 5
6 order by sal desc;
EMPNO SAL
---------- ----------
7839 5000
7788 3000
7902 3000
7499 2850
7698 2850
-- WHOOPS! where did 7566's salary of 2975 go
-- let's see if we can find out
SQL> select empno, sal, rownum
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 order by sal desc;
EMPNO SAL ROWNUM
---------- ---------- ----------
7839 5000 4
7788 3000 3
7902 3000 5
7566 2975 6
7499 2850 1
7698 2850 2
6 rows selected.
Here's why:
When you select a subset of the records, the new
rownums are assigned by relative rownum in the table, since 7566 was number 13, and the others were all lower, it became number 6, not number 4 as you might have expected.
Hope this saves you some grief,
Paul
|
|
|
Re: A SQL Challenge [message #370721 is a reply to message #370715] |
Sat, 22 January 2000 09:00 |
Mark E Kane
Messages: 7 Registered: January 2000
|
Junior Member |
|
|
No worries Arun,
I just ran that script again on a table, and you are absolutely right. My table just happened to have the numbers in near-descending orders and I didn't catch a couple of missing values.
I misunderstood how the order by works. It has no bearing on the data selected, on on how it is displayed.
Mark
|
|
|
Re: A SQL Challenge [message #370732 is a reply to message #370705] |
Tue, 25 January 2000 17:56 |
Ben
Messages: 48 Registered: January 2000
|
Member |
|
|
Paul,
What about these:
SQL> select empno, sal, rownum
2> from emp a
3> where 5 > ( select count (*) from emp b
4> where b.sal > a.sal)
5> and rowid not in ( select rowid from emp a
6> where 5-1 = ( select count (*) from emp b
7> where b.sal > a.sal)
8> and rowid not in (
9> select max(rowid)
10> from emp a
11> where 5-1 = ( select count (*) from emp b
12> where b.sal > a.sal)
13> group by sal))
14> order by 2 desc
15>
EMPNO SAL ROWNUM
---------- ---------- ----------
7839 5000 4
7788 3000 3
7902 3000 5
7566 2975 1
7698 2850 2
5 rows selected.
OR
SQL> select empno, sal
2> from emp a
3> where 5-1 > ( select count (*) from emp b
4> where b.sal > a.sal)
5> union
6> select empno, sal
7> from emp
8> where rowid in (
9> select max(rowid)
10> from emp a
11> where 5-1 = ( select count (*) from emp b
12> where b.sal > a.sal)
13> group by sal )
14> order by 2 desc
15>
EMPNO SAL
---------- ----------
7839 5000
7788 3000
7902 3000
7566 2975
7698 2850
5 rows selected.
Ben
|
|
|
Re: A SQL Challenge [message #370733 is a reply to message #370699] |
Tue, 25 January 2000 18:41 |
Edward Jayaraj
Messages: 7 Registered: December 1999
|
Junior Member |
|
|
Folks,
I tried all the queries and I am kind of confused, totally. Can you pl., clear the following doubts:
On running this query,
SELECT SAL
FROM EMP A
WHERE 20 > (SELECT COUNT (*)
FROM EMP B
WHERE B.SAL > A.SAL)
ORDER BY 1 DESC
I am getting the first 20 highest values but if value 5000 is repeated in two row I am getting both the rows, that is not we are looking for?.
We are looking for the top 20 distinct values.
If the above given query is going to give the repetitive values then the query with a where CLAUSE on ROWNUM <= THE 20 SAME WOULD AN AND DO THING.
Pl., clear me on this, I might have misunderstood you explanation, any clarification is appreciated.
Thanks
|
|
|
Re: A SQL Challenge [message #370736 is a reply to message #370705] |
Wed, 26 January 2000 08:58 |
Paul
Messages: 164 Registered: April 1999
|
Senior Member |
|
|
Ben,
If these statements consistently give you the answer you require, then they are 'correct'. There are, however, cases where they will fail, as the SQL below illustrates. Granted, this scenario may never occur in your actual data - but it is not impossible in some situations.
SQL> select empno, sal, rownum from emp;
EMPNO SAL ROWNUM
---------- ---------- ----------
7369 800 1
7499 1600 2
7521 1250 3
7566 2975 4
7654 1250 5
7698 2850 6
7782 2450 7
7788 3000 8
7839 5000 9
7844 1500 10
7876 1100 11
7900 950 12
7902 3000 13
7934 1300 14
14 rows selected.
-- create a 7 way tie for second
SQL> update emp set sal = 4000 where empno > 7700 and sal 7 rows updated.
SQL> select empno, sal, rownum
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 and rowid not in ( select rowid from emp a
6 where 5-1 = ( select count (*) from emp b
7 where b.sal > a.sal)
8 and rowid not in (
9 select max(rowid)
10 from emp a
11 where 5-1 = ( select count (*) from emp b
12 where b.sal > a.sal)
13 group by sal))
14 order by 2 desc;
EMPNO SAL ROWNUM
---------- ---------- ----------
7839 5000 3
7782 4000 1
7844 4000 4
7900 4000 6
7934 4000 8
7902 4000 7
7876 4000 5
7788 4000 2
8 rows selected.
SQL> select empno, sal, rownum
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 and rowid not in ( select rowid from emp a
6 where 5-1 = ( select count (*) from emp b
7 where b.sal > a.sal)
8 and rowid not in (
9 select max(rowid)
10 from emp a
11 where 5-1 = ( select count (*) from emp b
12 where b.sal > a.sal)
13 group by sal))
14 order by 2 desc;
EMPNO SAL ROWNUM
---------- ---------- ----------
7839 5000 3
7782 4000 1
7844 4000 4
7900 4000 6
7934 4000 8
7902 4000 7
7876 4000 5
7788 4000 2
8 rows selected.
That said, there is another very important point to consider here. Put simply, there is no
single 'correct' way to write the logic for a 'Top n' report. So, when a user requests one, it
is very important to take the time to discover EXACTLY what THEY expect to see as output and to find out how THEY expect 'ties' to be handled. When you do this, the first thing you may discover is that they never considered the possibility of duplicate values, and thus have no clear idea of how they want them handled. You then, can point out various alternatives and allow them to select one of them.
My personal experience has been that the two logics most often desired are two of the easiest
to code - either 'show me the 5 highest DISTINCT salaries and a count of how many employees earn each of them' or 'show me a list of all employees who earn any of the 5 highest distinct salaries'. Third most requested has been 'show me the 5 highest paid employees, if there is a tie for 5th place, show me all the employees who earn this salary'. I'll admit that I've occasionally gotten the impression that they selected the second or third of these because they did not want to be the one to make the decision on how to handle ties. Other, less often requested logics have been 'show me the 5 titles with the highest average salaries, break a tie for 5th by selecting the title with the highest number of employees, if this fails, break the tie by selecting the title with the highest total overtime paid for the year, if this fails, show all still tied' and 'for each of the 5 highest disinct salaries, show me the employee or employees with the longest time at this salary'.
What all these logics have in common is that they clearly define the output (and therefore the user expectations). The other is that all except the first one, that returns no employee specific data, may return more than n rows. I cannot recall ever having anyone specify a report where ties were to be broken by random selection.
I also admit to having a personal issue with solutions that rely on factors that are or may be outside of my control. Both ROWNUM and ROWID can change when records are deleted, inserted and updated. Even solutions relying on indexes on fields other than PK and FK can suddenly stop
working and should be used, if at all, with extreme caution. Got burned on this last one once, when the DBA's decided to eliminate some 'superfluous' indexes. The key phrase in the above is 'rely on'; it's one thing to have a report that may become 'disorgranized' if an index is dropped, it is quite another to have one that will still 'look' right, but start returning an incorrect result if this happens.
Hope the above doesn't bore or offend you.
Regards,
Paul
|
|
|
|