Home » RDBMS Server » Server Administration » A SQL Challenge
A SQL Challenge [message #370699] |
Thu, 20 January 2000 13:11  |
Ben
Messages: 48 Registered: January 2000
|
Member |
|
|
I have a table with a 'number' type column which has thousands of rows. I want to write a query to retrieve the 20 largest (top 20) values on this column. No temporary table/view or PL/SQL object is allowed. Does anyone know how to do that?
|
|
|
|
|
|
|
|
|
|
Re: A SQL Challenge [message #370720 is a reply to message #370705] |
Sat, 22 January 2000 08:46   |
Paul
Messages: 164 Registered: April 1999
|
Senior Member |
|
|
Ben,
Sorry about the mixup with the signs. Just want you to be aware of a potential 'gotcha' when you attempt to limit a return set using ROWNUM
This example uses the SCOTT/TIGER EMP table, since EMP only has 14 record, I'm selecting
only the 'Top 5', but the principle is the same.
-- first, display empno, sal and rownum for all
-- records note that the records were not
-- inserted in salary order
SQL> select empno, sal, rownum from emp;
EMPNO SAL ROWNUM
---------- ---------- ----------
7369 800 1
7499 1600 2
7521 1250 3
7698 2850 4
7782 2450 5
7788 3000 6
7839 5000 7
7844 1500 8
7876 1100 9
7900 950 10
7902 3000 11
7934 1300 12
7566 2975 13
7654 1250 14
14 rows selected.
-- now create a 'tie' for 5th by upping one salary
SQL> update emp
2 set sal = 2850
3 where empno = 7499;
1 row updated.
-- now show the 5 highest paid employees
SQL> select empno, sal
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 order by sal desc;
EMPNO SAL
---------- ----------
7839 5000
7788 3000
7902 3000
7566 2975
7499 2850
7698 2850
6 rows selected.
-- only want 5, use rownum to limit output
SQL> select empno, sal
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 and rownum <= 5
6 order by sal desc;
EMPNO SAL
---------- ----------
7839 5000
7788 3000
7902 3000
7499 2850
7698 2850
-- WHOOPS! where did 7566's salary of 2975 go
-- let's see if we can find out
SQL> select empno, sal, rownum
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 order by sal desc;
EMPNO SAL ROWNUM
---------- ---------- ----------
7839 5000 4
7788 3000 3
7902 3000 5
7566 2975 6
7499 2850 1
7698 2850 2
6 rows selected.
Here's why:
When you select a subset of the records, the new
rownums are assigned by relative rownum in the table, since 7566 was number 13, and the others were all lower, it became number 6, not number 4 as you might have expected.
Hope this saves you some grief,
Paul
|
|
|
|
|
|
Re: A SQL Challenge [message #370736 is a reply to message #370705] |
Wed, 26 January 2000 08:58   |
Paul
Messages: 164 Registered: April 1999
|
Senior Member |
|
|
Ben,
If these statements consistently give you the answer you require, then they are 'correct'. There are, however, cases where they will fail, as the SQL below illustrates. Granted, this scenario may never occur in your actual data - but it is not impossible in some situations.
SQL> select empno, sal, rownum from emp;
EMPNO SAL ROWNUM
---------- ---------- ----------
7369 800 1
7499 1600 2
7521 1250 3
7566 2975 4
7654 1250 5
7698 2850 6
7782 2450 7
7788 3000 8
7839 5000 9
7844 1500 10
7876 1100 11
7900 950 12
7902 3000 13
7934 1300 14
14 rows selected.
-- create a 7 way tie for second
SQL> update emp set sal = 4000 where empno > 7700 and sal 7 rows updated.
SQL> select empno, sal, rownum
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 and rowid not in ( select rowid from emp a
6 where 5-1 = ( select count (*) from emp b
7 where b.sal > a.sal)
8 and rowid not in (
9 select max(rowid)
10 from emp a
11 where 5-1 = ( select count (*) from emp b
12 where b.sal > a.sal)
13 group by sal))
14 order by 2 desc;
EMPNO SAL ROWNUM
---------- ---------- ----------
7839 5000 3
7782 4000 1
7844 4000 4
7900 4000 6
7934 4000 8
7902 4000 7
7876 4000 5
7788 4000 2
8 rows selected.
SQL> select empno, sal, rownum
2 from emp a
3 where 5 > ( select count (*) from emp b
4 where b.sal > a.sal)
5 and rowid not in ( select rowid from emp a
6 where 5-1 = ( select count (*) from emp b
7 where b.sal > a.sal)
8 and rowid not in (
9 select max(rowid)
10 from emp a
11 where 5-1 = ( select count (*) from emp b
12 where b.sal > a.sal)
13 group by sal))
14 order by 2 desc;
EMPNO SAL ROWNUM
---------- ---------- ----------
7839 5000 3
7782 4000 1
7844 4000 4
7900 4000 6
7934 4000 8
7902 4000 7
7876 4000 5
7788 4000 2
8 rows selected.
That said, there is another very important point to consider here. Put simply, there is no
single 'correct' way to write the logic for a 'Top n' report. So, when a user requests one, it
is very important to take the time to discover EXACTLY what THEY expect to see as output and to find out how THEY expect 'ties' to be handled. When you do this, the first thing you may discover is that they never considered the possibility of duplicate values, and thus have no clear idea of how they want them handled. You then, can point out various alternatives and allow them to select one of them.
My personal experience has been that the two logics most often desired are two of the easiest
to code - either 'show me the 5 highest DISTINCT salaries and a count of how many employees earn each of them' or 'show me a list of all employees who earn any of the 5 highest distinct salaries'. Third most requested has been 'show me the 5 highest paid employees, if there is a tie for 5th place, show me all the employees who earn this salary'. I'll admit that I've occasionally gotten the impression that they selected the second or third of these because they did not want to be the one to make the decision on how to handle ties. Other, less often requested logics have been 'show me the 5 titles with the highest average salaries, break a tie for 5th by selecting the title with the highest number of employees, if this fails, break the tie by selecting the title with the highest total overtime paid for the year, if this fails, show all still tied' and 'for each of the 5 highest disinct salaries, show me the employee or employees with the longest time at this salary'.
What all these logics have in common is that they clearly define the output (and therefore the user expectations). The other is that all except the first one, that returns no employee specific data, may return more than n rows. I cannot recall ever having anyone specify a report where ties were to be broken by random selection.
I also admit to having a personal issue with solutions that rely on factors that are or may be outside of my control. Both ROWNUM and ROWID can change when records are deleted, inserted and updated. Even solutions relying on indexes on fields other than PK and FK can suddenly stop
working and should be used, if at all, with extreme caution. Got burned on this last one once, when the DBA's decided to eliminate some 'superfluous' indexes. The key phrase in the above is 'rely on'; it's one thing to have a report that may become 'disorgranized' if an index is dropped, it is quite another to have one that will still 'look' right, but start returning an incorrect result if this happens.
Hope the above doesn't bore or offend you.
Regards,
Paul
|
|
|
|
Goto Forum:
Current Time: Sat May 03 07:36:12 CDT 2025
|