Re: SQL problem
Date: Tue, 6 Jan 2015 11:32:56 +0100
Message-ID: <CA+S=qd03JtPww7opOdbf8dc0=iWOeewz78GYR5dW0DucJjgafA_at_mail.gmail.com>
Hi, Thomas
Why are you not happy with your solution? I've been doing similar things and my solutions very much looks like your solution - I think your solution looks good.
Is it the performance of the query when executed on 20-100 million rows
that is the problem?
If so, what's the access plan for the query on the big table?
I'm guessing a couple full table scans, hash join, and some window sort,
window buffer and window nosort.
If so, then the problem may be a lot of work on TEMP as it might not be
able to hold it all in memory?
If so, then perhaps an idea could be to split the job into smaller jobs on intervals of k1, so that each sub-job the analytics can be performed all in memory?
But I'm just guessing ;-)
Let us know your actual reasons for not being happy with the solution, then
it is easier to help...
Regards
Kim Berg Hansen
http://dspsd.blogspot.com
kibeha_at_gmail.com
_at_kibeha
On Tue, Jan 6, 2015 at 10:58 AM, Thomas Ranchon <thomas_at_ranchon.org> wrote:
> Hi list,
>
> I'm trying to solve a SQL problem on a 11.2.0.4 DB, I've found a solution
> but I'm not very happy with it.
>
> Here is a simplified example of my problem.
>
> I have a table with the following description :
>
> CREATE TABLE etrn_tst(
> k1 NUMBER(2)
> , ens NUMBER(5)
> , start_date DATE
> , end_date DATE
> , priority NUMBER(2)
> , PRIMARY KEY(k1, ens, start_date));
>
> (In production the table hold between 20 million rows and 100 million rows)
>
> With some test data to insert into the table :
>
> INSERT
> INTO etrn_tst(k1, ens, start_date, end_date, priority)
> SELECT 1, 123, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('12/31/9999',
> 'MM/DD/YYYY'), 9 FROM dual
> UNION ALL
> SELECT 1, 456, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('06/01/2014',
> 'MM/DD/YYYY'), 7 FROM dual
> UNION ALL
> SELECT 1, 456, TO_DATE('08/01/2014', 'MM/DD/YYYY'), TO_DATE('01/01/2015',
> 'MM/DD/YYYY'), 7 FROM dual
> UNION ALL
> SELECT 1, 987, TO_DATE('04/01/2014', 'MM/DD/YYYY'), TO_DATE('09/01/2014',
> 'MM/DD/YYYY'), 4 FROM dual
> UNION ALL
> SELECT 1, 212, TO_DATE('05/01/2014', 'MM/DD/YYYY'), TO_DATE('07/01/2014',
> 'MM/DD/YYYY'), 1 FROM dual
> UNION ALL
> SELECT 2, 987, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('09/01/2014',
> 'MM/DD/YYYY'), 4 FROM dual
> UNION ALL
> SELECT 2, 212, TO_DATE('05/01/2014', 'MM/DD/YYYY'), TO_DATE('12/31/9999',
> 'MM/DD/YYYY'), 1 FROM dual;
>
> I need to transform the first table to insert it into a new table with the
> following description :
> CREATE TABLE etrn_tst_result(
> k1 NUMBER(2)
> , start_date DATE
> , end_date DATE
> , ens NUMBER(5)
> , PRIMARY KEY(k1, start_date));
>
> I need to build date period for k1, for each period find 'ens' with the
> minimum priority and if consecutive period have the same 'ens' then I have
> to merge them.
> Period should not overlap for a given k1.
> The problem would be quite easy if for a given k1 each ens had the same
> period but that's not the case.
>
> For my test data the result should be :
>
> k1 | start_date | end_date | ens
> 1 | 01/01/2014 | 03/31/2014 | 456
> 1 | 04/01/2014 | 04/30/2014 | 987
> 1 | 05/01/2014 | 07/01/2014 | 212
> 1 | 07/02/2014 | 09/01/2014 | 987
> 1 | 09/02/2014 | 01/01/2015 | 456
> 1 | 01/02/2015 | 12/31/9999 | 123
> 2 | 01/01/2014 | 04/30/2014 | 987
> 2 | 05/01/2014 | 12/31/9999 | 212
>
> Here is the query I used to find this result :
> INSERT INTO etrn_tst_result
> WITH
> tmp_perim_dt AS (
> SELECT k1
> , start_date
> , LEAD(start_date - 1, 1, TO_DATE('31/12/9999', 'DD/MM/YYYY'))
> OVER(PARTITION BY k1 ORDER BY start_date) AS end_date
> FROM (SELECT k1, start_date
> FROM etrn_tst
> UNION
> SELECT k1, end_date + 1
> FROM etrn_tst
> WHERE end_date != TO_DATE('31/12/9999', 'DD/MM/YYYY')))
> SELECT k1
> , start_date
> , end_date
> , ens
> FROM (SELECT k1
> , MIN(start_date) AS start_date
> , MAX(end_date) AS end_date
> , ens
> , grp
> FROM (SELECT k1
> , start_date
> , end_date
> , ens
> , RANK()
> OVER(PARTITION BY k1 ORDER BY start_date ASC)
> - SUM(ind_same)
> OVER(PARTITION BY k1 ORDER BY start_date ASC
> ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT
> ROW) AS grp
> FROM (SELECT k1
> , start_date
> , end_date
> , ens
> , DECODE(DECODE(LAG(ens, 1) OVER(PARTITION BY
> k1 ORDER BY start_date), ens, 0, 1) +
> DECODE(LAG(end_date, 1)
> OVER(PARTITION BY k1 ORDER BY start_date), start_date - 1, 0, 1), 0, 1, 0)
> AS ind_same
> FROM (SELECT x.k1
> , x.start_date
> , x.end_date
> , MIN(y.ens)
> KEEP(DENSE_RANK FIRST ORDER BY
> y.priority ASC) AS ens
> FROM tmp_perim_dt x
> INNER JOIN etrn_tst y ON y.k1 = x.k1
> AND
> y.start_date <= x.start_date
> AND y.end_date
> >= x.start_date
> GROUP
> BY x.k1
> , x.start_date
> , x.end_date)))
> GROUP
> BY k1
> , ens
> , grp)
> ORDER
> BY k1
> , start_date;
>
> I'm trying to find a better solution to my problem.
>
> Thanks,
> Thomas
>
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Jan 06 2015 - 11:32:56 CET