Home » SQL & PL/SQL » SQL & PL/SQL » Remove string duplicates (11.2)
Remove string duplicates [message #598210] |
Fri, 11 October 2013 08:53 |
|
jj_white
Messages: 6 Registered: July 2013
|
Junior Member |
|
|
Hello,
I need help in removing duplicates from a string that contains years and "-". Example: 1988-1997-2000-2013-1998-1965-1997-1899
I know this can be done in regular expressions but have no experience in this subject.
select REGEXP_REPLACE(.....) from dual;
Thank you
|
|
|
|
|
|
Re: Remove string duplicates [message #598226 is a reply to message #598219] |
Fri, 11 October 2013 10:20 |
|
Michel Cadot
Messages: 68664 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
SQL> with
2 data as (select '-'||'1988-1997-2000-2013-1998-1965-1997-1899'||'-' val from dual),
3 years as (
4 select distinct
5 substr(val,
6 instr(val, '-', 1, level),
7 instr(val, '-', 1, level+1)-instr(val, '-', 1, level)) yr
8 from data
9 connect by level <= length(val)-length(replace(val,'-'))-1
10 )
11 select substr(listagg(yr) within group (order by yr),2) res
12 from years
13 /
RES
---------------------------------------------------------------------------------------------
1899-1965-1988-1997-1998-2000-2013
|
|
|
Re: Remove string duplicates [message #598227 is a reply to message #598226] |
Fri, 11 October 2013 10:23 |
|
Michel Cadot
Messages: 68664 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
Or if you really want to use regexp functions:
SQL> with
2 data as (select '-'||'1988-1997-2000-2013-1998-1965-1997-1899'||'-' val from dual),
3 years as (
4 select distinct '-'||regexp_substr(val, '\d+', 1, level) yr
5 from data
6 connect by level <= length(val)-length(replace(val,'-'))-1
7 )
8 select substr(listagg(yr) within group (order by yr),2) res
9 from years
10 /
RES
------------------------------------------------------------------------------------------
1899-1965-1988-1997-1998-2000-2013
[Updated on: Fri, 11 October 2013 10:24] Report message to a moderator
|
|
|
Re: Remove string duplicates [message #598241 is a reply to message #598211] |
Fri, 11 October 2013 12:22 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
Michel Cadot wrote on Fri, 11 October 2013 19:24
Quote:I know this can be done in regular expressions
No it can't be.
Why No? Assuming a no means not possible. If you have something else, please pardon me.
|
|
|
Re: Remove string duplicates [message #598244 is a reply to message #598241] |
Fri, 11 October 2013 12:46 |
pablolee
Messages: 2882 Registered: May 2007 Location: Scotland
|
Senior Member |
|
|
Lalit, your post doesn't really make sense. Michel has said that something cannot be done. You appear to be asking him to show that it cannot be done. You cannot prove a negative. If you disagree with his statement, the the onus would be on you to show that this is not the case. Can you show this?
|
|
|
Re: Remove string duplicates [message #598249 is a reply to message #598244] |
Fri, 11 October 2013 14:44 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
pablolee wrote on Fri, 11 October 2013 23:16Lalit, your post doesn't really make sense. Michel has said that something cannot be done. You appear to be asking him to show that it cannot be done. You cannot prove a negative. If you disagree with his statement, the the onus would be on you to show that this is not the case. Can you show this?
Sorry, you misunderstood my post. I was just awestruck when I saw NO from Michel, since I consider him just the BEST! At first instance I genuinely thought it could be done using REGEXP.
Of course, the onus is now on me, let me try level best with the example input given by OP-
SQL> SELECT '1988-1997-2000-2013-1998-1965-1997-1899' dup_str, REGEXP_REPLACE('1
988-1997-2000-2013-1998-1965-1997-1899','(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3') dup
_str_removed
2 FROM dual;
DUP_STR DUP_STR_REMOVED
--------------------------------------- ----------------------------------
1988-1997-2000-2013-1998-1965-1997-1899 1988-1997-2000-2013-1998-1965-1899
Above demo is without any iterations. For further requirements, using iteration techniques a lot of variations can be done.
Regards,
Lalit
|
|
|
Re: Remove string duplicates [message #598250 is a reply to message #598210] |
Fri, 11 October 2013 14:50 |
Solomon Yakobson
Messages: 3275 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
jj_white wrote on Fri, 11 October 2013 09:53I know this can be done in regular expressions
You will have to split, remove duplicates and aggregate (concatenate) again. Regular expressions will help you to identify how to split, but that's about it. You'll still need to use other techniques to do the split and aggregate. Michel showed you one way (keep in mind, it will need some adjustments if you want to apply it to multi-rows table). Another way is XML:
with t as (
select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual
)
select str,
newstr
from t,
xmltable(
'string-join(distinct-values(ora:tokenize($str,"-")),"-")'
passing '-' || t.str as "str"
columns newstr varchar2(2000) path '.'
) x
/
STR NEWSTR
--------------------------------------- -----------------------------------
1988-1997-2000-2013-1998-1965-1997-1899 1899-1965-1988-1997-1998-2000-2013
SCOTT@orcl >
SY.
|
|
|
Re: Remove string duplicates [message #598252 is a reply to message #598249] |
Fri, 11 October 2013 15:00 |
Solomon Yakobson
Messages: 3275 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Lalit Kumar B wrote on Fri, 11 October 2013 15:44Above demo is without any iterations.
Your solution "works" pretty-much just for that single case:
SCOTT@orcl > with t as (
2 select '1988-2000-2000-1998' str from dual
3 )
4 SELECT str,
5 REGEXP_REPLACE(str,'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3') newstr
6 FROM t
7 /
STR NEWSTR
------------------- -----------------------------------
1988-2000-2000-1998 1988-2000-1998
SCOTT@orcl > with t as (
2 select '2000-2000-2000' str from dual
3 )
4 SELECT str,
5 REGEXP_REPLACE(str,'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3') newstr
6 FROM t
7 /
STR NEWSTR
-------------- -----------------------------------
2000-2000-2000 2000-2000
SCOTT@orcl >
SY.
|
|
|
Re: Remove string duplicates [message #598254 is a reply to message #598252] |
Fri, 11 October 2013 15:25 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
Solomon Yakobson wrote on Sat, 12 October 2013 01:30
Your solution "works" pretty-much just for that single case:
Yes, as I said the logic used in the demo lacks iteration. It was a simple SQL just to suffice the example input given by OP.
Let's try to extend it a bit.
One easier way without a complex SQL would be a FOR LOOP(but it is of course not a better way)-
SQL> set serveroutput on;
SQL> DECLARE
2 dup_str VARCHAR2(100) := '2000-2000-2000-1000-1000-1000-9999-8888';
3 BEGIN
4 DBMS_Output.Put_Line('Input duplicate string is - '||dup_str);
5 FOR i IN 1..length(dup_str)
6 LOOP
7 dup_str := RegExp_Replace(dup_str,'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3');
8 END LOOP;
9 DBMS_Output.Put_Line('output without duplicate is - '||dup_str);
10 END;
11 /
Input duplicate string is - 2000-2000-2000-1000-1000-1000-9999-8888
output without duplicate is - 2000-1000-9999-8888
PL/SQL procedure successfully completed.
10g onwards, MODEL clause could be used. I really struggled with this, but this is what I managed -
SQL> WITH t AS
2 (SELECT '2000-2000-2000-1000-1000-1000-9999-8888' dup_str FROM dual
3 )
4 SELECT dup_str,
5 dup_str_removed
6 FROM t model dimension BY (0 dim)
7 measures(dup_str, dup_str dup_str_removed) rules iterate(50)
8 until (dup_str_removed[0] = previous(dup_str_removed[0]))
9 (dup_str_removed[0] = regexp_replace(dup_str_removed[0],'(^|\-)([^
-]+\-)(.*\-)?\2','\1\2\3'))
10 /
DUP_STR DUP_STR_REMOVED
--------------------------------------- ---------------------------------------
2000-2000-2000-1000-1000-1000-9999-8888 2000-1000-9999-8888
Regards,
Lalit
|
|
|
Re: Remove string duplicates [message #598263 is a reply to message #598250] |
Fri, 11 October 2013 21:50 |
Solomon Yakobson
Messages: 3275 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
[quote title=Lalit Kumar B wrote on Fri, 11 October 2013 16:00But, there seems to be a miss in the order of the elements after aggregation.[/quote]
XML distinct-values implicitly orders. If order is important:
with t as (
select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual union all
select '1998-2000-2000-1988-2000-1988' from dual union all
select '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' from dual
),
r as (
select str,
min(position) position,
element
from t,
xmltable(
'
for $i at $p in ora:tokenize($str,"-")
return element { "E" } { attribute id { $p },$i }'
passing str AS "str"
columns
position number path '@id',
element varchar2(20) path '.'
) x
group by str,
element
)
select str,
listagg(element,'-') within group(order by position) newstr
from r
group by str
/
STR NEWSTR
------------------------------------------------------------ -----------------------------------
1988-1997-2000-2013-1998-1965-1997-1899 1988-1997-2000-2013-1998-1965-1899
1998-2000-2000-1988-2000-1988 1998-2000-1988
2000-2000-2000-1000-1000-1000-9999-8888-2000-2000 2000-1000-9999-8888
SQL>
SY.
|
|
|
Re: Remove string duplicates [message #598264 is a reply to message #598254] |
Fri, 11 October 2013 21:55 |
Solomon Yakobson
Messages: 3275 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Lalit Kumar B wrote on Fri, 11 October 2013 16:25Let's try to extend it a bit.
Exactly same wrong results since your REGEXP didn't change:
set serveroutput on;
DECLARE
dup_str VARCHAR2(100) := '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000';
BEGIN
DBMS_Output.Put_Line('Input duplicate string is - '||dup_str);
FOR i IN 1..length(dup_str)
LOOP
dup_str := RegExp_Replace(dup_str,'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3');
END LOOP;
DBMS_Output.Put_Line('output without duplicate is - '||dup_str);
END;
/
Input duplicate string is - 2000-2000-2000-1000-1000-1000-9999-8888-2000-2000
output without duplicate is - 2000-1000-9999-8888-2000
PL/SQL procedure successfully completed.
SQL>
WITH t AS
(SELECT '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' dup_str FROM dual
)
SELECT dup_str,
dup_str_removed
FROM t model dimension BY (0 dim)
measures(dup_str, dup_str dup_str_removed) rules iterate(50)
until (dup_str_removed[0] = previous(dup_str_removed[0]))
(dup_str_removed[0] = regexp_replace(dup_str_removed[0],'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3'))
/
DUP_STR DUP_STR_REMOVED
------------------------------------------------- -------------------------
2000-2000-2000-1000-1000-1000-9999-8888-2000-2000 2000-1000-9999-8888-2000
SQL>
SY.
|
|
|
Re: Remove string duplicates [message #598298 is a reply to message #598264] |
Sat, 12 October 2013 07:59 |
Solomon Yakobson
Messages: 3275 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
If you want to use model and preserve the order:
with t as (
select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual union all
select '1998-2000-2000-1988-2000-1988' from dual union all
select '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' from dual
)
select str,
ltrim(newstr,'-') newstr
from t
model
partition by(str)
dimension by(0 d)
measures(
'-' || replace(str,'-','--') || '-' workarea,
cast(null as varchar2(4000)) element,
cast(null as varchar2(4000)) newstr
)
rules iterate(1e6) until(workarea[0] is null)
(
element[0] = substr(workarea[0],1,instr(workarea[0],'-',2)),
newstr[0] = newstr[0] || rtrim(element[0],'-'),
workarea[0] = replace(workarea[0],element[0])
)
/
STR NEWSTR
------------------------------------------------- ----------------------------------
2000-2000-2000-1000-1000-1000-9999-8888-2000-2000 2000-1000-9999-8888
1988-1997-2000-2013-1998-1965-1997-1899 1988-1997-2000-2013-1998-1965-1899
1998-2000-2000-1988-2000-1988 1998-2000-1988
SQL>
SY.
P.S. Forgot to mention. I partitioned by str, but in general it is not right if duplicate str are allowed. We should partition by some unique column or by rowid, or just by rownum:
SQL> with t as (
2 select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual union all
3 select '1998-2000-2000-1988-2000-1988' from dual union all
4 select '1998-2000-2000-1988-2000-1988' from dual union all
5 select '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' from dual
6 )
7 select str,
8 ltrim(newstr,'-') newstr
9 from t
10 model
11 partition by(str)
12 dimension by(0 d)
13 measures(
14 '-' || replace(str,'-','--') || '-' workarea,
15 cast(null as varchar2(4000)) element,
16 cast(null as varchar2(4000)) newstr
17 )
18 rules iterate(10) until(workarea[0] is null)
19 (
20 element[0] = substr(workarea[0],1,instr(workarea[0],'-',2)),
21 newstr[0] = newstr[0] || rtrim(element[0],'-'),
22 workarea[0] = replace(workarea[0],element[0])
23 )
24 /
ERROR:
ORA-32638: Non unique addressing in MODEL dimensions
no rows selected
SQL> with t as (
2 select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual union all
3 select '1998-2000-2000-1988-2000-1988' from dual union all
4 select '1998-2000-2000-1988-2000-1988' from dual union all
5 select '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' from dual
6 )
7 select str,
8 ltrim(newstr,'-') newstr
9 from t
10 model
11 partition by(rownum r)
12 dimension by(0 d)
13 measures(
14 str,
15 '-' || replace(str,'-','--') || '-' workarea,
16 cast(null as varchar2(4000)) element,
17 cast(null as varchar2(4000)) newstr
18 )
19 rules iterate(1e6) until(workarea[0] is null)
20 (
21 element[0] = substr(workarea[0],1,instr(workarea[0],'-',2)),
22 newstr[0] = newstr[0] || rtrim(element[0],'-'),
23 workarea[0] = replace(workarea[0],element[0])
24 )
25 /
STR NEWSTR
------------------------------------------------- ---------------------------------------
1988-1997-2000-2013-1998-1965-1997-1899 1988-1997-2000-2013-1998-1965-1899
1998-2000-2000-1988-2000-1988 1998-2000-1988
2000-2000-2000-1000-1000-1000-9999-8888-2000-2000 2000-1000-9999-8888
1998-2000-2000-1988-2000-1988 1998-2000-1988
SQL>
[Updated on: Sat, 12 October 2013 12:30] Report message to a moderator
|
|
|
Goto Forum:
Current Time: Tue Jun 25 00:27:47 CDT 2024
|