OraFAQ Forum: SQL & PL/SQL » Remove string duplicates

Home » SQL & PL/SQL » SQL & PL/SQL » Remove string duplicates (11.2)

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

Remove string duplicates [message #598210]

Fri, 11 October 2013 08:53

jj_white
Messages: 6
Registered: July 2013

Junior Member

Hello,
I need help in removing duplicates from a string that contains years and "-". Example: 1988-1997-2000-2013-1998-1965-1997-1899
I know this can be done in regular expressions but have no experience in this subject.

select REGEXP_REPLACE(.....) from dual;

Thank you

Report message to a moderator

Re: Remove string duplicates [message #598211 is a reply to message #598210]

Fri, 11 October 2013 08:54

Michel Cadot
Messages: 68664
Registered: March 2007
Location: Nanterre, France, http://...

Senior Member
Account Moderator

Quote:

I know this can be done in regular expressions

No it can't be.

Report message to a moderator

Re: Remove string duplicates [message #598219 is a reply to message #598210]

Fri, 11 October 2013 09:14

jj_white
Messages: 6
Registered: July 2013

Junior Member

Can you please help me ?

Report message to a moderator

Re: Remove string duplicates [message #598223 is a reply to message #598219]

Fri, 11 October 2013 10:07

BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal

Senior Member

https://forums.oracle.com/thread/2591690

Report message to a moderator

Re: Remove string duplicates [message #598226 is a reply to message #598219]

Fri, 11 October 2013 10:20

Michel Cadot
Messages: 68664
Registered: March 2007
Location: Nanterre, France, http://...

Senior Member
Account Moderator

SQL> with
  2    data as (select '-'||'1988-1997-2000-2013-1998-1965-1997-1899'||'-' val from dual),
  3    years as (
  4      select distinct
  5             substr(val,
  6                    instr(val, '-', 1, level),
  7                    instr(val, '-', 1, level+1)-instr(val, '-', 1, level)) yr
  8      from data
  9      connect by level <= length(val)-length(replace(val,'-'))-1
 10    )
 11  select substr(listagg(yr) within group (order by yr),2) res
 12  from years
 13  /
RES
---------------------------------------------------------------------------------------------
1899-1965-1988-1997-1998-2000-2013

Report message to a moderator

Re: Remove string duplicates [message #598227 is a reply to message #598226]

Fri, 11 October 2013 10:23

Michel Cadot
Messages: 68664
Registered: March 2007
Location: Nanterre, France, http://...

Senior Member
Account Moderator

Or if you really want to use regexp functions:

SQL> with
  2    data as (select '-'||'1988-1997-2000-2013-1998-1965-1997-1899'||'-' val from dual),
  3    years as (
  4      select distinct '-'||regexp_substr(val, '\d+', 1, level) yr
  5      from data
  6      connect by level <= length(val)-length(replace(val,'-'))-1
  7    )
  8  select substr(listagg(yr) within group (order by yr),2) res
  9  from years
 10  /
RES
------------------------------------------------------------------------------------------
1899-1965-1988-1997-1998-2000-2013

[Updated on: Fri, 11 October 2013 10:24]

Report message to a moderator

Re: Remove string duplicates [message #598241 is a reply to message #598211]

Fri, 11 October 2013 12:22

Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web

Senior Member

Michel Cadot wrote on Fri, 11 October 2013 19:24

Quote:
I know this can be done in regular expressions

No it can't be.

Why No? Assuming a no means not possible. If you have something else, please pardon me.

Report message to a moderator

Re: Remove string duplicates [message #598244 is a reply to message #598241]

Fri, 11 October 2013 12:46

pablolee
Messages: 2882
Registered: May 2007
Location: Scotland

Senior Member

Lalit, your post doesn't really make sense. Michel has said that something cannot be done. You appear to be asking him to show that it cannot be done. You cannot prove a negative. If you disagree with his statement, the the onus would be on you to show that this is not the case. Can you show this?

Report message to a moderator

Re: Remove string duplicates [message #598249 is a reply to message #598244]

Fri, 11 October 2013 14:44

Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web

Senior Member

pablolee wrote on Fri, 11 October 2013 23:16

Lalit, your post doesn't really make sense. Michel has said that something cannot be done. You appear to be asking him to show that it cannot be done. You cannot prove a negative. If you disagree with his statement, the the onus would be on you to show that this is not the case. Can you show this?

Sorry, you misunderstood my post. I was just awestruck when I saw NO from Michel, since I consider him just the BEST! At first instance I genuinely thought it could be done using REGEXP.

Of course, the onus is now on me, let me try level best with the example input given by OP-

SQL> SELECT '1988-1997-2000-2013-1998-1965-1997-1899' dup_str, REGEXP_REPLACE('1
988-1997-2000-2013-1998-1965-1997-1899','(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3') dup
_str_removed
  2  FROM dual;

DUP_STR                                 DUP_STR_REMOVED
--------------------------------------- ----------------------------------
1988-1997-2000-2013-1998-1965-1997-1899 1988-1997-2000-2013-1998-1965-1899

Above demo is without any iterations. For further requirements, using iteration techniques a lot of variations can be done.

Regards,
Lalit

Report message to a moderator

Re: Remove string duplicates [message #598250 is a reply to message #598210]

Fri, 11 October 2013 14:50

Solomon Yakobson
Messages: 3275
Registered: January 2010
Location: Connecticut, USA

Senior Member

jj_white wrote on Fri, 11 October 2013 09:53

I know this can be done in regular expressions

You will have to split, remove duplicates and aggregate (concatenate) again. Regular expressions will help you to identify how to split, but that's about it. You'll still need to use other techniques to do the split and aggregate. Michel showed you one way (keep in mind, it will need some adjustments if you want to apply it to multi-rows table). Another way is XML:

with t as (
           select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual
          )
select  str,
        newstr
  from  t,
        xmltable(
                 'string-join(distinct-values(ora:tokenize($str,"-")),"-")'
                 passing '-' || t.str as "str"
                 columns newstr varchar2(2000) path '.'
                ) x
/

STR                                     NEWSTR
--------------------------------------- -----------------------------------
1988-1997-2000-2013-1998-1965-1997-1899 1899-1965-1988-1997-1998-2000-2013

SCOTT@orcl >

SY.

Report message to a moderator

Re: Remove string duplicates [message #598252 is a reply to message #598249]

Fri, 11 October 2013 15:00

Solomon Yakobson
Messages: 3275
Registered: January 2010
Location: Connecticut, USA

Senior Member

Lalit Kumar B wrote on Fri, 11 October 2013 15:44

Above demo is without any iterations.

Your solution "works" pretty-much just for that single case:

SCOTT@orcl > with t as (
  2             select '1988-2000-2000-1998' str from dual
  3            )
  4  SELECT  str,
  5          REGEXP_REPLACE(str,'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3') newstr
  6    FROM  t
  7  /

STR                 NEWSTR
------------------- -----------------------------------
1988-2000-2000-1998 1988-2000-1998

SCOTT@orcl > with t as (
  2             select '2000-2000-2000' str from dual
  3            )
  4  SELECT  str,
  5          REGEXP_REPLACE(str,'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3') newstr
  6    FROM  t
  7  /

STR            NEWSTR
-------------- -----------------------------------
2000-2000-2000 2000-2000

SCOTT@orcl >

SY.

Report message to a moderator

Re: Remove string duplicates [message #598254 is a reply to message #598252]

Fri, 11 October 2013 15:25

Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web

Senior Member

Solomon Yakobson wrote on Sat, 12 October 2013 01:30

Your solution "works" pretty-much just for that single case:

Yes, as I said the logic used in the demo lacks iteration. It was a simple SQL just to suffice the example input given by OP.

Let's try to extend it a bit.

One easier way without a complex SQL would be a FOR LOOP(but it is of course not a better way)-

SQL> set serveroutput on;
SQL> DECLARE
  2    dup_str VARCHAR2(100) := '2000-2000-2000-1000-1000-1000-9999-8888';
  3  BEGIN
  4    DBMS_Output.Put_Line('Input duplicate string is - '||dup_str);
  5    FOR i IN 1..length(dup_str)
  6    LOOP
  7      dup_str := RegExp_Replace(dup_str,'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3');

  8    END LOOP;
  9    DBMS_Output.Put_Line('output without duplicate is - '||dup_str);
 10  END;
 11  /
Input duplicate string is - 2000-2000-2000-1000-1000-1000-9999-8888
output without duplicate is - 2000-1000-9999-8888

PL/SQL procedure successfully completed.

10g onwards, MODEL clause could be used. I really struggled with this, but this is what I managed -

SQL> WITH t AS
  2    (SELECT '2000-2000-2000-1000-1000-1000-9999-8888' dup_str FROM dual
  3    )
  4  SELECT dup_str,
  5    dup_str_removed
  6  FROM t model dimension BY (0 dim)
  7     measures(dup_str, dup_str dup_str_removed) rules iterate(50)
  8     until (dup_str_removed[0] = previous(dup_str_removed[0]))
  9           (dup_str_removed[0] = regexp_replace(dup_str_removed[0],'(^|\-)([^
-]+\-)(.*\-)?\2','\1\2\3'))
 10  /

DUP_STR                                 DUP_STR_REMOVED
--------------------------------------- ---------------------------------------
2000-2000-2000-1000-1000-1000-9999-8888 2000-1000-9999-8888

Regards,
Lalit

Report message to a moderator

Re: Remove string duplicates [message #598263 is a reply to message #598250]

Fri, 11 October 2013 21:50

Solomon Yakobson
Messages: 3275
Registered: January 2010
Location: Connecticut, USA

Senior Member

[quote title=Lalit Kumar B wrote on Fri, 11 October 2013 16:00But, there seems to be a miss in the order of the elements after aggregation.[/quote]

XML distinct-values implicitly orders. If order is important:

with t as (
           select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual union all
           select '1998-2000-2000-1988-2000-1988' from dual union all
           select '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' from dual
          ),
     r as (
           select  str,
                   min(position) position,
                   element
             from  t,
                   xmltable(
                            '
                              for $i at $p in ora:tokenize($str,"-")
                              return element { "E" } { attribute id { $p },$i }'
                            passing str AS "str"
                            columns
                              position number path '@id',
                              element varchar2(20) path '.'
                           ) x
             group by str,
                      element
          )
select  str,
        listagg(element,'-') within group(order by position) newstr
  from  r
  group by str
/

STR                                                          NEWSTR
------------------------------------------------------------ -----------------------------------
1988-1997-2000-2013-1998-1965-1997-1899                      1988-1997-2000-2013-1998-1965-1899
1998-2000-2000-1988-2000-1988                                1998-2000-1988
2000-2000-2000-1000-1000-1000-9999-8888-2000-2000            2000-1000-9999-8888

SQL>

SY.

Report message to a moderator

Re: Remove string duplicates [message #598264 is a reply to message #598254]

Fri, 11 October 2013 21:55

Solomon Yakobson
Messages: 3275
Registered: January 2010
Location: Connecticut, USA

Senior Member

Lalit Kumar B wrote on Fri, 11 October 2013 16:25

Let's try to extend it a bit.

Exactly same wrong results since your REGEXP didn't change:

set serveroutput on;
DECLARE
  dup_str VARCHAR2(100) := '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000';
BEGIN
  DBMS_Output.Put_Line('Input duplicate string is - '||dup_str);
  FOR i IN 1..length(dup_str)
  LOOP
    dup_str := RegExp_Replace(dup_str,'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3');
  END LOOP;
  DBMS_Output.Put_Line('output without duplicate is - '||dup_str);
END;
/
Input duplicate string is - 2000-2000-2000-1000-1000-1000-9999-8888-2000-2000
output without duplicate is - 2000-1000-9999-8888-2000

PL/SQL procedure successfully completed.

SQL>

WITH t AS
  (SELECT '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' dup_str FROM dual
  )
SELECT dup_str,
  dup_str_removed
FROM t model dimension BY (0 dim)
   measures(dup_str, dup_str dup_str_removed) rules iterate(50)
   until (dup_str_removed[0] = previous(dup_str_removed[0]))
         (dup_str_removed[0] = regexp_replace(dup_str_removed[0],'(^|\-)([^-]+\-)(.*\-)?\2','\1\2\3'))
/

DUP_STR                                           DUP_STR_REMOVED
------------------------------------------------- -------------------------
2000-2000-2000-1000-1000-1000-9999-8888-2000-2000 2000-1000-9999-8888-2000

SQL>

SY.

Report message to a moderator

Re: Remove string duplicates [message #598298 is a reply to message #598264]

Sat, 12 October 2013 07:59

Solomon Yakobson
Messages: 3275
Registered: January 2010
Location: Connecticut, USA

Senior Member

If you want to use model and preserve the order:

with t as (
           select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual union all
           select '1998-2000-2000-1988-2000-1988' from dual union all
           select '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' from dual
          )
select  str,
        ltrim(newstr,'-') newstr
  from  t
  model
    partition by(str)
    dimension by(0 d)
    measures(
             '-' || replace(str,'-','--') || '-' workarea,
             cast(null as varchar2(4000)) element,
             cast(null as varchar2(4000)) newstr
            )
    rules iterate(1e6) until(workarea[0] is null)
      (
       element[0]  = substr(workarea[0],1,instr(workarea[0],'-',2)),
       newstr[0]   = newstr[0] || rtrim(element[0],'-'),
       workarea[0] = replace(workarea[0],element[0])
      )
/

STR                                               NEWSTR
------------------------------------------------- ----------------------------------
2000-2000-2000-1000-1000-1000-9999-8888-2000-2000 2000-1000-9999-8888
1988-1997-2000-2013-1998-1965-1997-1899           1988-1997-2000-2013-1998-1965-1899
1998-2000-2000-1988-2000-1988                     1998-2000-1988

SQL>

SY.
P.S. Forgot to mention. I partitioned by str, but in general it is not right if duplicate str are allowed. We should partition by some unique column or by rowid, or just by rownum:

SQL> with t as (
  2             select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual union all
  3             select '1998-2000-2000-1988-2000-1988' from dual union all
  4             select '1998-2000-2000-1988-2000-1988' from dual union all
  5             select '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' from dual
  6            )
  7  select  str,
  8          ltrim(newstr,'-') newstr
  9    from  t
 10    model
 11      partition by(str)
 12      dimension by(0 d)
 13      measures(
 14               '-' || replace(str,'-','--') || '-' workarea,
 15               cast(null as varchar2(4000)) element,
 16               cast(null as varchar2(4000)) newstr
 17              )
 18      rules iterate(10) until(workarea[0] is null)
 19        (
 20         element[0]  = substr(workarea[0],1,instr(workarea[0],'-',2)),
 21         newstr[0]   = newstr[0] || rtrim(element[0],'-'),
 22         workarea[0] = replace(workarea[0],element[0])
 23        )
 24  /
ERROR:
ORA-32638: Non unique addressing in MODEL dimensions



no rows selected

SQL> with t as (
  2             select '1988-1997-2000-2013-1998-1965-1997-1899' str from dual union all
  3             select '1998-2000-2000-1988-2000-1988' from dual union all
  4             select '1998-2000-2000-1988-2000-1988' from dual union all
  5             select '2000-2000-2000-1000-1000-1000-9999-8888-2000-2000' from dual
  6            )
  7  select  str,
  8          ltrim(newstr,'-') newstr
  9    from  t
 10    model
 11      partition by(rownum r)
 12      dimension by(0 d)
 13      measures(
 14               str,
 15               '-' || replace(str,'-','--') || '-' workarea,
 16               cast(null as varchar2(4000)) element,
 17               cast(null as varchar2(4000)) newstr
 18              )
 19      rules iterate(1e6) until(workarea[0] is null)
 20        (
 21         element[0]  = substr(workarea[0],1,instr(workarea[0],'-',2)),
 22         newstr[0]   = newstr[0] || rtrim(element[0],'-'),
 23         workarea[0] = replace(workarea[0],element[0])
 24        )
 25  /

STR                                               NEWSTR
------------------------------------------------- ---------------------------------------
1988-1997-2000-2013-1998-1965-1997-1899           1988-1997-2000-2013-1998-1965-1899
1998-2000-2000-1988-2000-1988                     1998-2000-1988
2000-2000-2000-1000-1000-1000-9999-8888-2000-2000 2000-1000-9999-8888
1998-2000-2000-1988-2000-1988                     1998-2000-1988

SQL>

[Updated on: Sat, 12 October 2013 12:30]

Report message to a moderator

Previous Topic:	while running a Stored procedure getting error
Next Topic:	Why the second query is more efficient?

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Jun 25 00:27:47 CDT 2024