Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: Find an unprintable character inside a column....
perl -pe "s/[[:cntrl:]]//g" < file_with_ctrl_chars >
file_without_control_chars
"Robson, Peter" <pgro_at_bgs.ac.uk>
Sent by: ml-errors_at_fatcity.com
10/13/2003 03:49 AM
Please respond to ORACLE-L
To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com> cc: Subject: RE: Find an unprintable character inside a column....
Just a brief foot-note to this discussion.
The reason I selected the data, rather than attempted an automatic
correction, was that sometimes two words would be separated by ascii(10)
(forcing a line throw), and sometimes the corruption would appear as an
additional spurious character in a line. In the former case, one wants to
replace ascii(10) with a space (ascii 32), but in the latter case one
simply
wants to remove the corrupt data character.
OK, yes, I could do it with decode, and I'm sure Jared will have the appropriate Perl script out before I finish typing this.... But but but...
peter
edinburgh
> -----Original Message-----
> From: Stephane Faroult [mailto:sfaroult_at_oriole.com]
> Sent: Friday, October 10, 2003 10:54 PM
> To: Multiple recipients of list ORACLE-L
> Subject: Re: Find an unprintable character inside a column....
>
>
> Jared.Still_at_radisys.com wrote:
> >
> > I played with this a bit.
> >
> > First, I created some test data with one column corrupted with a
> > single random character
> > of 0-31 replacing a random char in that column 20% of the
> rows of the
> > table.
> >
> > Peter's function correctly found all of the rows in 7.5 seconds.
> >
> > Stephane's function ran in 3.5 seconds, but didn't find any of
> > the rows. I didn't attempt to correct the code.
> >
> > Then I tried a function based on owa_pattern.regex. My initial
> > attempts
> > didn't return the correct rows, as the regex pattern needed some
> > tuning.
> >
> > I didn't attempt to fix it, as it was woefully slow, about
> 30 seconds.
> >
> > Regex in the WHERE clause in 10g will be nice.
> >
> > Jared
> >
> > "Stephane Faroult"
> > <sfaroult_at_oriolecorp.com> To: Multiple
> > Sent by: ml-errors_at_fatcity.com recipients of list ORACLE-L
> > <ORACLE-L_at_fatcity.com>
> > 10/10/2003 07:09 AM cc:
> > Please respond to ORACLE-L Subject: RE: RE:
> > RE: Find an unprintable
> character
> > inside a column....
> >
> > >Some people have requested this code, so I thought
> > >you might as well all
> > >have the chance to pick it to bits... Its a
> > >function called BAD_ASCII, and
> > >it hunts out for any ascii characters with an ascii
> > >value of less than 32 in
> > >a specified field. (Acknowledgments to my colleague
> > >Keith Holmes for help
> > >with this code.)
> > >
> > >Use it as follows:
> > >
> > >Where a field called DATA in a table TABLE_1 may
> > >contain an ascci character
> > >with a value less than 32 (ie a non-printing
> > >character), the following SQL
> > >will find the row in question:
> > >
> > >select rowid,DATA,dump(DATA) from TABLE_1
> > >where BAD_ASCII(DATA) > 0;
> > >
> > >You could use the PK of the table instead of rowid,
> > >of course. You will also
> > >note that I select the DATA field in both normal
> > >and ascii 'dump' mode, the
> > >better to locate where the corruption is located.
> > >
> > >peter
> > >edinburgh
> > >...................................
> > >
> > >Source as follows:
> > >
> > >
> > >Function BAD_ASCII
> > > (V_Text in char)
> > > return number
> > >is
> > > V_Int number;
> > > V_Count number;
> > >begin
> > >--
> > >V_Int := 0;
> > >V_Count := 1;
> > >while V_Count<=length(rtrim(V_Text)) and V_Int=0
> > > loop
> > > if ascii(substr(V_Text, V_Count, 1))<32 then
> > > V_Int := V_Count;
> > > end if;
> > > V_Count := V_Count + 1;
> > >end loop;
> > >return V_Int;
> > >--
> > >exception
> > > when others then
> > > return -1;
> > >end BAD_ASCII;
> > >/
> > >
> >
> > Peter,
> >
> > I think that you can make this code 25% faster when the data is
> > clean (which hopefully is the general case) by using 'replace', more
> > efficient than a PL/SQL loop, to check whether you have some rubbish
> > (sort of). It will not tell you where the bad character is,
> however -
> > which means that then you can loop to look for it.
> >
> > Here is what I would suggest :
> >
> > create or replace Function BAD_ASCII (V_Text in char)
> > return number
> > is
> > V_Int number;
> > V_Count number;
> > begin
> > if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)||
> > chr(4)||chr(5)||chr(6)||chr(7)||
> > chr(8)||chr(9)||chr(10)||chr(11)||
> > chr(12)||chr(13)||chr(14)||chr(15)||
> > chr(16)||chr(17)||chr(18)||chr(19)||
> > chr(20)||chr(21)||chr(22)||chr(23)||
> > chr(24)||chr(25)||chr(26)||chr(27)||
> > chr(28)||chr(29)||chr(30)||chr(31),
> > '--------------------------------')
> > = V_text)
> > then
> > return 0;
> > else
> > V_Int := 0;
> > V_Count := 1;
> > while V_Count<=length(rtrim(V_Text)) and V_Int=0
> > loop
> > if ascii(substr(V_Text, V_Count, 1))<32 then
> > V_Int := V_Count;
> > end if;
> > V_Count := V_Count + 1;
> > end loop;
> > return V_Int;
> > end if;
> > --
> > exception
> > when others then
> > return -1;
> > end BAD_ASCII;
> > /
>
>
> Jared, you're the scourge of people who just write things out
> of the top
> of their head and don't test them thoroughly :-). I had made my usual
> mistake of using REPLACE instead of TRANSLATE. Just tried it with
> 'regular' data, since this is the only case where it can be
> faster that
> Peter's routine.
> Works like Peter's routine with TRANSLATE, only somewhat faster.
>
>
> Ooops again.
>
> --
> Regards,
>
> Stephane Faroult
> Oriole Software
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.net
> --
> Author: Stephane Faroult
> INET: sfaroult_at_oriole.com
>
> Fat City Network Services -- 858-538-5051 http://www.fatcity.com
> San Diego, California -- Mailing list and web hosting services
> ---------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from). You may
> also send the HELP command for other information (like subscribing).
>
BGS. . http://www.bgs.ac.uk *********************************************************************
-- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Robson, Peter INET: pgro_at_bgs.ac.uk Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services --------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: INET: Jared.Still_at_radisys.com Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services --------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).Received on Mon Oct 13 2003 - 13:14:33 CDT