Home » SQL & PL/SQL » SQL & PL/SQL » How to verify if string contains unwanted character (Oracle Db 11g and later)
How to verify if string contains unwanted character [message #665448] |
Wed, 06 September 2017 05:08  |
ator
Messages: 43 Registered: March 2009
|
Member |
|
|
Hi,
before writing here i've been looking all over but i couldn't find how to resolve my problem.
I have to verify if in a give string there are character outside a given list.
I tried to user regular expression but it doesn't seem to work as follows.
regexp_like('test letter ã','[^a-zA-Z0-9\.,;:''\+\-\(\)\?\*\[\]\{\}\\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ]')
In this case it should return false because the letter ã is not in the given list and i can't do the opposite match because i don't know which character they would insert.
I hope you can help me with this problem, otherwise i had to make a list of unwanted characters and match it with it.
Thank you in advance,
Stefano
|
|
|
|
|
|
|
Re: How to verify if string contains unwanted character [message #665587 is a reply to message #665456] |
Mon, 11 September 2017 10:29   |
ator
Messages: 43 Registered: March 2009
|
Member |
|
|
Hi,
sorry for the late reply, i wrote in this secton because is a pl/sql function of oracle 11g and if i got the wrong section i apoligize.
Quote:
Hi,
Just a few remarks:
The regular expression pattern does not include a space (" "), so it always matching.
Additionally, escaping inside a bracket expression is not taking any effect. On the other hand, the characters "-" and "]" have to be put into special place.
See details here: https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended
So, in the end you may end with mask like this (I have not included the "ã" character , as it was not in the original one):
'[^] a-zA-Z0-9.,;:''+()?*[{}\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ-]'
(hopefully it recognizes all included characters)
I tried with the string but i did not succeded, so i'll do a translate and keep updating the character needed to be replaced and the error will occur.
If I find a better solution i will let you know.
Thanks for your help.
Stefano
|
|
|
Re: How to verify if string contains unwanted character [message #665738 is a reply to message #665587] |
Tue, 19 September 2017 05:08  |
 |
quirks
Messages: 85 Registered: October 2014
|
Member |
|
|
Hello @ator:
I thought, that @flyboy s statement should be working, but it did not (as you figured out by yourself).
So I played a little bit around and figured that ...
SELECT CASE WHEN REGEXP_LIKE('ã', '[a-zA-Z]') THEN 'true' ELSE 'false' END STR_MATCHES
FROM DUAL;
... 'ã' is part of the characters defined by '[a-z]'.
As it shows, as long as you are using a ASCII character set in your database the '[a-z]' works as expected:
WITH
CHARACTERS
AS
(SELECT TO_NUMBER(TO_CHAR(LEVEL - 1)) AS DEC_NUM
FROM DUAL
CONNECT BY LEVEL <= POWER(2, 7))
SELECT DEC_NUM
,CASE WHEN DEC_NUM < 128 THEN CHR(DEC_NUM) ELSE NULL END ASCII_CHAR
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:graph:]]') THEN 'x' END IS_GRAPH
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:blank:]]') THEN 'x' END IS_BLANK
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:alnum:]]') THEN 'x' END IS_ALNUM
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:alpha:]]') THEN 'x' END IS_ALPHA
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[a-z-Z]') THEN 'x' END A_Z_TEST
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:digit:]]') THEN 'x' END IS_DIGIT
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:cntrl:]]') THEN 'x' END IS_CNTRL
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:lower:]]') THEN 'x' END IS_LOWER
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:upper:]]') THEN 'x' END IS_UPPER
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:print:]]') THEN 'x' END IS_PRINT
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:punct:]]') THEN 'x' END IS_PUNCT
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:space:]]') THEN 'x' END IS_SPACE
,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:xdigit:]]') THEN 'x' END IS_XDIGIT
FROM CHARACTERS
But when your character set is bigger (as it is in nearly all oracle installations) 'ã' is part of the [a_z] range.
WITH
CHARACTERS
AS
(SELECT TO_NUMBER(TO_CHAR(LEVEL - 1)) AS DEC_NUM
,REGEXP_REPLACE(REGEXP_REPLACE(TO_CHAR(LEVEL - 1, 'XXXX'), '^ ', '\'), ' ', '0') AS HEX_NUM
FROM DUAL
CONNECT BY LEVEL <= POWER(2, 16))
SELECT DEC_NUM
,CASE WHEN DEC_NUM < 128 THEN CHR(DEC_NUM) ELSE NULL END ASCII_CHAR
,HEX_NUM
,UNISTR(HEX_NUM) UTF8_CHAR
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:graph:]]') THEN 'x' END IS_GRAPH
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:blank:]]') THEN 'x' END IS_BLANK
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:alnum:]]') THEN 'x' END IS_ALNUM
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:alpha:]]') THEN 'x' END IS_ALPHA
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[a-z-Z]') THEN 'x' END A_Z_TEST
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:digit:]]') THEN 'x' END IS_DIGIT
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:cntrl:]]') THEN 'x' END IS_CNTRL
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:lower:]]') THEN 'x' END IS_LOWER
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:upper:]]') THEN 'x' END IS_UPPER
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:print:]]') THEN 'x' END IS_PRINT
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:punct:]]') THEN 'x' END IS_PUNCT
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:space:]]') THEN 'x' END IS_SPACE
,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:xdigit:]]') THEN 'x' END IS_XDIGIT
FROM CHARACTERS
WHERE
UNISTR(HEX_NUM) = 'ã' AND
(
DECODE(UNISTR(HEX_NUM), REGEXP_SUBSTR(UNISTR(HEX_NUM), '[a-z-Z]'), 1) = 1 OR
DECODE(UNISTR(HEX_NUM), REGEXP_SUBSTR(UNISTR(HEX_NUM), '[[:alpha:]]'), 1) = 1
);
You can find out which character set you are using with this statement:
SELECT *
FROM NLS_DATABASE_PARAMETERS
WHERE PARAMETER LIKE 'NLS_%CHARACTERSET';
'NLS_CHARACTERSET' is used in CHAR and VARCHAR columns.
'NLS_NCHAR_CHARACTERSET' in NCHAR and NVARCHAR columns (which can store the full Unicode character set).
In conclusion you can not use [a-z] but must enlist each character by itself:
SELECT CASE WHEN REGEXP_LIKE('ã', '[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]') THEN 'true' ELSE 'false' END STR_MATCHES
FROM DUAL;
To bring @flyboy s statement to life you just need to replace the range definitions by their individual values:
SELECT CASE
WHEN REGEXP_LIKE(
'test letter ã'
,'[^] abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0-9.,;:''+()?*[{}\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ-]'
)
THEN
'true'
ELSE
'false'
END
STR_MATCHES
FROM DUAL;
|
|
|
Goto Forum:
Current Time: Sun Apr 27 22:06:44 CDT 2025
|