Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> Finding illegal UTF8 sequences
Is anyone experienced with finding illegal UTF8 sequences and doing
something about them?
We have a UTF8 database containing Japanese data. One of the customers appears to have random malformed data; when the data is displayed it's displayed as random characters rather than Kanji characters.
Using the dump() function I've found sequences where there appears to be, say, a valid trail byte with no associated lead byte. I've found a valid three-character lead byte with no associated trail byte, and so on and so on.
At least, I think that's what I've found.=20
At this point I'm still in a bit of learning mode here and am still trying to figure out what I'm looking at and what I'm going to do.
This problem is isolated to one customer and may be the result of a data import that was done some time ago.
So, does anyone know of any utilities that can find and print out illegal UTF8 sequences? Or am I going to have to hire someone to do it for me (I'm not smart enough to be able to do that sort of thing)?
Thanks,
--Walt Weaver
Bozeman, Montana
![]() |
![]() |