Re: Characterset Question

From: Kim Berg Hansen <kibeha_at_gmail.com>
Date: Sat, 23 Mar 2024 15:19:31 +0100
Message-ID: <CA+S=qd2nmLSMRsfL8Kx2WesAw_VUK49G-dd-Qeeu6FYAks77MQ_at_mail.gmail.com>





DMU is the tool, yes. As an addendum to what Martin said, DMU can also be used to scan the database for any characters that are invalid according to the database characterset.

Invalid characters in strings can for example happen if the client NLS_LANG environment variable is set to the same characterset as the database characterset, but then the client sends strings that really are in another characterset. Those will not be converted, because the database believes that according to NLS_LANG the client is sending same characterset as the database wants to store it in, therefore no conversion is needed.

Be careful that your client NLS_LANG is set to the characterset that the strings has that the client is sending. If set correctly, conversion happens automatically on the way in and out of the database. If set wrong, so NLS_LANG characterset does not reflect the real characterset of the data sent by the client, you're asking for problems.

See an example of what can happen here:
https://www.kibeha.dk/2018/05/corrupting-characters-how-to-get.html - Even when the database is AL32UTF8 you can get invalid byte values in the strings.

Cheerio
/Kim

Regards

Kim Berg Hansen
Oracle ACE Director

Author of Practical Oracle SQL
<https://www.apress.com/gp/book/9781484256169> http://www.kibeha.dk
kibeha_at_kibeha.dk
_at_kibeha
<http://twitter.com/kibeha>

On Sat, Mar 23, 2024 at 10:13 AM Martin Berger <martin.a.berger_at_gmail.com> wrote:

>
> The tool to convert databases to Unicode characterset is called *Database
> Migration Assistant for Unicode* (DMU
> <https://www.oracle.com/de/database/technologies/appdev/oracle-database-migration-assistant.html>
> ).
> Previous there was CSSCAN and CSALTER, but they are long gone: *CSSCAN
> and CSALTER To Be Desupported after DB 11.2 (Doc ID 1418321.1
> <https://support.oracle.com/epmos/faces/DocContentDisplay?id=1418321.1>)*
>
> If you face any issues with characters wrongly inserted (e.g. you have a
> these hex values in the dump() result of a varchar2 column 0xC2 0xB0 -
> which is ° in UTF8 encoding of Unicode Character “°” (U+00B0
> <https://www.compart.com/en/unicode/U+00B0>) ) you can try to identify &
> fix this with the functions SYS_OP_CSCONVTEST()
> <https://berxblog.blogspot.com/2021/12/under-dmus-cover.html> and
> SYS_OP_CSCONV()
> <https://docs.oracle.com/en/database/oracle/dmu/19.1/dumag/ch5_advanced_topics.html#GUID-3E852E81-30FD-4413-83CC-D378DE01DAFA__GUID-B3D03649-C5FE-4873-8EF6-73D0C48242B1>
> .
>
> hth,
> Martin
>
> Am Fr., 22. März 2024 um 23:58 Uhr schrieb Mark W. Farnham <mwf_at_rsiz.com>:
>
>> If memory serves, the utility was csscan, and I **thought** it was
>> upgraded at some point to flag bytes not (supposed to be) in the source
>> database characterset.
>>
>>
>>
>> That’s probably moot for this case, since Scott identified that inserts
>> from informatica into the new server are the apparent problem, and as Clay
>> pointed out, checking the NLS_LANG (I think) on the client running
>> informatica should be checked.
>>
>>
>>
>> If the client and server nls language values versions are identical, I *
>> *think** no translation is done, which if Scott is correct they are both
>> AL32UTF8, then the translation is the no-op, which should both work
>> correctly and be infinitesimally faster.
>>
>>
>>
>> Scott, at your convenience please let us know how you make out.
>>
>>
>>
>> *From:* dimensional.dba_at_comcast.net [mailto:dimensional.dba_at_comcast.net]
>> *Sent:* Friday, March 22, 2024 6:03 PM
>> *To:* joncrisler_at_gmail.com; Clay.Jackson_at_quest.com
>> *Cc:* srcdco_at_rit.edu; 'Mark W. Farnham'; oracle-l_at_freelists.org
>> *Subject:* RE: Characterset Question
>>
>>
>>
>> The tool works great all charactersets except for databases that are set
>> to us7ascii and have 8bit bytes too.
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On
>> Behalf Of *Jon Crisler
>> *Sent:* Friday, March 22, 2024 2:22 PM
>> *To:* Clay.Jackson_at_quest.com
>> *Cc:* srcdco_at_rit.edu; Mark W. Farnham <mwf_at_rsiz.com>;
>> oracle-l_at_freelists.org
>> *Subject:* Re: Characterset Question
>>
>>
>>
>> There is a utility that is part of the Oracle characterset conversion
>> program, that can scan the existing DB and the proposed characterset. It
>> will report back on any problem tables / columns and if the result is
>> lossless, losey and/or info about manual cleanup. The exact name escapes
>> me, but check out Oracle DocIDs relating to converting from one
>> characterset to another and it should be mentioned. I will try to find it
>> as well.
>>
>>
>>
>> On Fri, Mar 22, 2024 at 4:38 PM Clay Jackson <dmarc-noreply_at_freelists.org>
>> wrote:
>>
>> Check the Oracle client NLS settings that the Informatica server is
>> using. 😊
>>
>>
>>
>>
>>
>> *Clay Jackson*
>>
>> Database Solutions Sales Engineer
>>
>> <https://www.quest.com/solutions/database-performance-monitoring/>
>>
>> clay.jackson_at_quest.com
>>
>> *office* 949-754-1203 *mobile* 425-802-9603
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On
>> Behalf Of *Scott Canaan
>> *Sent:* Friday, March 22, 2024 11:54 AM
>> *To:* Mark W. Farnham <mwf_at_rsiz.com>; oracle-l_at_freelists.org
>> *Subject:* RE: Characterset Question
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not follow guidance, click links, or open attachments unless you recognize
>> the sender and know the content is safe.
>>
>>
>>
>> Interesting that the source is also AL32UTF8, but the data is transferred
>> via Informatica. My guess is that’s where the issue is.
>>
>>
>>
>> *Scott Canaan ‘88*
>>
>> *Sr Database Administrator *Information & Technology Services
>> Finance & Administration
>>
>>
>> *Rochester Institute of Technology*o: (585) 475-7886 | f: (585) 475-7520
>>
>> *srcdco_at_rit.edu <srcdco_at_rit.edu>* | c: (585) 339-8659
>>
>> *CONFIDENTIALITY NOTE*: The information transmitted, including
>> attachments, is intended only for the person(s) or entity to which it is
>> addressed and may contain confidential and/or privileged material. Any
>> review, retransmission, dissemination or other use of, or taking of any
>> action in reliance upon this information by persons or entities other than
>> the intended recipient is prohibited. If you received this in error, please
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>> *From:* Mark W. Farnham <mwf_at_rsiz.com>
>> *Sent:* Friday, March 22, 2024 2:50 PM
>> *To:* Scott Canaan <srcdco_at_rit.edu>; oracle-l_at_freelists.org
>> *Subject:* RE: Characterset Question
>>
>>
>>
>> That should be a lossless conversion with the possible exception of
>> clob/blob.
>>
>>
>>
>> However, figuring out the appropriate NLS_PARAMETER values for the client
>> and the server have to be set right.
>>
>>
>>
>> How are you bringing data from Peoplesoft over to your new database?
>>
>>
>>
>> IF it is application to database, setting the client and server
>> parameters should take care of it.
>>
>>
>>
>> IF it is database (old) to database (new), then there is possibly some
>> data in the (old) database that is not actually US7ASCII and some cleanup
>> may be required.
>>
>>
>>
>>
>> https://docs.oracle.com/en/database/oracle/oracle-database/19/nlspg/supporting-multilingual-databases-with-unicode.html
>>
>>
>>
>> should give you a leg up, and there are more similar documents on Oracle.
>>
>>
>>
>> I’m a bit rusty on this, my friend, but I think that is about correct.
>>
>>
>>
>> mwf
>>
>>
>>
>>
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [
>> mailto:oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org>] *On
>> Behalf Of *Scott Canaan ("srcdco")
>> *Sent:* Friday, March 22, 2024 2:23 PM
>> *To:* oracle-l_at_freelists.org
>> *Subject:* Characterset Question
>>
>>
>>
>> We are in the process of migrating all of our Oracle databases from Red
>> Hat 7 to Red Hat 8. In creating the new databases in Red Hat 8, we’ve been
>> going with the default characterset of AL32UTF8. Many of the old databases
>> are US7ASCII. We are having trouble with applications trying to insert
>> into text fields in the new database. In one case, bringing data from
>> Peoplesoft that has a degree symbol fails because that comes over as 2
>> bytes instead of one. I thought that AL32UTF8 should handle those
>> characters. What needs to be done to accommodate this?
>>
>>
>>
>> *Scott Canaan ‘88*
>>
>> *Sr Database Administrator *Information & Technology Services
>> Finance & Administration
>>
>>
>> *Rochester Institute of Technology*o: (585) 475-7886 | f: (585) 475-7520
>>
>> *srcdco_at_rit.edu <srcdco_at_rit.edu>* | c: (585) 339-8659
>>
>> *CONFIDENTIALITY NOTE*: The information transmitted, including
>> attachments, is intended only for the person(s) or entity to which it is
>> addressed and may contain confidential and/or privileged material. Any
>> review, retransmission, dissemination or other use of, or taking of any
>> action in reliance upon this information by persons or entities other than
>> the intended recipient is prohibited. If you received this in error, please
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>>
>
> --
> Martin Berger Oracle ♠ Alumnus
> martin.a.berger_at_gmail.com _at_martinberx <https://twitter.com/martinberx>
> ^∆x http://berxblog.blogspot.com
>



--
http://www.freelists.org/webpage/oracle-l


image001.jpg
(image/jpeg attachment: image001.jpg)

Received on Sat Mar 23 2024 - 15:19:31 CET

Original text of this message