Re: Characterset Question

From: Luis Claudio Dias dos Santos <lsantos_at_pobox.com>
Date: Tue, 26 Mar 2024 11:52:59 -0300
Message-ID: <CAPWdmV_huCA4=VJ+8tmg_rty6PbE7Wmq5=vniTfaQd+P=pj0HQ_at_mail.gmail.com>





After RDBMS 12.2 you can have your CDB with AL32UTF8 (the definitive default) but a PDB with any charset you want, even paleozoic US7ASCII.

https://oracle-base.com/articles/12c/multitenant-pdb-character-sets-12cr2

<https://about.me/lcdsantos?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel&utm_content=thumb> Luis Santos
about.me/lcdsantos
<https://about.me/lcdsantos?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel&utm_content=thumb>

Em seg., 25 de mar. de 2024 às 12:31, Kim Berg Hansen <kibeha_at_gmail.com> escreveu:

> I thought so, but wasn't sure.
>
> You should then change the columns to CHAR semantics in source database
> before doing expdb.
> Can be done with a script performing ALTER TABLE on the columns that need
> it, or you could use the DMU on the source database - it has capability to
> automatically change all VARCHAR2 to CHAR semantics.
> (Scanning the source databases with DMU before doing expdp also has the
> advantage it'll tell you if you have invalid characters in the source - ie.
> something that's not really US7ASCII.)
>
> Cheerio
> /Kim
>
>
> On Mon, Mar 25, 2024 at 4:20 PM Scott Canaan <srcdco_at_rit.edu> wrote:
>
>> Ok, looking at Oracle support documents, the only way to make this work
>> via impdp is to pre-create the tables with CHAR instead of BYTE. According
>> to the documentation, impdp always uses the source definition when creating
>> tables, no matter what the NLS parameters are on the receiving database.
>>
>>
>>
>> *Scott Canaan ‘88*
>>
>> *Sr Database Administrator *Information & Technology Services
>> Finance & Administration
>>
>>
>> *Rochester Institute of Technology *o: (585) 475-7886 | f: (585) 475-7520
>>
>> *srcdco_at_rit.edu <srcdco_at_rit.edu>* | c: (585) 339-8659
>>
>> *CONFIDENTIALITY NOTE*: The information transmitted, including
>> attachments, is intended only for the person(s) or entity to which it is
>> addressed and may contain confidential and/or privileged material. Any
>> review, retransmission, dissemination or other use of, or taking of any
>> action in reliance upon this information by persons or entities other than
>> the intended recipient is prohibited. If you received this in error, please
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>> *From:* Kim Berg Hansen <kibeha_at_gmail.com>
>> *Sent:* Monday, March 25, 2024 10:55 AM
>> *To:* Scott Canaan <srcdco_at_rit.edu>
>> *Cc:* Mark W. Farnham <mwf_at_rsiz.com>; dimensional.dba_at_comcast.net;
>> joncrisler_at_gmail.com; Clay.Jackson_at_quest.com; oracle-l_at_freelists.org
>> *Subject:* Re: Characterset Question
>>
>>
>>
>> When you use datapump, it's not an issue with NLS_LANG. Datapump is
>> taking care of that.
>>
>>
>>
>> Your problem with the migration of US7ASCII databases to AL32UTF8 is, as
>> you say, that some characters that are stored in 1 byte in US7ASCII, they
>> take up 2 bytes when stored in AL32UTF8.
>>
>> So when a VARCHAR2 column has a length of say 20 *bytes*, it can contain
>> anywhere from 5 to 20 *characters *of UTF8, depending on the byte-length
>> of each character (between 1 and 4.)
>>
>> In UTF8 database, instead of defining character column length in bytes,
>> they should be defined in characters.
>>
>>
>>
>> The fix for this is to change your columns from for example VARCHAR2(20)
>> to VARCHAR2(20 *CHAR*).
>>
>> When the column is defined as VARCHAR2(20 *CHAR*), it can contain 20
>> *characters*, no matter if they use 1 or 2 or 3 or 4 bytes of storage.
>>
>>
>>
>> Basically for every VARCHAR2 column in the database, do an ALTER TABLE
>> MODIFY (<colname> VARCHAR2(<length> *CHAR*));
>>
>> (The DMU can also help to do this - change all columns to use CHAR
>> semantics instead of BYTE semantics.)
>>
>>
>>
>> I think you'd need to do this change to CHAR semantics in the source
>> databases, so that then impdp creates the tables, they'll be created
>> correctly.
>>
>> I don't think it'll work to try and force the impdp to create the tables
>> with CHAR semantics by setting target database parameter
>> NLS_LENGTH_SEMANTICS to CHAR before doing the impdp, but maybe give it a
>> try on a small testcase?
>>
>>
>>
>>
>>
>> Cheerio
>>
>> /Kim
>>
>>
>>
>>
>>
>> On Mon, Mar 25, 2024 at 3:40 PM Scott Canaan <dmarc-noreply_at_freelists.org>
>> wrote:
>>
>> Catching up on these messages.
>>
>>
>>
>> The characterset of the source (PeopleSoft) database is AL32UTF8. The
>> characterset of the ultimate destination (Data Warehouse) database is
>> AL32UTF8. In between them is Informatica. That is running in Windows
>> using a SQL Server database. It used to be Oracle, but my previous boss
>> decided that he wanted it using SQL Server, so he had it converted. I
>> don’t have access to the Informatica app server. I suspect that’s where
>> the issue is.
>>
>>
>>
>> Another related issue is that we are in the process of migrating from Red
>> Hat 7 to Red Hat 8. Most of the databases in Red Hat 7 are US7ASCII, but
>> the new databases in Red Hat 8 are AL32UTF8, as it is the default now. If
>> there is a “special” character in the US7ASCII database, when it is
>> datapumped into the Red Hat 8 database, it takes 2 bytes instead of one,
>> which sometimes causes the import to error with an “ORA-12899: value too
>> large for column” error. I made sure the environment running the impdp
>> command (over the network) has the NLS_LANG set to
>> AMERICAN_AMERICA.AL32UTF8. I’m not sure how to fix that without going
>> backward to US7ASCII, which I don’t want to do.
>>
>>
>>
>>
>> *Scott Canaan ‘88 *
>> *Sr Database Administrator *Information & Technology Services
>> Finance & Administration
>>
>>
>> *Rochester Institute of Technology *o: (585) 475-7886 | f: (585) 475-7520
>>
>> *srcdco_at_rit.edu <srcdco_at_rit.edu>* | c: (585) 339-8659
>>
>> *CONFIDENTIALITY NOTE*: The information transmitted, including
>> attachments, is intended only for the person(s) or entity to which it is
>> addressed and may contain confidential and/or privileged material. Any
>> review, retransmission, dissemination or other use of, or taking of any
>> action in reliance upon this information by persons or entities other than
>> the intended recipient is prohibited. If you received this in error, please
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>> *From:* Mark W. Farnham <mwf_at_rsiz.com>
>> *Sent:* Friday, March 22, 2024 6:56 PM
>> *To:* dimensional.dba_at_comcast.net; joncrisler_at_gmail.com;
>> Clay.Jackson_at_quest.com
>> *Cc:* Scott Canaan <srcdco_at_rit.edu>; oracle-l_at_freelists.org
>> *Subject:* RE: Characterset Question
>>
>>
>>
>> If memory serves, the utility was csscan, and I **thought** it was
>> upgraded at some point to flag bytes not (supposed to be) in the source
>> database characterset.
>>
>>
>>
>> That’s probably moot for this case, since Scott identified that inserts
>> from informatica into the new server are the apparent problem, and as Clay
>> pointed out, checking the NLS_LANG (I think) on the client running
>> informatica should be checked.
>>
>>
>>
>> If the client and server nls language values versions are identical, I *
>> *think** no translation is done, which if Scott is correct they are both
>> AL32UTF8, then the translation is the no-op, which should both work
>> correctly and be infinitesimally faster.
>>
>>
>>
>> Scott, at your convenience please let us know how you make out.
>>
>>
>>
>> *From:* dimensional.dba_at_comcast.net [mailto:dimensional.dba_at_comcast.net
>> <dimensional.dba_at_comcast.net>]
>> *Sent:* Friday, March 22, 2024 6:03 PM
>> *To:* joncrisler_at_gmail.com; Clay.Jackson_at_quest.com
>> *Cc:* srcdco_at_rit.edu; 'Mark W. Farnham'; oracle-l_at_freelists.org
>> *Subject:* RE: Characterset Question
>>
>>
>>
>> The tool works great all charactersets except for databases that are set
>> to us7ascii and have 8bit bytes too.
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On
>> Behalf Of *Jon Crisler
>> *Sent:* Friday, March 22, 2024 2:22 PM
>> *To:* Clay.Jackson_at_quest.com
>> *Cc:* srcdco_at_rit.edu; Mark W. Farnham <mwf_at_rsiz.com>;
>> oracle-l_at_freelists.org
>> *Subject:* Re: Characterset Question
>>
>>
>>
>> There is a utility that is part of the Oracle characterset conversion
>> program, that can scan the existing DB and the proposed characterset. It
>> will report back on any problem tables / columns and if the result is
>> lossless, losey and/or info about manual cleanup. The exact name escapes
>> me, but check out Oracle DocIDs relating to converting from one
>> characterset to another and it should be mentioned. I will try to find it
>> as well.
>>
>>
>>
>> On Fri, Mar 22, 2024 at 4:38 PM Clay Jackson <dmarc-noreply_at_freelists.org>
>> wrote:
>>
>> Check the Oracle client NLS settings that the Informatica server is
>> using. 😊
>>
>>
>>
>>
>>
>> *Clay Jackson*
>>
>> Database Solutions Sales Engineer
>>
>> <https://www.quest.com/solutions/database-performance-monitoring/>
>>
>> clay.jackson_at_quest.com
>>
>> *office* 949-754-1203 *mobile* 425-802-9603
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On
>> Behalf Of *Scott Canaan
>> *Sent:* Friday, March 22, 2024 11:54 AM
>> *To:* Mark W. Farnham <mwf_at_rsiz.com>; oracle-l_at_freelists.org
>> *Subject:* RE: Characterset Question
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not follow guidance, click links, or open attachments unless you recognize
>> the sender and know the content is safe.
>>
>>
>>
>> Interesting that the source is also AL32UTF8, but the data is transferred
>> via Informatica. My guess is that’s where the issue is.
>>
>>
>>
>>
>> *Scott Canaan ‘88 *
>> *Sr Database Administrator *Information & Technology Services
>> Finance & Administration
>>
>>
>> *Rochester Institute of Technology *o: (585) 475-7886 | f: (585) 475-7520
>>
>> *srcdco_at_rit.edu <srcdco_at_rit.edu>* | c: (585) 339-8659
>>
>> *CONFIDENTIALITY NOTE*: The information transmitted, including
>> attachments, is intended only for the person(s) or entity to which it is
>> addressed and may contain confidential and/or privileged material. Any
>> review, retransmission, dissemination or other use of, or taking of any
>> action in reliance upon this information by persons or entities other than
>> the intended recipient is prohibited. If you received this in error, please
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>> *From:* Mark W. Farnham <mwf_at_rsiz.com>
>> *Sent:* Friday, March 22, 2024 2:50 PM
>> *To:* Scott Canaan <srcdco_at_rit.edu>; oracle-l_at_freelists.org
>> *Subject:* RE: Characterset Question
>>
>>
>>
>> That should be a lossless conversion with the possible exception of
>> clob/blob.
>>
>>
>>
>> However, figuring out the appropriate NLS_PARAMETER values for the client
>> and the server have to be set right.
>>
>>
>>
>> How are you bringing data from Peoplesoft over to your new database?
>>
>>
>>
>> IF it is application to database, setting the client and server
>> parameters should take care of it.
>>
>>
>>
>> IF it is database (old) to database (new), then there is possibly some
>> data in the (old) database that is not actually US7ASCII and some cleanup
>> may be required.
>>
>>
>>
>>
>> https://docs.oracle.com/en/database/oracle/oracle-database/19/nlspg/supporting-multilingual-databases-with-unicode.html
>>
>>
>>
>> should give you a leg up, and there are more similar documents on Oracle.
>>
>>
>>
>> I’m a bit rusty on this, my friend, but I think that is about correct.
>>
>>
>>
>> mwf
>>
>>
>>
>>
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [
>> mailto:oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org>] *On
>> Behalf Of *Scott Canaan ("srcdco")
>> *Sent:* Friday, March 22, 2024 2:23 PM
>> *To:* oracle-l_at_freelists.org
>> *Subject:* Characterset Question
>>
>>
>>
>> We are in the process of migrating all of our Oracle databases from Red
>> Hat 7 to Red Hat 8. In creating the new databases in Red Hat 8, we’ve been
>> going with the default characterset of AL32UTF8. Many of the old databases
>> are US7ASCII. We are having trouble with applications trying to insert
>> into text fields in the new database. In one case, bringing data from
>> Peoplesoft that has a degree symbol fails because that comes over as 2
>> bytes instead of one. I thought that AL32UTF8 should handle those
>> characters. What needs to be done to accommodate this?
>>
>>
>>
>>
>> *Scott Canaan ‘88 *
>> *Sr Database Administrator *Information & Technology Services
>> Finance & Administration
>>
>>
>> *Rochester Institute of Technology *o: (585) 475-7886 | f: (585) 475-7520
>>
>> *srcdco_at_rit.edu <srcdco_at_rit.edu>* | c: (585) 339-8659
>>
>> *CONFIDENTIALITY NOTE*: The information transmitted, including
>> attachments, is intended only for the person(s) or entity to which it is
>> addressed and may contain confidential and/or privileged material. Any
>> review, retransmission, dissemination or other use of, or taking of any
>> action in reliance upon this information by persons or entities other than
>> the intended recipient is prohibited. If you received this in error, please
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>>



--
http://www.freelists.org/webpage/oracle-l


image001.jpg
(image/jpeg attachment: image001.jpg)

Received on Tue Mar 26 2024 - 15:52:59 CET

Original text of this message