Re: Character set changes

From: Tom Tyson <tomtysonjr_at_yahoo.com>
Date: Tue, 29 Aug 2000 11:54:21 -0700 (PDT)
Message-Id: <10603.115767@fatcity.com>

Dave

The UTF8 character set is a variable width character set, so it all depends on the data as to how much space will be used in your VARCHAR2. If you have US7ASCII data that is being convered to UTF8, it will take a single byte just as it did in the US7ASCII character set. If however later you put data into the data field that is not from a single-byte character set, then it will of course take up more space. Here is a snippet from Oracle 8i's National Language Support Guide, which may give some info for you on the size in bytes that will be utilized.

Tom Tyson
Exodus Communications, Inc.

Oracle’s UTF8 character set currently supports the following characters.

Unicode 2.1 (UCS2 and UTF16) characters U+0000 through U+007F inclusive. These are 1-byte characters in UTF8, that have character codes 0x00 through 0x7f inclusive. These can represent only English ASCII characters. All English ASCII characters have exactly the same character codes (0x00 through 0x7f inclusive) in US7ASCII and UTF8 character sets.

Unicode 2.1 (UCS2 and UTF16) characters U+0080 through U+07FF inclusive These are 2-byte characters in UTF8, that have character codes 0xc0WW through 0xdfWW inclusive where WW can be 0x80 through 0xbf inclusive. These can represent characters of most European (including Greek and Russian), Arabic, Hebrew and some other languages.

Unicode 2.1 (UCS2 and UTF16) characters U+0800 through U+D7FF inclusive and U+E000 through U+FFFF inclusive

These are 3-byte characters in UTF8, that have character codes

0xe0WWTT through 0xecWWTT inclusive
0xed80TT through 0xed9fTT inclusive
0xeeWWTT through 0xefWWTT inclusive

where WW and TT are 0x80 through 0xbf inclusive.

These can represent characters of Chinese, Japanese, Korean, Thai, Indic, Dravidian and some other languages. Also, the "euro" currency sign is included in this group of characters. Oracle’s UTF8 character set currently does not support the following characters. If you use these characters in Oracle’s current UTF8 character set, the result is not guaranteed, and the behavior changes in the future releases of Oracle.

Dave Morgan <dmorgan_at_bartertrust.com> wrote:
> Hi all,
> We have a new requirement to support Oracle LDAP server which
> requires the character set UTF8. Our current character set is
> US7ASCII.
>
> As described in the manual UTF8 is a superset of US7ASCII so I
> can do an ALTER DATABASE COMMAND to change the character set.
> (After backups and such). My question is since VARCHAR2s under
> UTF8 are sized by bytes what happens to existing tables columns.
>
> For example if I have a column
> DESCRIPTION VARCHAR2(2000);
> under US7ASCII how many characters will the the field hold
> under UTF8?
>
> TIA
> Dave
>
> --
> Dave Morgan
> Senior Database Administrator
> Internet Barter Inc.
> www.bartertrust.com
> 408-982-8774
> --
> Author: Dave Morgan
> INET: dmorgan_at_bartertrust.com
>
> Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> San Diego, California -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from). You may
> also send the HELP command for other information (like subscribing).
>

Do You Yahoo!? Received on Tue Aug 29 2000 - 13:54:21 CDT