Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Mailing Lists -> Oracle-L -> Re: Character set changes
Dave
The UTF8 character set is a variable width character set, so it all depends on the data as to how much space will be used in your VARCHAR2. If you have US7ASCII data that is being convered to UTF8, it will take a single byte just as it did in the US7ASCII character set. If however later you put data into the data field that is not from a single-byte character set, then it will of course take up more space. Here is a snippet from Oracle 8i's National Language Support Guide, which may give some info for you on the size in bytes that will be utilized.
Tom Tyson
Exodus Communications, Inc.
Unicode 2.1 (UCS2 and UTF16) characters U+0000 through U+007F inclusive. These are 1-byte characters in UTF8, that have character codes 0x00 through 0x7f inclusive. These can represent only English ASCII characters. All English ASCII characters have exactly the same character codes (0x00 through 0x7f inclusive) in US7ASCII and UTF8 character sets.
Unicode 2.1 (UCS2 and UTF16) characters U+0080 through U+07FF inclusive These are 2-byte characters in UTF8, that have character codes 0xc0WW through 0xdfWW inclusive where WW can be 0x80 through 0xbf inclusive. These can represent characters of most European (including Greek and Russian), Arabic, Hebrew and some other languages.
Unicode 2.1 (UCS2 and UTF16) characters U+0800 through U+D7FF inclusive and U+E000 through U+FFFF inclusive
These are 3-byte characters in UTF8, that have character codes
0xe0WWTT through 0xecWWTT inclusive
0xed80TT through 0xed9fTT inclusive
0xeeWWTT through 0xefWWTT inclusive
where WW and TT are 0x80 through 0xbf inclusive.
These can represent characters of Chinese, Japanese, Korean, Thai, Indic, Dravidian and some other languages. Also, the "euro" currency sign is included in this group of characters. Oracle’s UTF8 character set currently does not support the following characters. If you use these characters in Oracle’s current UTF8 character set, the result is not guaranteed, and the behavior changes in the future releases of Oracle.