Re: Unicode Character Allocation

From: Maxim Demenko <mdemenko_at_gmail.com>
Date: Thu, 13 Apr 2006 21:20:41 +0200
Message-ID: <e1m8a3$1u8$01$1@news.t-online.com>

my_grillz_gleam schrieb:
> Hello all,
>
> I have a quick question regarding how Oracle allocates storage space
> for its data types. In particular, I have been tasked develop processes
> to move data between Oracle and DB2 databases which both are set to
> use UTF-8. Now, I have no problems moving data from the DB2 tables to
> the Oracle tables, however moving from Oracle to DB2 has been causing
> records to reject. And to note, both tables have the exact same DDL and
> the Oracle is using BYTE semantics (DB2 only has BYTE semantics). Now
> my question is:
>
> Does Oracle, in UTF-8 mode, actually allocate 4 bytes per every byte
> specified in the DDL for a character field?
>
> i.e. does VARCHAR2(100 BYTE) equal 400 bytes or 100 bytes of disk
> space allocated? It seems to me that this is the case, from my testing.
> And unfortunately my Oracle DBA was not able to confirm this.
>

You may look it in the Oracle online documentation: http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14225/ch6unicode.htm#g1014017 <quote>
UTF-8 is the 8-bit encoding of Unicode. It is a variable-width encoding and a strict superset of ASCII. This means that each and every character in the ASCII character set is available in UTF-8 with the same code point values. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Characters from the European scripts are represented in either 1 or 2 bytes. Characters from most Asian scripts are represented in 3 bytes. Supplementary characters are represented in 4 bytes.
</quote>

In other words, it depends on the characters in UTF-8, how many bytes will them represent, it may vary from 100 bytes up to 400 bytes for 100 characters.

Best regards

Maxim Received on Thu Apr 13 2006 - 14:20:41 CDT