Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: Unicode Character Allocation
my_grillz_gleam schrieb:
> Hello all,
>
> I have a quick question regarding how Oracle allocates storage space
> for its data types. In particular, I have been tasked develop processes
> to move data between Oracle and DB2 databases which both are set to
> use UTF-8. Now, I have no problems moving data from the DB2 tables to
> the Oracle tables, however moving from Oracle to DB2 has been causing
> records to reject. And to note, both tables have the exact same DDL and
> the Oracle is using BYTE semantics (DB2 only has BYTE semantics). Now
> my question is:
>
> Does Oracle, in UTF-8 mode, actually allocate 4 bytes per every byte
> specified in the DDL for a character field?
>
> i.e. does VARCHAR2(100 BYTE) equal 400 bytes or 100 bytes of disk
> space allocated? It seems to me that this is the case, from my testing.
> And unfortunately my Oracle DBA was not able to confirm this.
>
You may look it in the Oracle online documentation:
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14225/ch6unicode.htm#g1014017
<quote>
UTF-8 is the 8-bit encoding of Unicode. It is a variable-width encoding
and a strict superset of ASCII. This means that each and every character
in the ASCII character set is available in UTF-8 with the same code
point values. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or
4 bytes in UTF-8 encoding. Characters from the European scripts are
represented in either 1 or 2 bytes. Characters from most Asian scripts
are represented in 3 bytes. Supplementary characters are represented in
4 bytes.
</quote>
In other words, it depends on the characters in UTF-8, how many bytes will them represent, it may vary from 100 bytes up to 400 bytes for 100 characters.
Best regards
Maxim Received on Thu Apr 13 2006 - 14:20:41 CDT