Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> Re: charset for kangi
MTNorman schreef:
> On Jan 30, 2:37 pm, Frank van Bortel <frank.van.bor..._at_gmail.com>
> wrote:
>> I would not go for different fields at all when designing >> such an application, but rather have one characterset. >> I would always opt for the AL16UTF16, because: >> - it is closest to the Windows code set (like it or not, >> most clients use that on the desktop to enter characters) >> - it is fixed double byte. >> >> There may be other considerations, which would make the >> first option a viable choice. >> Simple UTF8, as you call it, is not 10G - AL32UTF8 is. >> And it is a valid choice. >>
Oops - you are very correct; major slip on my part!
> AL16UTF16 expands two bytes at a time to handle multi-byte UTF
> characters. AL16UTF8 has a single byte base and expands one byte at a
> time to handle multi-byte UTF.
AL16UTF16 handles all European, and most Asian characters in 2 byte; it is a strict superset of UCS-2 (which I confused it with). Supplementary characters need 4 bytes, but in general, it is more compact than UTF8 for Asian characters.
Don't know where AL16UTF8 comes in - it's not supported by
Oracle, afaik.
>
> Why would you have the same data in two different table columns? All
> fields/column that would contain Kanji data would use a national data
> type. You can store USASCII7 in the national character set as well as
> Kanji.
You are right - again.
>
> As to whether different data types cause problems in applications...
> which single data type do you use for all your fields/columns now -
> char, varchar2, number, clob, blob? I find no more problems with
> using nvarchar2 and with using char and varchar2. Yes, the developer
> needs to be aware of the characteristics of the different data types,
> particularly when assigning character data to declared variables...
> but then the developer should always be aware of the source and
> destination data types anyway. Just because oracle can usually
> implicitly convert a string to a number and back to a string without
> problems doesn't mean the developer has not just introduced a "bug"
> into the code that's going to show up as soon as zero leading numeric
> strings are used.
You mean "byte" versus "char" semantics. I'd change the database
default to char, in such an environment.
>
> WE8MSWIN1252 is the most supported windows character set, but XP
> supports multiple character sets that can be changed on the fly.
> That's why it's so neat for a dumb single language person like me to
> see the XP character set changed from american english to canadian
> french to some indian set and watch the desktop change from something
> I can read to something I sorta recognize (that year of high school
> french was a long time ago), to something only my colleague can
> read.
>
> BTW - having both the database character set and the national
> character set in the UTF space reduces the opportunities of "losing"
> some bytes when accidently crossing data types.
>
> Regards,
> Margaret
>
So - AL32UTF8 it is - all the way.
-- Regards, Frank van Bortel Top-posting is one way to shut me up...Received on Wed Jan 31 2007 - 13:16:07 CST
![]() |
![]() |