Home » Non-English Forums » Russian » UTF8 russian compatibility
UTF8 russian compatibility [message #255067] |
Mon, 30 July 2007 09:57 |
zahar
Messages: 4 Registered: July 2007 Location: Belarus
|
Junior Member |
|
|
Hello All!
Has generated base with coding UTF-8 on server 10.1.0.4.0
<initParam name="nls_language" value="RUSSIAN"/>
<initParam name="nls_territory" value="RUSSIA"/>
<characterSet>AL32UTF8</characterSet>
<nationalCharacterSet>UTF8</nationalCharacterSet>
----
select userenv('language') from dual gives RUSSIAN_CIS.AL32UTF8
-----
The purpose is that any user could connect with any russian allowed codepage (f.ex. russian_cis&cl8iso8859p5, russian_cis&cl8mswin1251) and orderly work and see cyrillics independent of who have inserted these data with what codepage.
With each code page cyrillics can be inserted and clients with such codepage see the text properly, but the other clients with different codepages see something unreadable.
And, even if to connect with NLS_LANG = russian_cis&al32utf8, only such clients orderly sees their russian symbols.
Please tell me what is wrong ?
Regards
Sergey
|
|
|
Re: UTF8 russian compatibility [message #255111 is a reply to message #255067] |
Mon, 30 July 2007 12:15 |
andrew again
Messages: 2577 Registered: March 2000
|
Senior Member |
|
|
1.) if the client nls setting is correct, then the input characters will be correctly stored in UTF8 in the database.
2.) if you se your client nls setting to match the database (RUSSIAN_RUSSIA.AL32UTF8) then no codepage conversion will be done. Whatever input bytes are input - they will be stored like that in the database.
3.) the client font (and DOS codepage or Windows codepage) is important in being able to display the characters you are interested in.
select * from v$nls_parameters where parameter in ('NLS_LANGUAGE', 'NLS_TERRITORY', 'NLS_CHARACTERSET');
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CHARACTERSET AL32UTF8
NLS_LANG=<NLS_LANGUAGE>_<NLS_TERRITORY>.<NLS_CHARACTERSET>
NLS_LANG=AMERICAN_AMERICA.AL32UTF8
--==========================================================================
Euro is CHR(14844588) (U+20AC)
--==========================================================================
1.) Set NLS_LANG in registry for current Oracle Home to something having a Euro
e.g. AMERICAN_AMERICA.WE8MSWIN1252
-- this trick shows what NLS setting is used by sqlplus
SQL> @%NLS_LANG%
SP2-0310: unable to open file "AMERICAN_AMERICA.WE8MSWIN1252"
--==========================================================================
-- Default DOS codepage 437 (Euro sign display test fails)
--==========================================================================
C:\>chcp
Active code page: 437
C:\>sqlplus test/test1@dev
SQL*Plus: Release 9.2.0.4.0 - Production on Thu Oct 7 14:27:37 2004
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle8i Enterprise Edition Release 8.1.7.2.0 - Production
With the Partitioning option
JServer Release 8.1.7.2.0 - Production
SQL> select CHR(14844588), dump(CHR(14844588)) from dual;
CHR DUMP(CHR(14844588))
--- ------------------------
� Typ=1 Len=3: 226,130,172 <<=== Incorrect display (default codepage 437)
--==========================================================================
-- Windows codepage 1252 (Euro sign display test works)
--==========================================================================
2.) C:\>chcp 1252
Active code page: 1252
3.) Set font in DOS window to Lucida Console (it contains Euro)
4.) C:\>sqlplus test/test1@dev
SQL*Plus: Release 9.2.0.4.0 - Production on Thu Oct 7 14:28:21 2004
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle8i Enterprise Edition Release 8.1.7.2.0 - Production
With the Partitioning option
JServer Release 8.1.7.2.0 - Production
2.) SQL> select CHR(14844588), dump(CHR(14844588)) from dual;
CHR DUMP(CHR(14844588))
--- ------------------------
� Typ=1 Len=3: 226,130,172 <<=== Correct display (codepage 1252)
Use dump to check that your character is correctly stored in UTF8 - else you'll have junk in your DB and only the client that inserted the data will be able to see it correctly.
-- Oracle 9.2.x database
SELECT * FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
NLS_CHARACTERSET AL32UTF8
create table utf8_tst(col1 varchar2(1 char));
-- Euro is U+20AC
insert into utf8_tst values (unistr('\20AC'));
-- Small Greek Gamma U+03B3
insert into utf8_tst values (unistr('\03B3'));
insert into utf8_tst values (unistr('A'));
select col1, vsize(col1), dump(col1, 1010) Decimal_bytes, dump(col1, 1016) Hex_Bytes from utf8_tst;
� 3 Typ=1 Len=3 CharacterSet=AL32UTF8: 226,130,172 Typ=1 Len=3 CharacterSet=AL32UTF8: e2,82,ac
� 2 Typ=1 Len=2 CharacterSet=AL32UTF8: 206,179 Typ=1 Len=2 CharacterSet=AL32UTF8: ce,b3
A 1 Typ=1 Len=1 CharacterSet=AL32UTF8: 65 Typ=1 Len=1 CharacterSet=AL32UTF8: 41
http://www.macchiato.com/unicode/convert.html
http://www.i18nguy.com/unicode/codepages.html
[Updated on: Mon, 30 July 2007 12:17] Report message to a moderator
|
|
|
|
Re: UTF8 russian compatibility [message #255242 is a reply to message #255111] |
Tue, 31 July 2007 02:40 |
zahar
Messages: 4 Registered: July 2007 Location: Belarus
|
Junior Member |
|
|
Good day.
Sorry for bad English - it is not my native language.
andrew again wrote on Mon, 30 July 2007 20:15 | 1.) if the client nls setting is correct, then the input characters will be correctly stored in UTF8 in the database.
2.) if you se your client nls setting to match the database (RUSSIAN_RUSSIA.AL32UTF8) then no codepage conversion will be done. Whatever input bytes are input - they will be stored like that in the database.
|
Thanks, I also think so, but...
Really if russian text is inserted with client codepage, f.e., nls_lang=russian_cis.cl8iso8859p5 - it is seen properly ONLY with this codepage on client side !
I supposed that if text is indeed saved to base as utf8 formatted - it shoul be seen at ANY client cyrillic codepage properly, f. ex. client with nls_lang=russian_cis.cl8win1251.
But it is NOT so !
This is the problem.
Even if text is inserted by client with nls_lang=russian_cis.al32utf8 (direct data inserting without re-encoding) it is seen properly only if reading client have EXACTLY SUCH nls_lang setting.
All other "codepaged" clients see something unreadable.
-----
SELECT *
FROM V_$NLS_PARAMETERS
gives
"NLS_LANGUAGE","RUSSIAN"
"NLS_TERRITORY","CIS"
"NLS_CURRENCY","�."
"NLS_ISO_CURRENCY","CIS"
"NLS_NUMERIC_CHARACTERS",", "
"NLS_CALENDAR","GREGORIAN"
"NLS_DATE_FORMAT","DD-Mon-RRRR"
"NLS_DATE_LANGUAGE","RUSSIAN"
"NLS_CHARACTERSET","AL32UTF8"
"NLS_SORT","RUSSIAN"
"NLS_TIME_FORMAT","HH24:MI:SSXFF"
"NLS_TIMESTAMP_FORMAT","DD.MM.RR HH24:MI:SSXFF"
"NLS_TIME_TZ_FORMAT","HH24:MI:SSXFF TZR"
"NLS_TIMESTAMP_TZ_FORMAT","DD.MM.RR HH24:MI:SSXFF TZR"
"NLS_DUAL_CURRENCY","�."
"NLS_NCHAR_CHARACTERSET","UTF8"
"NLS_COMP","BINARY"
"NLS_LENGTH_SEMANTICS","CHAR"
"NLS_NCHAR_CONV_EXCP","FALSE"
so seems base indeed created with utf-8.
If there can be some additional parameters or operations for base to eliminate this problem ?
|
|
|
Goto Forum:
Current Time: Thu Jan 09 19:33:50 CST 2025
|