OraFAQ Forum: Russian » UTF8 russian compatibility

Home » Non-English Forums » Russian » UTF8 russian compatibility

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

UTF8 russian compatibility [message #255067]

Mon, 30 July 2007 09:57

zahar
Messages: 4
Registered: July 2007
Location: Belarus

Junior Member

Hello All!

Has generated base with coding UTF-8 on server 10.1.0.4.0

<initParam name="nls_language" value="RUSSIAN"/>
<initParam name="nls_territory" value="RUSSIA"/>
<characterSet>AL32UTF8</characterSet>
<nationalCharacterSet>UTF8</nationalCharacterSet>

----

select userenv('language') from dual gives RUSSIAN_CIS.AL32UTF8

-----

The purpose is that any user could connect with any russian allowed codepage (f.ex. russian_cis&cl8iso8859p5, russian_cis&cl8mswin1251) and orderly work and see cyrillics independent of who have inserted these data with what codepage.

With each code page cyrillics can be inserted and clients with such codepage see the text properly, but the other clients with different codepages see something unreadable.
And, even if to connect with NLS_LANG = russian_cis&al32utf8, only such clients orderly sees their russian symbols.

Please tell me what is wrong ?

Regards
Sergey

Report message to a moderator

Re: UTF8 russian compatibility [message #255111 is a reply to message #255067]

Mon, 30 July 2007 12:15

andrew again
Messages: 2577
Registered: March 2000

Senior Member

1.) if the client nls setting is correct, then the input characters will be correctly stored in UTF8 in the database.
2.) if you se your client nls setting to match the database (RUSSIAN_RUSSIA.AL32UTF8) then no codepage conversion will be done. Whatever input bytes are input - they will be stored like that in the database.
3.) the client font (and DOS codepage or Windows codepage) is important in being able to display the characters you are interested in.

select * from v$nls_parameters where parameter in ('NLS_LANGUAGE', 'NLS_TERRITORY', 'NLS_CHARACTERSET');

NLS_LANGUAGE         AMERICAN
NLS_TERRITORY        AMERICA
NLS_CHARACTERSET     AL32UTF8

NLS_LANG=<NLS_LANGUAGE>_<NLS_TERRITORY>.<NLS_CHARACTERSET>

NLS_LANG=AMERICAN_AMERICA.AL32UTF8

--==========================================================================
Euro is CHR(14844588) (U+20AC)
--==========================================================================
1.) Set NLS_LANG in registry for current Oracle Home to something having a Euro
 e.g. AMERICAN_AMERICA.WE8MSWIN1252

-- this trick shows what NLS setting is used by sqlplus
SQL> @%NLS_LANG%
SP2-0310: unable to open file "AMERICAN_AMERICA.WE8MSWIN1252"

--==========================================================================
-- Default DOS codepage 437 (Euro sign display test fails)
--==========================================================================
C:\>chcp
Active code page: 437

C:\>sqlplus test/test1@dev

SQL*Plus: Release 9.2.0.4.0 - Production on Thu Oct 7 14:27:37 2004

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle8i Enterprise Edition Release 8.1.7.2.0 - Production
With the Partitioning option
JServer Release 8.1.7.2.0 - Production

SQL> select CHR(14844588), dump(CHR(14844588)) from dual;

CHR DUMP(CHR(14844588))
--- ------------------------
�   Typ=1 Len=3: 226,130,172   <<=== Incorrect display (default codepage 437)

--==========================================================================
-- Windows codepage 1252 (Euro sign display test works)
--==========================================================================
2.) C:\>chcp 1252
Active code page: 1252

3.) Set font in DOS window to Lucida Console (it contains Euro)

4.) C:\>sqlplus test/test1@dev

SQL*Plus: Release 9.2.0.4.0 - Production on Thu Oct 7 14:28:21 2004

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle8i Enterprise Edition Release 8.1.7.2.0 - Production
With the Partitioning option
JServer Release 8.1.7.2.0 - Production

2.) SQL> select CHR(14844588), dump(CHR(14844588)) from dual;

CHR DUMP(CHR(14844588))
--- ------------------------
�   Typ=1 Len=3: 226,130,172   <<=== Correct display (codepage 1252)

Use dump to check that your character is correctly stored in UTF8 - else you'll have junk in your DB and only the client that inserted the data will be able to see it correctly.

-- Oracle 9.2.x database
SELECT * FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
NLS_CHARACTERSET	AL32UTF8

create table utf8_tst(col1 varchar2(1 char));

-- Euro is U+20AC
insert into utf8_tst values (unistr('\20AC'));
-- Small Greek Gamma U+03B3
insert into utf8_tst values (unistr('\03B3'));
insert into utf8_tst values (unistr('A'));

select col1, vsize(col1), dump(col1, 1010) Decimal_bytes, dump(col1, 1016) Hex_Bytes from utf8_tst;
� 	3	Typ=1 Len=3 CharacterSet=AL32UTF8: 226,130,172	Typ=1 Len=3 CharacterSet=AL32UTF8: e2,82,ac
�	2	Typ=1 Len=2 CharacterSet=AL32UTF8: 206,179	Typ=1 Len=2 CharacterSet=AL32UTF8: ce,b3
A	1	Typ=1 Len=1 CharacterSet=AL32UTF8: 65	Typ=1 Len=1 CharacterSet=AL32UTF8: 41

http://www.macchiato.com/unicode/convert.html
http://www.i18nguy.com/unicode/codepages.html

[Updated on: Mon, 30 July 2007 12:17]

Report message to a moderator

Re: UTF8 russian compatibility [message #255118 is a reply to message #255111]

Mon, 30 July 2007 12:31

andrew again
Messages: 2577
Registered: March 2000

Senior Member

http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#CEGCGEAF

Report message to a moderator

Re: UTF8 russian compatibility [message #255242 is a reply to message #255111]

Tue, 31 July 2007 02:40

zahar
Messages: 4
Registered: July 2007
Location: Belarus

Junior Member

Good day.

Sorry for bad English - it is not my native language.

andrew again wrote on Mon, 30 July 2007 20:15

Thanks, I also think so, but...

Really if russian text is inserted with client codepage, f.e., nls_lang=russian_cis.cl8iso8859p5 - it is seen properly ONLY with this codepage on client side !

I supposed that if text is indeed saved to base as utf8 formatted - it shoul be seen at ANY client cyrillic codepage properly, f. ex. client with nls_lang=russian_cis.cl8win1251.
But it is NOT so !
This is the problem.

Even if text is inserted by client with nls_lang=russian_cis.al32utf8 (direct data inserting without re-encoding) it is seen properly only if reading client have EXACTLY SUCH nls_lang setting.
All other "codepaged" clients see something unreadable.

-----

SELECT *
FROM V_$NLS_PARAMETERS

gives

"NLS_LANGUAGE","RUSSIAN"
"NLS_TERRITORY","CIS"
"NLS_CURRENCY","�."
"NLS_ISO_CURRENCY","CIS"
"NLS_NUMERIC_CHARACTERS",", "
"NLS_CALENDAR","GREGORIAN"
"NLS_DATE_FORMAT","DD-Mon-RRRR"
"NLS_DATE_LANGUAGE","RUSSIAN"
"NLS_CHARACTERSET","AL32UTF8"
"NLS_SORT","RUSSIAN"
"NLS_TIME_FORMAT","HH24:MI:SSXFF"
"NLS_TIMESTAMP_FORMAT","DD.MM.RR HH24:MI:SSXFF"
"NLS_TIME_TZ_FORMAT","HH24:MI:SSXFF TZR"
"NLS_TIMESTAMP_TZ_FORMAT","DD.MM.RR HH24:MI:SSXFF TZR"
"NLS_DUAL_CURRENCY","�."
"NLS_NCHAR_CHARACTERSET","UTF8"
"NLS_COMP","BINARY"
"NLS_LENGTH_SEMANTICS","CHAR"
"NLS_NCHAR_CONV_EXCP","FALSE"

so seems base indeed created with utf-8.

If there can be some additional parameters or operations for base to eliminate this problem ?

Report message to a moderator

Previous Topic:	unicode problem
Next Topic:	How to error messages in russian language, (NLS parameters)

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Jan 30 00:36:12 CST 2025