Home » Server Options » Text & interMedia » Multi-lingual searches (Oracle 11gR2)
Multi-lingual searches [message #563330] |
Mon, 13 August 2012 00:48 |
|
srimansamjo
Messages: 2 Registered: August 2012 Location: Sydney, Australia
|
Junior Member |
|
|
Hey Guys, We are building a multi-lingual application that need to cater to English, German and Arabic character sets. While input and rendering of the data (number/currency and date formats ) is not an issue, query requirement has me stumped!
I am sure Oracle Text and the use of lexers might have a solution to the problem I have.
Requirement - Say I have a column that stores the last name of an individual. This last name, depending on the geography where it is entered from, could contain Latin, German or Arabic character sets. This is fine. So, for example, Smith could be entered in all three character sets. Now the query function on this column should return data entered in all languages regardless of the current language and keyboard mapping chosen by the user.
I have seen the use of thesaurus package ctx_thes and the world lexer etc. Is it possible to do this?
I was thinking about having another column, for every such column that needs to have language-independent query requirement, shadowing the user-enterable column. Upon user entry somehow translate the entry into a generic language that would be used then for the query function. Not sure there is any smart language translators available - especially proper nouns.
Thanks in anticipation
Regards
Sridhar
|
|
|
|
Re: Multi-lingual searches [message #563453 is a reply to message #563351] |
Mon, 13 August 2012 21:09 |
|
srimansamjo
Messages: 2 Registered: August 2012 Location: Sydney, Australia
|
Junior Member |
|
|
Hey Barbara, Here is an example
insert into test( id, lastname )
values ( 1, 'Muller' );
insert into test( id, lastname )
values ( 2, 'Müller' );
insert into test( id, lastname )
values ( 3, '<Muller in arabic however its written>' );
Sure the table has many other entries. Now, the requirement is that when a user, with an English keyboard mapping, queries the table test for Muller, all 3 records should be returned. If mapped to the German keyboard, a query on Müller should return all 3 records and likewise for Arabic keyboard mapping.
With the thesaurus it may be possible but we will need to have the lookups done for all possible proper nouns and key words which may not be possible.
The other option I was wondering was if I could have a 'hidden' column alongside the 'lastname' and populate this automagically using some translation tools into one unified characterset, if available. If this were possible, I could route the query operation to the 'hidden' column regardless of what key board mapping and NLS is used by the user session.
Regards
Sridhar
|
|
|
Re: Multi-lingual searches [message #563457 is a reply to message #563453] |
Mon, 13 August 2012 22:11 |
|
Barbara Boehmer
Messages: 9101 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
You can use a lexer with base_letter conversion to automatically convert any characters with diacritical marks, such as ü to their base letter, such as u, for indexing and searching, as shown below. The rest of your requirements are unclear; I would require additional sample data.
SCOTT@orcl_11gR2> create table test
2 (id number,
3 lastname varchar2(30))
4 /
Table created.
SCOTT@orcl_11gR2> insert into test( id, lastname )
2 values ( 1, 'Muller' )
3 /
1 row created.
SCOTT@orcl_11gR2> insert into test( id, lastname )
2 values ( 2, 'Müller' )
3 /
1 row created.
SCOTT@orcl_11gR2> insert into test( id, lastname )
2 values ( 3, 'other' )
3 /
1 row created.
SCOTT@orcl_11gR2> commit
2 /
Commit complete.
SCOTT@orcl_11gR2> select * from test
2 /
ID LASTNAME
---------- ------------------------------
1 Muller
2 Müller
3 other
3 rows selected.
SCOTT@orcl_11gR2> begin
2 ctx_ddl.create_preference ('test_lex', 'basic_lexer');
3 ctx_ddl.set_attribute ('test_lex', 'base_letter', 'yes');
4 ctx_ddl.set_attribute ('test_lex', 'base_letter_type', 'generic');
5 end;
6 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> create index test_idx
2 on test (lastname)
3 indextype is ctxsys.context
4 parameters ('lexer test_lex')
5 /
Index created.
SCOTT@orcl_11gR2> select token_text from dr$test_idx$i
2 /
TOKEN_TEXT
----------------------------------------------------------------
MULLER
OTHER
2 rows selected.
SCOTT@orcl_11gR2> select * from test where contains (lastname, 'Muller') > 0
2 /
ID LASTNAME
---------- ------------------------------
1 Muller
2 Müller
2 rows selected.
SCOTT@orcl_11gR2> select * from test where contains (lastname, 'Müller') > 0
2 /
ID LASTNAME
---------- ------------------------------
1 Muller
2 Müller
2 rows selected.
|
|
|
Goto Forum:
Current Time: Fri Dec 27 01:04:25 CST 2024
|