Home » RDBMS Server » Server Administration » Restricting characters in a UTF8 database (Oracle 11.2.0.2.0)
Restricting characters in a UTF8 database [message #603981] Sat, 21 December 2013 20:42 Go to next message
orauser001
Messages: 13
Registered: April 2013
Location: us
Junior Member
Version Information
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production



We are building an application that will store data about companies. It will get feeds from 100+ sources. The database character set is AL32UTF8

The user requirement is that the database should allow storing any 'Latin' and 'Arabic' characters. Looking at Unicode specification (http://www.unicode.org/charts/) the Latin and Arabic characters in Unicode are in the following ranges:

- Basic Latin (ASCII) [0000-007F]
- Latin-1 Supplement [0080-00FF]
- Latin Extended-A [0100-017F]
- Latin Extended-B [0180-024F]
- Latin Extended-C [2C60-2C7F]
- Latin Extended-D [A720-A7FF]

- Arabic [0600-06FF]
- Arabic Supplement [0750-077F]
- Arabic Extended [08A0-08FF]


Questions I have


1. Once we get data from a source its first loaded in a temporary staging table. Is there an easy way to query the staging table to find out if specific
column (e.g. Company Name) have any data that is not covered in the above acceptable Character ranges (so that it can be rejected and not be loaded in the
master tables).

2.Since we would be getting large volumes of such data, the check should ideally work in reasonable amount of time.

3.We need to create a specification document for our data providers. I am wondering what we need to specify in that document - will it suffice to say
that the files should be encoded in UTF8 and the characters should be in the code ranges that our application accepts (specified above)?


Thanks in advance for your help.

Re: Restricting characters in a UTF8 database [message #603986 is a reply to message #603981] Sun, 22 December 2013 01:11 Go to previous messageGo to next message
John Watson
Messages: 8960
Registered: January 2010
Location: Global Village
Senior Member
The character set scanner?
http://docs.oracle.com/cd/E11882_01/server.112/e10729/ch12scanner.htm#NLSPG487
Re: Restricting characters in a UTF8 database [message #603987 is a reply to message #603981] Sun, 22 December 2013 01:12 Go to previous message
Michel Cadot
Messages: 68716
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator

Have a look at REGEXP_LIKE

Previous Topic: oracle upgrade from 9i to 11g fails with SP2-0714: invalid combination of STARTUP options
Next Topic: Oracle 11g Installation (merged)
Goto Forum:
  


Current Time: Thu Nov 28 23:53:28 CST 2024