Skip navigation.

Dominic Brooks

Syndicate content OraStory
Dominic Brooks on Oracle - Data Quality, Sensible Design, SQL Performance
Updated: 2 hours 48 min ago

Loader

Fri, 2009-11-06 11:11

It’s been years and years since I worked regularly with SQL*Loader-based feeds.

There are loads of tips and tricks regarding formats, encodings, character sets, etc.
I had forgotten nearly everything that I could ever have claimed to know in this area, all aged out of my personal buffer cache, at best distant memories, very distant.

However, I’ve recently had to resurrect some of these distant memories to feed some data via External Tables.

I thought it would be worthwhile to do an incoherent brain dump on some of the particular issues for future reference.

This article is my no means comprehensive and will just touch on some of the considerations specific to what I was doing.

The starting place for documentation on SQL*Loader is the Oracle Database Utilies Manual and similarly for External Tables.

The thing about external tables is that they can use the ORACLE_LOADER or the ORACLE_DATAPUMP access drivers.

My circumstances involved the ORACLE_LOADER driver.

As the names suggest, ORACLE_LOADER is related to SQL*LOADER. In fact, you can use the EXTERNAL_TABLE=GENERATE_ONLY parameter with SQL*LOADER to generate the syntax for your EXTERNAL TABLE.

I think things like this are brilliant but I never use them. I suppose I’m old skool. Neat features that auto generate code mean that a) I don’t necessarily need to understand what’s going on and b) skip the line-by-line sanity checks that I still believe in.

I work in an environment where it’s not so easy to get access to servers to ftp and view files, etc.

As a result, my approach would be to work with SQL*Loader early on and then graduate to external files nearer the time. However, I’m going to skip all that, fast forward past the SQL*Loader bit and move on to the issues.

So the main considerations for loading up data?
- Encoding / Character sets
- Delimiters – field and line
- File Transfer

(I’m going to ignore Endianness for the sake of “clarity”. But it can be a factor. See Byte Ordering for futher information.)

In my experience, when you get errors whilst loading, it’s due to one or more of the reasons above.
More than one?
Sure.

For example, the combination of file transfer mechanism and delimiters.
Ever wondered why there’s confusion over whether you should transfer a file as text (or ascii), as binary? What’s the difference?

As ever, it depends.
- SFTP doesn’t have a text mode, all transfers are binary.
- FTP lets you choose between ascii (default) and binary.
- What about WinSCP and similar tools with their automatic mode?

Automatic modes usually work off the file extension and would pick a text transfer for HTML, TXT, PHP, XML, etc and binary otherwise.

And the differences in mode?

For Text mode, there are two basic methods – either the tool is responsible for doing some conversion to the format supported by the destination or, more commonly, the client converts to a canonical format and the service then does a further conversion to its own format if necessary.

With Binary mode, the raw bytes are transferred as is, i.e. the file is transferred in its original form.

So, the main significance of this related to field and line delimiters – things like tabs and line feed characters are different between platforms, different between Unix and Windows for example.

This was particularly relevant for my file export from SQL Server to Oracle on Linux.

In Windows, a new line is often represented by two characters – one carriage return and one line feed.
In Unix, a new line is normally just a line feed.

Sometimes you see “^M” characters on *nix. What’s this about?
This represents the carriage return part of the newline for Windows as described above.
If you need to, you can get rid of that using dos2unix.

So, in the ACCESS PARAMETERS subsection, if you use the “RECORDS DELIMITED BY NEWLINE” syntax in your external table definiton, what does that mean?

The NEWLINE keyword uses the newline format for your platform – so just a line feed in the case of *nix.

For my process, I decided that I would just go with the format as extracted from SQL Server – a 50:50 decision that there’s no point on expanding on.

So, in my situation, the syntax for an external table should not use the NEWLINE keyword because the format was Windows new lines, the syntax for which can be:

RECORDS DELIMITED BY '\r\n'

Here’s a situation. Things have been going fine in DEV,etc and eventually go to PREPRODUCTION and there’s a problem.

The feeds don’t work. The external table loader isn’t finding the right delimiters.

The question really is how can you tell what characters are in your file?
So, there’s no point looking at the file prior to transfer in case the technological clue (or the person in a manual process) changes the file on transfer. We need to look at it in the destination directory on the destination server.

And what’s the best way to do that?

On a Unix/Linux, a useful command is the od command which dumps files in octal and other formats. For example,

od -c <filename>

which gives ascii characters or backslash escapes.
There’s a lot of output from this command but you don’t need to do the whole file, e.g.

head -2 <filename> | od -x | more

If the file suddenly doesn’t match the expected format, what’s the sort of errors you might get?
Well, if it should have transfered in binary but it was done in text by error, you might get:

KUP-04020: found record longer than buffer size supported

i.e. the lines are running into each other
or under slightly different circumstances

KUP-04023: field start is after end of record

etc.

That’s about it on transfers and delimiters.

What’s there to say about encoding and character sets.
Using the od command above we can see the hex codes to double check the encoding is as expected.
Other that that, in our external table definition we can specify the character set in the ACCESS PARAMETERS section using the CHARACTERSET keyword, e.g.

CREATE TABLE <my_tablename>
(<my_columns>)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY <my_directory>
     ACCESS PARAMETERS
       ( RECORDS DELIMITED BY '\r\n'
       CHARACTERSET AL16UTF16
       FIELDS TERMINATED BY '\t'
       MISSING FIELD VALUES ARE NULL  )
     LOCATION (<my_directory>:'<my_filename>')
  )
REJECT LIMIT 0;

and then the expected characterset conversion will take place.
Note that SQLServer tends to use USC2 as its unicode characterset and UCS2 is a subset of AL16UTF16.

It’s Friday, it’s 5 o’clock, I’ve lost focus. I’m done.

Learning

Wed, 2009-10-28 06:11

Earlier this month I attended Tanel Poder’s course “Advanced Oracle Troubleshooting for DBAs and Performance Engineers”.

It was good – definitely worthwhile, lots of useful information and really puts into perspective those scripts of his, like LatchProf, LatchProfX, WaitProf and sample.sql. Plus a whole bunch of other stuff. So definitely worth thinking about if you’ve any training budget (that’s mine blown for forseeable future).

I’ve already mentioned the Metalink Headlines tip and I’ve had a little jump in traffic over the past couple of days from Tanel saying the same thing.

The other top tip was to read James Morle’s book Scaling Oracle 8i. James was there on the second day but that wasn’t why it was recommended. Ignore the 8i in the title, most of it remains as relevant as it ever was. I don’t know how I never read this at the time but I’m ploughing through it on the train and it’s a real, real benefit. It’s not just about Oracle. It’s as much about the stuff it interacts with – OS, IO subsystems, memory, etc.
It’s downloadable. Read it.

Metalink headlines

Wed, 2009-09-16 03:51

When I was thinking about whether to go on Tanel Poder’s course, I read the testimonials and there’s a great piece of advice from Tanel via Coskan Gundogar:

I asked him how do you learn all these interesting stuff. He gave me a perfect clue. Sign up for metalink headlines mailing list. Bugs docs they are all there after they are created. Perfect resource.

I just want to emphasise what a great resource these Metalink updates are.

I’ve only been signed up for a couple of weeks but some of the nuggets that come into your mailbox are pure gold.

I’ve learnt some stuff that I would never come across, plus I’m building up links for some great articles and explanations, some of which are invaluable as a reference for yourself or to give others (I’m rubbish at explaining stuff in my own words).

For example, just from the last couple of days, I’ve been notified of these new or updated articles:
(Note: Login to Metalink first before clicking the links)

Character set and character set conversion (comes up all the time on the forums and from other teams at work):

Metalink Scripts:

General:

Interesting behaviour:

To name but a few…

It’s just invaluable stuff.

Sign up if you don’t already.