Re: Index compression vs. table compression

From: Rick Denoire <100.17706_at_germanynet.de>
Date: Thu, 30 Dec 2004 00:38:16 +0100
Message-ID: <bs86t05pelmlvbg73jpbfodtlgpm6cprn6@4ax.com>

"Howard J. Rogers" <hjr_at_dizwell.com> wrote:

>>>Unless you twiddled things when you were creating your file systems,
>>>then that's wrong: it should be 4K (assuming you're using bog-standard
>>>ext2, for example). You'll have I/O problems as a result of having an
>>>Oracle block size that's bigger than your file system buffer size.

Where is the evidence?

>> This is a question I still have to investigate. Do you have numbers?
>> Benchmarks? Records of I/O per sec? One has to MEASURE in order to
>> know - too many myths out there, and this is a complicated matter.
>
>Of course. That's why Steve Adams has articles on it, and why I do.

Seems strange to me that you rely that much other people's opinion. I am missing the facts at www.ixora.com.au also. Don't take me wrong, Ixora is a respectable source of information. But it is not a church.

>> I
>> would expect that several read requests of contiguous blocks be
>> gathered into one request by the OS before being actually issued.
>
>Let's not expect things, eh?!

In the absence of enough information, I would expect that, but I am not insisting that because of this write coalescing feature Oracle performance will be positively affected. I just don't know.

>> Isn't this a classical feature of Unix-like filesystem buffers? (I
>> won't go into read-ahead details done by the storage).
>
>Go read what Steve Adams has to say on the matter. http://www.ixora.com.au.

Go measure results. Sorry.

>> This is similar to using the DB_FILE_MULTIBLOCK_READ_COUNT parameter
>> to coalesce adjacent I/Os, I think (but for FTS).
>> The file systems can be mounted with the sync option, avoiding file
>> system buffering.
>
>Good. If you can do direct I/O, then that's fine.

At my next opportunity, I will experiment with the parameters disk_async_io and filesystemio_options (names from my head).

>> Would that then aliviate the mismatch between Oracle
>> blocksize and filesystem blocksize?
>
>This is not about "filesystem blocksize" but about "file system buffer
>size". There's a difference. And it should hopefully then be
>self-evident that if you by-pass the file system buffer then of course
>there is no mismatch between the Oracle block size and the file system
>buffer size -because you don't actually have one of the latter if doing
>direct I/O!

But I still have 4K on an ext3 Linux filesystem, right? Please explain me something: If I had several filesystems attached to a Linux box, each of them formatted with a different blocksize (mke2fs, option -b), and they are all mounted without any fancy direct I/O option, how would the Linux OS cope with them? Would I have several filesystem buffers (similar to the Oracle case) with a different "file system buffer size" each? This is along your theory. Go break your head.

>> I don't understand the sense of a recycle cache.
>
>Ouch. You should do, let's put it that way.
>
>> Since buffers for
>> blocks read via FTS are put at the end of the MRU list anyway, the
>> main cache already "recycles" block buffers with low "life
>> expectancy", and reserving a recycle cache at the expense of the main
>> cache for just this kind of operation could be a waste of memory in
>> situations where the memory could still be used to hold "hot" blocks
>> instead of being otherwise exclusively reserved for "wasted" blocks.
>
>If it's a FTS, you won't want the blocks cached AT ALL, probably. And if
>that's the case, holding them in cache, even for the time it takes them
>to age out, is *definitely* a waste of memory.

According to the original Oracle documentation I read just today, one has to be carefull to choose a size of the recycle cache large enough to assure that buffers are still cached if they are needed a second time during the process of the full table scan (to be more precise about your "AT ALL" qualifier).

That reminds me the "cache" option for tables, I also never understood how that could go well. Nowadays, Oracle discourages its use. "Of course!", I would say. It is likely to be misused, it could practically flush your default cache. I do not believe blindly what is against my understanding and recommend every one to do the same.

>It's also the case that blocks can slip into the cold half of the LRU
>list which you'd rather not have flushed out. By directing FTS to their
>own portion of the cache, you can give such blocks a bit more time to
>get 're-warmed' up to the MRU end of the LRU list.

I think you don't seem to grasp my point. If you detach one portion of the memory from the default cache for a recycle cache, it will be missing in the default cache. There won't ever be any blocks trying to get re-warmed there. No cache - no blocks, no re-warming. Do not buy *every feature* Oracle sells indiscriminately.

>A recycle cache can be, in short, extremely useful.

Did you actually check that? For something that is "extremely useful", it shouldn't be difficult. But sure, it is difficult to test in a non-artificial manner.

>Then my advice is not to do it at all. If you can't evaluate something
>like this properly, beforehand, you're simply asking for trouble.

I will do compression on a standard blocksize. No time to do tests. I have several databases of different type to tune, not just this one.

>> If you have tangible information about the blocksize discrepancy and
>> its impact to performance, I would REALLY appreciate a pointer to it.
>
>I do have tangible pointers to Steve Adam's website where it's
>discussed. It's silly putting out benchmarks on the matter, and neither
>Steve nor I do.

I respect your opinion, but I dislike it. The prove of your hypothesis is that it would be silly trying to refute the opposite?!

>For me, at least, that's because the benchmarks would be
>true for my machine, my configuration, my hard disks, and would be out
>of date the minute they were produced. I can tell you I did some tests
>about 18 months ago, and a full table scan of a million row table took
>approximately 25% longer with the "wrong" block size than with the
>"right" one (RHES3, ext3, 4K file system buffer size), but make of that
>what you will.

Frankly, 25% is completely meaningless if you consider the enormous impact that your theory about blocksize mismatch should have on DB operations. Didn't we all learn that the really expensive part of DB work consist of its "mechanical" components (e.g. moving heads on disks)? And that accessing disks can be orders of magnitude slower than the pure electronic access?

>The point is that the theory is sound, it's well argued
>by Steve amongst others, and mucking about with block sizes is therefore
>something one would have to benchmark and test on your *own* hardware
>before indulging in it.

OK, I see you are a believer.

I actually already began doing preliminary tests (which varied *more than 25% under consistently same conditions*). I installed a little tool called orabm/orastress and ran it testwise. Next weeks we will see. I intend to run the benchmark on a Linux box with/without directIO, with tablespaces of different Oracle blocksize (4k and 8k). Then we will know for sure how sound this theory is.

Regards
Rick Denoire Received on Wed Dec 29 2004 - 17:38:16 CST