Re: What is a good blocksize to use.

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 30 Sep 2002 08:33:18 +0100
Message-ID: <an8v4v$9cs$1$8300dec7@news.demon.co.uk>
Comments at end
--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA__________November 7/9 (MI), 19/21 (TX)
____England______November 12/14

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html




ben brugman wrote in message ...


>

>"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message

>news:an3l6q$iob$1$8300dec7_at_news.demon.co.uk...

>>

>> Given the details that Ben Brugman has supplied

>> so far for this particular case I __suspect__ that an index

>> structure (any index structure, not just an IOT) on the

>> client_id is going to degenerate quite badly because

>> of the data split across clients.  It is likely (but something

>> to check) that a significant number of leaf blocks will take

>> a 50/50 split and the lower block will remain half empty.

>> But it's the nature of the data - combined with the design

>> of the feature - that introduces the side-effect.

>

>Maintenance on B-tree's is relatively low. Only a few percent

>of all diskaccesses are needed for maintenance. On growing

>datastructures, where the data is inserted 'fairly' random, on

>average each leave and each node (not the root) will be filled

>for 75 %. On a tree which has grown to 5000 blocks there have

>been 4999 splits. This is not to bad. Even for a high changing

>index the maintenance is only a few percent.

>



In this case, the maintenance aspect is pretty irrelevant.
Your comment about 'fairly random' inserts leading to 75%
efficiency is correct. However, your data does not have a
random distribution, which is why I made the comments
I did.

You have a very large number  of client ids (type A, say)
with 3 or 4  rows, 10% have several hundred rows (type B), and
a few have row counts in the hundreds of thousands (type C).

In your case, therefore, there is a significant risk that the arrival
of a type B client will cause a leaf block split at the usual 50/50
location.  It is then less likely than average that future data
(i.e for the few type A client sharing the lower block) will fill
the lower data block, and the only data that could go into the
higher block will be data for that one specific type B client.
Received on Mon Sep 30 2002 - 02:33:18 CDT