Re: 10g - Need advise on large lookup table and optimizing io

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Thu, 26 Jan 2006 19:40:59 +0000 (UTC)
Message-ID: <drb8kb$3t1$1@nwrdmz02.dmz.ncs.ea.ibs-infra.bt.com>

"peter" <p_msantos_at_yahoo.com> wrote in message news:1136566558.089817.287670_at_g49g2000cwa.googlegroups.com...
> Folks,
> I inherited an application that has 1 very large lookup table. This
> lookup table has
> a 90/10 read/write ratio. Most of the time it's read heavily, but
> during uploads
> records that don't currently exist in that table get inserted.
> This table looks like this.
>
> USER_ID| EMAIL | CLIENT_ID | DUPLICATE_ID
>
> ------------------------------------------------------------------------
> 1000000 'email1' 12312 NULL
> 1000001 'email2' 13434 1
>
> Anyway this table has 400 million records an is not partitioned. He
> way 2 main types of
> queries against this table.
>
> SELECT email from THISTABLE where DUPLICATE_ID IS NULL and user_id =
> ?
>
> The other type of query executed is a SELECT where this table is
> simply joined on
> USER_PROF_ID to pull the email address.
>
> SELECT A.EMAIL, B.COL1,B.COL2
> FROM DEMOGRAPHICS B, THISTABLE A
> WHERE B.USER_PROF_ID = A.USER_PROF_ID
>
> We now have indexes on the "THISTABLE" so that the actual data is never
> read. Only the indexes in place are used to satisfy all queries. The
> above SELECT query like many typical queries on our system pulls lots
> of data (500K records to 1 million records).
>
> We have one specific index on the columns
> USER_ID|EMAIL|DUPLICATE_ID|CLIENT_ID
> that we use to satisfy all the large select queries...these queries do
> an INDEX RANGE SCAN on the index columns I just described ..and so
> never need to access the table rows.
>
> Because the table/indexes have nearly 400 million records in them, and
> this index is read my all our "larger" select queries, this index is
> contented for and the IO waits are high.
>
> I know that this data model is not great, but i inherited it and can't
> really change too mch now. But I'm wondering what I can do to speed up
> the lookups during the SELECT queries against that very large index (
> Index now has 3 levels).
>
> I've been doing some testing with a single table hash cluster, but I'm
> not sure if the benefits of a hash cluster outperform the benefits (in
> terms of IO) of looking up the rows in the large index and not having
> to read the table data.
>
> If anyone out there has any input, I'd greatly appreciate it.
>
> --peter
>

You have shown a query on user_prof_id, which is a column that doesn't appear to exist, so I assume it is really user_id.

Assuming you have a primary key (user_id, email_id seems to be the probably key) this looks like a good candidate for a hash partitioned IOT. Gets rid of the table, automatically satisfies the indexed access requirement, and takes the heat off the root block.

I am, of course, guessing about the intent and statistics.

-- 
Regards

Jonathan Lewis

http://www.jlcomp.demon.co.uk/faq/ind_faq.html
The Co-operative Oracle Users' FAQ

http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
Cost Based Oracle: Fundamentals

http://www.jlcomp.demon.co.uk/appearances.html
Public Appearances - schedule updated 10th Jan 2006

Received on Thu Jan 26 2006 - 13:40:59 CST