Re: Unique index access path seems very slow
Date: Tue, 24 Jan 2023 14:52:54 +0000
Message-ID: <CAGtsp8kk0rWjDFRyDOVb2vAGTvThd4bmHjaKVK8_ZC-C3UughA_at_mail.gmail.com>
It doesn't matter what "the other systems" do, the "RAW" and "HEX" are
representations of the same value. You ought to halve the size of the index
and reduce the size of the table by a significant percentage. If the
downstream systems want to see a hex representation and don't want to query
the conversion for themselves then create a virtual column that represents
the raw as a hex string.
The I/O rate doesn't appear to be a problem, but it's a little hard to
determine without a wait event histogram. The nested loop plan I'm looking
at says you did 815K (463K + 351K) requests for a single block inside an
interval of 1,589 (1591 - 2) seconds, which is 513 I/O requests per second
Your "similar query" was probably not similar enough to tell you anything
useful, but we can't tell because all yo u showed us a an execution plan
You big index is running a little less efficiently than I might expect
roughly 5,000 bytes used per block rather then a typical 5,600 / 70%) but
that may be an inidilcation that you rebuilt it some time in the not too
distant past and have a large number of blocks that have gone through 50/50
spilts and are still moving towards a stable state. That's not your big
issue, though.
Basically, as Andy pointed out, your index is big and the standard hash
means you are inserting and querying blocks in a completly random pattern
across the entire range of the index. Do some arithmetic - you're after 3M
rows (from tran_tab), which means 1 row in 1,000 from encrypt_tab. Your
index entries are about 140 bytes (128 plus a little overhead), and you're
getting about 36 entries per leaf block. That means "on average" you're
looking for one row out of every 28 index leaf blocks (and it's a little
worse of the table) - the probability of being able to take advantage of
any previously cached blocks is pretty low - so if you want to estimate the
workload if your query uses a nested loop, it's likely to in the ballpark
of the driving tablescan time plus 2 disk reads (one leaf block one table
block) per row supplied by tran_tab. (There's a note about thinking at
scale here: https://jonathanlewis.wordpress.com/2007/03/18/thinking-big/ )
An important question then is: what's the significance of the
tran_tab.pd_cd (which has a redundant outer join symbol in your example) -
does it mean that of the final 400M rows you're actually only going to
movea small percentage, or is it a flag with a small number of values and
each value directs data to a different system, or is it just a way of
making a high volume query a lower volume query, or what .....
Also: why are you testing with a query for an hour range? If you want to
run every 5 minutes should this be a 5 minute range (which would give an
average of 1.4M rows ... is this a "worst case" test or is there some other
significance.
Fundamental and obvious strategy: store raw, not hex
Suggestion to be tested: if you hash partition encrypt_tab on the sha512
column, and rebuild tran_tab as a composite partitioned table where the
second dimension is also hash partitioned on the (virtual) sha512 column
with the same number of (sub)partitions then Oracle should be able to do a
partial partiton-wise join between the two and a series of small(-ish) hash
joins may be much more efficient that one huge hash join. (If the path is
a nested loop then a local index MIGHT see better caching on a partial
partition-wise join.)
Secondary suggestion: (probably worth only a small amount of CPU time. It
looks like you have a date-only column for the partition key column, so I'm
guessing the table is range partitioned (perhaps with an interval clause of
one day). With this setup the predicate pt_date = {constant} has to be
applied to every rows you visit because there may be rows (from Oracle's
perspective) which have a non-midnight time component. If you change the
table to list-partitioned with exxactly one date(-only) value defining each
partition then Oracle could recognise that this predicate was exactly ALL
the rows in the partition and not need to test each row. I think your
version of Oracle even allows automatic list partitioning so that a new
list partition would be created automatically when a new data appeared.
Curiosity suggestion: You have about 3 billion rows in encrypt_tab, and 2
billion of them survive the bloom filter. Of course that might mean
anything between 1.5 billion and 2.5 billion. However, the tablescan
reports 135 seconds (samples) of CPU, ,and that MIGHT be the cost of
applying the Bloom filter to every single row. It's possible (though I
think a little unlikely) that if you disable the Bloom filtering the CPU
utilisation will drop (the outline shows you the Bloom filter hint -
px_join_filter - add this to the query but change it to
NO_px_join_filter). You may find, anyway that if you add a cardinality (or
OPT_ESTIMATE) hint to the query to tell it that tran_tab will supply 3M
rows the Bloom filter may simply not appear anyway. I'd have to mess about
a bit to get it right (and the hint_report option to
dbms_xplan.display_cursor() might help) but it would probably be something
like: cardinality(_at_sel$f5bb74e1 tran_tab_at_sel$2 3000000)
Regards
(and there was other activity going on at the same time). If you want a
better picture of the actual I/O waits you'll probably have to enable SQL
tracing for a whle then examine the trace file. (I'm assuming, by the way,
that the database is using the standard 8KB block size - and I'm also
assuming that neither table is uses HCC.)
(and that showed your 2nd table was partitioned and you were using a local
index to access it, so not similar at all).
Jonathan Lewis
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Jan 24 2023 - 15:52:54 CET