Re: Unique index access path seems very slow
Date: Tue, 24 Jan 2023 09:23:11 +0000
Message-ID: <CAGtsp8mqnZDNmq4AoMRG51bcC1CtW=cerbkd0y2vCWJSdYbw-A_at_mail.gmail.com>
One quick question before I take a proper look at this .
Is there any good reason why you've stored an SHA value as a 128 character
HEX form rather than a 64 byte RAW. (Which would save you 64 * 3 billion *
2 bytes - roughly 384 GB. It would also cut out one step of the conversion
code, and roughly halve the size of any hash table used for a join.
Regards
On Tue, 24 Jan 2023 at 05:19, Pap <oracle.developer35_at_gmail.com> wrote:
> Thank You So much Andy and Mladen.
Jonathan Lewis
>
> Just to see if CPU because of HASH function evaluation is really the
> bottleneck here, I just tried to execute the inline view part of the query
> to know how much overhead really the HASH function evaluation is putting
> here. As I see , it finishes in ~40 seconds for ~12million rows which is a
> lot better I think. So basically it's the outer join which is causing the
> bottleneck here. And outer join should not have to perform/apply the
> additional HASH function because those hash values are now readily
> available from inline view from TRAN_TAB and another row source from
> TAB_ENCRYPT also holds the hashed value in a column itself. So why is the
> final reading of data from table TAB_ENCRYPT and outer join with the inline
> view result so slow here?
>
> https://gist.github.com/oracle9999/d4ffa07a4f499e1114feba55a4417aa9
>
> Also I realized in the indexed execution path which I posted , I had
> missed the dt_cr filter for the inline view query. So I reran the indexed
> path again and updated the details. But again no difference i saw even its
> a different time i am running the query. ~378K in ~30minutes. It's a lot
> less as compared to even the FTS path. So there appears to be something
> wrong with the way index data is stored or distributed in the table
> TAB_ENCRYPT for column COL1_SHA_512. As I believe it can't be the
> underlying IO slowness as we have other applications and queries running
> without any issue on this database. Anyway, I will also try to fetch data
> using another index on a normal column and will see if 100 IOPS is really
> the speed, (which I believe is not the normal speed of IO on this box
> anyway).
>
> https://gist.github.com/oracle9999/9792a70f748b71c365397c5458d3e9e8
>
> Regarding the question " *is the related data inserted at the same time?
> That way you can do a hash join of the two recently inserted data sets*."
> The transaction table TRAN_TAB is getting the data from another process
> 24/7 throughout the day but this process which matches the encrypt column
> value and passes it to the other system is different altogether and has
> dependency with other processes/workflows.
>
> Regards
> Pap
>
>
> On Tue, Jan 24, 2023 at 3:22 AM Mladen Gogala <gogala.mladen_at_gmail.com>
> wrote:
>
>> On 1/23/23 13:46, Pap wrote:
>>
>> This database is on 19C of Oracle exadata. Below is the query and the sql
>> monitor for both the FULL scan path and index access path. Also I had
>> captured the non-zero stats from gv$sesstat for ~10minutes execution for
>> each of those runs. The index access path has just processed ~82K in
>> 30minutes using the index access path and has still not finished yet. We
>> want to make this query finish in ~2-3minutes.
>>
>> SELECT TRAN_TAB.COL1_SHA_512, TAB_ENCRYPT.TAB_ENCRYPT_COL2.....
>> FROM (SELECT RAWTOHEX ( STANDARD_HASH (TO_CHAR (TRIM (
>> TRAN_TAB.LN4)),'SHA512')) AS COL1_SHA_512
>> FROM TRAN_TAB
>> WHERE TRAN_TAB.PD_CD(+) = 'XXX'
>> AND TRAN_TAB.PART_DT = to_date(:B1,'DD-MON-YYYY HH24:MI:SS')
>> AND TRAN_TAB.dt_cr between to_date(:B2,'DD-MON-YYYY HH24:MI:SS') and
>> to_date(:B3,'DD-MON-YYYY HH24:MI:SS')
>> ) TRAN_TAB, TAB_ENCRYPT
>> WHERE TRAN_TAB.COL1_SHA_512 = TAB_ENCRYPT.COL1_SHA_512(+)
>>
>> ****** "Full scan" path sql monitor + nonzero results from gv$sesstat
>> from initial ~10minutes run*******
>>
>> https://gist.github.com/oracle9999/b37160097fa8e3929fb66af61ebbf9ed
>>
>> ***** "Index access" path sql monitor + nonzero results from gv$sesstat
>> from initial ~10minutes run*******
>>
>> https://gist.github.com/oracle9999/9792a70f748b71c365397c5458d3e9e8
>>
>> The requirement is something as below...
>>
>> We have two tables TRAN_TAB and TAB_ENCRYPT. Table(TAB_ENCRYPT) is an
>> insert only table and holds the encrypted value(COL1_SHA_512) of a
>> confidential column/attribute. Each row of the transaction table(TRAN_TAB)
>> has to be checked/joined against all the encrypted column values of table
>> TAB_ENCRYPT using the above query and the results along with additional
>> columns from table tab_encrypt have to be sent to another downstream
>> system. This query is supposed to run multiple times in a day so as to send
>> the results to the downstream system as real time as possible based on the
>> records in the transaction table tran_tab. Currently the plan is to run it
>> once in ~5minutes. The table TAB_ENCRYPT holds ~3billion rows. And the
>> transaction table TRAN_TAB will have Approx ~400 million rows per
>> day/part_dt to get it joined with the encrypted column value- COL1_SHA_512.
>>
>> As in the sql monitor, the original query was taking ~30minutes to join
>> with ~3million rows in the transaction table. And the majority of the
>> resources seems to be spent while full scanning the table TAB_ENCRYPT. And
>> thus we tried creating a unique index on column "COL1_SHA_512" thinking it
>> will be fast, but it appears to be further degrading the performance of
>> this query. Want to understand why it's happening this way. Is it because
>> the column holds ~128byte values so the index becomes too big to fit in
>> cache? and then how can we fix this query?
>>
>> The column "COL1_SHA_512" , holds the SHA512 encrypted value of another
>> column. The sample encrypted values look something like below. The size of
>> the index on this column is ~700GB. And the statistics of the index is as
>> below. It contains all unique values though but i think because of its
>> 128byte length the size of the index becomes this huge. The size of the
>> table tran_tab is ~880GB.
>>
>>
>> "99983FB4F3BEAA516BE40B85706338B0D7E5D14D1B2A3C945D5F74947A3E8DA8E714D1761D008715CD46B6C6CDE8B99690F2AE04D97D9KHYU67GBF4589HNFRSDR"
>>
>>
>> *Column 'COL1_SHA_512' statistics:- *NUM_DISTINCT NULLABLE DENSITY
>> NUM_NULLS AVG_COL_LEN
>> 2987284438 N 0.000000000334752187397831
>> 0 129
>>
>>
>> *Unique Index(on column COL1_SHA_512) Statistics:- *INDEX_NAME
>> BLEVEL PCT_FREE LEAF_BLOCKS CLUSTERING_FACTOR
>> NUM_ROWS
>> TAB_ENCRYPT_UK 3 10 83389146
>> 3008764650 3008764650
>>
>> Regards
>> Pap
>>
>>
>>
>> Pap, your index plan is attempting to read 20 million rows from the outer
>> table and join the result to the inner table using nested loops. Yes, Bloom
>> filter will eliminate some rows from the outer table but there will still
>> be a lot to read. There will be 80295 read requests and 627MB to read from
>> the table itself and 152K reads totalling 1GB from the index. Depending on
>> the model of your Exadata, cell single block read can be relatively slow.
>> You should take a look at your AWR report. If your cell single block read
>> average time is in milliseconds, that would explain your problem. Exadata
>> was made to speed up full table scan. It doesn't do much for the single
>> block reads.
>>
>> However, 80295 rows out of 20M is a relatively small portion of data. Is
>> there a possibility to use index range scan on the outer table?
>>
>> --
>> Mladen Gogala
>> Database Consultant
>> Tel: (347) 321-1217https://dbwhisperer.wordpress.com
>>
>>
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Jan 24 2023 - 10:23:11 CET