Re: Unique index access path seems very slow
Date: Tue, 24 Jan 2023 15:41:49 +0530
Message-ID: <CAEjw_fgVNvEgoXWdSNqxtuVar-8aRVbhKLmg5cqdz8u7kzvHWQ_at_mail.gmail.com>
Thank you Jonathan. I need to check this from dev to confirm if RAW can be stored and compared in the query logic rather hex one. But my understanding was that there are already up/downstream system exists, which already storing and distinguishing this column in hex format only, and that might be the reason we are keeping it as hex here.
Also as Andy pointed regarding slow IO, I tried running the similar query on same database by keeping a different table/column in outer join and the IO response seems to be lot better. Wondering why is the encrypted column table/column(TAB_ENCRYPT/COL1_SHA_512) behaving such a way then?
https://gist.github.com/oracle9999/c2b8f2daa3c43b42b51ba8e71ef4e28f
Also, I was wondering if row chaining or row migration playing any role but from below table stats for TAB_ENCRYPT, it doesn't seem like any of those case also i am not seeing those stats in v$sesstat results.
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN PCT_FREE INI_TRANS MAX_TRANS TAB_ENCRYPT 2987284438 113548212 268 10 1 255 On Tue, 24 Jan, 2023, 2:53 pm Jonathan Lewis, <jlewisoracle_at_gmail.com> wrote:
> One quick question before I take a proper look at this .
>
> Is there any good reason why you've stored an SHA value as a 128 character
> HEX form rather than a 64 byte RAW. (Which would save you 64 * 3 billion *
> 2 bytes - roughly 384 GB. It would also cut out one step of the conversion
> code, and roughly halve the size of any hash table used for a join.
>
> Regards
> Jonathan Lewis
>
>
>
>
> On Tue, 24 Jan 2023 at 05:19, Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Thank You So much Andy and Mladen.
>>
>> Just to see if CPU because of HASH function evaluation is really the
>> bottleneck here, I just tried to execute the inline view part of the query
>> to know how much overhead really the HASH function evaluation is putting
>> here. As I see , it finishes in ~40 seconds for ~12million rows which is a
>> lot better I think. So basically it's the outer join which is causing the
>> bottleneck here. And outer join should not have to perform/apply the
>> additional HASH function because those hash values are now readily
>> available from inline view from TRAN_TAB and another row source from
>> TAB_ENCRYPT also holds the hashed value in a column itself. So why is the
>> final reading of data from table TAB_ENCRYPT and outer join with the inline
>> view result so slow here?
>>
>> https://gist.github.com/oracle9999/d4ffa07a4f499e1114feba55a4417aa9
>>
>> Also I realized in the indexed execution path which I posted , I had
>> missed the dt_cr filter for the inline view query. So I reran the indexed
>> path again and updated the details. But again no difference i saw even its
>> a different time i am running the query. ~378K in ~30minutes. It's a lot
>> less as compared to even the FTS path. So there appears to be something
>> wrong with the way index data is stored or distributed in the table
>> TAB_ENCRYPT for column COL1_SHA_512. As I believe it can't be the
>> underlying IO slowness as we have other applications and queries running
>> without any issue on this database. Anyway, I will also try to fetch data
>> using another index on a normal column and will see if 100 IOPS is really
>> the speed, (which I believe is not the normal speed of IO on this box
>> anyway).
>>
>> https://gist.github.com/oracle9999/9792a70f748b71c365397c5458d3e9e8
>>
>> Regarding the question " *is the related data inserted at the same time?
>> That way you can do a hash join of the two recently inserted data sets*."
>> The transaction table TRAN_TAB is getting the data from another process
>> 24/7 throughout the day but this process which matches the encrypt column
>> value and passes it to the other system is different altogether and has
>> dependency with other processes/workflows.
>>
>> Regards
>> Pap
>>
>>
>> On Tue, Jan 24, 2023 at 3:22 AM Mladen Gogala <gogala.mladen_at_gmail.com>
>> wrote:
>>
>>> On 1/23/23 13:46, Pap wrote:
>>>
>>> This database is on 19C of Oracle exadata. Below is the query and the
>>> sql monitor for both the FULL scan path and index access path. Also I had
>>> captured the non-zero stats from gv$sesstat for ~10minutes execution for
>>> each of those runs. The index access path has just processed ~82K in
>>> 30minutes using the index access path and has still not finished yet. We
>>> want to make this query finish in ~2-3minutes.
>>>
>>> SELECT TRAN_TAB.COL1_SHA_512, TAB_ENCRYPT.TAB_ENCRYPT_COL2.....
>>> FROM (SELECT RAWTOHEX ( STANDARD_HASH (TO_CHAR (TRIM (
>>> TRAN_TAB.LN4)),'SHA512')) AS COL1_SHA_512
>>> FROM TRAN_TAB
>>> WHERE TRAN_TAB.PD_CD(+) = 'XXX'
>>> AND TRAN_TAB.PART_DT = to_date(:B1,'DD-MON-YYYY HH24:MI:SS')
>>> AND TRAN_TAB.dt_cr between to_date(:B2,'DD-MON-YYYY HH24:MI:SS') and
>>> to_date(:B3,'DD-MON-YYYY HH24:MI:SS')
>>> ) TRAN_TAB, TAB_ENCRYPT
>>> WHERE TRAN_TAB.COL1_SHA_512 = TAB_ENCRYPT.COL1_SHA_512(+)
>>>
>>> ****** "Full scan" path sql monitor + nonzero results from gv$sesstat
>>> from initial ~10minutes run*******
>>>
>>> https://gist.github.com/oracle9999/b37160097fa8e3929fb66af61ebbf9ed
>>>
>>> ***** "Index access" path sql monitor + nonzero results from gv$sesstat
>>> from initial ~10minutes run*******
>>>
>>> https://gist.github.com/oracle9999/9792a70f748b71c365397c5458d3e9e8
>>>
>>> The requirement is something as below...
>>>
>>> We have two tables TRAN_TAB and TAB_ENCRYPT. Table(TAB_ENCRYPT) is an
>>> insert only table and holds the encrypted value(COL1_SHA_512) of a
>>> confidential column/attribute. Each row of the transaction table(TRAN_TAB)
>>> has to be checked/joined against all the encrypted column values of table
>>> TAB_ENCRYPT using the above query and the results along with additional
>>> columns from table tab_encrypt have to be sent to another downstream
>>> system. This query is supposed to run multiple times in a day so as to send
>>> the results to the downstream system as real time as possible based on the
>>> records in the transaction table tran_tab. Currently the plan is to run it
>>> once in ~5minutes. The table TAB_ENCRYPT holds ~3billion rows. And the
>>> transaction table TRAN_TAB will have Approx ~400 million rows per
>>> day/part_dt to get it joined with the encrypted column value- COL1_SHA_512.
>>>
>>> As in the sql monitor, the original query was taking ~30minutes to join
>>> with ~3million rows in the transaction table. And the majority of the
>>> resources seems to be spent while full scanning the table TAB_ENCRYPT. And
>>> thus we tried creating a unique index on column "COL1_SHA_512" thinking it
>>> will be fast, but it appears to be further degrading the performance of
>>> this query. Want to understand why it's happening this way. Is it because
>>> the column holds ~128byte values so the index becomes too big to fit in
>>> cache? and then how can we fix this query?
>>>
>>> The column "COL1_SHA_512" , holds the SHA512 encrypted value of another
>>> column. The sample encrypted values look something like below. The size of
>>> the index on this column is ~700GB. And the statistics of the index is as
>>> below. It contains all unique values though but i think because of its
>>> 128byte length the size of the index becomes this huge. The size of the
>>> table tran_tab is ~880GB.
>>>
>>>
>>> "99983FB4F3BEAA516BE40B85706338B0D7E5D14D1B2A3C945D5F74947A3E8DA8E714D1761D008715CD46B6C6CDE8B99690F2AE04D97D9KHYU67GBF4589HNFRSDR"
>>>
>>>
>>> *Column 'COL1_SHA_512' statistics:- *NUM_DISTINCT NULLABLE DENSITY
>>> NUM_NULLS AVG_COL_LEN
>>> 2987284438 N 0.000000000334752187397831
>>> 0 129
>>>
>>>
>>> *Unique Index(on column COL1_SHA_512) Statistics:- *INDEX_NAME
>>> BLEVEL PCT_FREE LEAF_BLOCKS CLUSTERING_FACTOR
>>> NUM_ROWS
>>> TAB_ENCRYPT_UK 3 10 83389146
>>> 3008764650 3008764650
>>>
>>> Regards
>>> Pap
>>>
>>>
>>>
>>> Pap, your index plan is attempting to read 20 million rows from the
>>> outer table and join the result to the inner table using nested loops. Yes,
>>> Bloom filter will eliminate some rows from the outer table but there will
>>> still be a lot to read. There will be 80295 read requests and 627MB to read
>>> from the table itself and 152K reads totalling 1GB from the index.
>>> Depending on the model of your Exadata, cell single block read can be
>>> relatively slow. You should take a look at your AWR report. If your cell
>>> single block read average time is in milliseconds, that would explain your
>>> problem. Exadata was made to speed up full table scan. It doesn't do much
>>> for the single block reads.
>>>
>>> However, 80295 rows out of 20M is a relatively small portion of data. Is
>>> there a possibility to use index range scan on the outer table?
>>>
>>> --
>>> Mladen Gogala
>>> Database Consultant
>>> Tel: (347) 321-1217https://dbwhisperer.wordpress.com
>>>
>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Jan 24 2023 - 11:11:49 CET