Re: Unique index access path seems very slow
Date: Fri, 10 Feb 2023 18:38:52 +0000
Message-ID: <CAGtsp8nNa+9ihT=z24Kur53KLp5g74AFmQAD5bwTWSLqnY3+eA_at_mail.gmail.com>
On Fri, 10 Feb 2023 at 18:16, Pap <oracle.developer35_at_gmail.com> wrote:
> Thank You Jonathan and Yudhi.
>
> I think one key learning I have got here which is not mentioned in any
> blog or book, I. E, I have never thought of there exists a way to control
> the size of bloom filters. i.e. the technique by which if we want to
> influence oracle to do more work in terms of how big bloom filters it
> applies there to filter out maximum rows , it can be done using simple
> cardinality hints. And this will be beneficial if we want to send a minimum
> amount of rows to the subsequent steps of the execution, also this can be
> helpful to minimize tempspill in case it's happening in subsequent rows
> passed to the hash join. This is awesome. Thanks to Jonathan.
>
> With regards to the number of hash partitions I am not sure if there is a
> best practice to follow for deciding the number of hash partitions here?
> From your response it seems <=10GB is the optimal size per partition.
> Correct me if wrong.
>
>
> On Thu, Feb 9, 2023 at 1:43 PM yudhi s <learnerdatabase99_at_gmail.com>
> wrote:
>
>> Regarding your question on number of HASH partitions for this ~900GB
>> table. So it depends up on multiple factors like , say you should not make
>> it very large number of partition so that can be overhead on parsing time
>> for any quick queries and it will also flood your data dictionary (say
>> histogram for each column across all partition etc). Also if you create a
>> local index there will be those number of index partitions/segments to be
>> scanned by the query which will use that index and that would not be a good
>> thing. But looking your current use case, you want to perform partition
>> wise join with the other table as efficiently as possible, so it seems your
>> 256 hash partition will keep each partition size <4GB(Considering ~900GB
>> table size) and also if in future your data becomes double then too each of
>> your partition size will stays <10GB.
>>
>> Regarding your HASH JOIN trace file i cant comment much. Jonathan may put
>> his thought if any clue there which may improve things. BTW I was thinking
>> if you pass the hash_area_size higher(~2GB) for that session using
>> workarea_size_policy as manual will that help anyway?
>>
>>
>>
>> On Thu, 9 Feb, 2023, 12:42 am Pap, <oracle.developer35_at_gmail.com> wrote:
>>
>>> Thank you so much Jonathan and Yudhi.
>>>
>>> Jonathan, To your point "*Side note: Unless things have changed in
>>> recent versions a 10104 trace will tell you about what's going on with the
>>> hash join, including the number of buckets in the hash table and the
>>> distribution of rows per bucket - that might be quite interesting and
>>> corroborate some of my comments about collisions/false positives etc"*
>>>
>>> I was struggling to generate the '10104' trace initially as I was trying
>>> to do it as level 10. But then I just removed the level from there and saw
>>> the HASH join information populated in the trace file, below is the git
>>> link. Not able to interpret much of it , however I do see a few
>>> things...like say, In one case 'number of partitions fit in memory' is 32
>>> VS 8 in another. The Total number of rows in in-memory partitions is
>>> 2548554 in both cases.
>>>
>>> Not sure if it gives any clue which would help here making existing
>>> query better, but in coming days, yes we are planning to make the
>>> tab_encrypt table(which is ~900GB in size) as ~256 HASH partitions so that
>>> each partition would be small enough i.e. ~3.5GB in size. And the same
>>> number of HASH subpartitions for the transaction table, I hope that will
>>> work in this scenario.
>>>
>>> Below is the trace with cardinality hint 10M(where the bloom filter was
>>> more effective) vs cardinality hint 50K(where the bloom filter was small).
>>>
>>> ALTER SESSION SET EVENTS '10104 trace name context forever';
>>>
>>> **** with cardiality hint of 10Million ********
>>> https://gist.github.com/oracle9999/fb0598d04b0c0938bb6de695c20131fb
>>>
>>> **** with cardinality hint of 50K ********
>>> https://gist.github.com/oracle9999/a9145310a6158f151ac9e04f2bf3ba31
>>>
>>>
>>>
>>> On Thu, Feb 2, 2023 at 8:51 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>>
>>>> Yuhdi,
>>>>
>>>> As I pointed out, I don't think that there's likely to be *much*
>>>> change in performance by hacking in a different cardinality estimates; but
>>>> we have seen that the two different figures produce significant changes in
>>>> WHERE the time is spent and some change in the effectiveness of
>>>> off-loading. Given that clue (and assuming that there isn't a more
>>>> important task to address) I would have spent an hour or two re-running the
>>>> query with a few different cardinality hints between 49K and 2M to see if
>>>> there was a sweet spot that reduced the CPU required to apply the filter,
>>>> maximised the effectiveness of offloading, and minimised the number of rows
>>>> passed up the plan.
>>>>
>>>> IIRC none of the plans showed any writes on the hash join, so I wasn't
>>>> thinking about overheads of hash joins spilling to disk.
>>>>
>>>> The suggestion for re-engineering the data so that Oracle could iterate
>>>> through a partition-wise join was also about offload and CPU efficiency. On
>>>> smaller data volumes a hash table could have both a smaller number of
>>>> buckets and be more accurate in its distribution, so a Bloom filter could
>>>> be more effective and cheaper to use on the offload.
>>>>
>>>> The switch to RAW, of course, is mostly about reducing I/O: the very
>>>> slow runs are probably about resource use by other users on the Cell
>>>> Servers so a smaler data size means less I/O which means less impact when
>>>> the hardware gets busy; it did occur to me to wonder if the CPU cost of
>>>> hashing a 64 byte raw would be less than the cost of hashing a 128 byte
>>>> varchar (answer: probably) which would also reduce run time and the load on
>>>> the cell server (and that last one woudl reduce the risk of large volumes
>>>> of data being sent unprocessed to the database server).
>>>>
>>>>
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, 31 Jan 2023 at 19:54, yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> So Jonathan, OP has supplied plans which shows both cases, I. E, with
>>>>> large estimation the bigger bloom filter is consuming additional CPU cycle
>>>>> and with smaller estimation the hash join is consuming higher CPU cycle.
>>>>> But both the cases the total query execution time is closely equal, also op
>>>>> mentioned both the plans running for ~30minutes+ many times of the day
>>>>> so...
>>>>>
>>>>> when you said below I. E favoring large bloom filter option, so I am
>>>>> wondering if it's because it might help in less temp spill? Or say, do you
>>>>> mean its better option of hinting the inline view or tran_tab estimation
>>>>> very high so that a bigger bloom filter will be applied and the lesser
>>>>> amounts of rows will be passed to the hash join which may also benefit in
>>>>> case of large data volume as temp spill will be minimal?
>>>>>
>>>>> *Note that the Offload Returned Bytes was 300GB for the 49K estimate
>>>>> with the small Bloom filter, and 500GB for the 2M estimate with the large
>>>>> Bloom filter.*
>>>>> *It looks like we need to "fake" the system so that the Bloom filter
>>>>> (estimate) is large enough to eliminate a lot of data while being small
>>>>> enough to be sent to the cell server so that the 14 concurrently active
>>>>> cells can do the row elimination. Beyond that I don't think there's a way
>>>>> to make the query go faster than the (roughly) 650 seconds you've seen so
>>>>> far*.
>>>>>
>>>>>
>>>>> On Tue, 31 Jan, 2023, 3:51 am Jonathan Lewis, <jlewisoracle_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Comparing the 5 hash join plans you've posted:
>>>>>>
>>>>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Feb 10 2023 - 19:38:52 CET