Re: Unique index access path seems very slow
Date: Sat, 11 Feb 2023 09:52:04 +0530
Message-ID: <CAEzWdqe21KU3WkZTdzpNE1hcNRs+E=e3o5uX9h9A972tEKi1xw_at_mail.gmail.com>
> The reason you're not likely to have seen anything about tuning the Bloom
> filter is because you're one of a small set of people doing a certain type
> of processing with very large volumes of data on Exadata. The tuning
> we're trying to handle has come up because we're trying to get the largest
> possible Bloom filter that will be sent down to the cell and applied
> there. It's a little unlikely that anyone would notice the possible effect
> of the combination. On top of that the initial allocation of memory
> depends on what Oracle thinks it will need - and the effectiveness of the
> hash table will be affected by the size of the initial estimates.
>
> I thought I'd written a reply explaining the effects you were seeing from
> the different cardinality hints you were trying, but if I have it's not
> going into the mail. I'll try to find time to rewrite over the weekend.
>
> You might like to try a series of test to find a sweet spot. After see
> the effect of 50K I think I'd try 100K, 150K, 200K, 250K. as the
> cardinality goes up I think the offload will increase until it suddenly
> drops off because the Bloom filter has got too big to send to the cell. (To
> be honest I'd like to get my hands on your database for a few hours to
> experiment with this detail.)
>
>
> Regards
> Jonathan Lewis
>
> On Fri, 10 Feb 2023 at 18:16, Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Thank You Jonathan and Yudhi.
>>
>> I think one key learning I have got here which is not mentioned in any
>> blog or book, I. E, I have never thought of there exists a way to control
>> the size of bloom filters. i.e. the technique by which if we want to
>> influence oracle to do more work in terms of how big bloom filters it
>> applies there to filter out maximum rows , it can be done using simple
>> cardinality hints. And this will be beneficial if we want to send a minimum
>> amount of rows to the subsequent steps of the execution, also this can be
>> helpful to minimize tempspill in case it's happening in subsequent rows
>> passed to the hash join. This is awesome. Thanks to Jonathan.
>>
>> With regards to the number of hash partitions I am not sure if there is a
>> best practice to follow for deciding the number of hash partitions here?
>> From your response it seems <=10GB is the optimal size per partition.
>> Correct me if wrong.
>>
>>
>> On Thu, Feb 9, 2023 at 1:43 PM yudhi s <learnerdatabase99_at_gmail.com>
>> wrote:
>>
>>> Regarding your question on number of HASH partitions for this ~900GB
>>> table. So it depends up on multiple factors like , say you should not make
>>> it very large number of partition so that can be overhead on parsing time
>>> for any quick queries and it will also flood your data dictionary (say
>>> histogram for each column across all partition etc). Also if you create a
>>> local index there will be those number of index partitions/segments to be
>>> scanned by the query which will use that index and that would not be a good
>>> thing. But looking your current use case, you want to perform partition
>>> wise join with the other table as efficiently as possible, so it seems your
>>> 256 hash partition will keep each partition size <4GB(Considering ~900GB
>>> table size) and also if in future your data becomes double then too each of
>>> your partition size will stays <10GB.
>>>
>>> Regarding your HASH JOIN trace file i cant comment much. Jonathan may
>>> put his thought if any clue there which may improve things. BTW I was
>>> thinking if you pass the hash_area_size higher(~2GB) for that session using
>>> workarea_size_policy as manual will that help anyway?
>>>
>>>
>>>
>>> On Thu, 9 Feb, 2023, 12:42 am Pap, <oracle.developer35_at_gmail.com> wrote:
>>>
>>>> Thank you so much Jonathan and Yudhi.
>>>>
>>>> Jonathan, To your point "*Side note: Unless things have changed in
>>>> recent versions a 10104 trace will tell you about what's going on with the
>>>> hash join, including the number of buckets in the hash table and the
>>>> distribution of rows per bucket - that might be quite interesting and
>>>> corroborate some of my comments about collisions/false positives etc"*
>>>>
>>>> I was struggling to generate the '10104' trace initially as I was
>>>> trying to do it as level 10. But then I just removed the level from there
>>>> and saw the HASH join information populated in the trace file, below is the
>>>> git link. Not able to interpret much of it , however I do see a few
>>>> things...like say, In one case 'number of partitions fit in memory' is 32
>>>> VS 8 in another. The Total number of rows in in-memory partitions is
>>>> 2548554 in both cases.
>>>>
>>>> Not sure if it gives any clue which would help here making existing
>>>> query better, but in coming days, yes we are planning to make the
>>>> tab_encrypt table(which is ~900GB in size) as ~256 HASH partitions so that
>>>> each partition would be small enough i.e. ~3.5GB in size. And the same
>>>> number of HASH subpartitions for the transaction table, I hope that will
>>>> work in this scenario.
>>>>
>>>> Below is the trace with cardinality hint 10M(where the bloom filter was
>>>> more effective) vs cardinality hint 50K(where the bloom filter was small).
>>>>
>>>> ALTER SESSION SET EVENTS '10104 trace name context forever';
>>>>
>>>> **** with cardiality hint of 10Million ********
>>>> https://gist.github.com/oracle9999/fb0598d04b0c0938bb6de695c20131fb
>>>>
>>>> **** with cardinality hint of 50K ********
>>>> https://gist.github.com/oracle9999/a9145310a6158f151ac9e04f2bf3ba31
>>>>
>>>>
>>>>
>>>> On Thu, Feb 2, 2023 at 8:51 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Yuhdi,
>>>>>
>>>>> As I pointed out, I don't think that there's likely to be *much*
>>>>> change in performance by hacking in a different cardinality estimates; but
>>>>> we have seen that the two different figures produce significant changes in
>>>>> WHERE the time is spent and some change in the effectiveness of
>>>>> off-loading. Given that clue (and assuming that there isn't a more
>>>>> important task to address) I would have spent an hour or two re-running the
>>>>> query with a few different cardinality hints between 49K and 2M to see if
>>>>> there was a sweet spot that reduced the CPU required to apply the filter,
>>>>> maximised the effectiveness of offloading, and minimised the number of rows
>>>>> passed up the plan.
>>>>>
>>>>> IIRC none of the plans showed any writes on the hash join, so I wasn't
>>>>> thinking about overheads of hash joins spilling to disk.
>>>>>
>>>>> The suggestion for re-engineering the data so that Oracle could
>>>>> iterate through a partition-wise join was also about offload and CPU
>>>>> efficiency. On smaller data volumes a hash table could have both a smaller
>>>>> number of buckets and be more accurate in its distribution, so a Bloom
>>>>> filter could be more effective and cheaper to use on the offload.
>>>>>
>>>>> The switch to RAW, of course, is mostly about reducing I/O: the very
>>>>> slow runs are probably about resource use by other users on the Cell
>>>>> Servers so a smaler data size means less I/O which means less impact when
>>>>> the hardware gets busy; it did occur to me to wonder if the CPU cost of
>>>>> hashing a 64 byte raw would be less than the cost of hashing a 128 byte
>>>>> varchar (answer: probably) which would also reduce run time and the load on
>>>>> the cell server (and that last one woudl reduce the risk of large volumes
>>>>> of data being sent unprocessed to the database server).
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>> Jonathan Lewis
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 31 Jan 2023 at 19:54, yudhi s <learnerdatabase99_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> So Jonathan, OP has supplied plans which shows both cases, I. E, with
>>>>>> large estimation the bigger bloom filter is consuming additional CPU cycle
>>>>>> and with smaller estimation the hash join is consuming higher CPU cycle.
>>>>>> But both the cases the total query execution time is closely equal, also op
>>>>>> mentioned both the plans running for ~30minutes+ many times of the day
>>>>>> so...
>>>>>>
>>>>>> when you said below I. E favoring large bloom filter option, so I am
>>>>>> wondering if it's because it might help in less temp spill? Or say, do you
>>>>>> mean its better option of hinting the inline view or tran_tab estimation
>>>>>> very high so that a bigger bloom filter will be applied and the lesser
>>>>>> amounts of rows will be passed to the hash join which may also benefit in
>>>>>> case of large data volume as temp spill will be minimal?
>>>>>>
>>>>>> *Note that the Offload Returned Bytes was 300GB for the 49K estimate
>>>>>> with the small Bloom filter, and 500GB for the 2M estimate with the large
>>>>>> Bloom filter.*
>>>>>> *It looks like we need to "fake" the system so that the Bloom filter
>>>>>> (estimate) is large enough to eliminate a lot of data while being small
>>>>>> enough to be sent to the cell server so that the 14 concurrently active
>>>>>> cells can do the row elimination. Beyond that I don't think there's a way
>>>>>> to make the query go faster than the (roughly) 650 seconds you've seen so
>>>>>> far*.
>>>>>>
>>>>>>
>>>>>> On Tue, 31 Jan, 2023, 3:51 am Jonathan Lewis, <jlewisoracle_at_gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Comparing the 5 hash join plans you've posted:
>>>>>>>
>>>>>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Sat Feb 11 2023 - 05:22:04 CET