Re: Looking for ideas on blocked sessions updating partitioned table with CLOB (solved)

From: Tim Gorman <tim.evdbt_at_gmail.com>
Date: Tue, 18 Sep 2018 10:27:30 -0600
Message-ID: <b59898df-3520-eaac-2588-72173bd3bd34_at_gmail.com>



Think of it this way:  *this* is the audience who will pull you or your story to pieces, so you've already cleared that hurdle.

I didn't intend to cause stress, just letting you know that stories like this are *valuable*.  Anyone can find problems, fewer can present a solution or two, fewer still can honestly describe what it takes to find a solution.

I enjoyed your summary email, so I know I'd enjoy a longer treatise, that's all.

On 9/18/18 10:02, Chris Taylor wrote:
> Well, that's not intimating.  Not at all :)
>
> Talking to Oracle-L is one thing, putting it out for a wider audience?
> I'm not so sure about all that .
>
> Chris
>
> On Tue, Sep 18, 2018 at 9:41 AM Tim Gorman <tim.evdbt_at_gmail.com
> <mailto:tim.evdbt_at_gmail.com>> wrote:
>
> Chris,
>
> You have a gift for telling the story.  This could make an amazing
> newsletter article or a fantastic presentation that you'll enjoy
> creating and presenting.  The fun part of such an article or
> presentation would be explaining the underlying technology, so
> that the reader/audience enjoys the punchline as much as we have.
>
> For an article in a newsletter, please consider RMOUG SQL>Update
> <http://www.rmoug.org/newsletter/>, the NoCOUG Journal
> <http://nocoug.org/newsletter.html>, IOUG Select
> <http://select.ioug.org/>, or UKOUG Oracle Scene
> <http://www.ukoug.org/membership-new/member-activities/oracle-scene/>?
> There might be others I have overlooked, for which I apologize.
>
> Hope this helps!
>
> -Tim
>
>
> On 9/18/18 08:22, Chris Taylor wrote:
>> So, I need to own up to something.
>>
>> My whole approach to this problem was wrong.  I had been told
>> that the problem with this table was the CLOB and the number of
>> sessions updating the table at the same time.  So, my whole
>> approach was based on that information being TRUE.
>>
>> I was blinded to the 'real' problem and wasn't actively looking
>> for an alternative answer until I posted this thread.  Due to the
>> discussion here, I took a few steps back and actually asked
>> myself "why would this be row locks/blocks?" and WHY don't I see
>> internal locking on the clob if the clob is the problem?
>>
>> At that point I was in 'question everything' mode and started
>> really looking at the Event that the blocking session was sitting
>> in (instead of just assuming it was a problem with design).
>>
>> The blocking sessions were ALWAYS sitting in EVENT: SQL*Net MORE
>> DATA FROM CLIENT and WAIT_CLASS: NETWORK and the wait times were
>> reaching upward of 2 minutes before the wait_time_micro restarted
>> counting.
>>
>> When I saw that (and really looked at it), I was like that makes
>> no sense.  Why would a session be getting latency from the PHP
>> server?  (Why is it taking so long to send an object from the PHP
>> server to the database?)
>>
>> Running strace on the sessions on both the db server and the php
>> server showed the sessions just sitting in either read() or
>> write() to the network socket and never completing.
>>
>> Large file transfers INTO the main db server would stall.
>> (Transfers OUT of the db server were fine) - this affected
>> SQLNet, SCP, ncat etc.  Outgoing was fine, Inbound was terrible.
>>
>> Ultimately we discovered these 2 settings had been DISABLED on
>> this box at some point (as its been around a while):
>>
>> /proc/sys/net/ipv4/tcp_sack
>>
>> /proc/sys/net/ipv4/tcp_timestamps
>>
>>
>> Both of those were "0" whereas all our other db servers those
>> were "1" and weren't experiencing any problem.
>>
>> After setting both of those to "1" , our sessions now complete
>> and no longer block.
>>
>> I know some of you may be thinking "How could he miss all the
>> network waits?"  - And that's a good question.  All I can say is
>> sometimes you get "blinded" by assumptions and aren't really
>> seeing the problem for what it is.
>>
>> I've known this a long, long time.  So, remember kiddos you may
>> have to question 'what you know versus what you think you know' ;)
>>
>> Also, the very fact of sharing this problem with the you on the
>> list gave me the ability to step back and look at it with a new
>> mindset.
>>
>> Anyway, I wanted to share that as a testament (and a huge thank
>> you) to the community we have here.
>>
>> Thanks,
>> Chris
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Aug 31, 2018 at 11:48 AM Jonathan Lewis
>> <jonathan_at_jlcomp.demon.co.uk
>> <mailto:jonathan_at_jlcomp.demon.co.uk>> wrote:
>>
>>
>> If the event is "enq TX: row lock wait" the p1/p2/p3 values
>> will be about the TX lock in v$lock.  p2 = id1, p3 = id2, and
>> p1 will encode "TX" and the lock mode.
>> Select p1raw (if its available) or
>> to_char(p1,'XXXXXXXXXXXXXXXX') and you'll get
>> 00000000054580006 - or possibly a 4 on the end. If it's a
>> mode 4 lock then that's a clue that it's not really about a
>> table row and more likely to be about an index/IOT or some
>> internal anomaly.
>>
>> If you've got the ASH data then current_obj# may give you the
>> object_id of the object being accessed, but it's not entirely
>> reliable.
>>
>> Regards
>> Jonathan Lewis
>>
>> ________________________________________
>> From: Stefan Knecht <knecht.stefan_at_gmail.com
>> <mailto:knecht.stefan_at_gmail.com>>
>> Sent: 31 August 2018 16:04:20
>> To: Jonathan Lewis
>> Cc: christopherdtaylor1994_at_gmail.com
>> <mailto:christopherdtaylor1994_at_gmail.com>; ORACLE-L
>> Subject: Re: Looking for ideas on blocked sessions updating
>> partitioned table with CLOB
>>
>> You can also look at the p1/p2/p3 values of the sessions that
>> are blocked by the TX contention - and see which object the
>> contention is on. If it's the lobindex, that may lead to
>> further clues about some of the possibilities raised by Jonathan
>>
>> Stefan
>>
>>
>>
>> On Fri, Aug 31, 2018 at 8:55 PM, Jonathan Lewis
>> <jonathan_at_jlcomp.demon.co.uk
>> <mailto:jonathan_at_jlcomp.demon.co.uk><mailto:jonathan_at_jlcomp.demon.co.uk
>> <mailto:jonathan_at_jlcomp.demon.co.uk>>> wrote:
>>
>> There is a major problem with basicfile LOBs (with similar,
>> but not so drastic symptoms appearing for securefile LOBs),
>> so as a first strategic step you should probably be planning
>> to find a painless way to move to securefile LOBs.
>>
>> I've written an entire system about a problem with basicfile
>> LOBs when you do inserts and deletes - but the problem is the
>> same if you do lots of updates because for a LOB an update is
>> a delete followed by an insert (with the old LOB left in the
>> LOB segment rather than being copied to the UNDO segment). 
>> The way LOBs handle deletes is that the LOB index is a
>> two-part index of which the first part indexes in time order
>> the chunks of LOBs that have been deleted and the second part
>> indexes by lobid the current LOB chunks.
>>
>> A problem appears when you try to insert a LOB and there
>> isn't any free space in the segment, but there is plenty of
>> freeable space. Your session will free up all the freeable
>> space (deleting LOB index entries) as it goes.  This can take
>> a very long time and while it's happening any other session
>> that want to insert a LOB will wait for your session to
>> finish freeing up the space.
>>
>> Unfortunately I've only see HW enqueues appearing as a result
>> of this action, I don't think I've seen TX enqueues (and I
>> wouldn't really expect to see the "enq TX - row lock
>> contention" enqueue).  THe series start at this URL:
>> http://jonathanlewis.wordpress.com/2017/01/26/basicfile-lobs/
>>
>> Different possibility - is the LOB declared with multiple
>> freepools ? This is the default and it means the LOBindex has
>> only the two parts I describe above.  If you have N freepools
>> then the index consistents of N pairs of parts. This doesn't
>> avoid the problem I've described above (your session will try
>> to free ALL the free space from ALL the pools if it needs
>> space), but having a single freepool may explain your
>> observations.  If I insert a very large LOB PERHAPS you can't
>> insert one until I finish insert LOBIndex entries into my bit
>> of index, in which case maybe you'd show a row wait of some
>> sort - though, again I'd expect a different wait to appear
>> (maybe buffer busy).
>>
>> Another thought - your LOBs enable in-row storage: how many
>> of them are short enough to fit in the row (a few, lots,
>> most) and do they get updated many times in situ, and are
>> many of them likely to start short and then grow  a few times
>> before getting too long ? How long are the rest of the
>> columns in the row ? I'm thinking about the possibility that
>> your smaller LOBs spend some time growing in-row and causing
>> row migration before they get long enough to move into the
>> LOB segment - and row migraion does funny things with ITLs.
>> Maybe something odd happens as a row that's migrated moves
>> its LOB to the LOB segment and the row migrates back to it's
>> original block.
>>
>> ---
>>
>> In the absence of answers - a plan for the future might be to
>> range partition by sess_id and hash subpartiiton by sess_id: 
>> I'm assuming that sess_id is a value that's going to increase
>> with time when I say that.  If that is the case then the
>> benefit of doing this composite partitioning is that on a
>> regular basis you get a new partition and all the action
>> moves into it and any garbage that's happened in the earlier
>> partition can be cleared up (or dropped).
>>
>> If you stick with basicfiles then look at freepools just in
>> case it's relevant - securefile lobs automatically have a
>> better strategy for concurrency but there is a parameter you
>> can set of increase concurrency. (Can't remember which it is
>> at present).
>>
>> Since I've mentioned the series - one feature of the
>> mechanism is that the LOB segment can become much larger than
>> it needs to be, so if you can sum() the sizes of the lobs
>> that exceed the in-linesize and it's much smaller than
>> segment size you'll know that your update mechanism is
>> introducing some sort of problem.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>> ________________________________________
>> From: oracle-l-bounce_at_freelists.org
>> <mailto:oracle-l-bounce_at_freelists.org><mailto:oracle-l-bounce_at_freelists.org
>> <mailto:oracle-l-bounce_at_freelists.org>>
>> <oracle-l-bounce_at_freelists.org
>> <mailto:oracle-l-bounce_at_freelists.org><mailto:oracle-l-bounce_at_freelists.org
>> <mailto:oracle-l-bounce_at_freelists.org>>> on behalf of Chris
>> Taylor <christopherdtaylor1994_at_gmail.com
>> <mailto:christopherdtaylor1994_at_gmail.com><mailto:christopherdtaylor1994_at_gmail.com
>> <mailto:christopherdtaylor1994_at_gmail.com>>>
>> Sent: 31 August 2018 14:23:46
>> To: knecht.stefan_at_gmail.com
>> <mailto:knecht.stefan_at_gmail.com><mailto:knecht.stefan_at_gmail.com
>> <mailto:knecht.stefan_at_gmail.com>>
>> Cc: ORACLE-L
>> Subject: Re: Looking for ideas on blocked sessions updating
>> partitioned table with CLOB
>>
>> Ah, very good questions.
>>
>> For the blocking issue:
>> 1. Blocking Type = enq: TX - row lock contention, yet the
>> SESS_IDs (PK) is different.
>> 2. Basicfile LOBs (I assume these were carried over from an
>> upgrade from a prev version)
>>
>> For the PHP object transfer being slow issue:
>> 1. TNS settings:
>>   db server:
>>      sqlnet.ora
>>      ---------------------------------
>>      send_buf_size=2097152
>>      recv_buf_size=2097152
>>      default_sdu_size=32768
>>      default_tdu_size=32768
>>
>>      tnsnames.ora
>>      -----------------------------
>>      Nothing on the tnsnames.ora
>>
>>    web/app server:
>>       tnsnames.ora
>>       -------------------------------
>>       Nothing specified here either for SDU etc
>>
>> 2. OS = Red hat Linux 6.8 64-bit , kernel 2.6.32-642.el6.x86_64
>>
>> 3. Block Size = 8192
>>
>> Thanks,
>> Chris
>>
>>
>> On Thu, Aug 30, 2018 at 10:02 PM Stefan Knecht
>> <knecht.stefan_at_gmail.com
>> <mailto:knecht.stefan_at_gmail.com><mailto:knecht.stefan_at_gmail.com
>> <mailto:knecht.stefan_at_gmail.com>><mailto:knecht.stefan_at_gmail.com
>> <mailto:knecht.stefan_at_gmail.com><mailto:knecht.stefan_at_gmail.com
>> <mailto:knecht.stefan_at_gmail.com>>>> wrote:
>> When you say "blocking" - what event are the blocked sessions
>> waiting on?
>>
>> Also, what's your TNS config - particularly SDU sizes between
>> the mid-tier and the database (TNS connection string and the
>> receiving listener)?
>>
>> Are you using securefile or basicfile LOBs?
>>
>> What OS is the database on and what block size are you using
>> in the tablespace where the lobs are stored?
>>
>> On Fri, Aug 31, 2018 at 6:21 AM, Chris Taylor
>> <christopherdtaylor1994_at_gmail.com
>> <mailto:christopherdtaylor1994_at_gmail.com><mailto:christopherdtaylor1994_at_gmail.com
>> <mailto:christopherdtaylor1994_at_gmail.com>><mailto:christopherdtaylor1994_at_gmail.com
>> <mailto:christopherdtaylor1994_at_gmail.com><mailto:christopherdtaylor1994_at_gmail.com
>> <mailto:christopherdtaylor1994_at_gmail.com>>>> wrote:
>> Env: 12.1.0.2
>>
>> We have a table that stores session data (base64 encoded)
>> from web sessions.
>> The table has 32 HASH partitions and contains a CLOB.
>>
>> The table is hash partitioned by SESSION_ID which is a
>> 64-char string.
>>
>> We have this code that executed in PHP into the Oracle database:
>> UPDATE sessions_table
>> SET session_data = :session_data  // (CLOB)
>> WHERE session_id = :session_id;
>>
>> For 95% of the sessions this runs very,very fast.
>>
>> For 5% of the sessions that LOB being passed from PHP is 15MB
>> and for some reason PHP is really slow about transmitting
>> those session OBJECTS over to the database.  (Verified
>> through strace etc).
>>
>> While those sessions are waiting to complete the update to
>> the CLOB, they end up BLOCKING other sessions that aren't
>> trying to update the same session id but are in the same
>> partition.
>>
>> I believe they're blocked on the CLOB on the same partition
>> that the long running session has open as I've confirmed that
>> the SESS_ID being updated is different in most cases.
>>
>> So, here's my question, what performance strategy should I be
>> investigating for CLOBS.  Should I just add more partitions
>> and spread out the likelihood that a session will end up in
>> the same partition? Seems logical but I'm not sure how CLOBS
>> play into this.
>>
>> The CLOBS are stored like this:
>> Segment Name = SYS_LOB0051640773C00002$$
>> Index Name = SYS_LOB0051640773C00002$$
>> Chunk = 8192
>> PCTVERSION = 10
>> CACHE = YES
>> LOGGING = None
>> Encrypt = None
>> Compression = None
>> IN_ROW = Yes
>> Partition = Yes
>> Retention = Yes
>>
>> Looking for any ideas on how to prevent a session from
>> blocking other non-related sessions that are updating the CLOB.
>>
>> I'm trying to mitigate the blocking at the db layer while we
>> stand up a product such as Redis to handle the session
>> caching at the server level (and remove it from the db).
>>
>> Chris
>>
>>
>>
>>
>>
>>
>>
>> --
>> //
>> zztat - The Next-Gen Oracle Performance Monitoring and
>> Reaction Framework!
>> Visit us at zztat.net
>> <http://zztat.net><http://zztat.net><http://zztat.net/> |
>> _at_zztat_oracle | fb.me/zztat
>> <http://fb.me/zztat><http://fb.me/zztat><http://fb.me/zztat>
>> | zztat.net/blog/
>> <http://zztat.net/blog/><http://zztat.net/blog/><http://zztat.net/blog/>
>>
>>
>>
>> --
>> //
>> zztat - The Next-Gen Oracle Performance Monitoring and
>> Reaction Framework!
>> Visit us at zztat.net <http://zztat.net><http://zztat.net/> |
>> _at_zztat_oracle | fb.me/zztat
>> <http://fb.me/zztat><http://fb.me/zztat> | zztat.net/blog/
>> <http://zztat.net/blog/><http://zztat.net/blog/>
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 18 2018 - 18:27:30 CEST

Original text of this message