Effective deadlock handling

From: Saeed <sr_ng_at_goawaynms-sys-lts.demon.co.uk>
Date: Fri, 2 Jan 2004 22:39:18 +0000
Message-ID: <1Yslu0CWMf9$Ewan@nms-sys-ltd.demon.co.uk>

Hello,

A somewhat perplexing problem that I would like to share, and perhaps in doing so someone wiser will spot what I'm doing wrong.

he following description is a made up application to help explain the problem, so if it sounds a bit naff, sorry.

An application has a JOB_STATS table as follows:

Column Type Null?

------------------- ----------- --------
ID                  NUMBER(10)  NOT NULL
PER_ID              NUMBER(10)  NOT NULL
JOB_DATE            DATE        NOT NULL
NUM_BOOKED          NUMBER(4)   NOT NULL
NUM_CANCELLED       NUMBER(4)   NOT NULL
NUM_COMPLETED       NUMBER(4)   NOT NULL

This table is for upto the minute stats that managers can view to see how jobs are stacking up for future days, and how the current load and backlog is progressing. Actual core data that these stats are based upon are maintained elsewhere. Whenever a change in the core data occurs that pertains to these stats, this table is updated with appropriate triggers. During batch loads in the morning and late evening, this can result in hundreds of updates per second from the batch concurrent jobs.

Now, there are occasions when in a single transaction 2 rows may be updated, e.g. if PER_ID 10 has cancelled an appointment on 10/12/2003 and moved it to 11/12/2003, but the auto scheduler has also decided to swap jobs for this person, by cancelling the 11/12/2003 and creating one on 10/12/2003. So the procedure that would handle this would be called by session 1 as:

pa_stats.swap_appt ( p_per_id => 10,
                     p_old_date => '10/12/2003',
                     p_new_date => '11/12/2003' );

and session 2 would have:

pa_stats.swap_appt ( p_per_id => 10,
                     p_old_date => '11/12/2003',
                     p_new_date => '10/12/2003' );

The procedure should carry out the following UPDATES:

UPDATE job_stats
SET    NUM_CANCELLED = NUM_CANCELLED - 1
WHERE  per_id = p_per_id
AND    job_date = p_old_date;

UPDATE job_stats
SET    NUM_BOOKED = NUM_BOOKED + 1

WHERE per_id = p_per_id
AND job_date = p_new_date;

Now, if the these 2 sessions invoke the procedures as above at the same time, we get a deadlock.

To solve this, I have tried a number of methods:

SELECT FOR UPDATE

CURSOR get_lock ( cp_per_id NUMBER,
                  cp_old_date DATE,
                  cp_new_date DATE ) IS
SELECT id,
       num_cancelled,
       num_booked

FROM job_stats
WHERE per_id = cp_per_id
AND ( job_date = cp_old_date OR job_date = cp_new_date ) FOR UPDATE OF num_cancelled, num_booked;

This helps but does not eliminate the deadlocks, especially during batch updates.

2. SELECT FOR UPDATE NOWAIT As above but with a NOWAIT.
With this we obviously avoid the deadlock, but the problem then is how to deal the "failed" transaction. Retries seem to have limited success. The only reliable option appears to be to write the data to an intermediate table and then another job would complete the UPDATEs from the intermediate data. This is messy, and renders the stats near-time rather than real-time.

3. Exert Some Order in the Locking

We replace the cursor above with:

CURSOR get_lock ( cp_per_id NUMBER,

                  cp_date   DATE ) IS
SELECT id,
       num_cancelled,
       num_booked

FROM job_stats
WHERE per_id = cp_per_id
AND job_date = cp_date
FOR UPDATE OF num_cancelled, num_booked;

Then in the code we try to ensure that the order of the locks is consistent:

IF p_old_date < p_new_date THEN

   OPEN get_lock ( p_per_id, p_old_date );    CLOSE get_lock;
   OPEN get_lock ( p_per_id, p_new_date );    CLOSE get_lock;
ELSE
   OPEN get_lock ( p_per_id, p_new_date );    CLOSE get_lock;
   OPEN get_lock ( p_per_id, p_old_date );    CLOSE get_lock;
END IF; Now this should, to my mind, prevent deadlocks since we know that the locks will be obtained in a consistent ordered manner. To my surprise, this still did not prevent the deadlocks. I'm surprised because I cannot see how a deadlock can occur with this code, even if we have dozens of sessions with varying values (maybe three-way or 4-way contention) I cant see how with this scheme the deadlock can arise.

Can anyone explain this to me?

Kind regards,

Saeed

sr_ng 786 Received on Fri Jan 02 2004 - 16:39:18 CST