Using reverse key indexes to solve buffer busy wait problems
Buffer busy wait and related events can cripple performance of concurrent inserts. Bad in a single instance database, far worse in a RAC (think "gc buffer busy"). Often the problem is because of a primary key populated from a sequence. Reversing the index can fix this problem.
Contention for index blocks when inserting grows can cause an application to hang up completely. This is because with a b-tree index on a monotonically increasing key, even though there will never be row lock all the inserted keys are going onto the same block at the edge of the index. A reverse key index will fix this. If you index, for example, 19, 20 and 21 as themselves, all three keys will probably be in the same block of the index. Instead, you would index them as 91, 02, and 12. So consecutive values will not be adjacent in the index: they will be distributed across the whole width of the index. You could do this programmatically, but Oracle provides the reverse key index for exactly this purpose. Here's an example:
connect / as sysdba drop user jw cascade; grant dba to jw identified by jw; conn jw/jw create table t1 (c1 number); create index normal_i on t1(c1); create table t2 (c1 number); create index reverse_i on t2(c1) reverse; create sequence s1 cache 100000 noorder; create or replace procedure ins_normal(n number) as begin for i in 1..n loop insert into t1 values(s1.nextval); end loop; commit; end; / create or replace procedure ins_reverse(n number) as begin for i in 1..n loop insert into t2 values(s1.nextval); end loop; commit; end; /
The schema JW now has two tables, one indexed with a normal b-tree and the other with a reverse key b-tree, and a sequence to generate the keys. And two procedures to insert some rows.
Now set up a concurrency test, using Windows shell scripts:
copy con i.sql exec &1 exit ^Z copy con concurrent_inserts.bat sqlplus jw/jw @i.sql %1 ^Z for /l %i in (1,1,100) do start /b concurrent_inserts.bat ins_normal(10000) for /l %i in (1,1,100) do start /b concurrent_inserts.bat ins_reverse(10000)
The SQL*Plus script I.SQL does nothing more than execute a procedure. The batch file CONCURRENT_INSERTS.BAT will launch SQL*Plus, calling the script I.SQL with a command line argument that will pass the name of the procedure.
To run the test, the FOR loops call the batch file concurrently in a hundred background sessions, performing ten thousand inserts each.
What is the result for buffer busy wait? After running that test? Here it is:
orclz> orclz> select object_name,value from v$segment_statistics 2 where owner='JW' and object_type='INDEX' and statistic_name='buffer busy waits'; OBJECT_NAME VALUE ------------------------------ ---------- NORMAL_I 161347 REVERSE_I 8983 orclz>The reverse key index has reduced the buffer busy waits by around 95%. Impressed? I hope you are. This is even more significant in a RAC environment, where buffer busy wait is globalized.
I am not saying that all indexes should be reversed. You do need to understand your data and how it is being accessed. For example, a non-equality predicate on the key cannot use the index. But when would you use a non-equality predicate on a primary key? Probably, never. It is hard to find a reason for not reversing all your monotonically increasing keys.
----
Addition - by popular demand: some timings
I enabled trace for the test, these are the results (insert into the normal index first, then the reversed index):
******************************************************************************** SQL ID: 1bmscqsn9h63b Plan Hash: 3884345238 INSERT INTO T1 VALUES (S1.NEXTVAL) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 100 0.00 0.00 0 0 0 0 Execute 1000000 134.68 5421.65 1208 16471886 5103248 1000000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 1000100 134.68 5421.66 1208 16471886 5103248 1000000 Misses in library cache during parse: 0 Optimizer mode: FIRST_ROWS Parsing user id: 128 (recursive depth: 1) Number of plan statistics captured: 100 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 0 0 0 LOAD TABLE CONVENTIONAL (cr=62 pr=0 pw=0 time=14931 us) 1 1 1 SEQUENCE S1 (cr=0 pr=0 pw=0 time=38 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ Disk file operations I/O 316 0.31 4.67 db file sequential read 1264 0.37 3.22 buffer busy waits 513840 3.20 2198.53 latch free 2582 0.03 1.63 library cache: mutex X 2035 0.09 25.87 enq: TX - index contention 182675 0.13 900.54 enq: SQ - contention 128 0.00 0.43 buffer deadlock 105187 0.11 1163.93 latch: cache buffers chains 3039 0.01 0.79 enq: TX - allocate ITL entry 101 0.04 0.42 latch: enqueue hash chains 133585 0.09 922.55 resmgr:cpu quantum 57296 0.00 7.89 cursor: pin S 136 0.06 1.20 read by other session 270 0.37 4.45 log file switch completion 65 0.11 2.96 latch: undo global data 53 0.00 0.00 enq: TX - contention 11 0.00 0.02 log file switch (checkpoint incomplete) 49 3.18 35.50 reliable message 100 0.01 0.09 control file sequential read 560 0.01 0.27 Data file init write 28 0.00 0.02 db file single write 28 0.00 0.00 control file parallel write 84 0.00 0.01 rdbms ipc reply 28 0.08 0.29 undo segment extension 1 0.02 0.02 enq: US - contention 14 0.00 0.02 enq: HW - contention 11 0.00 0.00 enq: FB - contention 3 0.00 0.00 latch: redo allocation 1 0.00 0.00 ******************************************************************************** ******************************************************************************** SQL ID: b00stmqya8601 Plan Hash: 3884345238 INSERT INTO T2 VALUES (S1.NEXTVAL) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 100 0.00 0.00 0 0 0 0 Execute 1000000 61.89 948.89 2572 302421 4145886 1000000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 1000100 61.89 948.89 2572 302421 4145886 1000000 Misses in library cache during parse: 0 Optimizer mode: FIRST_ROWS Parsing user id: 128 (recursive depth: 1) Number of plan statistics captured: 100 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 0 0 0 LOAD TABLE CONVENTIONAL (cr=1 pr=1 pw=0 time=312356 us) 1 1 1 SEQUENCE S1 (cr=0 pr=0 pw=0 time=1120 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ Disk file operations I/O 304 0.25 15.28 db file sequential read 2638 0.43 12.93 read by other session 318 0.46 8.26 latch free 504 0.12 7.84 resmgr:cpu quantum 16738 0.21 196.22 enq: SQ - contention 112 0.07 3.03 library cache: mutex X 2892 0.16 73.54 buffer busy waits 13740 4.97 278.00 latch: redo allocation 48 0.07 0.48 reliable message 206 0.06 0.40 cursor: pin S 289 0.07 2.90 log file switch completion 154 0.25 12.49 latch: cache buffers chains 21 0.07 0.24 enq: US - contention 556 4.51 95.12 control file sequential read 660 0.06 1.63 Data file init write 36 0.12 0.22 db file single write 33 0.00 0.01 control file parallel write 99 0.00 0.03 rdbms ipc reply 33 0.71 1.56 enq: HW - contention 324 4.44 40.96 enq: CF - contention 6 0.08 0.21 enq: TX - index contention 974 0.66 46.81 buffer deadlock 538 0.11 13.31 latch: object queue header operation 4 0.02 0.08 log file switch (checkpoint incomplete) 50 1.43 44.59 enq: TX - contention 10 0.01 0.05 latch: enqueue hash chains 451 0.07 1.18 enq: TX - allocate ITL entry 60 0.20 2.29 undo segment extension 210 1.00 14.99 log buffer space 115 0.24 6.51 latch: redo copy 7 0.06 0.29 enq: FB - contention 3 0.01 0.01 latch: row cache objects 4 0.00 0.00 latch: undo global data 2 0.00 0.00 latch: checkpoint queue latch 1 0.00 0.00 ********************************************************************************You can't argue with that, can you? I make it a 571% improved elapsed time. Mostly on buffer cache wait events.
(All tests done on DB release 12.1.0.1)
--
John Watson
Oracle Certified Master DBA
http://skillbuiders.com
- John Watson's blog
- Log in to post comments
Comments
Another fine demonstartion
Well done John. I would love to see some timings attached.
Is it possible to also mitigate the problem by manipulating INITRANS?
Reverse keys - timings added
Thanks for the feedback, Ross, and thank you for kicking me to finish the demo properly. I've added trace information for the tests, pretty spectacular.
I haven't added the initrans test yet (anyone care to volunteer? No, I didn't think so).
Hope someone finds it helpful.
I don't know what's more
I don't know what's more alarming, the results or your prowess with DOS scripts!
Dear JW!Is there any
Dear JW!
Is there any different progression between Normal B-Tree Index and Reverse Key Index?
In my mind, I think, the main one is:
- NX (Normal Index) using concentrated physical block instead continously during DML.
- RX (Reverse key Index) using same block instead of concentrated.
Correct me if I'm wrong!
Thank you!
"progression" and "concentrated"
Sorry, I do not understand what you mean by these terms. But reverse key indexes is a pretty straightforward topic, I'm sure if you run a few experiments you can work out anything that isn't clear. That is what I always do.