Understanding Indexes
Of iPods and Indexes
I'm not really an "early-adopter" of technology. Don't get me wrong; I love it, I just don't want to feed the addiction. When I do get a new piece of technology though, it's like a fever; I can't concentrate on anything until I've read the manual from cover to cover and found out everything it can do, every built-in gizmo, and every trashy piece of after-market merchandise that can be plugged into it.
And I don't think I'm alone here. Working in I.T., there's no shortage of people who can peel off a half-hour litany on their new Blackberry/IPod/Notepad/Digital Watch within a day of purchase.
So why are databases different? I worked with Oracle databases for 5 years before I understood indexes - and it's right there in the manual (Concepts manual, for those interested). I don't mean a deep, spiritual, one-ness with indexes; I mean just a basic understanding of the mechanics of the things. I distinctly remember my first tuning methodology: "It's running slow. I think I'll index some of the columns and see if it improves." I should have copyrighted it because I've seen it used so many times in the last 10 years, I could've made a fortune in commissions.
If you understand how indexes work, 99 times out of a 100 you don't need the suck-it-and-see methodology because you know beforehand whether an index will help?
What is an Index?
This is covered in the Oracle Concepts manual, of course, but here's the Cliff Notes version.
Blocks
First you need to understand a block. A block - or page for Microsoft boffins - is the smallest unit of disk that Oracle will read or write. All data in Oracle - tables, indexes, clusters - is stored in blocks. The block size is configurable for any given database but is usually one of 4Kb, 8Kb, 16Kb, or 32Kb. Rows in a table are usually much smaller than this, so many rows will generally fit into a single block. So you never read "just one row"; you will always read the entire block and ignore the rows you don't need. Minimising this wastage is one of the fundamentals of Oracle Performance Tuning.
Oracle uses two different index architectures: b-Tree indexes and bitmap indexes. Cluster indexes, bitmap join indexes, function-based indexes, reverse key indexes and text indexes are all just variations on the two main types. b-Tree is the "normal" index, so we will come back to Bitmap indexes another time.
The "-Tree" in b-Tree
A b-Tree index is a data structure in the form of a tree - no surprises there - but it is a tree of database blocks, not rows. Imagine the leaf blocks of the index as the pages of a phone book.
- Each page in the book (leaf block in the index) contains many entries, which consist of a name (indexed column value) and an address (ROWID) that tells you the physical location of the telephone (row in the table).
- The names on each page are sorted, and the pages - when sorted correctly - contain a complete sorted list of every name and address
A sorted list in a phone book is fine for humans, beacuse we have mastered "the flick" - the ability to fan through the book looking for the page that will contain our target without reading the entire page. When we flick through the phone book, we are just reading the first name on each page, which is usually in a larger font in the page header. Oracle cannot read a single name (row) and ignore the reset of the page (block); it needs to read the entire block.
If we had no thumbs, we may find it convenient to create a separate ordered list containing the first name on each page of the phone book along with the page number. This is how the branch-blocks of an index work; a reduced list that contains the first row of each block plus the address of that block. In a large phone book, this reduced list containing one entry per page will still cover many pages, so the process is repeated, creating the next level up in the index, and so on until we are left with a single page: the root of the tree.
To find the name Gallileo in this b-Tree phone book, we:
- Read page 1. This tells us that page 6 starts with Fermat and that page 7 starts with Hawking.
- Read page 6. This tells us that page 350 starts with Fyshe and that page 351 starts with Garibaldi.
- Read page 350, which is a leaf block; we find Gallileo's address and phone number.
That's it; 3 blocks to find a specific row in a million row table. In reality, index blocks often fit 100 or more rows, so b-Trees are typically quite shallow. I have never seen an index with more than 5 levels. Curious? Try this:
SELECT index_name, blevel+1 FROM user_indexes ORDER BY 2;
user_indexes.blevel
is the number of branch levels. Always add 1 to include the leaf level; this tells you the number of blocks a unique index scan must read to reach the leaf-block. If you're really, really, insatiably curious; try this in SQL*Plus:
ACCEPT index_name PROMPT "Index Name: " ALTER SESSION SET TRACEFILE_IDENTIFIER = '&index_name'; COLUMN object_id NEW_VALUE object_id SELECT object_id FROM user_objects WHERE object_type = 'INDEX' AND object_name = upper('&index_name'); ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME TREEDUMP LEVEL &object_id'; ALTER SESSION SET TRACEFILE_IDENTIFIER = ""; SHOW PARAMETER user_dump_dest
Give the name of an index on a smallish table (because this will create a BIG file). Now, on the Oracle server, go to the directory shown by the final SHOW PARAMETER user_dump_dest
command and find your trace file - the file name will contain your index name. Here is a sample:
*** 2007-01-31 11:51:26.822 ----- begin tree dump branch: 0x68066c8 109078216 (0: nrow: 325, level: 1) leaf: 0x68066c9 109078217 (-1: nrow: 694 rrow: 694) leaf: 0x68066ca 109078218 (0: nrow: 693 rrow: 693) leaf: 0x68066cb 109078219 (1: nrow: 693 rrow: 693) leaf: 0x68066cc 109078220 (2: nrow: 693 rrow: 693) leaf: 0x68066cd 109078221 (3: nrow: 693 rrow: 693) ... ... leaf: 0x68069cf 109078991 (320: nrow: 763 rrow: 763) leaf: 0x68069d0 109078992 (321: nrow: 761 rrow: 761) leaf: 0x68069d1 109078993 (322: nrow: 798 rrow: 798) leaf: 0x68069d2 109078994 (323: nrow: 807 rrow: 807) ----- end tree dump
This index has only a root branch with 323 leaf nodes. Each leaf node contains a variable number of index entries up to 807! A deeper index would be more interesting, but it would take a while to dump.
"B" is for...
Contrary to popular belief, b is not for binary; it's balanced.
As you insert new rows into the table, new rows are inserted into index leaf blocks. When a leaf block is full, another insert will cause the block to be split into two blocks, which means an entry for the new block must be added to the parent branch-block. If the branch-block is also full, it too is split. The process propagates back up the tree until the parent of split has space for one more entry, or the root is reached. A new root is created if the root node splits. Staggeringly, this process ensures that every branch will be the same length. Try it on paper for yourself!
How are Indexes used?
Indexes have three main uses:
- To quickly find specific rows by avoiding a Full Table Scan
We've already seen above how a Unique Scan works. Using the phone book metaphor, it's not hard to understand how a Range Scan works in much the same way to find all people named "Gallileo", or all of the names alphabetically between "Smith" and "Smythe". Range Scans can occur when we use >, <,
LIKE
, orBETWEEN
in aWHERE
clause. A range scan will find the first row in the range using the same technique as the Unique Scan, but will then keep reading the index up to the end of the range. It is OK if the range covers many blocks. - To avoid a table access altogether
If all we wanted to do when looking up Gallileo in the phone book was to find his address or phone number, the job would be done. However if we wanted to know his date of birth, we'd have to phone and ask. This takes time. If it was something that we needed all the time, like an email address, we could save time by adding it to the phone book.
Oracle does the same thing. If the information is in the index, then it doesn't bother to read the table. It is a reasonably common technique to add columns to an index, not because they will be used as part of the index scan, but because they save a table access. In fact, Oracle may even perform a Fast Full Scan of an index that it cannot use in a Range or Unique scan just to avoid a table access.
- To avoid a sort
This one is not so well known, largely because it is so poorly documented (and in many cases, unpredicatably implemented by the Optimizer as well). Oracle performs a sort for many reasons:
ORDER BY
,GROUP BY
,DISTINCT
, Set operations (eg.UNION
), Sort-Merge Joins, uncorrelated IN-subqueries, Analytic Functions). If a sort operation requires rows in the same order as the index, then Oracle may read the table rows via the index. A sort operation is not necessary since the rows are returned in sorted order.Despite all of the instances listed above where a sort is performed, I have only seen three cases where a sort is actually avoided.
GROUP BY
1 select src_sys, sum(actl_expns_amt), count(*) 2 from ef_actl_expns 3 where src_sys = 'CDW' 4 and actl_expns_amt > 0 5* group by src_sys ------------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT GROUP BY NOSORT | | |* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| EF_ACTL_EXPNS | |* 3 | INDEX RANGE SCAN | EF_AEXP_PK | ------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ACTL_EXPNS_AMT">0) 3 - access("SRC_SYS"='CDW')
Note the
NOSORT
qualifier in Step 1.ORDER BY
1 select * 2 from ef_actl_expns 3 where src_sys = 'CDW' 4 and actl_expns_amt > 0 5* order by src_sys ------------------------------------------------------------ | Id | Operation | Name | ------------------------------------------------------------ | 0 | SELECT STATEMENT | | |* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| EF_ACTL_EXPNS | |* 2 | INDEX RANGE SCAN | EF_AEXP_PK | ------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ACTL_EXPNS_AMT">0) 2 - access("SRC_SYS"='CDW')
Note that there is no SORT operation, despite the
ORDER BY
clause. Compare this to the following:1 select * 2 from ef_actl_expns 3 where src_sys = 'CDW' 4 and actl_expns_amt > 0 5* order by actl_expns_amt ------------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT ORDER BY | | |* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| EF_ACTL_EXPNS | |* 3 | INDEX RANGE SCAN | EF_AEXP_PK | ------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ACTL_EXPNS_AMT">0) 3 - access("SRC_SYS"='CDW')
DISTINCT
1 select distinct src_sys 2 from ef_actl_expns 3 where src_sys = 'CDW' 4* and actl_expns_amt > 0 ------------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT UNIQUE NOSORT | | |* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| EF_ACTL_EXPNS | |* 3 | INDEX RANGE SCAN | EF_AEXP_PK | ------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ACTL_EXPNS_AMT">0) 3 - access("SRC_SYS"='CDW')
Again, note the
NOSORT
qualifier.
This is an extraordinary tuning technique in OLTP systems like SQL*Forms that return one page of detail at a time to the screen. A SQL with a DISTINCT, GROUP BY, or ORDER BY that uses an index to sort can return just the first page of matching rows without having to fetch the entire result set for a sort. This can be the difference between sub-second response time and several minutes or hours.
Everybody repeat after me: "Full table Scans are not bad"
Up to now, we've seen how indexes can be good. It's not always the case; sometimes indexes are no help at all, or worse: they make a query slower.
A b-Tree index will be no help at all in a reduced scan unless the
WHERE
clause compares indexed columns using >, <,LIKE
,IN
, orBETWEEN
operators. A b-Tree index cannot be used to scan for anyNOT
style operators: eg.!=
,NOT IN
,NOT LIKE
. There are lots of conditions, caveats, and complexities regarding joins, sub-queries,OR
predicates, functions (inc. arithmetic and concatenation), and casting that are outside the scope of this article. Consult a good SQL tuning manual.Much more interesting - and important - are the cases where an index makes a SQL slower. These are particularly common in batch systems that process large quantities of data.
To explain the problem, we need a new metaphor. Imagine a large deciduous tree in your front yard. It's Autumn, and it's your job to pick up all of the leaves on the lawn. Clearly, the fastest way to do this (without a rake, or a leaf-vac...) would be get down on hands and knees with a bag and work your way back and forth over the lawn, stuffing leaves in the bag as you go. This is a Full Table Scan, selecting rows in no particular order, except that they are nearest to hand. This metaphor works on a couple of levels: you would grab leaves in handfuls, not one by one. A Full Table Scan does the same thing: when a bock is read from disk, Oracle caches the next few blocks with the expectation that it will be asked for them very soon. Type this in SQL*Plus:
SHOW PARAMETER db_file_multiblock_read_count
Just to shake things up a bit (and to feed an undiagnosed obsessive compulsive disorder), you decide to pick up the leaves in order of size. In support of this endeavour, you take a digital photograph of the lawn, write an image analysis program to identify and measure every leaf, then load the results into a Virtual Reality headset that will highlight the smallest leaf left on the lawn. Ingenious, yes; but this is clearly going to take a lot longer than a full table scan because you cover much more distance walking from leaf to leaf.
So obviously Full Table Scan is the faster way to pick up every leaf. But just as obvious is that the index (virtual reality headset) is the faster way to pick up just the smallest leaf, or even the 100 smallest leaves. As the number rises, we approach a break-even point; a number beyond which it is faster to just full table scan. This number varies depending on the table, the index, the database settings, the hardware, and the load on the server; generally it is somewhere between 1% and 10% of the table.
The main reasons for this are:
- As implied above, reading a table in indexed order means more movement for the disk head.
- Oracle cannot read single rows. To read a row via an index, the entire block must be read with all but one row discarded. So an index scan of 100 rows would read 100 blocks, but a FTS might read 100 rows in a single block.
- The
db_file_multiblock_read_count
setting described earlier means FTS requires fewer visits to the physical disk. - Even if none of these things was true, accessing the entire index and the entire table is still more IO than just accessing the table.
So what's the lesson here? Know your data! If your query needs 50% of the rows in the table to resolve your query, an index scan just won't help. Not only should you not bother creating or investigating the existence of an index, you should check to make sure Oracle is not already using an index. There are a number of ways to influence index usage; once again, consult a tuning manual. The exception to this rule - there's always one - is when all of the columns referenced in the SQL are contained in the index. If Oracle does not have to access the table then there is no break-even point; it is generally quicker to scan the index even for 100% of the rows.
Summary
Indexes are not a dark-art; they work in an entirely predictable and even intuitive way. Understanding how they work moves Performance Tuning from the realm of guesswork to that of science; so embrace the technology and read the manual.
Edit: Ths article is now available as a video presentation to my clients and colleagues in May 2014. http://youtu.be/Z4hKomnGHFA
- rleishman's blog
- Log in to post comments
Comments
Fantastic article
Hi Ross,
This is absolutely fantastic demonstration of the working of indexes in oracle. It is very simple to understand.. full credit to your way of detailing.
Regards.
Sethu
You're awesome
Thanks for the article, it was very helpful. After reading this I removed unnecessary indexes on some columns and my query runs much faster!
good work
One of the best article I've read on indexes. Appreciate your work. Please keep posting such good articles.
Superb!
Wow! I don't think I have ever read a better explanation of indexes before. So simple to understand. It took you 5 years to understand indexes....well it took me 6 years...after reading this article I now understand them a lot better. I wish I had read this 6 years ago :)
We really need more simple to understand articles like this so that there are more proficient DBAs in the world and we're not just guessing when it comes to things like creating indexes, tuning, etc. Common, real-life examples help a ton and stick in the brain better. Keep up the good work!
Thanks a ton!
Thanks
Thanks for the feedback shaseeb.
Indexes segment information
Hi rleishman,
I have gone through your note on Index. I am having few question understanding the index creation pattern at the block level.
1) Say I have table 'tab1' and column naming 'last_name' in it and I have various combinations of alphabets in the last name, so all these rows might be placed in or or more blocks under the segment 'tab1', now if I am creating an index 'ind1' on 'tab1', how exactly and what exactly the blocks under the ind1 segment consists of.
How exactly and what exactly? - I know we have blocks from tab1 segment containing the information of rows, so are these all last_name value and row id of it are first placed with in the blocks under ind1 segment? And then branch blocks are categorized and created under the ind1 segment and then root block? Before that how exactly these branch blocks categorized (what I mean by categorization is grouping of all blocks from alphabets A to E under one branch block and then from E to H under another branch block - how exactly is this grouping done is it following any specific algorithm? Say some how its grouping now my question again come here is does this branch blocks contain information of the first last_name value with in that block + its rowid from all the blocks? And then how exactly does this root block created with in this IND1 segments? Now again when looking for a row, how exactly optimizer find the root block and traverse to the leaf block?
Your information or advice is very helpful. Might be I'm missing some thing so that I am unable to get to know answers for these questions. Kindly pull some of your valuable time to let me understand the indexes in detail.
Thanks,
Rama
Not sure I understand your question(s)
There are a lot of questions in your post, and I'm not sure I understand all of them, but I'll try:
Like this?:
Like this?
When you create an index on a pre-populated table, Oracle performs a full scan of the table to retrieve indexed column values and ROWIDs from every row. It then performs a sort (just like an ORDER BY) to get the entire list in the order of the indexed column values - this can use a lot of TEMP space. So far, we have not created any index blocks.
The sorted list of index values and ROWIDs is then broken into block-size chunks and placed in index leaf blocks. I don't know how Oracle allocates these leaf blocks to branch blocks. But considering the entire list might not fit into memory, I could take an educated guess... As each leaf block is created, its address and high-value are placed in a branch block. When the branch block fills up, it places the address and high-value of the branch block in a grand-parent level branch block and allocates a new branch block. This process repeats until the grand-parent level branch block fills and then it creates a great-grand-parent level branch block, etc.
When you are done, all of the leaf blocks except (probably) the last one will be full, and all of the branch blocks at a given level will be full except (probably) the last one. As a result, INSERTs after an index rebuild can cause a great deal of index block splitting to make room (see main article).
So Oracle doesn't decide where the start and end-points of an index block will be, it just keeps adding rows until the block is full... that's the end-point.
Leaf blocks contain the indexed column value and ROWID of the table rows that they address. A branch block contains addresses (like ROWIDs) of a bunch of contiguous leaf blocks, plus the high-value of the LAST leaf block it points to. The parent of that branch block contains addresses of a bunch of contiguous branch blocks, plus the high-value of the LAST branch block it points to. And so on up to the root.
The root block is created when the index is first created and (I'm going from memory here) never moves. With normal branch blocks, when they fill up they split in two and the address of the new block is added to the parent. If there is no room in that parent, it splits in two and the address of the new block is added to its parent. If this process continues up to the root - and the root is full - then the CONTENTS of the root are divided in two and TWO new blocks are created. The addresses of these two new blocks are then placed in the empty root. In this way, the root never moves.
How does the optimizer find the root block? Its address is in the data dictionary, and since it never moves, the data dictionary never needs to update it.
Hope that helps.
Ross.
Very lucid and Informative.
Hi Rleishman,
Indexes - simplified, dimystified!
Never seen better expalination than this. Good work. Thanks a lot. Keep it up.
Thanks and Regards,
Gururaj.
My pleasure
Glad I could help
A question about your article
Thank you very much for this article.
I'm trying to run your code samples in order to better understand your article. However, it seems to me that none of the tables in your code exhibits exist in my database. Are these sample predefined Oracle tables? Is there any script allowing to create them?
OS: Fedora Core 16
Oracle version : 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
Thanks in advance!
You'll have to roll your own
No, they were just tables to which I happened to have access on the database I was using at the time. You could recreate the scenarios yourself using tables in the sample schemas with similar index structure.
Two thumbs way up...
It's amazing how something written in 2007 is still so valuble to novices like myself. We keep coming back here :)
Great compliments on a job well done. Fantastic explanation. Thank you.
I´m stunned!
Wow, you know not only one heck of a lot about Oracle (I have 6 years of Oracle experience, and so far I thought there is quite a lot I know, but now that I read your article, I know that I know so little), you also write and explain just wonderfully!
Thanks for this amazing article!!!!
what really happens when an intermediate value is inserted into
Hi,
If an Index is created on column that contain numbers from 1 to 1001 with number 5 missing.
Let us assume each block contains 10 rows so there will be 1 root block (level 0) in the next level 10 blocks(level 1) and 100 leaf nodes (level 2). Since indexes are stored in a sorted fashion we can say that the first leaf block at level 2(say block 'A') contains 1 to 11 numbers since 5 is missing. If we insert a new record with number 5 then it should be inserted into block 'A'. How does the B-tree split and restructure now??
1)If block A to split say A1 and A2. Now A1 contains numbers 1 to 10 and A2 contains only number 11 this will lead to wastage of the block as the next numbers are in the second block at level 2.
2) If block is not split instead the number 11 moved to block B(second block at level 2) and the last value in that block is moved to the next following block and so on.
So an simple new insert causes 10 inserts and 10 deletes in the level 2 blocks?? or wastage of the blocks as in the point 1?
what really happens when an intermediate value is inserted into indexed column.
Intermediate Inserts
Using your example, an index leaf block (A) is full.
Oracle will navigate the b-tree and find the index leaf block that is supposed to contain the new value (5), but then it sees that the block is full. Oracle then:
It is incorrect to think of this as deleting 5 rows from block A and inserting 5 rows into block B. That implies that each index-entry is migrated separately, which is not the case.
These blocks - once full - are now half empty. You can look at this as wasted space, but it is really just an investment in the future speed of intermediate inserts. Next time we insert an intermediate value (say, 7.5) there will be room in the block so it will only cost 1 block write rather than 3.
How does Oracle reach to the Leaf node ?
Thanks for the elobrate explanation. One doubt though..
You have explained that Oracle goes through the Branch blocks inorder to reach the Leaf node (where the index entry is found).
But how does Oracle know which Leaf node to reach, Lets take your example:
"To find the name Gallileo in this b-Tree phone book, we:
◦Read page 1. This tells us that page 6 starts with Fermat and that page 7 starts with Hawking.
◦Read page 6. This tells us that page 350 starts with Fyshe and that page 351 starts with Garibaldi.
◦Read page 350, which is a leaf block; we find Gallileo's address and phone number. "
In this case we humans know "Galileo" is between "Fermat" and "Hawking" and so we next read Page 6. Similarly how does Oracle know which Block to look into next ?
In my b-Tree phone book
In my b-Tree phone book example, the branch blocks contain a series of pairs of KEYS and PAGES, like so:
( ... (Fermat, 6), (Hawking, 7) ...)
In an Oracle index, the keys work exactly the same way, we just replace the page numbers with addresses
( ... (Fermat, File 4 / Block 763), (Hawking, File 4 / Block 887), ... )
These are logical addresses that tell us where to find the next level block (either branch or leaf). Combined with the data dictionary, these addresses can be turned into physical disk locations, which can be sent to the IO subsystem to be retrieved.
...Oracle does not have to access the table...
Ross,
RE: " The exception to this rule - there's always one - is when all of the columns referenced in the SQL are contained in the index. If Oracle does not have to access the table then there is no break-even point; it is generally quicker to scan the index even for 100% of the rows."
So, does this mean that if I am willing to pay the cost of index creation/maintenance (time and resources), better performance could be achieved by indexing all columns? For example if I have a large table with mostly read access and all columns are usually retrieved.
Thanks
-Napi Shy
napishy wrote:So, does this
Okay, no, it doesn't. I was trading absolute precision in that sentence for brevity. That's why I used the word "generally"
Putting EVERY column in the index would make for quite a long concatenated key and the index would end up being bigger than the table for two reasons:
If you did need 100% of rows from the index (without the need to do a table lookup) then Oracle would use a Fast Full Scan, processing the index in multi-block reads. If that structure is bigger than the table itself, it's going to take longer than the equivalent Full Table Scan.
So, my more correct quote in the article should be:
I amazed to see an
I amazed to see an informative article on index.. As we know Index play a vital role in Performance of the Oracle Database...But My all perpelexity regarding Index is very clear....