Part 1 - The Bitmap Identity

articles: 

What is a Bitmap Index?

This is first post of the four-part epic - The Bitmap Conspiracy - detailing the structure and behaviour of Bitmap Indexes. Later in the series we will cover the internal structure of Bitmap Indexes, how Oracle uses them, and finally we will expose some of the myths surrounding them. But before we get there let’s just get a clear understanding of what a Bitmap Index actually is.

  1. What is a Bitmap Index?
  2. What is a Bitmap?
  3. Combining Bitmaps
  4. What is a Bitmap Index
  5. Any Questions?
  6. Appendix A – AND and OR Binary Logic

What is a Bitmap?

We all understand that a bit is the smallest unit of information and is typically expressed as 1 or 0. We often attach meaning to these values such as TRUE and FALSE, or YES and NO because this is the only way we can make sense of a single bit without context. Once we add context to a bit: we can make it mean just about anything: Right or Wrong, Positive or Negative, Alive or Dead, Play-this-song-repeatedly or Do-not-play-this-song-repeatedly.

A bitmap consist of two things:

  • A series of one or more bits
  • A context that defines what those bits are telling us

For example, consider the following bitmap and context:

  • Bitmap: 111001001
  • Context: People in the Brady Bunch aged 13 or greater

There are 9 bits corresponding to the 9 Bradys (Mike, Carol, Marcia, Jan, Cindy, Greg, Peter, Bobby, Alice). We can see that bits 1, 2, 3, 6 and 9 are “set”; this tells us that Mike, Carol, Marcia, Greg and Alice are all aged 13 or greater. Importantly, it also tells us that Jan, Cindy, Peter and Bobby are not aged 13 or greater.

See what we did there? Because it was pretty cool. We took two lists of names totalling 376 bits (1 letter = 1 byte = 8 bits) and stored the two lists it in a single bitmap totalling just 9 bits.

But hang on there (I hear you say), those 9 bits just store the Bitmap, by the time you store the Context (those 9 names), you haven’t saved any space at all! Well (I say back to you), that’s very observant of you. You are dead right: one bitmap on its own like this is actually an inefficient way to store this sort of information. In fact, we are quite close to dispelling the first of those Bitmap Index Myths: that a (single) Bitmap Index is good for low-cardinality columns … but more on that later.

Although our single 9-bit bitmap is made inefficient by the 376 bits of name data we store to make sense of it, we can make it more efficient by reusing the list of names when we capture more bitmaps that describe the Bradys.

  • 100001110 – Males
  • 110000001 – Adults
  • 100000000 – Architects
  • 011110000 – Blondes
  • 000100000 – Tattletales

We now start to see that if each of these bitmaps is stored as a list of names (names of males and non-males, names of adults and non-adults, architects and non-architects, etc) then it is going to take up a lot more space than the bitmaps.

So is this why we use Bitmap Indexes: to take up less space? Hardly! We haven’t got to the good bit yet.

Combining Bitmaps

The really cool thing about bitmaps is that they can be combined easily using Binary Logic. Binary Logic uses the operators “AND” and “OR”. We sometimes use the symbols & (and) and | (or), although these are not supported by SQL.

See Appendix A for a detailed explanation of AND and OR binary logic operations.

The best thing about Binary Logic is that computers can do it really easily. In fact - although it may be hard to believe – every single operation performed by a computer is performed using binary logic. Addition, multiplication, playing Minesweeper, watching cat videos: they are all done entirely with binary logic.

Computers perform binary logic really fast, so that means bitmaps can be combined to answer complex questions really fast.

What is a Bitmap Index

Before we describe a Bitmap Index, let’s set up an example (in Oracle) based on the bitmaps we used above.

# NAME   GENDER HAIR   AGEGRP DESCRIPTION BIRTHORDER
- ------ ------ ------ ------ ----------- ----------
1 MIKE   MALE   BROWN  ADULT  ARCHITECT
2 CAROL  FEMALE BLONDE ADULT  MOM
3 MARCIA FEMALE BLONDE TEEN   POPULAR     ELDEST
4 JAN    FEMALE BLONDE CHILD  WHINER      MIDDLE
5 CINDY  FEMALE BLONDE CHILD  TATTLETALE  YOUNGEST
6 GREG   MALE   BROWN  TEEN   JOCK        ELDEST
7 PETER  MALE   BROWN  CHILD  WANNABE     MIDDLE
8 BOBBY  MALE   BROWN  CHILD  DREAMER     YOUNGEST
9 ALICE  FEMALE BROWN  ADULT  HOUSEKEEPER

If we create a Bitmap Index on a column (say AGEGRP), then Oracle generates a bitmap for each distinct value of that column, where the position of bits in the bitmap represent rows in the table.
Eg.

           M
         C A   C   P B A
       M A R   I G E O L
       I R C J N R T B I
       K O I A D E E B C
       E L A N Y G R Y E
       - - - - - - - - -
ADULT  1 1 0 0 0 0 0 0 1
TEEN   0 0 1 0 0 1 0 0 0
CHILD  0 0 0 1 1 0 1 1 0

We can see that the bitmap for ADULT is 110000001, meaning that the first, second and ninth rows in the table contain AGEGRP = ‘ADULT’. There are 3 bitmaps for the AGEGRP column because there are 3 different possible values. If we created an index on DESCRIPTION then we would get 9 bitmaps because there are 9 different values.

When we run a SQL that includes predicates on the bitmap indexed columns, Oracle retrieves the bitmaps, performs any Binary Logic required to satisfy AND / OR conditions in the SQL and then translates the final bitmap into Oracle ROWIDs so that the rows can be looked up in the table.
Eg.

SELECT NAME
FROM BRADYS
WHERE AGEGRP = ‘CHILD’
AND   HAIR = ‘BLONDE’

Step 1: Retrieve the ‘CHILD’ and ‘BLONDE’ bitmaps

CHILD  000110110
BLONDE 011110000

Step 2: Combine these bitmaps using the AND operator

CHILD  000110110  &
BLONDE 011110000
------ ---------
STEP2  000110000

Step 3: Convert the result of Step 2 into ROWIDs and lookup the rows in the table.

The bitmap in Step 2 has bits 4 and 5 set. In the BRADYS table, these rows correspond to JAN and CINDY.

SELECT NAME
FROM BRADYS
WHERE AGEGRP = ‘CHILD’
AND   HAIR = ‘BLONDE’

NAME
-----
JAN
CINDY

2 rows selected.

Any Questions?

If you are anything like me, you now have heaps of questions. When I first saw this type of description of Bitmap Indexes, I was worried about heaps of stuff. Let’s compare notes; here’s my list:

  • Is there a bitmap for the NULL value? Ie. Can you scan for IS NULL predicates?
  • Two or three distinct values mean two or three bitmaps? That sounds efficient. What is the upper limit?
  • If you insert a new distinct value into a big table, won’t it take ages to build the new bitmap?
  • How does Oracle convert the set bits of a bitmap into ROWIDs?
  • I understand how regular indexes are physically stored in a B-Tree. How are Bitmap Indexes stored?

The Oracle manuals contain a very detailed description of the structure of B-Tree indexes, but prior to version 11g they have not been as forthcoming on the structure of Bitmap Indexes. The Oracle 11g Database Concepts manual contains a short but revealing section on the structure on Bitmap Indexes. Over the years I have pieced together a lot of information, much of which is confirmed by the 11g Database Concepts manual. Some of it is still just my unproven opinion, so in the next edition I will try to answer these questions with as much supporting evidence as I can find.

Return to main article
Read Part 2 - The Bitmap Supremacy




Appendix A – AND and OR Binary Logic


The binary AND (&) operator combines two bitmaps (say, A & B) and returns a third bitmap (C) that contains only those bits set in both A and B. eg. Say we wanted to find the Bradys that are both Males AND Adults.

    M
  C A   C   P B A
M A R   I G E O L
I R C J N R T B I
K O I A D E E B C
E L A N Y G R Y E
- - - - - - - - -
1 0 0 0 0 1 1 1 0 – A (MALE)
1 1 0 0 0 0 0 0 1 – B (ADULT)
-----------------
1 0 0 0 0 0 0 0 0 – C = A & B

The first bit of C is set (1) because it is set in both A and B. The second bit of C is not set (0) because it is set in only one of A and B. The third bit of C is not set (0) because it is not set in either A or B, and so on.

Since only the first bit of C is set, this tells us that Mike is the only Adult Male.

The binary OR (|) operator combines two bitmaps (say, A | B) and returns a third bitmap (C) that contains bits set in either A or B. eg. Say we wanted to find the Bradys that are either Architects or Tattletales

    M
  C A   C   P B A
M A R   I G E O L
I R C J N R T B I
K O I A D E E B C
E L A N Y G R Y E
- - - - - - - - -
1 0 0 0 0 0 0 0 0 – A (ARCHITECTS)
0 0 0 0 1 0 0 0 0 – B (TATTLETALES)
-----------------
1 0 0 0 1 0 0 0 0 – C = A | B

The first bit of C is set (1) because it is set in A, even though it is not set in B. The second bit of C is not set (0) because it is not set in either A or B, and so on.

Since bits 1 and 4 of C are set, this tells us that Mike and Cindy are in the group of Architects and Tattletales.


Return to main article
Read Part 2 - The Bitmap Supremacy

Comments

bijunator's picture

i am a bit late but thank you...a lot...

Good Post.
Too late response ;). Just found this topic to read...