Temporary table
A Temporary table is a special type of table in the Oracle database used for storing intermediate temporary data.
Introduction
True temporary tables are a relatively late addition to the Oracle database. As a general rule, they are better avoided. As a particular rule, they are lesser evil than regular tables with TMP somewhere in their names, but just barely.
Note that temporary tables are always Heap-organized tables; you cannot create an index organized temporary table, but they can be indexed as any regular table.
Some products allow sessions to create on the fly temporary tables that are automatically dropped when the session ends. In the case of Oracle, the definition of temporary tables is permanent; that's what the GLOBAL stands for in CREATE GLOBAL TEMPORARY TABLE .... Only their content is temporary, and disappears when rows are explicitly deleted, or when the current transaction is committed or rolled back (default behaviour) or, if the table was created with the on commit preserve rows clause, when the session ends. Multiple sessions can refer to the same temporary table, but each session only sees the data it has inserted, even when temporary rows are preserved after commit.
The trouble with temporary tables
The way Oracle executes queries is increasingly driven by the environment: number of rows in tables, distribution of values in columns and parameters that are passed, system load, memory availability. The goal of the optimizer isn't to cling to one particular execution plan whatever happens; its goal is to return data as fast as possible under circumstances that may vary. It can be compared to a GPS system that adapts the itinerary it recommends to real-time information about the traffic.
Temporary tables introduce an additional degree of uncertainty. Since they are volatile, no statistics are gathered about them than what can be collected on the fly. They make estimating the cardinality at each step in an execution plan more difficult - and cardinality is a key ingredient in the determination of the most efficient execution plan.
On-the-fly statistics collection is controlled by the database parameter optimizer_dynamic_sampling. This parameter is a number that can take values ranging from 0 (don't do anything) to 10; it applies to all unanalyzed tables, whether they are global temporary or regular tables. At level 1, only unindexed, unanalyzed tables that use more than 32 blocks and appear in a join or subquery are analyzed. At level 2 (the default) 64 blocks from any unanalized table are sampled. At higher levels, sampling spreads to table for which some estimates depend on unanalized tables, first with 64 blocks, then with a number of blocks that doubles each time (128 times 32 for level 9) until all blocks are analyzed at level 10.
Somewhat confusingly, there is also a hint that is written as /*+ dynamic_sampling( <table alias> <level> ) */, with level also taking a value ranging from 0 to 10 but with a slightly different meaning. For one thing, it refers to a single table; and there is a kind of shift in the scale, since if 0 still means do nothing and 10 sample all blocks, level 1 is an unconditional analysis of 32 blocks and level 3 samples double the number of blocks sampled at level 2 (instead of the same number) and level 9 samples 256 times the number of blocks analyzed at level 1.
For once, this hint may be justified when using a global temporary table; it will ensure some stability in case some overal change in statistics collection policies results is accompanied by database parameter changes.
In other words, when specified in a hint, dynamic sampling is a stronger medicine above level 2 than when set at the database level.
A very important point to consider is that dynamic sampling takes place only at hard-parse time, since it requires more work the benefit of which is merely to improve the execution plan. What does it means? If you start your process filling up a temporary table, then run a query that uses this table and hint that it should be dynamically sampled, barring the odd bug Oracle should find the suitable execution plan. But in all the subsequent runs of the query, assuming that variables are correctly bound as they should be in an OLTP system, the same plan will be applied, based on sampling collected on the initial content of the temporary table. It may suit what your temporary table now stores if it has changed, or it may not. And of course, hinting queries is no help.
Additionally, even if these operations generate much less over head (redo log generation and the like), inserting data inside a temporary table puts some additional strain over the database.
Alternatives to temporary tables
Developers often use temporary tables to store a dataset that will be used in several queries, under the apparently reasonable assumption that performing the hard work only once will benefit performance. There may be alternate, and often better, solutions to improve performance.
- First of all, the assumption that identifying a work set first and then using it in subsequent query is beneficial to performance can be in many cases questioned. It may (or may not) happen that when the query that retrieves the "temporary" result set is combined with a query where it is used, the optimizer, having a fuller vision of what you are trying to do, uses a completely and more efficient execution path than when you split the process into several steps.
To be fair, you need to compare time to fill the temporary table + time to execute a simpler query that uses the temporary table + ... + time to execute the umpteenth simpler query that uses the temporary table to the execution of umpteen more complex queries - the link between apparent complexity (within reasonable bounds) and execution time isn't as simple as it sounds. A join between TA and TB is much faster than running a cursor to get the join column values from TA then fetching rows from TB using these values. It may be a similar story with temporary tables.
- There has been since Oracle 9.0 a good alternative to temporary tables - WITH. WITH allows you to "factorize" a query, assign a name to it, and then refer to the result set by this name at various places in the query; it is particular useful with statements that make an intensive use of set operators such as UNION or INTERSECT with patterns that are repeated in the different queries. WITH is to temporary tables what inline views are to regular views - functionally equivalent but with a life-scope limited to the duration of the query. When you use WITH the optimizer may decide to "materialize" the factorized query, that is temporarily store the result set to reuse it several times.
- Think of external tables when using data from files that don't need to be permanently stored in the database.
- A join or an in () subquery that references a temporary table can advantageously be replaced by a single variable that contains a list if this string is less than 4,000 character long. The technique details for using bind variables in in () lists are fully explained on http://tkyte.blogspot.com/2006/06/varying-in-lists.html.
When temporary tables can be useful
Better avoided isn't exactly the same as completely banished. There are some cases when using a temporary table is justifiable, but those cases are probably less frequent than commonly thought. As shown above temporary tables don't make life easy for the optimizer, which can lead to wrong choices on its part - and when the optimizer is wrong, it can be very wrong.
- Temporary tables are better used when
- They are filled only once in the session, or with data sets of comparable size and properties (distribution of data, etc.), in which case dynamic sampling can benefit statements.
- When the result set they contain is required by several queries that cannot be combined (updates of distinct tables)
- When the additional cost of populating them is dwarfed by the performance improvement of further operations compared to alternative solutions.
- The most significant advantage of temporary tables over other solutions is that they can be indexed. Although indexes add overhead to the process of filling a temporary table, it can be a strong argument in favour of using a temporary table when processes would benefit from an index that cannot be added to the original table(s) (either because of schema mismatches, for instance a join that must use a complex, non-deterministic result, or because data comes from a schema you only have select access to, or because data comes from an external table).