Parsing

From Oracle FAQ
⧼orafaq-jumptonavigation⧽⧼orafaq-jumptosearch⧽

When Oracle get a SQL query it needs to execute some tasks before being able to really execute the query. These tasks make up what is called parsing. To execute this job and limit the number of data that it must rebuilt each time, Oracle uses a shared memory area named the shared pool.

Shared pool

Oracle keeps SQL statements, packages, information on the objects and many other things in a memory area named shared pool. This memory area is managed in a complex way by Oracle but we have to keep in mind that the used memory is not freed when a user completed an action using it. The purpose of this shared area is to maximize the sharing and the reuse of the information. The information created in memory for a session can be useful to another one but Oracle cannot know a priori if one information will be or not useful to another session so it keeps everything that comes even if it means removing older information to make space.

Three areas are more interesting for the parsing:

  • dictionary cache. This area contains records from Oracle dictionary that have served to build the previous SQL statements and other objects of the library cache.
  • library cache. This area contains metadata about the tables, indexes, views, synonyms and other objects of the database. It also contains dependency graphs and other relations between objects including SQL and PL/SQL
  • SQL area. This area contains the binary form, executable by Oracle, of the SQL and PL/SQL cursors.

Identical statements

Two statements are identical if their text is character by character and if they do the same thing. So the 2 followings statements are regarded as different:

SELECT ENAME FROM EMP ; 
SELECT ename FROM emp ; 

Shared statements

If two sessions send two identicals SQL statements this does necessarily not mean the statement is shared. Assuming SCOTT and FREE have each one a table named EMP and send the query:

SELECT ename FROM emp ; 

Even if the query texts are identical, the EMP tables are different, the statements are then two different versions of the same base statement. There are many points that ascertain if two statements are really identical:

  • All referenced objects must point to the same physical objects
  • All session parameters related to the optimizer must be the same ones
  • All bind variables must have similar datatypes and sizes
  • All NLS parameters related to the statement must be the same ones

Parsing

The following operations are made during the parsing.

  1. Validate the syntax of the statement: is the query a valid SQL statement?

    SQL> select nothing where 1=2;
    select nothing where 1=2
                   *
    ERROR at line 1:
    ORA-00923: FROM keyword not found where expected

  2. Validate the semantic of the statement: are the objects valid? is there any ambiguity? have you the privilege to access them? does the constant fit into the column?...

    SQL> select col from not_existent_table;
    select col from not_existent_table
                    *
    ERROR at line 1:
    ORA-00942: table or view does not exist

  3. Search in the shared pool:
    • Is the query text already known (search among all the query texts)?
      if not, go to step 4
    • Does the query referenced the same objects (search among all versions of the query)?
      if not, go to step 4
    • Is the execution environment identical (same search)? If yes, execute the query.
  4. Allocate memory in the shared pool to store the data about the query
  5. Get the values of the bind variables and check if all values fit in the columns

    SQL> var v varchar2(20);
    SQL> exec :v := '12345678901'
    PL/SQL procedure successfully completed.
    SQL> insert into michel.t values (:v);
    insert into michel.t values (:v)
                                  *
    ERROR at line 1:
    ORA-12899: value too large for column "MICHEL"."T"."COL" (actual: 11, maximum: 10)

  6. Optimize the query execution
  7. Build the parse tree and the execution plan in a format that the SQL engine can use, this is named row source generation
  8. Store the parse tree and the execution plan in the shared pool.

The execution of all these steps is named a hard parse. If Oracle executes only the steps 1 to 3, it is named a soft parse. Then a cursor (a memory area containing the SQL statement and all related information) can be localy stored in the session context. If a statement is found in this local cache we name this a softer soft parse because Oracle just has to directly go to the address of the query in the shared pool to pin the related data and prevent them from being aged out.

We have to keep in mind that any access to the shared memory is protected by an oracle internal lock named a latch. Like any other kind of lock, this implies a serializarion point and therefore a potential contention. Thus during the semantic validation or security validation, Oracle must take some shared pool and library cache latches to search for the information in the library cache and, possibly, in the dictionary cache if the information are not already present in the library cache; it might even have to generate some internal SQL queries on the Oracle dictionary (named recursive SQL) if the dictionary cache does not contain the necessary information to build the structures in the library cache.

Likewise, step 4 concerning memory allocation can be very long: if Oracle does not immediately find the necessary space, it has to scan the shared pool to age out some components to make space. And this is under the protection of the shared pool latch. Even a very simple query might lead to 20 or 30 holds of the library cache latch (that do hopefully not all lead to waits).

In the end, step 7 concerning query optimization consumes a lot of CPU time.