Skip navigation.

Jonathan Lewis

Syndicate content Oracle Scratchpad
Just another Oracle weblog
Updated: 7 hours 4 min ago

Table Duplication

Sat, 2015-01-03 05:54

I’ve probably seen a transformation like the following before and I may even have written about it (though if I have I can’t the article), but since it surprised me when I was experimenting with a little problem a few days ago I thought I’d pass it on as an example of how sophisticated the optimizer can be with query transformation.  I’ll be talking about the actual problem that I was working on in a later post so I won’t give you the table and data definitions in this post, I’ll just show some SQL and its plan:


select
        tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from
        xo_stock_trans tr,
        xo_prices pr
where
        tr.trans_date between '01-AUG-2014' and '3-AUG-2014'
and     pr.item = tr.item
and     pr.price_date = (
                select
                        max(pr2.price_date)
                from
                        xo_prices pr2
                where   pr2.item = tr.item
                and     pr2.price_date <= tr.trans_date
        )
;

The code is a fairly standard expression of “find me the most recent price available for each stock item as at the stock date of that item”. As you can see I’ve referenced the stock table once and the pricing table twice – the second appearance being in a “max()” correlated subquery. Oracle has decided to unnest the subquery – but spot the interesting detail in the plan:


-----------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |    14M|   748M|       | 77955  (84)| 00:06:30 |
|*  1 |  HASH JOIN                 |                |    14M|   748M|    37M| 77955  (84)| 00:06:30 |
|*  2 |   HASH JOIN                |                |   829K|    28M|       | 70867  (92)| 00:05:55 |
|   3 |    JOIN FILTER CREATE      | :BF0000        | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|*  4 |     TABLE ACCESS FULL      | XO_STOCK_TRANS | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|   5 |    VIEW                    | VW_SQ_1        |   210M|  4206M|       | 64135  (94)| 00:05:21 |
|   6 |     HASH GROUP BY          |                |   210M|  5408M|       | 64135  (94)| 00:05:21 |
|   7 |      JOIN FILTER USE       | :BF0000        |   210M|  5408M|       | 11807  (67)| 00:01:00 |
|*  8 |       HASH JOIN            |                |   210M|  5408M|       | 11807  (67)| 00:01:00 |
|*  9 |        TABLE ACCESS FULL   | XO_STOCK_TRANS | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|* 10 |        INDEX FAST FULL SCAN| XO_PRICES_IX1  |  3918K|    44M|       |  1936  (10)| 00:00:10 |
|  11 |   TABLE ACCESS FULL        | XO_PRICES      |  4571K|    87M|       |  2275  (11)| 00:00:12 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("PR"."ITEM"="TR"."ITEM" AND "PR"."PRICE_DATE"="MAX(PR2.PRICE_DATE)")
   2 - access("ITEM_1"=ROWID)
   4 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - access("PR2"."ITEM"="TR"."ITEM")
       filter("PR2"."PRICE_DATE"<="TR"."TRANS_DATE")
   9 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  10 - filter("PR2"."PRICE_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

I was running 11.2.0.4 at the time, which is why we can get a serial Bloom filter on the hash join – and it’s interesting to see that the filter has been pushed inside the view operator; but the really interesting part of the plan is the second appearance of the XO_STOCK_TRANS table.

My correlated subquery returns a value that is used in a comparison with a column in the XO_PRICES table, but the correlation predicates referred back to the XO_STOCK_TRANS table, so the optimizer has added the XO_STOCK_TRANS to the subquery as it unnested it.

I’ve written several examples of how we can optimise SQL manually by rewriting it to introduce extra copies of some of the tables (typically in a fashion analogous to the optimizer’s mechanism for star transformations), so it’s nice to see another variation on the theme of the optimizer using table duplication to optimise a statement.

Footnote:

The execution plan in 10.2.0.5 is slightly different, but it still unnests the subquery, introducing a second occurrence of XO_STOCK_TRANS as it does so.


Just in case

Tue, 2014-12-23 04:37

For those who don’t read Oracle-l and haven’t found Nikolay Savvinov’s blog, here’s a little note pulling together a recent question on Oracle-L and a relevant (and probably unexpected) observation from the blog. The question (paraphrased) was:

The developers/data modelers are creating all the tables with varchar2(4000) as standard by default “Just in case we need it”. What do you think of this idea?

The general answer was that it’s a bad idea (and unnecessary, anyway) and one specific threat that got mentioned was the problem of creating indexes and Oracle error ORA-01450; but coincidentally Nikolay Savvinov had written about a performance-related “bug” in August this year, which turned out, only last week, to be expected behaviour. You can read his articles for the details, but since he used a 4KB block size to demonstrate it I thought I’d repeat the exercise using an 8KB block size.

 



drop table t1 purge;
create table t1 (id number(6), v1 varchar(40), v2 varchar2(40), v3 varchar2(40));
create unique index t1_i1 on t1(id);

execute snap_redo.start_snap

insert into t1 
select	object_id, object_name, object_name, object_name 
from	all_objects
where	rownum <= 10000
;

execute snap_redo.end_snap


drop table t1 purge;
create table t1 (id number(6), v1 varchar(4000), v2 varchar2(4000), v3 varchar2(4000));
create unique index t1_i1 on t1(id);

execute snap_redo.start_snap

insert into t1 
select	object_id, object_name, object_name, object_name 
from	all_objects
where	rownum <= 10000
;

execute snap_redo.end_snap

I’ve dropped and created the same table twice, once with varchar2(40) columns and once with varchar2(4000) columns.

I’ve created an index on a (non-character) column – the specific results vary depending on whether the index is unique or non-unique, and whether or not you have the index, and whether or not the table already holds data, and the effective clustering on the index columns etc. etc. but the key difference between the two sets of results doesn’t go away.

I’ve inserted object_name values (maximum usage 32 bytes) into the varchar2() columns, inserting 10,000 rows.

The snap_redo package is one of my simple pre/post packages that calculates changes in values in some dynamic performance view – in this case it’s looking at v$sysstat (system statistics) for statistics relating to redo generation, which means you need to run this test on an otherwise idle instance. Here are the two sets of results from an instance of 11.2.0.4:


Name                                                                     Value
----                                                                     -----
messages sent                                                               11
messages received                                                           11
calls to kcmgcs                                                            313
calls to kcmgas                                                             37
calls to get snapshot scn: kcmgss                                           74
redo entries                                                               769
redo size                                                            1,317,008
redo wastage                                                             3,888
redo writes                                                                 11
redo blocks written                                                      2,664
redo write time                                                             10
redo blocks checksummed by FG (exclusive)                                2,242
redo ordering marks                                                          1
redo subscn max counts                                                       1
redo synch time                                                              7
redo synch time (usec)                                                  88,875
redo synch time overhead (usec)                                          1,810
redo synch time overhead count (<2 msec)                                    11
redo synch writes                                                           11
redo write info find                                                        11
undo change vector size                                                261,136
rollback changes - undo records applied                                      2
IMU undo allocation size                                                17,184


Name                                                                     Value
----                                                                     -----
messages sent                                                                8
messages received                                                            8
calls to kcmgcs                                                            222
calls to kcmgas                                                             56
calls to get snapshot scn: kcmgss                                           52
redo entries                                                            20,409
redo size                                                            5,606,872
redo buffer allocation retries                                               1
redo wastage                                                             1,248
redo writes                                                                  6
redo blocks written                                                     11,324
redo write time                                                             26
redo blocks checksummed by FG (exclusive)                                  571
redo ordering marks                                                         32
redo subscn max counts                                                       1
redo synch time                                                              6
redo synch time (usec)                                                  60,230
redo synch time overhead (usec)                                            159
redo synch time overhead count (<2 msec)                                     1
redo synch writes                                                            1
redo write info find                                                         1
undo change vector size                                              1,590,520
IMU undo allocation size                                                   144

Notice, particularly, the great change in the number of redo entries and the total redo size when the character columns are defined at varchar2(4000). Note particularly that the number of redo entries is roughly “2 * number of rows inserted” – for each row that’s one for the row and one for the index entry. You can check the redo log content by dump the log file, of course (and Nikolay did), or you can take my word for it that Oracle is doing the equivalent of single row processing in the varchar2(4000) case and array processing in the varchar2(40) case.

When Oracle calculates that the row length is larger than the block size it falls back to single row processing; this can increase your redo generation significantly, and since the rate at which you can pump out redo is the ultimate rate at which you can load data this could have a significant impact on your data loading times. Declaring character columns as varchar2(4000) “just in case” is a bad idea.

 


Learning

Mon, 2014-12-22 11:28

I received an email a few weeks ago asking me if I would look at a series of three posts on adaptive dynamic sampling in 12c – (part 1, part 2, part 3). I took a note of the topic and URLs, and read through them fairly rapidly, and they seemed to be perfectly reasonable articles describing the authors thoughts, tests, and observations. Inevitably, though, several questions ran through my mind as I read – typically along the lines of “what would happen if …”, “did you restart the instance before …”, “did you flush the shared pool between …”. It’s very hard to create, run, and report a set of tests that allow you to make solid inferences about how Oracle and the optimizer behave – and whatever you do someone else will find a way of asking some questions that push the envelope a little further.

I got a follow-up email a week or so later, asking if I’d had time to look at the article because the author wanted to present the topic at a local (Azerbaijan) user group event – and the follow-up email prompted me to write this blog note.

Given that it’s taken about three weeks for me to get around to writing this note you might appreciate that I don’t have a lot of time to spare on a topic that I’m not actively pursuing for a good reason. I accumulate a lot of information from around the internet, from books, and from presentations, and I’ll invariably attach lots of questions and conjectures to that information but I won’t necessarily be able to have any confidence in its utility or correctness. When I have to know an answer I may go back to a source I’ve noted and use it as a basis for doing some further goal-oriented investigations – but until I need to (or unless I’m particularly curious) I don’t have the time to do arbitrary research. This means, of course, that I don’t have time to get into a dialogue with people about the work they have done and the information they have presented.

On the other hand, of course, I thoroughly approve of anyone who takes the time to do the experiment and write up the results. And I heartily approve of anyone who is prepared to stand up in front of a user group and share their observations and want to encourage people to do this; so this is the reply I sent which, I hope, is suitable positive:

There are a number of questions that I would probably want answered if I were to examine the posts in detail, and I would want to repeat the tests that you have done to check that I got the same results and to see if there were other observations I could make that might lead me to different conclusions. This is something that takes far too much time to do properly, and I have no inside information that would allow me to say very quickly whether your comments are right, wrong, or simply incomplete.

The important point, though, is that you have set up some tests, documented the results, and offered some conclusions. That is sufficient for your presentation to the user group.
  • This is what I did
  • This is what I observed
  • This is what I concluded
 At worst someone will ask some questions like: “Did you check …”, or “What happens if …” You may need to answer “I don’t know”, or “I didn’t think of doing that”, or “That’s a good idea, I’ll try it”; but whatever happens everyone learns something, and everyone has the opportunity to learn more based on what you have given them. In passing, at the very first presentation I did for the UK Oracle User Group about 25 years ago I got about half way through some comments on an odd performance pattern I had observed when modifying the SQL*Plus arraysize when someone stuck up their hand and asked: “Did you check the tcp packet size?” The only response I could make, after a few seconds pause to think, was: “I should have thought of that. No.”

push_pred – evolution

Fri, 2014-12-12 08:22

Here’s a query (with a few hints to control how I want Oracle to run it) that demonstrates the difficulty of trying to solve problems by hinting (and the need to make sure you know where all your hinted code is):


select
	/*+
		qb_name(main)
		leading (@main t1@main v1@main t4@main)
		push_pred(v1@main)
	*/
	t1.*,v1.*,t4.*
from
	t1,
	(
	select	/*+ qb_name(inline) no_merge */
		t2.n1, t3.n2, count(*)
	from	t2, t3
	where exists (
		select	/*+ qb_name(subq) no_unnest push_subq */
			null
		from	t5
		where	t5.object_id = t2.n1
		)
	and	t3.n1 = t2.n2
	group by t2.n1, t3.n2
	)	v1,
	t4
where
	v1.n1 = t1.n1
and	t4.n1(+) = v1.n1
;

Nominally it’s a three-table join, except the second table is an in-line view which joins two tables and includes an existence subquery. Temporarily I have made the join to t4 an outer join – but that’s just to allow me to make a point, I don’t want an outer join in the final query. I’ve had to include the no_merge() hint in the inline view to stop Oracle using complex view merging to “join then aggregate” when I want it to “aggregate then join”; I’ve included the no_unnest and push_subq hints to make sure that the subquery is operated as a subquery, but operates at the earliest possible moment in the inline view. Ignoring the outer join (which would make operation 1 a nested loop outer), this is the execution plan I want to see:


-------------------------------------------------------------------------------------------
| Id  | Operation                         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |       |    50 | 12850 |  4060   (1)| 00:00:21 |
|   1 |  NESTED LOOPS                     |       |    50 | 12850 |  4060   (1)| 00:00:21 |
|   2 |   NESTED LOOPS                    |       |    50 | 12850 |  4060   (1)| 00:00:21 |
|   3 |    NESTED LOOPS                   |       |    50 |  7400 |  4010   (1)| 00:00:21 |
|   4 |     TABLE ACCESS FULL             | T1    |  1000 |   106K|     3   (0)| 00:00:01 |
|   5 |     VIEW PUSHED PREDICATE         |       |     1 |    39 |     4   (0)| 00:00:01 |
|   6 |      SORT GROUP BY                |       |     1 |    16 |     4   (0)| 00:00:01 |
|   7 |       NESTED LOOPS                |       |     1 |    16 |     3   (0)| 00:00:01 |
|   8 |        TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
|*  9 |         INDEX UNIQUE SCAN         | T2_PK |     1 |       |     1   (0)| 00:00:01 |
|* 10 |          INDEX RANGE SCAN         | T5_I1 |     1 |     4 |     1   (0)| 00:00:01 |
|  11 |        TABLE ACCESS BY INDEX ROWID| T3    |     1 |     8 |     1   (0)| 00:00:01 |
|* 12 |         INDEX UNIQUE SCAN         | T3_PK |     1 |       |     0   (0)| 00:00:01 |
|* 13 |    INDEX UNIQUE SCAN              | T4_PK |     1 |       |     0   (0)| 00:00:01 |
|  14 |   TABLE ACCESS BY INDEX ROWID     | T4    |     1 |   109 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   9 - access("T2"."N1"="T1"."N1")
       filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("SUBQ") */ 0 FROM
              "T5" "T5" WHERE "T5"."OBJECT_ID"=:B1))
  10 - access("T5"."OBJECT_ID"=:B1)
  12 - access("T3"."N1"="T2"."N2")
  13 - access("T4"."N1"="V1"."N1")

Note, particularly, operation 5: VIEW PUSHED PREDICATE, and the associated access predicate at line 9 “t2.n1 = t1.n1″ where the predicate based on t1 has been pushed inside the inline view: so Oracle will evaluate a subset view for each selected row of t1, which is what I wanted. Then you can see operation 10 is an index range scan of t5_i1, acting as a child to the index unique scan of t2_pk of operation 9 – that’s Oracle keeping the subquery as a subquery and executing it as early as possible.

So what happens when I try to get this execution plan using the SQL and hints I’ve got so far ?

Here’s the plan I got from 10.2.0.5:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    50 | 12750 |    62   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |    50 | 12750 |    62   (4)| 00:00:01 |
|*  2 |   HASH JOIN                  |       |    50 |  7350 |    12  (17)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | T1    |  1000 |   105K|     3   (0)| 00:00:01 |
|   4 |    VIEW                      |       |    50 |  1950 |     9  (23)| 00:00:01 |
|   5 |     HASH GROUP BY            |       |    50 |   800 |     9  (23)| 00:00:01 |
|*  6 |      HASH JOIN               |       |    50 |   800 |     7  (15)| 00:00:01 |
|*  7 |       TABLE ACCESS FULL      | T2    |    50 |   400 |     3   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN      | T5_I1 |     1 |     4 |     1   (0)| 00:00:01 |
|   9 |       TABLE ACCESS FULL      | T3    |  1000 |  8000 |     3   (0)| 00:00:01 |
|  10 |   TABLE ACCESS BY INDEX ROWID| T4    |     1 |   108 |     1   (0)| 00:00:01 |
|* 11 |    INDEX UNIQUE SCAN         | T4_PK |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."N1"="T1"."N1")
   6 - access("T3"."N1"="T2"."N2")
   7 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("SUBQ") */ 0
              FROM "T5" "T5" WHERE "T5"."OBJECT_ID"=:B1))
   8 - access("T5"."OBJECT_ID"=:B1)
  11 - access("T4"."N1"="V1"."N1")

In 10g the optimizer has not pushed the join predicate down into the view (the t1 join predicate appears in the hash join at line 2); I think this is because the view has been declared non-mergeable through a hint. So let’s upgrade to 11.1.0.7:

------------------------------------------------------------------------------------------
| Id  | Operation                        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |       |    50 | 12950 |  4008K  (1)| 05:34:04 |
|   1 |  NESTED LOOPS                    |       |    50 | 12950 |  4008K  (1)| 05:34:04 |
|   2 |   MERGE JOIN CARTESIAN           |       |  1000K|   205M|  2065   (3)| 00:00:11 |
|   3 |    TABLE ACCESS FULL             | T1    |  1000 |   105K|     3   (0)| 00:00:01 |
|   4 |    BUFFER SORT                   |       |  1000 |   105K|  2062   (3)| 00:00:11 |
|   5 |     TABLE ACCESS FULL            | T4    |  1000 |   105K|     2   (0)| 00:00:01 |
|   6 |   VIEW PUSHED PREDICATE          |       |     1 |    43 |     4   (0)| 00:00:01 |
|   7 |    SORT GROUP BY                 |       |     1 |    16 |     4   (0)| 00:00:01 |
|*  8 |     FILTER                       |       |       |       |            |          |
|   9 |      NESTED LOOPS                |       |     1 |    16 |     3   (0)| 00:00:01 |
|  10 |       TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
|* 11 |        INDEX UNIQUE SCAN         | T2_PK |     1 |       |     1   (0)| 00:00:01 |
|* 12 |         INDEX RANGE SCAN         | T5_I1 |     1 |     4 |     1   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID| T3    |  1000 |  8000 |     1   (0)| 00:00:01 |
|* 14 |        INDEX UNIQUE SCAN         | T3_PK |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - filter("T4"."N1"="T1"."N1")
  11 - access("T2"."N1"="T4"."N1")
       filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("SUBQ") */ 0 FROM
              "T5" "T5" WHERE "T5"."OBJECT_ID"=:B1))
  12 - access("T5"."OBJECT_ID"=:B1)
  14 - access("T3"."N1"="T2"."N2")

Excellent – at operation 6 we see VIEW PUSHED PREDICATE, and at operation 11 we can see that the join predicate “t2.n1 = t1.n1″.

Less excellent – we have a Cartesian Merge Join between t1 and t4 before pushing predicates. Of course, we told the optimizer to push join predicates into the view, and there are two join predicates, one from t1 and one from t4 – and we didn’t tell the optimizer that we only wanted to push the t1 join predicate into the view. Clearly we need a way of specifying where predicates should be pushed FROM as well as a way of specifying where they should be pushed TO.

If we take a look at the outline information from the execution plan there’s a clue in one of the outline hints: PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 2) – the hint has a couple of extra parameters to it – perhaps the 2 and 3 refer in some way to the 2nd and 3rd tables in the query. If I test with an outer join to t4 (which means the optimizer won’t be able to use my t4 predicate as a join INTO the view) I get the plan I want (except it’s an outer join, of course), and the hint changes to: PUSH_PRED(@”MAIN” “V1″@”MAIN” 2) – so maybe the 2 refers to t1 and the 3 referred to t4, so let’s try the following hints:


push_pred(v1@main 2)
no_push_pred(v1@main 3)

Unfortunately this gives us the following plan:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    50 | 12300 |    62   (4)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER          |       |    50 | 12300 |    62   (4)| 00:00:01 |
|*  2 |   HASH JOIN                  |       |    50 |  6900 |    12  (17)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | T1    |  1000 |   105K|     3   (0)| 00:00:01 |
|   4 |    VIEW                      |       |    50 |  1500 |     9  (23)| 00:00:01 |
|   5 |     HASH GROUP BY            |       |    50 |   800 |     9  (23)| 00:00:01 |
|*  6 |      HASH JOIN               |       |    50 |   800 |     7  (15)| 00:00:01 |
|*  7 |       TABLE ACCESS FULL      | T2    |    50 |   400 |     3   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN      | T5_I1 |     1 |     4 |     1   (0)| 00:00:01 |
|   9 |       TABLE ACCESS FULL      | T3    |  1000 |  8000 |     3   (0)| 00:00:01 |
|  10 |   TABLE ACCESS BY INDEX ROWID| T4    |     1 |   108 |     1   (0)| 00:00:01 |
|* 11 |    INDEX UNIQUE SCAN         | T4_PK |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."N1"="T1"."N1")
   6 - access("T3"."N1"="T2"."N2")
   7 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("SUBQ") */ 0
              FROM "T5" "T5" WHERE "T5"."OBJECT_ID"=:B1))
   8 - access("T5"."OBJECT_ID"=:B1)
  11 - access("T4"."N1"(+)="V1"."N1")

We don’t have join predicate pushdown; on the other hand we’ve got the join order we specified with our leading() hint – and that didn’t appear previously when we got the Cartesian Merge Join with predicate pushdown (our hints were incompatible, so something had to fail). So maybe the numbering has changed because the join order has changed and I should push_pred(v1 1) and no_push_pred(v1 3). Alas, trying all combinations of 2 values from 1,2, and 3 I can’t get the plan I want.

So let’s upgrade to 11.2.0.4. As hinted we get the pushed predicate with Cartesian merge join, but this time the push_pred() hint that appears in the outline looks like this: PUSH_PRED(@”MAIN” “V1″@”MAIN” 2 1) – note how the numbers have changed between 11.1.0.7 and 11.2.0.4. So let’s see what happens when I try two separate hints again, fiddling with the third parameter, e.g.:


push_pred(v1@main 1)
no_push_pred(v1@main 2)

With the values set as above I got the plan I want – it’s just a pity that I’m not 100% certain how the numbering in the push_pred() and no_push_pred() hints is supposed to work. In this case, though, it no longer matters as all I have to do now is create an SQL Baseline for my query, transferring the hinted plan into the the SMB with the unhinted SQL.

In passing, I did manage to get the plan I wanted in 11.1.0.7 by adding the hint /*+ outline_leaf(@main) */ to the original SQL. I’m even less keen on doing that than I am on adding undocumented parameters to the push_pred() and no_push_pred() hints, of course; but having done it I did wonder if there are any SQL Plan Baslines in 11.1.0.7 production systems that include the push_pred() hint that are going to change plan on the upgrade to 11.2.0.4 because the numbering inside the hint is supposed to change with version.

Footnote:

Loosely speaking, this blog note is the answer to a question posted about five years ago.


Parse Time

Tue, 2014-12-09 17:53

Here’s a little query I wrote some time ago to see where my time went while running a query. It’s nothing sophisticated, just one of those simple things you can do with v$active_session_history (or dba_hist_active_sess_history, if you don’t get to the crime scene in time).


set null CPU

select
        sql_exec_id, in_parse, in_hard_parse, event, count(*)
from
        v$active_session_history
where
        sql_id = '{your choice here}'
group by
        sql_exec_id, in_parse, in_hard_parse, event
order by
        sql_exec_id, in_parse, in_hard_parse, count(*)
;

SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
   40341649 N N db file sequential read                                68
   40341649 N N CPU                                                    21
   40341649 N N db file scattered read                                  9
            Y Y CPU                                                     7

I had run the query that I was tracking exactly once, but my ASH query allows for, and separates, multiple executions of the same query by summing on sql_exec_id (the thing that the SQL Monitor also uses). The last row looks a little odd, though: it does’t have a value for sql_exec_id; that’s because those are ASH samples when the query is being optimized, not being executed – note that I’ve reported the columns in_parse and in_hard_parse – and both are set to “Y” for that row.

So  (statistically speaking) it’s probably taken about 7 CPU seconds for Oracle to optimise the statement, and from the rest of the results you can see that it’s taken about 21 CPU seconds to run, with 68 seconds spent on random I/Os and 9 seconds spent on multiblock reads for a total of 103 seconds elapsed.

Seven seconds sounds like quite a lot of time for parsing – but it was a fairly complex statement. However, the reason I’d been running the statement on a test system (a fairly good clone of production) was that I’d been seeing something stranger on production and I needed to get a baseline on the test system before I starting trying to fix the problem. Here’s the equivalent ASH reports for the same statement when it had run on production at a time that allowed me to capture its ASH samples.


SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
  514257929 N N CPU                                                    21
  514257929 Y Y latch: row cache objects                                1
            Y Y CPU                                                   119

Note the 119 CPU seconds spent parsing to run a 22 second query ! But that wasn’t the worst of it – sometimes the results looked more like this:


SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
  523748347 N N db file sequential read                                 1
  523748347 N N CPU                                                    32
            Y Y resmgr:cpu quantum                                     11
            Y Y latch: row cache objects                               38
            Y Y CPU                                                   415

That’s a pretty catastrophic optimsation time – especially since the statement can be optimised in seven seconds in another environment. You might note the resource manager kicking in there, the session is exceeding the CPU limit set for its resource group – though not very often given how infrequently it seems to be waiting on “resmgr:cpu quantum”. But there’s another important wrinkle to this report – which you can see when compare v$active_session_history with v$sql.


  1  select
  2     sql_id,
  3     round(cpu_time/1000000,2) cpu,
  4     round(elapsed_time/1000000,2) ela from v$sql
  5  where
  6     sql_text like '{some identifying text}'
  7* and        sql_text not like '%v$sql%'
SQL> /

SQL_ID               CPU        ELA
------------- ---------- ----------
2atyuc3vtpswy     285.81     516.13

The figures from v$sql don’t match very well with the summed results from ASH which has a total sample of 497 seconds and a CPU sample of 447 seconds. I think I can live with a statistical error of 4% ((516-497)/516) in a random sample for total time, but how do you explain the 36% error in the CPU time ?

The samples reporting “resmgr:cpu quantum” are a clue: the machine is overloaded; it’s trying to use far more CPU time than is available. As a result a process that gets pushed off the CPU by the operating system scheduler while it’s running can spend a long time in the run queue waiting to start running again. And if it’s an Oracle process that got pre-empted it doesn’t “know” that it’s not running, it didn’t put itself into a wait state so all it “knows” is that it’s not in a wait state.

So how do ASH and v$sql differ ? The code that derives the cpu_time for v$sql issues a call to the O/S asking “how much CPU have I used”. The code that takes an ASH sample says: “is this session active, if so is it in a wait state and if it’s not in a wait state then it’s either on the CPU or in the run queue waiting for the CPU”. So when we compare v$sql with ASH the difference in CPU is (statistically speaking) time spent in the run queue. So of our 447 seconds of CPU recorded by ASH, we spent 161 seconds in the CPU run queue waiting for CPU.

We still have to account for the difference between the 7 CPU seconds on a test system and the variation between 119 CPU seconds and 415 CPU seconds in optimisation on production. In a word – concurrency. Apart from everything else going on at the time there were, in the worst case, 16 slightly different versions of the same statement being called at the same time (on a machine with 24 cores) – all 16 statement were competing violently for the same resources at the same time, and the escalating conflict as more session joined in produced an exponential growth in time spent competing for resources rather than doing the job. (I managed to demonstrate the effect quite nicely during the evening by limiting the batch to 4 concurrent executions – and got a typical parse time of 40 CPU seconds).

I’ve often warned people about the problems of concurrency and encouraged them to think about how much time is being spent in competition rather then doing the job; I think this is the most extreme case I’ve seen in a production system. Given how dramatic the variation is, I can’t help wondering if the problem has been exaggerated by some corner case of sessions spinning for mutexes or latches; perhaps even an error in the code that allows resource management to put a session into “resmgr:cpu quantum” while it’s holding a latch or mutex. (I wasn’t able to emulate such an extreme display of the problem on a slightly newer version of Oracle, but I was able to construct a test that demonstrated the effect with a much smaller wastage of CPU.)

The solution (almost certainly): the statements are extremely similar, varying in just one predicate that is using a literal constant. It ought to be a relatively safe and simple change to make the query use a bind variable in that predicate. If the solution is adopted I’d expect to see the (once only) parse time on production drop back to about 7 seconds. Of course, if any other session tries to call the same statement at the same time it ought to end up reporting 7 seconds waiting on “cursor: pin S wait on X” before it starts executing – but that 7 seconds wait is a lot better than an extra 493 CPU seconds trying to optimise the “same” statement at the same time.

Footnote:

Running a minor variation on my ASH query to report the sql_plan_hash_value along with the sql_exec_id, I found that the “fix-up” code that updates older ASH rows with information that only becomes available later (e.g. plan_hash_value when optimising, or long time_waited values for a single wait event) only goes back 255 rows – so when I queried ASH for the statements that took 500 seconds to optimizer only 255 of the in_parse rows showed the final sql_plan_hash_value.

 


Cardinality Change

Mon, 2014-12-08 15:35

Here’s an entertaining little change across versions of Oracle, brought to my attention by Tony Hasler during UKOUG Tech 14. It’s a join cardinality estimate, so here are a couple of tables to demonstrate the issue – the only columns needed are the alpha_06 columns, but I reused some code from other demonstrations to create my test case, so there are lots of irrelevant columns in the create table script:


create table t1 nologging as
with generator as (
        select rownum id
        from dual
        connect by rownum <= 1000
)
select
        rownum                                          id,
        mod(rownum-1,200)                               mod_200,
        trunc(dbms_random.value(0,300))                 rand_300,
        mod(rownum-1,10000)                             mod_10000,
        trunc(sysdate) +
                trunc(dbms_random.value(0,1000))        date_1000,
        dbms_random.string('l',6)                       alpha_06,
        dbms_random.string('l',20)                      alpha_20
from
        generator,
        generator
where
        rownum <= 1e6
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

create table t2 nologging as select * from t1;
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 1')

I’m going to join t1 to t2 with a predicate based on the alpha_06 columns – using a LIKE predicate. Before I do so I’ll point out that there are are 1,000,000 rows in the table, and (checking the column stats) 985,920 distinct values for alpha_06. Here’s my query, with the execution plan I got from 11.1.0.7:


select
        count(*)
from
        t1, t2
where
        t2.alpha_06 like t1.alpha_06
;

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1122M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |    50G|   651G|  1122M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1123   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   | 50000 |   341K|  1122   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The 50,000 cardinality estimate for t2 looks like the standard 5% guess for “column >= {unknown value}”, following which the join cardinality of 50G is the same 5% guess applied to the Cartesian join between t1 and t2 (1M * 1M * 0.05). It’s not a good estimate in my case because the right answer happens to be close to 1M rows, specifically 1,003,176. So let’s upgrade to 11.2.0.4 and see what we get instead:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1050M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |  2014K|    26M|  1050M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1051   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   |     2 |    14 |  1050   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The estimate has dropped from 50 Billion rows down to 2 Million – a factor of about 25,000: possibly an indicator that the algorithm has changed, and that a few people might find execution plans changing as they upgrade to a newer version of Oracle. The change occurred at 11.2.0.2 as revealed by fix control 9303766 which has the description: “use 1/NDV+1/NROWS for col1 LIKE col2 selectivities”.

Just as a quick check on the arithmetic: there are 1 million rows in table t2, with (as noted above) 985,920 distinct values in the column, so the selectivity should be: 1/1000000 + 1/985920 = 2.014281 * e-6. Multiply the selectivity by 1e6 and you get 2, the cardinality estimate for t2; multiply the selectivity by 1M*1M (the Cartesian join) and you get 2,014,281, the cardinality estimate of the join. QED.

There are workarounds, of course. One would be to reverse out the fix control, either as an initialisation parameter or in a session logon trigger, another might be to modify the SQL – I think the following would be equivalent:


select
        *
from    t1, t2
where
        t2.alpha_06 like substr(t1.alpha_06,1,length(t1.alpha_06))||'%'
and     t1.alpha_06 is not null
and     t2.alpha_06 is not null

This changes the critical predicate from the form “col1 like col2″ to “col1 like {unknown value from function}” i.e. back to a case where the optimizer uses the 5% guess, and the cardinality estimates go back the original values.