Skip navigation.

Feed aggregator

Oracle GoldenGate Processes – Part 4 – Replicat

DBASolved - Thu, 2015-01-08 15:00

The replicat process is the apply process within the Oracle GoldenGate environment.  The replicat is responsible for reading the remote trail files and applying the data found in cronilogical order.  This ensures that the data is applied in the same order it was captured.  

Until recently there was only one version of a replicat, that version was the classic version.  As of 12.1.2.0, there are now three distinct versions of a replicat.  These replicat types are:

  • Classic Mode
  • Coordinated Mode
  • Integrated Mode

Each of on these modes provide some sort of benefit depending on the database being applied to.  Oracle is pushing everyone to a more integrated approach; however, you have to be on database version 11.2.0.4 at a minimum.  

To configure the replicat process is similar to the extract and data pump processes.

Adding a Replicat:

From GGSCI (classic):

$ cd $OGG_HOME
$ ./ggsci
GGSCI> add replicat REP, exttrail ./dirdat/rt

Note:  The add command is assuming that you already have a checkpoint table configured in the target environment.

Edit Replicat parameter file:

From GGSCI:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> edit params REP

From Command Line:

$ cd $OGG_HOME
$ cd ./dirprm
$ vi REP.prm

Example of Replicat Parameter file:

REPLICAT REP
SETENV (ORACLE_HOME=”/u01/app/oracle/product/11.2.0/db_3″)
SETENV (ORACLE_SID=”orcl”)
USERID ggate, PASSWORD ggate
ASSUMETARGETDEFS
DISCARDFILE ./dirrpt/REP.dsc, append, megabytes 500
WILDCARDRESOLVE IMMEDIATE
MAP SCOTT.*, TARGET SCOTT.*;

Start the Replicat process:

Depending on if you are starting the replicat for the first time or not; how you start is going to be similar yet different.

To star the Replicat after an inital load:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> start replicat REP, atcsn [ current_scn ]

Note: The current_scn needs be obtained from the source database prior to doing the inital load of data to the target.  This ensure the consistancy of the data and provides a starting point for the replicat to start applying data from.

To start Replicat normally:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> start replicat REP

Stop the Replicat process:

Stop replicat normally:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> stop replicat REP

Enjoy!

about.me: http://about.me/dbasolved


Filed under: Golden Gate
Categories: DBA Blogs

Oracle Priority Support Infogram for 08-JAN-2015

Oracle Infogram - Thu, 2015-01-08 14:08

RDBMS
Adaptive Query Optimization in Oracle Database 12c, from The ORACLE-BASE Blog.
Performance Problems with Dynamic Statistics in Oracle 12c, from Pythian.
From Robert G. Freeman on Oracle: Oracle Multitenant - Should you move to a single CDB/PDB architecture?
SQL Developer
From that JEFF SMITH: SQL Developer 4.1: Search & Replace Enhancements
Exalogic
From Oracle Exalogic: E-Business Suite 12.2.4 VM Templates for Exalogic
MySQL
From Oracle's MySQL Blog: Learn MySQL Performance Tuning From the Experts..APEX
From Dimitri Gielis Blog (Oracle Application Express - APEX): Generating sample data for your APEX application.
ADF and SOAP
Oracle Mobile Suite Service Bus REST and ADF BC SOAP, from Andrejus Baranovskis Blog.
Java
Validating Oracle Java Cloud Service HA, from Java.net.
NetBeans
From Geertjan's Blog: Classpath for Python Projects in NetBeans IDE.
Business Analytics
From Business Analytics - Proactive Support:
: Patch Set Update: 11.1.2.3.507 for Oracle Hyperion Reporting and Analysis for Interactive Reporting Release 11.1.2.3.000
Business Analytics Monthly Index - December 2014
WebCenter
Oracle Documents Cloud Service R2 – Now Available!, from the Oracle WebCenter Blog.
Security
From Integrigy: Oracle Audit Vault Reports
EBS
From Oracle E-Business Suite Technology:
Of Particular Importance:
àJava Auto-Update Upgrades JRE 7 to 8 in January 2015 ß

Database 12.1.0.2 Certified with E-Business Suite 12.2
Oracle VM Templates For EBS 12.2.4 For Exalogic Now Available
Enhance E-Business Suite Security By Switching to TLS
2014: A Year in Review for Oracle E-Business Suite
From Oracle E-Business Suite Support Blog:
Advanced Global Intercompany System Setup Diagnostic Available!
Are you occasionally getting complaints that the Purchase Order pdf file is not making it to the Supplier or Approver?
Get Notified When A New or Updated Product Analyzer or Diagnostic is Released
Webcast: DEM: Troubleshooting Engine Issues
Webcast: In-Memory Cost Management for Discrete Manufacturing
Logistics and Inventory Consolidated RUP11 Patches Released
Want to take a look (and replay) our 2014 Procurement Webcasts?
Did Your Consolidation Journals Post Twice Recently?
Budgeting For The New Year? Allow Us To Help!
Did You Check The Latest Rollup Patch for Report Manager?
Financial Reporting Using Excel? Get Some Hints!
…And Finally
Best of 2014: Communication Charts Help You Negotiate In Different Cultures, from PSFK. A lot of the charts look like artillery shells and hand grenades. I wonder if that indicates something about the nature of human negotiation overall? 
From Quantamagazine: A New Physics Theory of Life. Could be the biggest thing in biology since sliced bread (biologists love sandwiches), or…could be a big yawner if it doesn't pan out.

Oracle GoldenGate Processes – Part 3 – Data Pump

DBASolved - Thu, 2015-01-08 14:00

The Data Pump group is an secondary extract group that is used to help send data over a network.  Although a data pump is another extract group, don’t confuse it with the primary extrat group. The main purpose of the data pump extract is to write the captured data over the network to the remote trail files on the target system.  

Note: if the data pump is not confgured then the primary extract group will write directly to the remote trail file.

Why would you want to use a data pump process?  Two advantage of using a data pump are:

  • Protects against network failures
  • Helping to consolidate data, from multiple sources, into a single remote trail file

The data pump process, just as the extract process, uses a parameter file to run the process.  The parameter file can be edited either before or after adding the process to the Oracle GoldenGate environment.

Adding a Data Pump:

From GGSCI:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> add extract PMP, exttrailsource ./dirdat/lt
GGSCI> add rmttrail ./dirdat/rt, extract PMP, megabytes 200

Note: In the example above notice that the data pump is reading from ./dirdat/lt and then writing to ./dirdat/rt on the remote server.

Edit Data Pump parameter file:

From GGSCI:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> edit params PMP

From Command Line:

$ cd $OGG_HOME
$ cd ./dirprm
$ vi ./PMP.prm

Example of Data Pump parameter file:

EXTRACT PMP

PASSTHRU

RMTHOST 172.15.10.10, MGRPORT 15000, COMPRESS

RMTTRAIL ./dirdat/rt

TABLE SCOTT.*;

Start/Stop the Data Pump:

Start the Data Pump:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> start extract PMP

Stop the Data Pump:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> stop extract PMP

Enjoy!

about.me: http://about.me/dbasolved


Filed under: Golden Gate
Categories: DBA Blogs

Oracle GoldenGate Processes – Part 2 – Extract

DBASolved - Thu, 2015-01-08 13:00

The extract process of Oracle GoldenGate is used to perform change data capture from the source database.  The extract can be used to read the online transaction log (in Oracle the online redo logs) or the associated archive logs.  The data that is extracted from the source database is then placed into an trail file (another topic for a post) for shipping to the apply sided. 

To configure an extract process there needs to be a parameter file associated with it.  The parameter file can be edited after adding the extract to the Oracle GoldenGate environment if needed.  In order to configure an extract it needs to be added to the environment and assigned a local trail file location.

Adding an Extract and Local Trail File:

Using GGSCI:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> add extract EXT, tranlog, begin now
GGSCI> add exttrail ./dirdat/lt, extract EXT, megabytes 200

Note: The trail file is required to be prefixed with a two letter prefix.  

Edit Extract Parameter File:

From GGSCI:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> edit params EXT

Edit Extract Parameter File from Command Line:

$ cd $OGG_HOME
$ cd ./dirprm
$ vi ./ext.prm

Example of Extract Parameter File:

EXTRACT EXT
USERID ggate, PASSWORD ggate 
TRANLOGOPTIONS DBLOGREADER
SETENV (ORACLE_HOME=”/u01/app/oracle/product/11.2.0/db_3″)
SETENV (ORACLE_SID=”orcl”)
WARNLONGTRANS 1h, CHECKINTERVAL 30m
EXTTRAIL ./dirdat/lt
WILDCARDRESOLVE IMMEDIATE
TABLE SCOTT.*;

Start/Stop the Extract:

Start Extract:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> start extract EXT

Stop Extract:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> stop extract EXT

Enjoy!

about.me: http://about.me/dbasolved


Filed under: Golden Gate
Categories: DBA Blogs

"SELECT * FROM TABLE" Runs Out Of TEMP Space

Randolf Geist - Thu, 2015-01-08 12:49
Now that I've shown in the previous post in general that sometimes Parallel Execution plans might end up with unnecessary BUFFER SORT operations, let's have a look what particular side effects are possible due to this.

What would you say if someone tells you that (s)he just did a simple, straightforward "SELECT * FROM TABLE" that took several minutes to execute without returning, only to then error out with "ORA-01652 unable to extend temp segment", and the TABLE in question is actually nothing but a simple, partitioned heap table, so no special tricks, no views, synonyms, VPD etc. involved, it's really just a plain simple table?

Some time ago I was confronted with such a case at a client. Of course, the first question is, why would someone run a plain SELECT * FROM TABLE, but nowadays with power users and developers using GUI based tools like TOAD or SQLDeveloper, this is probably the GUI approach of a table describe command. Since these tools by default show the results in a grid that only fetches the first n rows, this typically isn't really a threat even in case of large tables, besides the common problems with allocated PX servers in case the table is queried using Parallel Execution, and the users simply keep the grid/cursor open and hence don't allow re-using the PX servers for different executions.

But have a look at the following output, in this case taken from 12.1.0.2, but assuming the partitioned table T_PART in question is marked parallel, resides on Exadata, has many partitions that are compressed via HCC, that uncompressed represent several TB of data (11.2.0.4 on Exadata produces a similar plan):


SQL> explain plan for
2 select * from t_part p;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format => 'BASIC PARTITION PARALLEL'));
Plan hash value: 2545275170

------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | | |
| 1 | PX COORDINATOR | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | | | Q1,02 | P->S | QC (RAND) |
| 3 | BUFFER SORT | | | | Q1,02 | PCWP | |
| 4 | VIEW | VW_TE_2 | | | Q1,02 | PCWP | |
| 5 | UNION-ALL | | | | Q1,02 | PCWP | |
| 6 | CONCATENATION | | | | Q1,02 | PCWP | |
| 7 | BUFFER SORT | | | | Q1,02 | PCWC | |
| 8 | PX RECEIVE | | | | Q1,02 | PCWP | |
| 9 | PX SEND ROUND-ROBIN | :TQ10000 | | | | S->P | RND-ROBIN |
| 10 | BUFFER SORT | | | | | | |
| 11 | PARTITION RANGE SINGLE | | 2 | 2 | | | |
| 12 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PART | 2 | 2 | | | |
|* 13 | INDEX RANGE SCAN | T_PART_IDX | 2 | 2 | | | |
| 14 | BUFFER SORT | | | | Q1,02 | PCWC | |
| 15 | PX RECEIVE | | | | Q1,02 | PCWP | |
| 16 | PX SEND ROUND-ROBIN | :TQ10001 | | | | S->P | RND-ROBIN |
| 17 | BUFFER SORT | | | | | | |
| 18 | PARTITION RANGE SINGLE | | 4 | 4 | | | |
| 19 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PART | 4 | 4 | | | |
|* 20 | INDEX RANGE SCAN | T_PART_IDX | 4 | 4 | | | |
| 21 | PX BLOCK ITERATOR | | 6 | 20 | Q1,02 | PCWC | |
|* 22 | TABLE ACCESS FULL | T_PART | 6 | 20 | Q1,02 | PCWP | |
| 23 | PX BLOCK ITERATOR | |KEY(OR)|KEY(OR)| Q1,02 | PCWC | |
|* 24 | TABLE ACCESS FULL | T_PART |KEY(OR)|KEY(OR)| Q1,02 | PCWP | |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

13 - access("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
20 - access("P"."DT">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
filter(LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')))
22 - filter("P"."DT">=TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2020-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND (LNNVL("P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2003-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))) AND (LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
24 - filter("P"."DT"<TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Can you spot the problem? It's again the "unnecessary BUFFER SORTS" problem introduced in the previous post. In particular the operation ID = 3 BUFFER SORT is "deadly" if the table T_PART is large, because it needs to buffer the whole table data before any row will be returned to the client. This explains why this simple SELECT * FROM T_PART will potentially run out of TEMP space, assuming the uncompressed table data is larger in size than the available TEMP space. Even if it doesn't run out of TEMP space it will be a totally inefficient operation, copying all table data to PGA (unlikely sufficient) respectively TEMP before returning any rows to the client.

But why does a simple SELECT * FROM TABLE come up with such an execution plan? A hint is the VW_TE_2 alias shown in the NAME column of the plan: It's the result of the "table expansion" transformation that was introduced in 11.2 allowing to set some partition's local indexes to unusable but still make use of the usable index partitions of other partitions. It takes a bit of effort to bring the table into a state where such a plan will be produced for a plain SELECT * FROM TABLE, but as you can see, it is possible. And as you can see from the CONCATENATION operation in the plan, the transformed query produced by the "table expansion" then triggered another transformation, the "concatenation" transformation mentioned in the previous post, that then results in the addition of unnecessary BUFFER SORT operations when combined with Parallel Execution.

Here is a manual rewrite that corresponds to the query that is the result of both, the "table expansion" and the "concatenation" transformation:

select * from (
select /*+ opt_param('_optimizer_table_expansion', 'false') */ * from t_part p where
("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
union all
select * from t_part p where
("P"."DT">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
and
(LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')))
union all
select * from t_part p where
("P"."DT">=TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2020-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND (LNNVL("P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2003-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))) AND (LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE('
2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
)
union all
select * from t_part p where
("P"."DT"<TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
;

But if you run an EXPLAIN PLAN on above manual rewrite, then 12.1.0.2 produces the following simple and elegant plan:

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | | |
| 1 | PX COORDINATOR | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | | | Q1,00 | P->S | QC (RAND) |
| 3 | UNION-ALL | | | | Q1,00 | PCWP | |
| 4 | VIEW | | | | Q1,00 | PCWP | |
| 5 | UNION-ALL | | | | Q1,00 | PCWP | |
| 6 | PX SELECTOR | | | | Q1,00 | PCWP | |
| 7 | PARTITION RANGE SINGLE | | 2 | 2 | Q1,00 | PCWP | |
| 8 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PART | 2 | 2 | Q1,00 | PCWP | |
|* 9 | INDEX RANGE SCAN | T_PART_IDX | 2 | 2 | Q1,00 | PCWP | |
| 10 | PX SELECTOR | | | | Q1,00 | PCWP | |
| 11 | PARTITION RANGE SINGLE | | 4 | 4 | Q1,00 | PCWP | |
| 12 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PART | 4 | 4 | Q1,00 | PCWP | |
|* 13 | INDEX RANGE SCAN | T_PART_IDX | 4 | 4 | Q1,00 | PCWP | |
| 14 | PX BLOCK ITERATOR | | 6 | 20 | Q1,00 | PCWC | |
|* 15 | TABLE ACCESS FULL | T_PART | 6 | 20 | Q1,00 | PCWP | |
| 16 | PX BLOCK ITERATOR | |KEY(OR)|KEY(OR)| Q1,00 | PCWC | |
|* 17 | TABLE ACCESS FULL | T_PART |KEY(OR)|KEY(OR)| Q1,00 | PCWP | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

9 - access("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
13 - access("P"."DT">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
filter(LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')))
15 - filter((LNNVL("P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2003-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))) AND (LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
17 - filter("P"."DT"<TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

I've disabled the "table expansion" transformation in this case, because it kicks in again when optimizing this query and just adds some harmless (and useless) branches to the plan that confuse the issue. Without those additional, useless branches it is very similar to the above plan, but without any BUFFER SORT operations, hence it doesn't cause any overhead and should return the first rows rather quickly, no matter how large the table is.

The 11.2.0.4 optimizer unfortunately again adds unnecessary BUFFER SORT operations even to the manual rewrite above, so as mentioned in the previous post the problem of those spurious BUFFER SORTs isn't limited to the CONCATENATION transformation.

Of course, since all this is related to Parallel Execution, a simple workaround to the problem is to run the SELECT * FROM TABLE using a NO_PARALLEL hint, and all those strange side effects of BUFFER SORTS will be gone. And not having unusable local indexes will also prevent the problem, because then the "table expansion" transformation won't kick in.

Interestingly, if the optimizer is told about the true intention of initially fetching only the first n rows from the SELECT * FROM TABLE - for example simply by adding a corresponding FIRST_ROWS(n) hint - at least in my tests using 12.1.0.2 all the complex transformations were rejected and a plain (parallel) FULL TABLE SCAN was preferred instead, simply because it is now differently costed, which would allow working around the problem, too.

If you want to reproduce the issue, here's a sample table definition, along with some comments what I had to do to bring it into the state required to reproduce:

-- The following things have to come together to turn a simple SELECT * from partitioned table into a complex execution plan
-- including Table Expansion and Concatenation:
--
-- - Unusable index partitions to trigger Table Expansion
-- - Partitions with usable indexes that are surrounded by partitions with unusable indexes
-- - And such a partition needs to have an index access path that is cheaper than a corresponding FTS, typically by deleting the vast majority of rows without resetting the HWM
-- - All this also needs to be reflected properly in the statistics
--
-- If this scenario is combined with Parallel Execution the "Parallel Concatenation" bug that plasters the plan with superfluous BUFFER SORT will lead to the fact
-- that the whole table will have to be kept in memory / TEMP space when running SELECT * from the table, because the bug adds, among many other BUFFER SORTs, one deadly BUFFER SORT
-- on top level before returning data to the coordinator, typically operation ID = 3
--
create table t_part (dt not null, id not null, filler)
partition by range (dt)
(
partition p_1 values less than (date '2001-01-01'),
partition p_2 values less than (date '2002-01-01'),
partition p_3 values less than (date '2003-01-01'),
partition p_4 values less than (date '2004-01-01'),
partition p_5 values less than (date '2005-01-01'),
partition p_6 values less than (date '2006-01-01'),
partition p_7 values less than (date '2007-01-01'),
partition p_8 values less than (date '2008-01-01'),
partition p_9 values less than (date '2009-01-01'),
partition p_10 values less than (date '2010-01-01'),
partition p_11 values less than (date '2011-01-01'),
partition p_12 values less than (date '2012-01-01'),
partition p_13 values less than (date '2013-01-01'),
partition p_14 values less than (date '2014-01-01'),
partition p_15 values less than (date '2015-01-01'),
partition p_16 values less than (date '2016-01-01'),
partition p_17 values less than (date '2017-01-01'),
partition p_18 values less than (date '2018-01-01'),
partition p_19 values less than (date '2019-01-01'),
partition p_20 values less than (date '2020-01-01')
)
as
with generator as
(
select /*+ cardinality(1000) */ rownum as id, rpad('x', 100) as filler from dual connect by level <= 1e3
)
select
add_months(date '2000-01-01', trunc(
case
when id >= 300000 and id < 700000 then id + 100000
when id >= 700000 then id + 200000
else id
end / 100000) * 12) as dt
, id
, filler
from (
select
(a.id + (b.id - 1) * 1e3) - 1 + 100000 as id
, rpad('x', 100) as filler
from
generator a,
generator b
)
;

delete from t_part partition (p_2);

commit;

exec dbms_stats.gather_table_stats(null, 't_part')

create unique index t_part_idx on t_part (dt, id) local;

alter index t_part_idx modify partition p_1 unusable;

alter index t_part_idx modify partition p_3 unusable;

alter index t_part_idx modify partition p_5 unusable;

alter table t_part parallel;

alter index t_part_idx parallel;

set echo on pagesize 0 linesize 200

explain plan for
select * from t_part p;

select * from table(dbms_xplan.display(format => 'BASIC PARTITION PARALLEL'));

Oracle Documents Cloud Service R2 – Now Available!

WebCenter Team - Thu, 2015-01-08 12:38
By: John Klinke, Oracle WebCenter Product Management
Business success depends on effective collaboration both inside and outside of your organization – with customers, partners, suppliers, remote employees – anytime, anywhere, on any device.
The latest release of Oracle Documents Cloud Service came out this week, and it delivers on this promise. With its fast, intuitive web interface and easy-to-use desktop and mobile applications, people can view and collaborate on files even when offline, keeping your organization running efficiently and your employees staying productive from any location.
Improved User Experience Getting employees started on Oracle Documents Cloud Service has never been easier. New users now have access to welcome tours to educate them on how to use each client. Overlay help is also available to point out features. Welcome emails have been updated to provide new users extra guidance in getting started and the online help provides quick answers to frequently asked questions.
Enhanced Productivity Improving productivity when it comes to content collaboration is one of the key objectives of Oracle Documents Cloud Service. Productivity improvements in this release can be seen across all the end-user applications – web , desktop, and mobile.  For quickly navigating to the most important files, you can now mark content as a favorite from the web UI and access all your favorite files with one click.  Another new capability is the ability to reserve content for editing so others are notified that you are working on it.  For mobile users, this release includes updated mobile apps with improvements around managing and sharing documents and folders from your smartphone and tablet. 
Increased Security

This release of Oracle Documents Cloud Service includes additional security around the use of public links for content sharing. Public links now have optional expirations dates for you to control how long links are active. You can have added security around your public links by setting up access codes, an 8-character or longer password that must be entered for anyone to access your link.

Seamless Integration The Oracle Documents Cloud Service Developer Platform also saw numerous improvements in this release. Both the REST APIs and the embedded UI capabilities were enhanced to make it easier for customers and partners to integrate document-centric collaboration into existing applications and business processes. Developer platform enhancements include additional APIs for securely managing folders and fetching user information. For embedding Oracle Documents Cloud Service into other applications, enhancements include support for previewing documents and videos in an embedded iFrame as well as role-based embedded links.
Oracle Documents Cloud Service is Oracle’s next-generation content collaboration solution. Combined with Oracle WebCenter, you get a unified hybrid ECM solution for secure cloud-based content collaboration tied back to your ECM system of record, minimizing the costs, risks and complexity of managing your enterprise content.
Next-Gen ECM  Oracle Documents Cloud Service delivers next-generation ECM  -- providing file synchronization and sharing capabilities. Employees need these functions to address the control and security your organization requires. Oracle Documents Cloud Service has to tools to provide the hybrid solution of secure cloud-based file sharing and collaboration, as well as on-premise ECM integration that embodies a Next Generation Content Management System. 
For more information about Oracle Documents Cloud Service and Oracle’s hybrid ECM solution, please visit: https://cloud.oracle.com/documents
To make a purchase, you can do so here

Oracle GoldenGate Processes – Part 1 – Manager

DBASolved - Thu, 2015-01-08 12:00

Oracle GoldenGate is made up of processes that are used to ensure replication.  These processes include a manager process, an extract and a replicat process. The primise of this post is to focus on the manager process.

In order to configure and run an Oracle GoldenGate enviornment, a manager process must be running on all the source, target, and intermediary servers in the configuration.  This is due to the manager process being the controller process for the Oracle GoldenGate processes, allocates ports and performs file maintenance.  The manager process peforms the following functions within the Oracle GoldenGate isntance:

  • Starts Processes
  • Starts Dynamic Processes
  • Start the Controller process
  • Manage the port numbers for the processes
  • Trail File Management
  • Create reports for events, error and threshold

Note: There is only one manager processes per Oracle GoldenGate instance.

Configure Manager Process
Before a manager process can be started it needs to be configured.  There are many different parameters than can be used in the manager parameter file, but the only required one is PORT parameter.  The default port for a manager is 7809.  In order to edit the manager parameter file, it can be done either from the command line or from within the GGSCI utility.

Edit via command line:

$ cd $OGG_HOME/dirprm
$ vi mgr.prm

Edit via GGSCI:

GGSCI> edit params mgr

Example of a Manager parameter file:

PORT 15000
AUTOSTART ER *
AUTORESTART ER * , RETRIES 5, WAITMINUTES 5
PURGEOLDEXTRACTS ./dirdat/lt*, USECHECKPOINTS, MINKEEPHOURS 2

Start the Manager Process

Starting the manager process is pretty simple.  The process can be started either from the command line or from the GGSCI utility.

Start from command line:

$ cd $OGG_HOME
$ mgr paramfile ./dirprm/mgr.prm reportfile ./dirrpt/mgr.rpt

Start from GGSCI:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> start manager  
or
GGSCI> start mgr

Stop the Manager Process

Stopping the manager process is pretty simple as well (from the GGSCI).

Stop from GGSCI:

$ cd $OGG_HOME
$ ./ggsci
GGSCI> stop manager [ ! ]

Enjoy!

about.me: http://about.me/dbasolved


Filed under: Golden Gate
Categories: DBA Blogs

555 Timer simulator with HTML5 Canvas

Paul Gallagher - Thu, 2015-01-08 11:02
The 555 timer chip has been around since the '70s, so does the world really need another website for calculating the circuit values?

No! But I made one anyway. It's really an excuse to play around with HTML5 canvas and demonstrate a grunt & coffeescript toolchain.

See this running live at visual555.tardate.com, where you can find more info and links to projects and source code on GitHub.


(blogarhythm ~ Time's Up / Living Colour)

Securing Big Data - Part 3 - Security through Maths

Steve Jones - Thu, 2015-01-08 09:00
In the first two parts of this I talked about how Securing Big Data is about layers, and then about how you need to use the power of Big Data to secure Big Data.  The next part is "what do you do with all that data?".   This is where Machine Learning and Mathematics comes in, in other words its about how you use Big Data analytics to secure Big Data. What you want to do is build up a picture of
Categories: Fusion Middleware

Oracle University Leadership Circle 2015

The Oracle Instructor - Thu, 2015-01-08 04:56

I’m in the Leadership Circle again :-) That is a quarterly Corporate Award for the best instructors worldwide according to customer feedback.

Oracle University has generally high quality standards, so it is hard to stand out individually. The more I am proud to be listed together with these great colleagues:

Oracle University Leadership Circle 2015


Categories: DBA Blogs

How an employee mishap can reveal database login credentials

Chris Foot - Thu, 2015-01-08 04:02

Sometimes, the most grievous data breaches are not incited by sophisticated cybercriminals using the latest hacking techniques, but everyday employees who ignore basic protocols. 

Internal threat 
Last year, Symantec and the Ponemon Institute conducted a study on data breaches that occurred throughout 2012. The two organizations discovered that an astounding two-thirds of these incidents were caused by human errors and system issues. Most of these situations were spawned by workers mishandling confidential information, organizations neglecting industry and government regulations and lackluster system controls. 

"While external attackers and their evolving methods pose a great threat to companies, the dangers associated with the insider threat can be equally destructive and insidious," said Ponemon Institute Chairman Larry Ponemon. "Eight years of research on data breach costs has shown employee behavior to be one of the most pressing issues facing organizations today, up 22 percent since the first survey."

Facebook's mistake 
ITWire's David Williams noted that Facebook employees accidentally divulged the username and password of its MySQL database by using Pastebin.com. For those who aren't familiar with the service, Pastebin allows IT specialists to send bits of code via a compact URL, allowing professionals to share the code through an email, social media post or simple Web search. 

As URLs are designed so that anyone can view a Web page, it's possible for a random individual to accidentally come across a URL created by Pastebin, allowing him or her to read the content within the URL. As it turns out, Sintehtic Labs' Nathan Malcolm learned that Facebook programmers were exchanging error logs and code snippets to one another through Pastebin. 

By perusing the Pastebin URLs, Malcom discovered Facebook shell script and PHP code. Williams maintained that none of this data was obtained illegally, nor did he receive it from a Facebook engineers. Instead, the code was "simply lying around the Internet in public view." 

MySQL entry 
It just so happened that one of the URLs contained source code that revealed Facebook's MySQL credentials. The server address, the database name as well as the username and password were available to the public. Although Facebook has likely changed these access permissions since the accident occurred, it's still an example of how neglect can lead to stolen information. 

Implementing database security monitoring software is one thing, but ensuring workers are following policies that prevent data from accidentally being divulged to the public is another – it's a step that shouldn't be ignored. 

The post How an employee mishap can reveal database login credentials appeared first on Remote DBA Experts.

How an employee mishap can reveal database login credentials

Chris Foot - Thu, 2015-01-08 04:02

Sometimes, the most grievous data breaches are not incited by sophisticated cybercriminals using the latest hacking techniques, but everyday employees who ignore basic protocols. 

Internal threat 
Last year, Symantec and the Ponemon Institute conducted a study on data breaches that occurred throughout 2012. The two organizations discovered that an astounding two-thirds of these incidents were caused by human errors and system issues. Most of these situations were spawned by workers mishandling confidential information, organizations neglecting industry and government regulations and lackluster system controls. 

"While external attackers and their evolving methods pose a great threat to companies, the dangers associated with the insider threat can be equally destructive and insidious," said Ponemon Institute Chairman Larry Ponemon. "Eight years of research on data breach costs has shown employee behavior to be one of the most pressing issues facing organizations today, up 22 percent since the first survey."

Facebook's mistake 
ITWire's David Williams noted that Facebook employees accidentally divulged the username and password of its MySQL database by using Pastebin.com. For those who aren't familiar with the service, Pastebin allows IT specialists to send bits of code via a compact URL, allowing professionals to share the code through an email, social media post or simple Web search. 

As URLs are designed so that anyone can view a Web page, it's possible for a random individual to accidentally come across a URL created by Pastebin, allowing him or her to read the content within the URL. As it turns out, Sintehtic Labs' Nathan Malcolm learned that Facebook programmers were exchanging error logs and code snippets to one another through Pastebin. 

By perusing the Pastebin URLs, Malcom discovered Facebook shell script and PHP code. Williams maintained that none of this data was obtained illegally, nor did he receive it from a Facebook engineers. Instead, the code was "simply lying around the Internet in public view." 

MySQL entry 
It just so happened that one of the URLs contained source code that revealed Facebook's MySQL credentials. The server address, the database name as well as the username and password were available to the public. Although Facebook has likely changed these access permissions since the accident occurred, it's still an example of how neglect can lead to stolen information. 

Implementing database security monitoring software is one thing, but ensuring workers are following policies that prevent data from accidentally being divulged to the public is another – it's a step that shouldn't be ignored. 

The post How an employee mishap can reveal database login credentials appeared first on Remote DBA Experts.

5 Linux distributions for servers

Chris Foot - Thu, 2015-01-08 01:06

When a professional says he or she specializes in Linux operating systems, some may be cheeky enough to ask "which one?"

The truth is, depending on how knowledgeable a Linux administrator is, he or she could create dozens of unique iterations of the OS. Generally, there are a handful that have either been developed by companies who then redistribute the open-source OS. Iterations vary depending on the functions and settings certain professionals require of the OS. Listed below are five different Linux distributions for servers.

1. Debian 
According to Tecmint contributor Avishek Kumar, Debian is an OS that works best in the hands of system administrators or users possessing extensive experience with Linux. He described it as "extremely stable," making it a good option for servers. It has spawned several other iterations, Ubuntu and Kali being two of them. 

2. SUSE Linux Enterprise Server 
TechTarget's Sander Van Vugt lauded SUSE Linux as one of the most accessible Linux distributions available, also recognizing it for its administrator-friendly build. The latter feature may be due to its integration with Yet another Setup Tool, a Linux OS configuration program that enables admins to install software, configure hardware, develop networks and servers and several other much-needed tasks. 

3. Red Hat Enterprise Linux 
Kumar maintained that RHEL was the first Linux distribution designed for the commercial market, and is compatible with x86 and x86_64 server architectures. Due to the support that Red Hat provides for this OS, it is often the server OS of choice for many sysadmins. The only "drawback" of this solution is that it isn't available for free distribution, although a beta release can be downloaded for educational use. 

4. Kali Linux 
As was mentioned above, this particular iteration is an offshoot of Debian. While not necessarily recommended for servers (and one of the latest Linux distributions) it has primarily been developed to conduct penetration testing. One of the advantages associated with Kali is that Debian's binary packages can be installed on Kali. It serves as a fantastic security assessment program for users concerned with database or WiFi security.

5. Arch Linux 
Kumar maintained that one of the advantages associated with Arch is that it is designed as a rolling release OS, meaning every time a new version is unrolled, those who have already installed it won't have to re-install the program again. It is designed for the X86 processor architecture. 

The post 5 Linux distributions for servers appeared first on Remote DBA Experts.

Here Are Your First Links of 2015

Oracle AppsLab - Wed, 2015-01-07 20:16

Our team has been busy since the New Year, competing in the AT&T Developer Summit hackathon, which is Noel’s (@noelportugal) Everest, i.e. he tries to climb it every year, see 2013 and 2014.

If you follow our Twitter (@theappslab) or Facebook page, you might have seen the teaser. If not, here it is:

Image courtesy of AT&T Developer Program

Image courtesy of AT&T Developer Program’s Facebook page

Look for details later this week.

While you wait for that, enjoy these tidbits from our Oracle Applications User Experience colleagues.

Fit for Work: A Team Experience of Wearable Technology

Wearables are a thing, just look at the CES 2015 coverage, so Misha (@mishavaughandecided to distribute Fitbits among her team to collect impressions.

Good idea, get everyone to use the same device, collect feedback, although it seems unfair, given Ultan (@ultan) is perhaps the fittest person I know. Luckily, this wasn’t a contest of fitness or of most-wrist-worn-gadgets. Rather, the goal was to gather as much anecdotal experience as possible.

Bonus, there’s a screenshot of the Oracle HCM Cloud Employee Wellness prototype.

¡Viva Mexico!

Fresh off a trip to Jolly Old England, the OAUX team will be in Santa Fe, Mexico in late February. Stay tuned for details.

Speaking of, one of our developers in Oracle’s Mexico Development Center, Sarahi Mireles (@sarahimireles) wrote up her impressions and thoughts on the Shape and ShipIt we held in November, en español.

And finally, OAUX and the Oracle College Hire Program

Oracle has long had programs for new hires right out of college. Fun fact, I went through one myself many eons ago.

Anyway, we in OAUX have been graciously invited to speak to these new hires several times now, and this past October, Noel, several other OAUX luminaries and David (@dhaimes) were on a Morning Joe panel titled “Head in the Clouds,” focused loosely around emerging technologies, trends and the impact on our future lives.

Ackshaey Singh (from left to right), DJ Ursal (@djursal), Misha Vaughan (@mishavaughan), Joe Goldberg, Noel Portugal (@noelportugal), and David Haimes (@dhaimes)

 

Interesting discussion to be sure, and after attending three of these Morning Joe panels now, I’m happy to report that the attendance seems to grow with each iteration, as does the audience interaction.

Good times.Possibly Related Posts:

APEX Conferences in 2015

Dimitri Gielis - Wed, 2015-01-07 17:30
In 2015 most APEX-only conferences are scheduled in history. I definitely recommend to attend at least one of the conferences as it's a great time to meet other people, industry experts and the APEX Development Team.

You'll find me at following conferences:

25-MAR-2015: APEX World - Rotterdam, the Netherlands

Every year a hit - and this year extra special as it will be at a different location - the beautiful SS Rotterdam. It's not decided yet which topics I'll present in the Netherlands.



9/10-JUN-2015 - APEXConnect - Düsseldorf, Germany

It will be my first time at a conference hosted by DOAG. I've heard so many great things about the excitement there is in Germany for APEX, so I look forward meeting the people over there.

On the 10th at 11h I'll present about the integration of APEX and Microsoft Sharepoint.



21/25-JUN-2015 - ODTUG KScope - Hollywood, Florida

I attended KScope for the first time in 2006 and since then returned almost every year. One of my most favourite conferences for sure. On Sunday the APEX Development is hosting the APEX Symposium: a full day of talks by the development team themselves. It also gives you an opportunity to meet them and many other experts in a nice informal setting.

This year these are my talks:

  • A primer on Web Components in APEX
  • How to make APEX print through node.js




There are some other conferences with great APEX tracks, for example GLOC, Collaborate, DOAG, UKOUG, ...

Next to the conferences there are many local APEX Meetups too. It's a totally different setting and concept compared to a conference, so definitely something to check out too. You can take your code to the meetups and ask questions or show what you did yourself.

You'll find me at most (if not all) of the Belgium APEX Meetups and I might attend some other meetups or conferences depending my schedule.



Look forward seeing you at one of the conferences or meetups.

Categories: Development

<b>Contribution by Angela Golla,

Oracle Infogram - Wed, 2015-01-07 15:56
Contribution by Angela Golla, Infogram Deputy Editor

My Oracle Support Essentials Webcast Series
The My Oracle Support Essentials Series brings interactive expertise straight to your desktop. The goal of this program is to communicate with our customers and partners, provide tips and tricks on how to effectively work with Oracle Support, and take advantage of the PROACTIVE tools that are provided with your Oracle Support contract.  The January schedule is available in Doc Id 553747.1





Recommended Reading: Oracle Database 12c NUMA-Related Topics

Kevin Closson - Wed, 2015-01-07 15:42

This is a short post to recommend some recent blog posts by Nikolay Manchev and Bertrand Drouvot on the topic of Oracle Database 12c NUMA awareness.

Nikolay provides a very helpful overview on Linux Control Groups and how they are leveraged by Oracle Database 12c. Bertrand Drouvot carried the topic a bit further by leveraging SLOB to assess the impact of NUMA remote memory on a cached Oracle Database workload. Yes, SLOB is very useful for more than physical I/O! Good job, Bertrand!

These are good studies and good posts!

Also, one can refer to MOS 1585184.1 for more information on Control Groups and a helpful script to configure CGROUPS.

The following links will take you to Nikolay and Bertrand’s writings on the topic:

http://manchev.org/2014/03/processor-group-integration-in-oracle-database-12c/

https://bdrouvot.wordpress.com/2015/01/07/measure-the-impact-of-remote-versus-local-numa-node-access-thanks-to-processor_group_name/

 


Filed under: oracle

"2015: Year Of The Enterprise Selfie And 9 Other Predictions For Business Technology" by Michael Hickens

Linda Fishman Hoyle - Wed, 2015-01-07 13:29

We ponder the posts in late December that list the highs and lows of the previous 12 months. We read about the most popular food trends, baby names, and music videos of 2014. There are lists of top legal stories, top celebrities, and top tech quotes. We remember the ice bucket challenge, ISIS, Ebola, Ukraine, Robin Williams, Cuba, Sony, Taylor Swift, and Uber—all included on some 2014 list.

And then the crystal ball drops in Times Square on New Year’s Eve and we’re flung into 2015 on a short night’s sleep. Come January 1 every news service leads with a list of predictions for the coming year.

Our own Michael Hickins was out in front of the reviewers and the predictors and published a great OracleVoice piece for the Forbes audience on December 29. He says, “The annual ritual of predicting the future of technology is as humbling as it is fun.”

You can tell Hickins had fun writing the article. His ten predictions are snappy, smart, and useful.

Most Recent

Jonathan Lewis - Wed, 2015-01-07 12:21

There’s a thread on the OTN database forum at present asking for advice on optimising a query that’s trying to find “the most recent price” for a transaction given that each transaction is for a stock item on a given date, and each item has a history of prices where each historic price has an effective start date. This means the price for a transaction is the price as at the most recent date prior to the transaction date.

There is an absolutely standard way of expressing “the most recent occurrence” in SQL. Assume we have a table of (item_code, effective_date, price) with the obvious primary key of (item_code, effective_date), then a requirement to find “the most recent price for item XXXX as at 25th Dec 2014″ case would give us code like the following (note – all the examples in this note were run against Oracle 11.2.0.4):


select  *
from    prices  pri1
where   item_code = 'XXXX'
and     effective_date = (
                select  max(effective_date)
                from    prices  pri2
                where   pri2.item_code = 'XXXX'
                and     pri2.effective_date <= date'2014-12-25'
        )
/

The ideal execution plan that we should expect to see for this query is as follows (with a small variation if you had created the prices table as an index-organized table – which would probably be sensible in many cases):


-----------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |     1 |    52 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID   | PRICES |     1 |    52 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN            | PRI_PK |     1 |       |     1   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE              |        |     1 |    32 |            |          |
|   4 |     FIRST ROW                  |        |     1 |    32 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN (MIN/MAX)| PRI_PK |     1 |    32 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ITEM_CODE"='XXXX' AND "EFFECTIVE_DATE"= (SELECT
              MAX("EFFECTIVE_DATE") FROM "PRICES" "PRI2" WHERE
              "PRI2"."EFFECTIVE_DATE"<=TO_DATE(' 2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "PRI2"."ITEM_CODE"='XXXX'))

   5 - access("PRI2"."ITEM_CODE"='XXXX' AND "PRI2"."EFFECTIVE_DATE"<=
             TO_DATE('2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

As you can see, this plan is using the “driving subquery” approach – the order of operation is 5, 4, 3, 2, 1, 0: we do an index min/max range scan in line 5 to find the maximum effective date for the item, then pass that up through the (essentially redundant) First Row and Sort Aggregate operations to use as an input to the index unique scan at operation 2 which passes the rowid up to operation 1 to find the specific row. In my case this was 2 consistent gets for the range scan, 2 more for the unique scan, and one for the table access.

You might point out that my example uses the item_code ‘XXXX’ twice, once in the main query, once in the subquery; and you might decide that this was in very poor taste since we should clearly be using a correlated subquery – the correlating predicate ought to be: pri2.item_code = pri1.item_code. Here’s the execution plan I got when I made that change:


----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     1 |    78 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |         |     1 |    78 |     3   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |         |     1 |    78 |     3   (0)| 00:00:01 |
|   3 |    VIEW                      | VW_SQ_1 |     1 |    26 |     2   (0)| 00:00:01 |
|*  4 |     FILTER                   |         |       |       |            |          |
|   5 |      HASH GROUP BY           |         |     1 |    32 |     2   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN       | PRI_PK  |     1 |    32 |     2   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN         | PRI_PK  |     1 |       |     0   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| PRICES  |     1 |    52 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("PRI2"."ITEM_CODE"='XXXX')
   6 - access("PRI2"."ITEM_CODE"='XXXX' AND "PRI2"."EFFECTIVE_DATE"<=
              TO_DATE('2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("ITEM_CODE"='XXXX' AND "EFFECTIVE_DATE"="MAX(EFFECTIVE_DATE)")

The plan changes dramatically, the optimizer has unnested the subquery. In my case this didn’t make any difference to the overall performance as my data set was small, I only had one or two prices per item code, and the query was very basic; but in most other cases the change could be catastrophic.

The Problem Query

The requirement on OTN had a stock transactions (xo_stock_trans) table and a prices (xo_prices) table, and the OP had supplied some code to create and populate these tables with 6.4 million and 4.5 million rows respectively. Unfortunately the xo_prices table didn’t have a suitable unique constraint on it and ended up with lots of items having multiple prices for the same date.  The OP had created a function to return a price for an item given a driving date and price_type, and had a query that called that function three times per row (once for each of three price types); but this did not perform very well and the OP wanted to know if there was a way of addressing the requirement efficiently using pure SQL; (s)he had already tried the following:


select tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from xo_stock_trans tr
join xo_prices pr on pr.item = tr.item
                and pr.price_date = (select max(pr2.price_date)
                                     from xo_prices pr2
                                     where pr2.item = pr.item
                                       and pr2.price_date <= tr.trans_date
                                     )
where tr.trans_date between '01-AUG-2014' and '31-AUG-2014';  

That was SO close – it’s clearly implementing the right sort of strategy: but it didn’t perform well, so let’s check the execution plan:

------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                    |     1 |    70 |       |   168M(100)|234:06:13 |
|   1 |  NESTED LOOPS                 |                    |     1 |    70 |       |   168M(100)|234:06:13 |
|   2 |   NESTED LOOPS                |                    |     9 |    70 |       |   168M(100)|234:06:13 |
|   3 |    NESTED LOOPS               |                    |     9 |   450 |       |   168M(100)|234:06:13 |
|   4 |     VIEW                      | VW_SQ_1            |   286 | 10010 |       |   168M(100)|234:06:11 |
|   5 |      HASH GROUP BY            |                    |   286 |  7722 |       |   168M(100)|234:06:11 |
|   6 |       MERGE JOIN              |                    |   456G|    11T|       |  9153K(100)| 12:42:50 |
|   7 |        SORT JOIN              |                    |   202K|  2960K|       |   548   (2)| 00:00:03 |
|*  8 |         INDEX RANGE SCAN      | XO_STOCK_TRANS_IX2 |   202K|  2960K|       |   548   (2)| 00:00:03 |
|*  9 |        SORT JOIN              |                    |  4045K|    46M|   154M| 19043   (6)| 00:01:36 |
|* 10 |         INDEX FAST FULL SCAN  | XO_PRICES_IX1      |  4045K|    46M|       |  1936  (10)| 00:00:10 |
|* 11 |     TABLE ACCESS BY USER ROWID| XO_STOCK_TRANS     |     1 |    15 |       |     1   (0)| 00:00:01 |
|* 12 |    INDEX RANGE SCAN           | XO_PRICES_IX1      |     1 |       |       |     2   (0)| 00:00:01 |
|  13 |   TABLE ACCESS BY INDEX ROWID | XO_PRICES          |     1 |    20 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   9 - access(INTERNAL_FUNCTION("PR2"."PRICE_DATE")<=INTERNAL_FUNCTION("TR"."TRANS_DATE"))
       filter(INTERNAL_FUNCTION("PR2"."PRICE_DATE")<=INTERNAL_FUNCTION("TR"."TRANS_DATE"))
  10 - filter("PR2"."PRICE_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  11 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  12 - access("ITEM_1"="PR"."ITEM" AND "PR"."PRICE_DATE"="MAX(PR2.PRICE_DATE)")
       filter("PR"."ITEM"="TR"."ITEM")

The query was limited to August 2014, which was about 198,000 rows in my table, so we might expect some signs of a brute-force approach (tablescans and hash joins rather than indexes and nested loops) – but what we get ends up with a high-precision approach with a very bad cardinality estimate after a brute-force unnesting of the “max(price_date)” subquery. The unnesting has done a range scan over 200,000 stock_trans rows, and an index fast full scan on 4.5 million prices to do a merge join and hash aggregation to find the maximum price_date for each target row in the xo_stock_trans table. (See my earlier posting on table duplication for a variation and explanation of what Oracle has done here). This step is a lot of work, but the optimizer thinks it’s going to produce only 286 rows in the aggregated result, so the next steps in the plan are indexed nested loops – which actually operate 198,000 times.

With the clue from my initial description, we need to aim for a strategy where Oracle doesn’t unnest that subquery – so let’s experiment with a basic /*+ no_unnest */ hint in the subquery and see what happens. Here’s the resulting execution plan:


--------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |   527 | 18445 |       |  6602M  (1)|999:59:59 |
|*  1 |  FILTER                       |                |       |       |       |            |          |
|*  2 |   HASH JOIN                   |                |  3423M|   111G|  5336K| 76973  (90)| 00:06:25 |
|*  3 |    TABLE ACCESS FULL          | XO_STOCK_TRANS |   202K|  2960K|       |  2531  (13)| 00:00:13 |
|   4 |    TABLE ACCESS FULL          | XO_PRICES      |  4571K|    87M|       |  2275  (11)| 00:00:12 |
|   5 |   SORT AGGREGATE              |                |     1 |    12 |       |            |          |
|   6 |    FIRST ROW                  |                |     1 |    12 |       |     3   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM
              "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND "PR2"."ITEM"=:B2))
   2 - access("PR"."ITEM"="TR"."ITEM")
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

The subquery now survives, and we can see a min/max range scan in the plan – but the subquery is a filter() subquery and is applied to the result of joining the 200,000 transactions to every price that applies for the item in each transaction. The optimizer thinks that this join will produce roughly 3.4 million rows but in fact with the sample data set (which had many prices per item) the join resulted in 4.4 Billion rows. The min/max subquery is as efficient as it can be, but it’s running far too often; ideally we would like it to run at most once per transaction, so why is it running late ? We could try adding the /*+ push_subq */ hint to the subquery but if we do the plan doesn’t change.

Our rapid “most recent occurrence” revolved around accessing the prices table by index while “pre-querying” for the date using a min/max subquery that knew the relevant item code already. In this case, though, we’re doing a full tablescan of the xo_prices table so the method doesn’t apply. So let’s manipulate the query to force an indexed access path for the join to the xo_prices table by adding the hints /*+ leading(tr pr) use_nl(pr) index(pr) */ to the main body of the query. This is the resulting plan:


--------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                |   527 | 18445 |  6614M  (1)|999:59:59 |
|   1 |  NESTED LOOPS                   |                |  3413K|   113M|    11M  (2)| 16:29:13 |
|   2 |   NESTED LOOPS                  |                |  3413K|   113M|    11M  (2)| 16:29:13 |
|*  3 |    TABLE ACCESS FULL            | XO_STOCK_TRANS |   202K|  2960K|  2531  (13)| 00:00:13 |
|*  4 |    INDEX RANGE SCAN             | XO_PRICES_IX1  |    16 |       |    52   (2)| 00:00:01 |
|   5 |     SORT AGGREGATE              |                |     1 |    12 |            |          |
|   6 |      FIRST ROW                  |                |     1 |    12 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |     3   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID   | XO_PRICES      |    17 |   340 |    59   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   4 - access("PR"."ITEM"="TR"."ITEM")
       filter("PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM
              "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND "PR2"."ITEM"=:B2))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

We’re nearly there, the shape of the execution plan – lines 4 to 7, at any rate – matches the shape of the very simple example at the start of this article, we seem to be driving from the min/max subquery at line 7; unfortunately when we look at the predicate section of line 4 of the plan we can see that the subquery is still a filter() subquery not an access() subquery – it’s (nominally) being performed for every index entry in the range scan of the xo_prices index that we do for each xo_stock_trans row. What we want to see is an access() subquery – and checking the SQL we can see how to get there: the subquery currently correlates the item back to the xo_prices table, not to the xo_stock_trans table,  so let’s correct that correlation. Here’s our final query (though not formatted to my preference) with execution plan:


select /*+ leading(tr pr) use_nl(pr) index(pr) */  -- hint added
       tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from xo_stock_trans tr
join xo_prices pr on pr.item = tr.item
                and pr.price_date = (select /*+ no_unnest */  -- hint added
                                         max(pr2.price_date)
                                     from xo_prices pr2
                                     where pr2.item = tr.item  -- correlate to tr, not pr
                                       and pr2.price_date <= tr.trans_date
                                     )
where tr.trans_date between '01-AUG-2014' and '31-AUG-2014'
;

--------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                |  3423M|   111G|  1824K  (1)| 02:32:02 |
|   1 |  NESTED LOOPS                   |                |  3413K|   113M|  1824K  (1)| 02:32:02 |
|   2 |   NESTED LOOPS                  |                |  3413K|   113M|  1824K  (1)| 02:32:02 |
|*  3 |    TABLE ACCESS FULL            | XO_STOCK_TRANS |   202K|  2960K|  2531  (13)| 00:00:13 |
|*  4 |    INDEX RANGE SCAN             | XO_PRICES_IX1  |    16 |       |     2   (0)| 00:00:01 |
|   5 |     SORT AGGREGATE              |                |     1 |    12 |            |          |
|   6 |      FIRST ROW                  |                |     1 |    12 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |     3   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID   | XO_PRICES      |    17 |   340 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   4 - access("PR"."ITEM"="TR"."ITEM" AND "PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */
              MAX("PR2"."PRICE_DATE") FROM "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND
              "PR2"."ITEM"=:B2))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

Finally we can see (from the predicate for line 4) the we run the subquery at most once for each row from xo_stock_trans and we use the result of each subquery execution to drive the index range scan to pick up the matching rows from xo_prices with no further filtering. The order of operation is: 3, 7, 6, 5, 4, 2, 8, 1, 0

The only thing we can do now is decide whether the strategy for indexing into the xo_prices table 200,000 times (for our 30 day requirement) is better than a brute force approach that does a massive join and sort, or a data duplication approach that puts a “price end date” on each xo_prices row to avoid the need to check all prices for an item to find the appropriate one. Ultimately the choice may depend on trading off the human development resources against the machine run-time resources, with an eye on the number of times the query runs and the size of the date range typically involved.

Footnote:

There’s plenty more I could say about this query and how to handle it – but there are too many questions about the correctness of the data definition and content to make it worth pursuing in detail.  You will note, however, that the various execution plans which logically should be returning the same data report dramatically different cardinalities for the final row source; if nothing else this should warn you that maybe the optimizer is going to have trouble producing a good plan because it’s model produced a bad cardinality estimate at some point in a series of transformations.

In fact, when I first saw this query I converted to traditional Oracle syntax (anticipating, incorrectly, a need to do something messy with hints), corrected the subquery correlation to the “obvious” choice, and put in a cardinality hint /*+ cardinality(tr 100) */ for the xo_stock_trans table, and got the execution plan that I’ve managed to produce as the final plan above.

Tactically the correlation column is the really important bit – if that can be set up suitably we just have to work around the optimizer’s arithmetic assumptions.

 

 


Securing Big Data - Part 2 - understanding the data required to secure it

Steve Jones - Wed, 2015-01-07 09:00
In the first part of Securing Big Data I talked about the two different types of security.  The traditional IT and ACL security that needs to be done to match traditional solutions with an RDBMS but that is pretty much where those systems stop in terms of security which means they don't address the real threats out there, which are to do with cyber attacks and social engineering.  An ACL is only
Categories: Fusion Middleware