Fine Grained Access Control aka Virtual Private Database (VPD) has been there since Oracle 8 and got enhanced in each subsequent version. One minor New Feature of 11.2 was the addition of the parameter UPDATE_CHECK to the DBMS_RLS.ADD_POLICY procedure. During the OCM Preparation Workshop that I deliver presently, one of the attendees asked me what this parameter is actually doing – the doc is not so clear about it – which is why I came up with this simplified example. Hope you find it useful also
SQL> grant dba to adam identified by adam; Grant succeeded. SQL> connect adam/adam Connected. SQL> create table emp (ename varchar2(5),salary number); Table created. SQL> insert into emp values ('SCOTT',3000); 1 row created. SQL> insert into emp values ('KING',9000); 1 row created. SQL> commit; Commit complete. SQL> grant create session to scott identified by tiger; Grant succeeded. SQL> grant select,update on adam.emp to scott; Grant succeeded.
The user SCOTT is not supposed to see the salary of the other employees and VPD is an elegant way to achieve that. The following technique will silently attach a WHERE-condition to statements hitting the table emp:
SQL> connect adam/adam Connected. SQL> create or replace function whoisit(schema varchar2, tab varchar2) return varchar2 as begin return '''' || sys_context('userenv','session_user') || ''' = ename '; end; / Function created. SQL> begin dbms_rls.add_policy (object_schema=>'ADAM', object_name=>'EMP', policy_name=>'EMP_POLICY', function_schema=>'ADAM', policy_function=>'WHOISIT', update_check=>true); end; / PL/SQL procedure successfully completed. SQL> connect scott/tiger Connected. SQL> select * from adam.emp; ENAME SALARY ----- ---------- SCOTT 3000
Although there are two rows in the table, SCOTT sees only his own salary! So far, this has been the same in earlier versions already. Now to the effect of update_check:
SQL> update adam.emp set ename='KING'; update adam.emp set ename='KING' * ERROR at line 1: ORA-28115: policy with check option violation
Without that parameter setting, the update would succeed – and the row would vanish for the user SCOTT as if the update would have deleted the row. Imagine the confusion of the user about that weird behavior
Talking about weird, by the way:
SQL> connect adam/adam Connected. SQL> select * from adam.emp; no rows selected SQL> connect / as sysdba Connected. SQL> select * from adam.emp; ENAME SALARY ----- ---------- SCOTT 3000 KING 9000 SQL> grant exempt access policy to adam; Grant succeeded. SQL> connect adam/adam Connected. SQL> select * from adam.emp; ENAME SALARY ----- ---------- SCOTT 3000 KING 9000
That was funny, wasn’t it?
Conclusion: The new parameter UPDATE_CHECK in the DBMS_RLS.ADD_POLICY procedure restricts updates that would else lead to the updated rows to fall out of the allowed visibility for that user. Check out the old behavior by just omitting that parameter. Because: Don’t believe it, test it
Tagged: 11g New Features, DBMS_RLS, fine grained access control, security, vpd
What happens if you drop a tablespace after it is backed up ?
Here's the backup :
RMAN> backup as compressed backupset database;
Starting backup at 16-JUN-13
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/home/oracle/app/oracle/oradata/orcl/sysaux01.dbf
input datafile file number=00001 name=/home/oracle/app/oracle/oradata/orcl/system01.dbf
input datafile file number=00004 name=/home/oracle/app/oracle/oradata/orcl/users01.dbf
input datafile file number=00003 name=/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_hemant_8pnowslc_.dbf
input datafile file number=00005 name=/home/oracle/app/oracle/oradata/orcl/example01.dbf
input datafile file number=00010 name=/home/oracle/app/oracle/oradata/orcl/APEX_2614203650434107.dbf
channel ORA_DISK_1: starting piece 1 at 16-JUN-13
channel ORA_DISK_1: finished piece 1 at 16-JUN-13
piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_06_16/o1_mf_nnndf_TAG20130616T080419_8vvnzn37_.bkp tag=TAG20130616T080419 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:02:36
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 16-JUN-13
channel ORA_DISK_1: finished piece 1 at 16-JUN-13
piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_06_16/o1_mf_ncsnf_TAG20130616T080419_8vvo4k8w_.bkp tag=TAG20130616T080419 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 16-JUN-13
Here's the subsequent DROP TABLESPACE :
SQL> drop tablespace APEX_2614203650434107 including contents and datafiles;
What happens when I try to LIST the BACKUP of the datafile / tablespace ?
RMAN> list backup of datafile 10;
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-03002: failure of list command at 06/16/2013 08:07:44
RMAN-20201: datafile not found in the recovery catalog
RMAN-06010: error while looking up datafile: 10
RMAN> list backup of tablespace APEX_2614203650434107;
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-03002: failure of list command at 06/16/2013 08:12:12
RMAN-20202: Tablespace not found in the recovery catalog
RMAN-06019: could not translate tablespace name "APEX_2614203650434107"
What does a full LIST BACKUP OF DATABASE show ?
RMAN> list backup of database ;
List of Backup Sets
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
4 Full 799.58M DISK 00:02:30 16-JUN-13
BP Key: 4 Status: AVAILABLE Compressed: YES Tag: TAG20130616T080419
Piece Name: /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_06_16/o1_mf_nnndf_TAG20130616T080419_8vvnzn37_.bkp
List of Datafiles in backup set 4
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
1 Full 14093203 16-JUN-13 /home/oracle/app/oracle/oradata/orcl/system01.dbf
2 Full 14093203 16-JUN-13 /home/oracle/app/oracle/oradata/orcl/sysaux01.dbf
3 Full 14093203 16-JUN-13 /home/oracle/app/oracle/oradata/orcl/undotbs01.dbf
4 Full 14093203 16-JUN-13 /home/oracle/app/oracle/oradata/orcl/users01.dbf
5 Full 14093203 16-JUN-13 /home/oracle/app/oracle/oradata/orcl/example01.dbf
10 Full 14093203 16-JUN-13
11 Full 14093203 16-JUN-13 /home/oracle/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_hemant_8pnowslc_.dbf
Datafile 10 appears as a NULL entry. It cannot be restored as it no longer belongs to the database.
A RESTORE obviously fails :
RMAN> restore datafile 10;
Starting restore at 16-JUN-13
using channel ORA_DISK_1
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-03002: failure of restore command at 06/16/2013 08:15:32
RMAN-20201: datafile not found in the recovery catalog
RMAN-06010: error while looking up datafile: 10
So, there you have it. Once a datafile doesn't belong to the database it cannot be restored.
If you manage Oracle on Windows, you probably have wondered why it is so difficult to work out what Oracle instances are running and which ORACLE_HOMEs they use. On Unix or Linux, this is a very simple task. Oracle services and their ORACLE_HOMEs are listed in the oratab file, located in /etc/ on most platforms, and in /var/opt/oracle/ on Solaris. To find what is running, we would usually use the ‘ps’ command, and pipe it through grep to find and running PMON processes.
On Windows, it just isn’t this easy. Each Oracle instance runs in a single monolithic oracle.exe process. Nothing about the process indicates the name of the instance. When we want to find all of the configured Oracle services, we can use the ‘sc’ command, and pipe the results through find (I have added emphasis to the ASM and database instances:
C:\> sc query state= all | find "SERVICE_NAME" | find "Oracle" SERVICE_NAME: Oracle Object Service SERVICE_NAME: OracleASMService+ASM1 SERVICE_NAME: OracleClusterVolumeService SERVICE_NAME: OracleCRService SERVICE_NAME: OracleCSService SERVICE_NAME: OracleDBConsoleorcl1 SERVICE_NAME: OracleEVMService SERVICE_NAME: OracleJobSchedulerORCL1 SERVICE_NAME: OracleOraAsm11g_homeTNSListener SERVICE_NAME: OracleProcessManager SERVICE_NAME: OracleServiceORCL1 SERVICE_NAME: OracleVssWriterORCL1
For any one of these services, you can get the current state with ‘sc query’, and the path of the ORACLE_HOME it is using with ‘sc qc’.
C:\> sc query OracleServiceORCL1 SERVICE_NAME: OracleServiceORCL1 TYPE : 10 WIN32_OWN_PROCESS STATE : 4 RUNNING (STOPPABLE, PAUSABLE, ACCEPTS_SHUTDOWN) WIN32_EXIT_CODE : 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0 C:\> sc qc OracleServiceORCL1 SERVICE_NAME: OracleServiceORCL1 TYPE : 10 WIN32_OWN_PROCESS START_TYPE : 3 DEMAND_START ERROR_CONTROL : 1 NORMAL BINARY_PATH_NAME : c:\oracle\product\11.2.0\db\bin\ORACLE.EXE ORCL1 LOAD_ORDER_GROUP : TAG : 0 DISPLAY_NAME : OracleServiceORCL1 DEPENDENCIES : SERVICE_START_NAME : LocalSystem
As you can see, the ORACLE_HOME, and SID are visible on the line labeled ‘BINARY_PATH_NAME’. Once you have this information, you can set your environment accordingly. It might even be worth your time to write a simple script to do this for you. Maybe you could call it ‘oraenv’!
C:\> set ORACLE_SID=ORCL1 C:\> set ORACLE_HOME=c:\oracle\product\11.2.0\db C:\> set PATH=%ORACLE_HOME%\bin;%PATH%
One of the biggest selling features of Oracle’s flagship engineered system – Exadata – is the storage offloading capabilities. Storage offloading allows data to be filtered as soon as it is read from storage, reducing amount of data that database hosts need to process. The storage layer is a shared-nothing MPP architecture, and in theory can scale linearly to extremely large data volumes through Exadata Storage Expansion units.
Offloading is applicable only to full table scans and fast index scans. Index access paths that rely on following B-TREE pointers (RANGE SCAN, FULL SCAN) use block access, thus do not benefit in any way from storage offloading.
Offloading performs the following filter operations:
- Projections – extracting only the columns which are present in the select list
- Predicate Filtering – applying all fixed predicate filters like “city=’Boston’” or “city=:the_city”. Bind variables are supported – but must match datatypes. Otherwise implicit conversion will happen. Oracle’s Data Mining scoring can also leverage this feature.
- Join filtering via bloom filters – this only happens for parallel operations and it’s a CBO (Cost Based Optimizer) activated decision
Join filtering at the storage layer in Exadata is implemented via bloom filters. Here’s a nice presentation explaining the concepts of bloom filters. There are many types of, variations and approaches to implementing bloom filtering. Bloom filtering functionality actually predates Exadata, being first introduced in Oracle 10g. They were utilized to reduce the amount of traffic between parallel query processes when joining tables. This functionality was extended to the Exadata Storage layer, but still requires parallel query, in contrast to other Exadata-specific filtering that happens for serial queries too. The decision to use bloom filtering is done by the Oracle optimizer, and is based on estimated CPU and I/O usage. As with all other optimizer decisions, the Oracle cost-based optimizer (CBO) can get it wrong. If the CBO uses bloom filters when not needed, there’s a slight increase in CPU usage with minimal memory overhead. If the CBO does not use a bloom filter-based join when it should, you may get dramatic performance difference.
The Test Data and System
In this post I will be using my favorite CKK test table. It is a table with 10 million rows (10,000 * 1,000) with 2 rows per block. I have a couple of columns in this table that allow various tests, for example mod5_id is a column that has the values 0 to 4 and can be used in various tests. It is built as follows:
create table ckk nologging tablespace ckbig as select rownum id, mod(rownum,5) mod5_id, mod(rownum,5000) mod5000_id, sysdate dt_fixed, sysdate - rownum/24/60 dt_dec, sysdate + rownum/24/60 dt_pos, sysdate + ora_hash(rownum,65,535)/24 dt_rand, sysdate+mod(rownum,10) dt_mod10, rpad('x',3500,'x') filler from (select rownum r from dual connect by level <= 10000) r1, (select rownum r from dual connect by level <= 1000) ;
I am running these tests on an Exadata X2-2 quarter rack with high-capacity disks. This system can read at 4 GB/sec from disk and 14.5 GB/sec from disk and flash combined. These are the numbers I calculated during my testing. Given that my CKK table is 39 GB it can be scanned in about 10 seconds from disk and in about 3.5 seconds from flash+disk. Note that to use flash+disk the table needs to have the
cell_flash_cache keep attribute defined.
Warming up the cell flash cache and confirming scan speeds:
select /*+ FULL(c) parallel(4) */ count(*) from ckk c; --> ~4.5 seconds
Given that I use a relatively low parallelism level I get 4.5 seconds. This is good enough for my testing.
Creating a simple 2-row table for joining:
create table small (id number); insert into small select rownum from dual connect by dual <=2; commit;
First join experiment:
select /*+ FULL(c) parallel(4) */count(*) from ckk c join small s on (s.id = c.id); -->; 4.1 seconds -- cell IO uncompressed bytes - 40,960,229,376 -- cell physical IO interconnect bytes - 645,974,512 -- CPU used by this session - 4.06 seconds
Notice column projection at work here. The table is 39 GB, however since I am doing only a count(*) and joining on one key only, the storage layer returns only the “rowid” and “id” columns. We have a reduction in data volume of nearly 99%. Projections really make a difference in storage layer traffic.
Next: a proper join experiment
select /*+ FULL(c) parallel(4) */ * from ckk c join small s on (s.id = c.id); --> 12.8 seconds -- cell IO uncompressed bytes - 41,249,734,656 -- cell physical IO interconnect bytes - 37,131,346,192 -- CPU used by this session - 27.64 seconds -> 27.64 / 4 = 6.9
In above example, we ask for all data from both tables – as a result we send much more data to the client session. I am not sure why uncompressed bytes is nearly 300 MB higher, but I am not going to investigate this now. Notice that we still get some data-size savings from sending a rowset as opposed to individual data blocks. Not all data blocks are full, and there’s metadata in the blocks. By sending a rowset, we can avoid sending this data to the database servers.
Also note the increased execution time – it takes time to transfer all that data – at about 3.2 GB/sec (single 40mbit infiniband) per NODE – this gives me a 6.4 GB/sec consumption rate. At this rate it should have run in about 6 seconds. Why the extra delay?
Well it takes time to process this data. At 4 CPUs we can only process so much – if we increase parallelism, we can reach higher speeds:
select /*+ FULL(c) parallel(8) */* from ckk c join small s on (s.id = c.id); --> 8.1 seconds -- CPU used by this session - 29.67 -> 29.67/8 = 3.7
We need about 28 CPU seconds to receive 37GB of data, process it, and join it to our small table. As we increase parallelism, the “penalty” for receiving large amount of data reduces, but we end up consuming more resources on the database nodes.
select /*+ FULL(c) parallel(16) */* from ckk c join small s on (s.id = c.id); --> 6.8 seconds -- CPU used by this session - 29.52 -> 29.52/16 = 1.8
As we reach parallelism 16, we are no longer limited by the CPU processing, but by the infiniband network transfer time.
A quick recap:
- 4.1 seconds to read data
- 6.0 to transfer the data
- 1.8 seconds (at parallel 16) to process the data
These things all happen at the same time, as data is been streamed.
In this case, no join offloading is happening. This is one of those cases where the CBO decided it’s not worthwhile. We can use a PX_JOIN_FILTER optimizer hint to force it though:
select /*+ FULL(c) parallel(4) PX_JOIN_FILTER(c)*/* from ckk c join small s on (s.id = c.id); --> 0.3 seconds -- cell IO uncompressed bytes - 57,098,240 -- cell physical IO interconnect bytes - 1,808,488 -- CPU used by this session - 0.23 seconds
Whoa. What happened here? How come we joined a 39 GB table with no indexes in 0.3 seconds, by reading 57 MB and sending back 2 MB back database layer? A few things:
1. When join filtering is happening at storage layer, Oracle can leverage storage indexes and skip entire regions of the table. In this case – “cell physical IO bytes saved by storage index – 40,903,131,136″. The exact mechanism of this is not clear to me, but there are technical articles on the internet about range based bloom filters.
2. Join filtering eliminates most of the data transfer to database servers, since only 2 rows match and they are pretty much on the same block – most of the data sent back is probably metadata of “empty”.
As you can see, a simple optimization can have dramatic effect in both wallclock time and overall system resource utilization. In this case we did not perform 39 GB of I/O. We did not send 37 GB of data over the InfiniBand network. And we did not use nearly 30 CPU seconds to receive and process. All due to a join filter optimization available exclusively to Exadata.
Here’s the explain plan for reference:
-------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 7114 | 376K (1)| 00:00:15 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 2 | 7114 | 376K (1)| 00:00:15 | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | 2 | 7114 | 376K (1)| 00:00:15 | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 2 | 6 | 2 (0)| 00:00:01 | Q1,01 | PCWP | | | 5 | PX SEND BROADCAST | :TQ10000 | 2 | 6 | 2 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST | | 6 | PX BLOCK ITERATOR | | 2 | 6 | 2 (0)| 00:00:01 | Q1,00 | PCWC | | | 7 | TABLE ACCESS STORAGE FULL| SMALL | 2 | 6 | 2 (0)| 00:00:01 | Q1,00 | PCWP | | | 8 | PX BLOCK ITERATOR | | 10M| 33G| 376K (1)| 00:00:15 | Q1,01 | PCWC | | |* 9 | TABLE ACCESS STORAGE FULL | CKK | 10M| 33G| 376K (1)| 00:00:15 | Q1,01 | PCWP | | -------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("S"."ID"="C"."ID") 9 - storage(SYS_OP_BLOOM_FILTER(:BF0000,"C"."ID")) filter(SYS_OP_BLOOM_FILTER(:BF0000,"C"."ID"))
The key element that shows bloom filters will be offloaded is step 9 – and the function storage(SYS_OP_BLOOM_FILTER(:BF0000,”C”.”ID”)) – this means data will be offloaded at storage layer.
In most cases the speed difference will not be this dramatic. It is very dependent on physical data ordering. However armed with that knowledge that such dramatic optimizations are possible, you can build an ETL process, for example, to take advantage of this. Or have the ETL prepare the data so that, when queried, these optimizations can be leveraged.
Here’s a slightly different query – notice the join key is now mod5000_id – which spreads values from 0 to 5000 equally in the entire dataset
select /*+ FULL(c) parallel(4) PX_JOIN_FILTER(c)*/* from ckk c join small s on (s.id = c.mod5000_id); --> ~8.8 seconds (includes all fetching) -- cell IO uncompressed bytes - 9,047,941,120 -- cell physical IO interconnect bytes - 859,995,712 -- CPU used by this session - 0.97
In this case, we still benefit significantly from storage indexes. We only had to read about 1/4 of the data – 9GB. Projections and bloom filtering further reduced it to 850 MB that were finally sent to the database host, which in turn produce some 4,000 rows at about 4,000 bytes each – 16 MB to the end client.
There are some restrictions in bloom filtering, for example, they don’t work with outer joins. Why is this the case? In inner joins any table can be “the second” table and bloom filters applied to it. In Oracle outer joins can also be performed in any order. However even in the case where the outer joined table is second, bloom filters appear to be unusable.
The CBO is your best friend and your worst enemy. As you saw in my simple example, there are obvious cases where join-based bloom filtering should happen, but doesn’t. I suspect it has something to do with the original purpose of join bloom filters, and the way costing was implemented.
And finally – bloom filters are based on hash values and thus very data-dependent. If you have skewed data and there are some significant hash collisions with popular values – this will make the join filtering much less efficient at the storage layer.
A blog post is a composition in an informal verse, usually characterized by the sharing of ideas, experiences, and opinionated news. This vivid explanation of valuable ramblings about the database technologies is what makes this Log Buffer Edition again.
Owen Allen got a question about the Management Credentials that are used for managing assets in Ops Center.
Nicolas Gasparotto carries on with his Prompt-free Appliance deployment series.
An introduction from the Cubegeek: Big Data – Part One.
When you create a table with a primary key or a unique constraint, Oracle automatically creates a unique index, to ensure that the column does not contain a duplicate value on that column; or so you have been told.
Big Data, the cloud and mobile computing are reshaping the office of the CFO.
Eli Weinstock-Herman rants; SQL is hard. Who’s to say whether it’s harder for the person that has no technical background or the one that is comfortable with object oriented, procedural, or functional styles and has to cross the great divide to set-based, declarative queries.
So with Denis, we are going to try to figure out how many uppercase, how many lowercase characters are in a column, we are also interested in how many are neither uppercase or lowercase.
Valentino Vranken is answering a question related to formatting certain things for display in a Reporting Services report.
Scary DBA is praising and giving suggestions for AlwaysOn, a new addition to the SQL Server 2012.
K. Brian Kelley was working with an auditor today who is working through a system with an external audit agency.
MySQL Workbench 6.0 is the new major update of the Development and Administration tool for MySQL. Michael Zinner informs.
Sean Hull shares has how to Optimize MySQL UNION For High Speed.
Be careful while executing DCL with replicate-ignore-db=mysql. Ramesh Sivaraman suggests.
Dathan Pattishall is in China and Spreading mySQL/MariaDB/XtraDB Ganglia, GearmanD, Memcache, MongoDB, HAProxy, Nginx, PHP, and Python.
So it has happened at last. MariaDB replaces MySQL in RHEL7.
Someone just asked at Pythian’s internal forum the following question: “The team here is evaluating DB load testing tools (Hammerdb, Orion, SLOB, and Swingbench) and was wondering about our experience in using different tools and what is our opinion?” I have experience using some the tools mentioned. Therefore decided to answer using this public blog post as it could be useful for you too and easier for me to reference it in the future :)Right tool for the right job
Most of the load testing tools could be grouped in the following categories:
- IO subsystem testing tools
- RDBMS Level testing tools
- Application Level testing tools (DB Side)
- Application Level testing tools (Apps Side)
Each of the next categories addresses the wider range of business testing problems/tasks. However each of the next category will requiter slightly more resources and involves a bit more challenges. You should clearly define your testing goals before starting to consider the right tool for the job. I would suggest that you consider tolls from the first category first than move to next category if tools in the first category doesn’t satisfy your business requirements. I hope that the overview will give you enough input to start putting your testing plans together.IO subsystem testing tools
Those tools are easy to configure and run. However those are doing nothing but sending “simple/meaningless/no real data” IO requests to a storage device and measure response times. Most tools have the abilities to run read only tests, read write tests, random IO tests, sequential IO test, increase readers/writers count, warm up storage’s cache, and do a mixture of test mentioned.
- Easy to run
- Short learning curve ( add 2-4 hours to first few runs for an initial learning)
- Low budget (4-16 hours should be enough for a testing project)
- Easy to get and reproduce results
- Easy to compare results with other platform
- May not represent your RDBMS or Application IO patterns
- Doesn’t test anything else but Physical IO (fair enough)
- orion (ORacle IO Numbers) – This tool is developed and maintained by Oracle. it is part of the latest Oracle DB distributions (e.g. 126.96.36.199). Orion was created to to test Oracle DB IO workload. It uses some of the typical system calls that Oracle database kernel uses to retrieve and write data from/to data files.
- Experience: I do have a lot of experience myself running it and compare different storage’s performance. We do have other DBAs at Pythian who experienced running the tool. However the best guru is my friend and Pythian’s CTO Alex Gorbachev. Nobody know more about orion utility than Alex does :)
- winsat - Modern Microsoft Windows distributions have winsat utility that is capable to do some basic IO testing.
- iometer - My friend (Artur Gaffanov) suggested iometer as alternative tool from the same category. This utility originally has been developed by Intel and is currently maintained by an Open Source Community.
- Experience: I do not have personal experience with the tool. However a quick search though Pythian Knowledge Base (Support Track) retrieved several references. I would say there is some experiences at Pythian.
- References: http://www.iometer.org/
This group of tools works from within an Oracle Instance. Therefore some people (I bet if they read this post they know exactly who I am referencing to ;) say these groups of tool could reproduce the RDBMS IO calls much closer than the fist group. I may not 100% agree that the first group doesn’t imitate RDBMS calls close enough, but there is other benefits that this group of tools provides. Some of them could be used to test other system resources’ performance like CPU and memory.
- Use RDBMS calls to test system resources (uses database the same way as your application does)
- Can be used to test all main system’s resources (CPU, Memory, IO)
- Relatively easy to run
- Short learning curve ( add 4-8 hours to first few runs for an initial learning)
- Relatively low budget (1-3 days should be enough for a testing project)
- May not represent your Application IO patterns
- Need a good basic understanding of RDBMS concepts
- You may need to spent a bit more time verifying and adjusting an instance/test’s configuration to ensure that it tests exactly what you want to test
- It would take you more time to run the same number of test than using a tool from the first group.
- You may need to spend a bit more time to configure your Oracle Instance to test performance the way you want
- You need an Oracle Database to be up and running to use those testing tools
- SLOB – The Silly Little Oracle Benchmark from Kevin Closson. The framework uses a simple and typical database operation to put some load on a system. Depending on the amount of memory you allocate to data cache it would test ether CPU and memory (Logical IOs) or Storage (Physical IO). The tool uses index range scan and table blocks look-ups.
- Experience: I did spent quite a bit of time running tests using SLOB utility. There are quite several folks around the world who are using SLOB for testing Oracle systems’ performance. From my experiences I should say that it takes a bit of time to understand the things to be careful with during testing. However as soon as you know what are you doing it takes less time time to test and compare different systems’ performance.
- Oracle Database I/O calibration feature - Some people knows it as DBMS_RESOURCE_MANAGER.CALIBRATE_IO. Strictly speaking the procedure tests physical IO only. However if you add DBMS_STATS.GATHER_SYSTEM_STATS that collaborates CPU speed with a certain skepticism we can add those tools to this category of testing tools. As both procedures works within an Oracle Instance. The biggest disadvantage here is the lack of details on how the tools run tests. Those are limited from configuration perspective too. As for example you don’t have a good control over what data files the IOs are issues against.
Before going any further I must mention that this category includes tools that requires more time to setup, test and get comparable results. I have been part of several projects that targeted to mimic application behavior. Depending on a tool, application complexity and results you would want to archive a project from this category may take anything from ~20 hours to several months.
For the interest of keeping this blog post reasonably short I mention some of the tools from this category below with a few comments. If you consider using one of the tools from this category I would encourage you to run a pre-study that can take from several hours to several days itself.
- Hammerora – this is a free multi databases testing tool. Originally it have been created to run TPC-C and TPCH application workloads. Today it supports application activity replay functionality (for Oracle Databases). I have used this tool to run TPC-C tests in the past. Typical use case would be to run set of test on two different platforms to get some general ideal on difference in performance characteristics. You may not get as precise results as with previous group of tools. However you may get a reasonably good idea on general performance comparison.
- Swingbench – from Dominic Giles (Oracle UK). This is a free tool similar to Hammerora that has a set of Supplied Benchmarks you may use to test performance of your Oracle based system. It allows you to build your own basic testing scripts. Originally the tool was developed to test Oracle RAC based systems. Dominic’s presentation gives good overview of the tool. Several folks at Pythian have been using this tool. My team members configured and ran the tests recently to compare source (old Solaris) and target (VM based Linux) system.
- Simora – from James Morle and Scale Abilities. This tool mines Oracle SQL Trace files and Generates SQL to be executed to reproduce the load. Obviously you can take and replay the application load on a copy of the system where the trace files have been generated. As a good use case you may consider using it for testing application across Database version upgrades. As with Hammerora such testing projects need careful planning (how and when to recover database to get the right data, how to synchronize data updates to make sure that no application constraints are violated etc). I would estimate such testing project in anything starting from a week to 8 weeks depending on the complexity and other parameters.
- Oracle Real Application Testing – This is an Enterprise Database option from Oracle that allows to record a load on source system and reply on the destination environment. I have tried this functionality several times in a test environments. It works well. However it RAT based projects have common challenges as other two tools. It worth mentioning that the product isn’t free and needs additional licences. Saying that I have heard about few clients who successfully used the product in their testing/migration projects.
Testing tools in this category mimics end users’ behavior as users would work with the system from their workstations. That means that tools like HP LoadRunner interact with application servers using variety of protocols, e.g. HTTP, HTTPS, Oracle Forms, etc to test all components of the system at the same time. Needless to say that in order to do it a test scenarios need to be scripted, test data prepared, tested and maintained based on the changes in the application side. I personally was part of such project. The project length was several months. However we have archive a good results and spotted quite several critical inefficiency in custom application code.Conclusions
As I mentioned at the beginning of this post you need to find the right testing tool for the task you have given. I hope that this overview help you to get some idea on options available and resources you may lineup for your testing activities. I would appreciate if you share your experience with the any of the tools mentioned or, even better, mention some good tools that I didn’t listed in this blog post in the comments section below.
Here is my response to that question:
Standby is standby. It is a technique to support disaster recovery. And it is still called Standby Database, not Data Guard, even now.
For a long time, in order to automate disaster recovery technique people have written scripts. For Oracle Database Server, Laurence To and the Oracle SPG group assembled a number of the scripts back with Oracle7 and Oracle8 and released then as a feature of the Enterprise Edition called Data Guard that initially only consisted of the 'best practices' scripts. The core feature was, and still is, available at no additional cost.
Data Guard has since progressed significantly and become more of a set of executables, rather than scripts. But the primary purpose still is to automate the steps of syncronizing the standby and automating the switchover/failover.
Standby is standby. With Oracle Database Server, it consists of two databases: the first or primary actively handling transactions and query requests; the 'standby' being available to take over the load if the primary fails.
Over the years, we in the industry have refined the term to distinguish between Cold and Hot standby, the difference being in how much effort is involved, and how quickly the standby environment is available for use.
A Cold Standby environment may have the software installed, but the environment does not use any CPU cycles to keeping the data in sync. In general, that will require some sort of restore from backups. Since the Cold Standby does not use CPU cycles, Oracle has not traditionally charged for it.
A Hot Standby environment keeps the data in sync fairly closely to the primary. The more similar the standby environment needs to be to the primary at the data and configuration level, the more it will cost to do that, and the more complicated the sync method needs to be. The Hot Standby does use CPU cycles, and therefore must be licensed the same way as the primary unless you have an exception within YOUR Oracle license contract.
Oracle database server - whether Express Edition, Standard Edition or Enterprise Edition - has the ability to perform crash and media recovery from intact redo log files. Oracle's hot standby capability is simply continuous media recovery. However that requires the redo information from the primary to be sent to the standby, when it is available and it requires the standby to apply the redo once it has arrived.
The Enterprise Edition feature called Data Guard is simply a 'guardian application' that detects when the redo information is available, extracts it, transmits it, and controls the application at the standby system(s). What it does can also be done manually, or through your own scripts. Indeed, in Standard Edition, DbVisit (http://www.dbvisit.com) has created their own commercially available executable that does the same thing and more.
Data Guard has been enhanced to allow several 'levels' of similarity, from "minimum data loss" through "absolutely no loss permitted". What used to be scripts is now compiled executables with many test points and with the ability to control the database server.
And the database kernel has been modified to allow the standby server to be opened in read-only while applying the redo information which may happen under the control of the Data Guard application. This is called Active Data Guard, and it DOES require additional licenses.
Also check out the Software Investment Guide at http://www.oracle.com/us/corporate/pricing/index.html
And remember: the final authority is Oracle, not me. "I read it on the Internet" is a shoddy defense in a contract dispute and will likely NOT be accepted by the Judge in a Court of Law.
Relational Theory for Computer Professionals What Relational Databases Are Really All About by C.J. Date
I was talking about a book tittles Relational Theory for Computer Professionals What Relational Databases Are Really All About By C.J. Date.
This is a good book for some people who are interested in Relational Database, and who want to know what exactly Relational Database is all about. Anyway, readers should have a basic about relational database. If You don't have... you should read other books from C.J. Date before. Why I said like that, I think this book not easy to read if you don't have basic and no idea about relational database. By the way, you might read "Free Sampler" before.
A book has 3 parts, Part I, "Foundations" is concerned with the relational model. Part II, “Transactions and Database Design,” is devoted to a discussion of certain material that (a) you do need to know in order to understand what databases in general, and relational databases in particular, are all about, but (b) doesn’t really have all that much to do with the relational model. Part III, “SQL,” then revisits the material of Part I and shows how the concepts discussed in that first part are realized in the SQL language.
In a book, after you finished for reading each chapter you can see exercise to help you to improve your understanding, that's very cool anyway. This book is useful to use in college, but I think that should be advance course because it need basic about relational database and sql. Something pops up in my head ... watch Relational Theory for Computer ProfessionalsA Hands-On Video WorkshopBy C.J. Date, that will help you too much in a book.
Written By: Surachart Opun http://surachartopun.com
To be precise I wonder if OUTERmost tracks of a spinning HDD are faster than INNERmost tracks? Should we put physical IO performance secretive data to OUTERmost parts of the disk and less critical data to INNERmost parts as several vendors suggests? Well I didn’t find a better solution than grab all HDDs I have had and start testing :) Yes! It is a work in progress project ….Disclaimer
Before you start criticize my testing results I would like to make a few points clear:
- I am not an expert in the hardware space
- I am just a curios Oracle Administrator opened to any suggestions on how to improve testing results and get closer to clarifying things
- Some of you will say that it is a useless exercise as nowadays we don’t have any control over what areas of a single HDD data is placed
- Well in some rare configurations like Exadata or Oracle Database Appliance we actually have this power and can possibly impact the IO performance
- In other cases sometimes it is good to understand how things works to explain why there is a certain performance impact
- As I am an Oracle DBA and Oracle databases most often are random IO bounded, I have focused my attention on random (8k) IO testing
- I do believe there is a better ways to test HDDs. Unfortunately I don’t have enough knowledge about other options. I am open to your suggestions on how to do it in a better way.
- Just keep in mind that I have Windows 7 (64 bit), Dell Latitude E6410 for this testing
- At the moment I am waiting on SATA adapter to arrive. I will re-run some of the tests I did to confirm or adjust the testing results.
- This is a work in progress project and I am not ready to make a final conclusions ( if will be ready to make those at all ;)
Based on previous experience I expect that
- OUTERmost tracks will not be too faster than INNERmost tracks
- The worst performance should be when data is accessed from both OUTERmost and INNERmost tracks at the same time
I have focused my attention to 3 tests:
- Data on OUTERmost tracks
- Data on INNERmost tracks
- Data distributed equally through full HDD surface
Note: I have made a fourth test where I run random IO test accessing data from OUTERmost and INNERmost tracks at the same time. The test results were very close to the full surface tests. Therefore I do not provide those in this blog post.How and what did I test Hardware
To start I took 7 HDDs that I happened to have and 2 SATA to USB adapters.Software
I have used 2 options to test and confirm IOPS results:
- Windows 7 have a nice little (; silly ;) utility winsat. It didn’t take me too long to figure out how to make it do random 8k reads
winsat disk -ran -ransize 8192 -read -drive E
- Oracle 11GR2 comes with Oracle native orion (ORacle IO Numbers) utility. I just installed it and used the command below to test random IOs
orion -testname e_hdd -duration 20 -cache_size 0 -write 0 -num_disks 1 -run advanced -size_small 8 -matrix row -num_large 0 # e_hdd.lun \\.\e:Assumption
To make things simple I have assumed that HDD controller (or windows, or who ever, … remember I am not an expert) allocates space starting from OUTERmost (fastest) tracks of the disk. Therefore I have allocated 1GB of unformatted space to E: drive first, then filled all space but last 1GB with an empty partition and created a 1GB G: drive. My assumption is that partition/drive E: is located on OUTERmoston tracks but partition/drive G: on the INNERmost (slowest) tracks. As you will see from the results below this assumption isn’t correct for all HDDs.
Please let me know (using comment section bellow) if there is a better way to ensure that a partition is created on OUTERmost tracks of HDD or at least how to check what tracks the partition is created on (if it is possible).Results “Good” results
The following reflects IOPS testing for 3 HDDs that confirmed my expectations. It is clear that there is no less than 10% gain between OUTERmost and INNERmost tracks. However there is a significant performance impact if a HDD’s head should move across whole HDD surface to access data.HDD Name Outer Inner Full Outer vs Inner Outer vs Full WD 2.5″ 1TB 5400RPM / WDBBEP0010BRD 127 121 64 4.96% 98.44% Hitachi 3.5″ 320GB 7200RPM / HDT725032VLA380 133 124 58 7.26% 129.31% WD 2.5″ 160 GB 5400RPM / WD1600BEVT 112 103 61 8.74% 83.61% “Other” results
A careful reader would notice that I didn’t provide all test results so far and you are right. The reason being the rest of the results doesn’t confirm my theory :). Have a look on the other 3 HDDs test results below (Note one of the 7 HDDs have had data on it therefore I have excluded it from OUTERmost and INNERmost tracks testing).HDD Name Outer Inner Full Outer vs Inner Outer vs Full HGST 2.5” 1TB 7200RPM / AT-0J22423 107 132 69 -18.94% 55.07% Segate 2.5” 250GB 7200RPM / ST9250410AS 88 71 63 23.94% 39.68% Seagate 3.5″ 1TB 7200RPM / ST31000333AS 141 92 69 53.26% 104.35%
The 3 HDDs above ether did show results that I can’t explain (AT-0J22423) as of now or performance difference in between OUTERmost and INNERmost tracks is more significant than for the first set of HDDs. However for both sets it is kind of clear that if a HDD moves head across whole surface there is a significant performance penalty. Those are expected result, aren’t those?Intermediate conclusions
There are some HDDs models where there is no significant performance difference between accessing data located on OUTERmost or INNERmost tracks. However in some cases IO performance could be by 130% slower if HDD header travels across whole surface to return data.
This have 2 possible practical implications:
- If someone states that a partition is created on a OUTERmost tracks of a HDD it isn’t necessary mean IO are significant faster from that partition than from any other region of the HDD
- It could significantly slowdown IO operations if data to be accessed from both OUTERmost and INNERmost tracks (e.g. if DATA is located on OUTERmost tracks but FRA on INNERmost tracks)
- You may find that your storage performance degrades more “active” data you put on your hard drives (fill hard drives)
Keep in mind that there are possible exceptions. Based on my initial tests there some HDDs models where the difference between OUTERmost and INNERmost tracks is significant.Please help to improve the test results
As I stated at the begging of this post a) I am not an expert in this space b) This is a work in progress project and I am looking for better ways to test random IOs c) I need your help to understand why there are exceptions and suggestions on how to adjust the HDDs testing process to improve the test results and get closer to good conclusions.
I’ve been thinking about how to use my blog to describe different individuals who have helped me learn about Oracle performance tuning. I was thinking about having a single blog entry or page describing several people but have decided that will be too long and overwhelming to read or to write. So, I got the idea of writing about a single individual at a time. Now, the challenge about writing about someone is that this is only my perspective of them and I could have some details wrong. So, if there are any errors or negative implications then that is on me and no reflection on the individual I’m describing. I’m including this person because their work has helped me in my job as an Oracle DBA in the performance tuning area so the intent is to be positive and to encourage other people to benefit from their work. In most cases these individuals have free or inexpensive materials available that can be of great worth to an Oracle DBA who is working on tuning.
So, I’m starting with Craig Shallahamer because he was the first individual who really helped me learn about Oracle performance. I started as an Oracle DBA in December of 1994 while working with Campus Crusade for Christ. At CCC we migrated all of our applications off of a small IBM mainframe and on to an Oracle database running PeopleSoft applications. But, at some point our new donations system was having bad performance. It was slower than the mainframe system we had just replaced. A couple of managers in our IT area ran into Craig and connected him up with me. It ended up that we were saturating one of our disk arrays with random I/O while our other arrays were underutilized. Craig’s free papers on Oracle waits and the use of operating system monitoring utilities such as sar -d really helped me understand how to diagnose our disk I/O issues.
If you want to check out Craig’s resources go to his OraPub website. I have read several of his papers – most notably ones about Oracle waits and about response time/queuing theory. Also, I’ve learned from his free tool called OSM which records V$ table values and operating system tool outputs. Craig sells a new tool called Stori which I know he is excited about, and he has a number of other free tools as well. I haven’t been able to take one of his paid courses but I’ve read the materials for his firefighting class several years ago and it was very helpful. I also read Craig’s first book which is on forecasting performance.
It is hard to do justice to someone’s work and I’m not sure I’m explaining it that well. I think of Craig in terms of operating system/hardware, Oracle internals, waits, mathematical models, performance tuning theory. It has been helpful for me to just take one of his excel spreadsheets and play with the values and think about what the curve means in terms of queuing and performance. Most DBAs I know aren’t that comfortable with operating system utilities like sar and don’t know what values to look for. And, few people seem to understand queuing theory and other mathematical models of Oracle systems.
So, Craig is a good source for Oracle performance information that may not be readily available to the average DBA unless they know where to look. I hope others find his work as helpful in their jobs as I have.
Amazon Web Services has offered Relational Database Service as part of their cloud offering since 2011. These days, RDS provides easy to deploy, on-demand database-as-a-service for MySQL, Oracle and SQL Server. When you compare it to essentially any other method of hosting and licensing Oracle, RDS seems to have a variety of really appealing qualities:
With RDS/Oracle, you don’t really need a DBA to take care of your database. With the notable exception of tuning, most of the DBA tasks, such as database creation and configuration, backups, upgrades and disaster recovery are simply features of the service.
Oracle databases on RDS can be created with “license included.” This means that for certain Oracle editions, you can pay as you go for your license. Currently this is limited to Standard Edition One, but rumours abound that higher editions, including Enterprise Edition, will be available with license-included in the near future.
The Oracle versions available on RDS are limited to a few modern, stable releases. This keeps customers from encountering oddball bugs and version quirks.
So far, RDS seems like a clean, simple elegant solution, and it is. It clearly has a place with certain enterprises that use or want to use Oracle. So the question you might have is, “Is it right for me?” Since no solution is perfect for every deployment, it is helpful to explore the factors that can help you decide of RDS/Oracle will fit your needs.
Availability of technical personnel: If you already run an enterprise that employs DBAs, there may not be as great an upside to deploying a largely DBA-free solution like RDS. On the other hand, if your in-house database expertise is not deep, RDS has the advantage of low technical barriers to entry. With RDS/Oracle, provisioning, backups, monitoring, upgrades and patching are managed and controlled via the web API. The major missing component is tuning. With RDS/Oracle, you still need to have some knowledge of Oracle and SQL tuning to run a successful RDS service.
Tuning: While we are on the topic, let’s discuss Oracle tuning and RDS. As with conventionally-hosted databases, diagnostic pack (and ASH / AWR) is available and supported, as long as you are running Enterprise Edition and you are licensed for those options. AWS even provides Enterprise Manager DB Control as an option. For all other editions however, there is a major hitch. Statspack, the tried and true alternative to AWR, is not supported on RDS. You can still query the v$ views to access current and aggregated wait event information, but the lack of Statspack support is a big stumbling block. SQL tracing and event 10046 (and many other diagnostic events) are available on RDS, and a recent enhancement has provided access to these files via the web API. Previously, access to alert and trace files was via external tables and SQL only.
Privileges: RDS grants you limited management privileges for Oracle, but it stops short of the SYSDBA role, which would have total control over the system. Some applications require SYSDBA, especially during schema installation. If you have an application that absolutely cannot function without SYSDBA privileges, RDS is off the table. On the other hand, most of the application schema deployment scripts that purport to need SYSDBA privilege actually need no such thing. In many of these cases, minor changes to schema build scripts would make them RDS-compatible.
Loading/migration: Most Oracle customers are accustomed to migrating their databases from one hosting solution to another via datafile copy. In the case of vary large databases, migration by physical standby switchover is the method of choice. With RDS, there is no access to the underlying filesystem, so datafile-based migration methods are impossible. Since the only access to RDS/Oracle is via SQL Net, data must be loaded using database links. This means using Data Pump, DML over database link, or materialized views. This final option is particularly interesting. If they first pre-create all of the tables and indexes to be migrated in RDS, customers can then build fast-refresh materialized views on the tables, and continually refresh them from the source system. When the customer wants to cut over to RDS, it can be accomplished simply by stopping application access to the source, refreshing all MVs one more time, and converting the MVs in RDS back to tables by dropping the MV objects. While this method is prone to problems stemming from schema design, high rate of change and large transactions, it is likely the best and only solution to a minimal-downtime migrations to RDS/Oracle.
Database versions: If you are planning to migrate to RDS from a conventional hosting solution, and you don’t already use one of the Oracle versions supported by RDS, your migration to RDS will also amount to a database upgrade. There is nothing fundamentally wrong with this, since you will be moving to a version well tested and certified by Amazon. On the other hand, any third party (or homegrown) software will have to be checked and tested to make sure it is runs and is supported on one of the available versions under RDS. Also be aware that database upgrade can come with their share of issues. The most common upgrade issues are small numbers of SQL statements that perform worse after upgrade, because of optimizer regressions.
Backup and recovery: RDS/Oracle backs up the database using storage snapshots, and boasts point-in-time recovery. There are some clear advantages to this method. Backups complete quickly, and you can execute them as often as you want. On the other hand, because Recovery Manager is not supported, some of the nice things you can do with that tool are missing from RDS. For instance, simple small repairs such as single block, single datafile, or single tablespace recovery are impossible with RDS. Recovery using storage snapshots is an all-or-nothing proposition.
High availability and disaster recovery: On the plus side, RDS/Oracle provides a very easy way to set up redundancy across Amazon availability zones (which you can think of as separate datacenters in the same region). This configuration, called multi-zone, provides synchronous storage replication between your production RDS database and a replica in one of the other zones within the same region. For the MySQL version of RDS, the replica is readable. However, this is not so for Oracle or SQL Server. So multi-zone RDS provides redundancy for Oracle, but not a read replica. Significantly, because nearly all viable replication options are unsupported, including Data Guard (standby database) and Streams, RDS does not provide customers with a cross-region DR solution.
Limitations on features, parameters and character sets: RDS/Oracle has enabled and supports a broad range of Oracle features, parameter settings and character sets. However, a subset of each of these categories is not supported, either because of how RDS is architected, or because Amazon has not seen the demand for those things to be great enough to merit the engineering effort involved in supporting them. Depending on the needs of the application, any limitations arising from the following lists may or may not affect you.
Features supported / not supported on RDS/Oracle (note that RAC is not supported on EC2 either)
Character sets supported (note that this list does not include UTF8 or WE8ISO8859P1, AKA Latin-1, both very common)
The available parameters and allowed settings are dictated by the edition, options and version of Oracle you have deployed. The complete list can be obtained via the RDS web API.
To summarize, Amazon RDS for Oracle is a really exciting option. The right application with limited requirements might find RDS to be a totally suitable platform for hosting a database. However, as the enterprise requirements and resulting degree of complexity of the database solution increase, RDS is gradually ruled out as an option. For larger and more complex deployments, the issues around tuning, migration and HA/DR completely eliminate RDS as a solution. For these more complex cases, Amazon’s Elastic Compute Cloud (EC2) can provide a much broader range of possible solutions. I would never be one to discount anything Amazon Web Services offering. Any deficiencies I call out in a blog posting like this one will probably be made obsolete as Amazon announces new RDS capabilities.
Would you deploy your databases on Amazon RDS for Oracle? Why or why not?
The onslaught of the blogs is hard to ignore and underestimate. It is the blogging which has become part of the technologists everywhere including the database professionals and evangelists. This Log Buffer Edition appreciates that and consists of blogs from Oracle, SQL Server and MySQL.
Dass Oracle in Version 11 unter bestimmten Umständen große serielle Full Table Scans (FTS) mit Hilfe von Direct Path Reads durchführen kann, ist schon häufig erwähnt worden (hier im Blog mindestens in drei Artikeln) – zuletzt hatte Tanel Poder darauf hingewiesen, dass die Entscheidung für diesen Zugriff seit 11.2.
Michael Snow highlights an active WebCenter partner with a recent success in the transportation vertical. Read on below to learn more about Redstone Content Solutions along with a great profile of their customer, Standard Forwarding LLC.
Mdinh got over 100+ saved sessions in PuTTY running on Windows and was looking for a way to move those session to Linux.
Data Pump is much faster than the old exp and imp client commands. Data Pump has a feature that helps to make it even faster by trading performance for resource consumption (CPU); this feature is called the “parallel” option.
SQL Gone Bad – But Plan Not Changed?
Just when you thought you got a handle on SQL Server 2012, here comes the next version! SQL Server 2014 was announced yesterday during the TechEd North America 2013 Day 1 Keynote.
Melissa Coates is integrating Data from Data Explorer into PowerPivot.
How Do You Handle Changes? Chris Shaw asks.
Mike Hillwig was looking at syntax of DBCC SHRINKDATBASE today and came across a little gem.
Richard Douglas has an interview with chairman of SQLRelay 2013.
Tokutek created the iiBench benchmark back in 2008. The point of the benchmark is to measure the performance of indexed insertions over time.
Madrid MySQL Users Group worth creating?
When looking at high availability for any CMS, and particularly for Drupal, the list of contenders for part or all of the solution is growing and can be daunting.
Engine condition pushdown is a MySQL internal mechanism that is intended to avoid to send non matching rows from the storage engine to the SQL layer.
Ever wondered how a database cluster ticks? Here’s how!
Many times I come across the requirement of quickly propagating changes from cloud platforms like Salesforce.com to on-premise data stores. You can do this with webservices, but that is not middleware and it requires coding.
How about doing with a data integration tool?
+Informatica Corporation's Informatica PowerCenter can achieve this by using the CDC (Change Data Capture) feature of the Informatica PowerCenter Salesforce connector, when Salesforce is the source in a mapping.
The configuration is simple. All you really have to set up is 2 properties in the Mapping Tab of a Session Task in Informatica Workflow Manager.
These are the properties:
- Time Limit property to -1
- Flush interval property to 60 seconds (minimum 60 seconds)
And here is what these two settings mean from the PowerCenter PowerExchange for Salesforce.com User Guide:
CDC Time Limit
Time period (in seconds) that the Integration Service reads changed Salesforce data. When you set the CDC Time Limit to a non-zero value, the Integration Service performs a full initial read of the source data and then captures changes to the Salesforce data for the time period you specify. Set the value to -1 to capture changed data for an infinite period of time. Default is 0.
Interval (in seconds) at which the Integration Service captures changed Salesforce data. Default is 300. If you set the CDC Time Limit to a non-zero value, the Integration Service captures changed data from the source every 300 seconds. Otherwise, the Integration Service ignores this value.
That's it, you don't have to configure anything else!
Once you set up these properties in the mapping tab of a session, save and restart the task in the workflow, the task will run continuously, non stop. The connector will poll the Salesforce org continuously and propagate any changes you do in Salesforce, downstream to the premise database system, including INSERT, UPDATE and DELETE operations.
SFDC CDC implementation in Informatica PowerCenter
When I heard that Intel announced their own Hadoop distribution, my first thought was “Why would they do that?”. This blog post is an attempt to explore why would anyone need their own Hadoop distribution, what can Intel gain by having their own and who is likely to adopt’s Intel’s distribution.
Why does anyone need an Hadoop distribution? Hadoop is open source, and it would make sense that RedHat and Canonical would package Hadoop and add it to their own distribution – just like they do to MySQL and other open source applications. Instead, we have Cloudera, Hortonworks, MapR, EMC, Intel and probably many more, each with their own Hadoop distribution.
When you try to pick an Hadoop distribution, the first thing you’ll notice is that each one has slightly different set of components. Cloudera includes Flume and Scoop which HortonWorks doesn’t. HortonWorks includes Ambari and a platform by Talend. Having a distribution gives companies a chance to define Hadoop. This matters a lot to new adopters, and especially larger companies – we look at the distribution as an indication of which components are safe to use, and are reluctant to add components outside their distribution. As an example, Oozie and Azkaban are similar tools performing similar task of managing jobs in Hadoop. In my experience, Oozie is far more popular, not because its a superior tool, but because it is part of the popular Cloudera distribution.
There’s a reason Hadoop users prefer to use a distribution as a whole rather than mix and match toolchains: With many components in an Hadoop production system, matching the versions to make sure all the tools are working well together is a challenging task. Companies that release their own distribution pick the correct versions, test a lot and furiously patch to make sure all the components will work as one whole. This is somewhat similar to the way Oracle will announce that 11g is supported on RHEL5 but not RHEL6, except much more so. Of course, Redhat could do the same, as they do to all software in their Linux distribution, but as you can see, they don’t.
When users choose a well known distribution they don’t just get a well chosen and tested mix of components, they also get the option of purchasing support for this distribution. Thats the main benefit for companies selling their own Hadoop distribution: You go to all the trouble of picking components and testing them, so that you are well positioned to provide support for them. Other companies can of-course sell support for the same distribution – Pythian will happily support any Hadoop distribution you choose. But the owner of the distribution has some advantage since it is much more difficult for 3rd party supporters to offer bug fixes in Hadoop code.
Of course, all this doesn’t apply to Intel, who show no intention of selling support.
So why would Intel need their own distribution?
Lets start from basics – Intel sells CPUs. Thats their main line of business. But they also write software. For example, Intel’s C compiler is first rate. I used to love working with it. Intel wrote their own compiler so executables generated with it will always use the best Intel features. This means that popular software would run faster on Intels, because their performance features will be used even when developers don’t know about them (Oracle Optimizer attempts to do the same, but with less success).
How does it apply to Hadoop? Clearly Intel noticed that Hadoop clusters tend to have lots of CPUs, and they are interested in making sure that these CPUs are always Intel, possibly by making sure that Hadoops run faster on Intel CPUs.
Lets look at Intel’s blog post on the topic: http://blogs.intel.com/technology/2013/02/big-data-buzz-intel-jumps-into-hadoop
“The Intel Distribution for Apache Hadoop software is a 100% open source software product that delivers Hardware enhanced performance and security (via features like Intel® AES-NI™ and SSE to accelerate encryption, decryption, and compression operation by up to 14 times).”
“With this distribution Intel is contributing to a number of open source projects relevant to big data such as enabling Hadoop and HDFS to fully utilize the advanced features of the Xeon™ processor, Intel SSD, and Intel 10GbE networking.”
“Intel is contributing enhancements to enable granular access control and demand driven replication in Apache HBase to enhance security and scalability, optimizations to Apache Hive to enable federated queries and reduce latency. ”
Intel is doing for Hadoop the same thing it did for C compilers – make sure they use the best hardware enhancements available in the CPUs and other hardware components available from Intel. The nice thing is that the enhancements are available as open source – Intel doesn’t care that the software is free, since they are selling the hardware!
And since its open source, we can take a peak at Intel’s Github repository: https://github.com/intel-hadoop
What can we find there? We have Project Rhino (https://github.com/intel-hadoop/project-rhino) with Intel’s security enhancements and Project Panthera (https://github.com/intel-hadoop/project-panthera) offering advanced SQL support for Hive and improved performance for HBase. There’s also a benchmarking suite (https://github.com/intel-hadoop/HiBench) and Performance Analyzer (https://github.com/intel-hadoop/HiTune) .
Improved Hadoop security is on the top of the list of things the enterprise needs from Hadoop http://tdwi.org/Blogs/Philip-Russom/2013/04/Hadoop-Functionality-that-Needs-Improvement.aspx - mixing Intel’s well known encryption support on the CPU with the enterprise requirement for improved security is a very smart move in my book. I know that security is much more than just fast encryption, but if Intel can leverage their security brand to create a strong security model for Hadoop, its a welcome effort. The security offerings are promising indeed – key management, unified and integrated access management, and possibly even replacing Kerberos with something better integrated? Sign me on, and from what I heard – my customers are ready to sign too.
None of those were officially released yet, and I didn’t try to compile the code and run, so I can’t say much about what is actually delivered. Perhaps someone did and can comment. But I did notice another interesting detail. The Project Rhino README lists all the Hadoop components that Intel intends to include in its unified and integrated security model:
- Core: A set of shared libraries
- HDFS: The Hadoop filesystem
- MapReduce: Parallel computation framework
- ZooKeeper: Configuration management and coordination
- HBase: Column-oriented database on HDFS
- Hive: Data warehouse on HDFS with SQL-like access
- Pig: Higher-level programming language for Hadoop computations
- Oozie: Orchestration and workflow management
- Mahout: A library of machine learning and data mining algorithms
- Flume: Collection and import of log and event data
- Sqoop: Imports data from relational databases
Looks familiar to anyone? Thats more or less identical to Cloudera’s Hadoop distribution. Why did Intel choose to use CDH? Possibly because of its focus on the enterprise toolchain – those are the tools you’ll need to build an ETL pipeline and a data-science practice on Hadoop. If Intel’s unified solution won’t include these tools, getting the enterprise adoption they are looking for will be a much bigger challenge. However, it does open new questions: Will Intel offer support for the distribution, or will they leave it to Cloudera who already support all the components? And can you have a “unified security solution” that leaves HortonWorks and MapR completely out of the plan?
Its far too early to tell where this will all go, but so far Intel made interesting decisions that make me look forward to the day when they’ll have more to download than just a PDF. If you have thoughts on where this is all going, I’m looking forward to reading your comments too.
Here's a demo. (tested in 188.8.131.52)
Start with a table and an FBI
SQL> create table test_10013177 (col_1 number(6,4));Funnily, the results in the last query are truncated !
SQL> insert into test_10013177 values (12.3456);
1 row created.
SQL> create index index_10013177 on test_10013177(col_1+1);
SQL> exec dbms_stats.gather_table_stats('','TEST_10013177');
PL/SQL procedure successfully completed.
SQL> select col_1+1 from test_10013177;
SQL> select max(col_1+1) from test_10013177;
SQL> select /*+ full (t) */ max(col_1+1) from test_10013177 t;
Log Buffer’s collection of blogs from across the world of various databases is all about nifty features, cool tricks, pragmatic tips, real world war stories, and much more.
Ops Center uses an Oracle Database to store product data. Normally you’ll use the Ops Center UI and won’t access the database directly, but it can occasionally be useful to see the database contents directly.
The “Oracle Fusion Financials 11g General Ledger Certified Implementation Specialist” certification is designed for implementation consultants who possess a strong foundation and expertise implementing and configuring the Fusion General Ledger application.
In Oracle when you compare a character and a number in a SQL WHERE clause Oracle converts the character value to a number rather than the other way around.
Enterprises look to new tools and tactics for big data ROI.
New Basel III Directives Make Sure You Are Not Kicking the Can Down The Road.
Greg Low was at a site where they decided to change the service account for the SQL Server services on a set of systems. After changing the service accounts, SQL Server restarted just fine on all machines except one.
Stored procedure debugging is a feature that has existed in SQL Server Management Studio (SSMS) for some time (since before SQL Server 2005 I believe) yet in Jamie Thomson experience not many folks make use of it, indeed many folks aren’t even aware it exists.
As soon as Davide Mauri had the possibility to test HDInsight on Azure he promptly started using it.
INSERT SELECT is Broken. Ask Microsoft to Fix It. Adam Machanic opines.
Generally Joe Chang does not get involved in high-end SAN systems. It is almost impossible to find meaningful information on the hardware architecture from the vendor.
Shard-Query inserts data into a “coordinator” table when answering queries. When there is a GROUP BY on the original query, the coordinator table contains a UNIQUE KEY over the GROUP BY attributes.
It’s now two.. three.. four weeks since Percona MySQL Conference was finished.. – and there are still many things and new ideas running in Dmitri‘s mind inspired by this great MySQL tech + community event.
Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality.
The main benefit of using GTIDs is to have much easier failover than with file-based replication.
MySQL can optimize aggregate functions like MIN and MAX as long as the columns specified are indexed. This means that, in the case of MIN and MAX, the optimizer should be able to identify the highest and lowest values of an indexed column from the B-Tree index.