Skip navigation.

DBA Blogs

MySQL Sounds Like Fun

Pythian Group - Mon, 2015-03-16 07:48

I love finding out new things about MySQL. Last week, I stumbled on a query that had the phrase “SOUNDS LIKE” in it. Sounds made-up, right? Turns out MySQL is using a known “soundex” algorithm common to most databases, and popular in use cases in geneaology.

The basic idea is that words are encoded according to their consonants. Consonants that sound similar (like M and N) are given the same code. Here’s a simple example:

(“soundex” and “sounds like” are different ways of doing the same thing in these queries)

MariaDB> select soundex("boom");
+-----------------+
| soundex("boom") |
+-----------------+
| B500            |
+-----------------+

MariaDB> select soundex("bam");
+----------------+
| soundex("bam") |
+----------------+
| B500           |
+----------------+

MariaDB> select soundex("bin");
+----------------+
| soundex("bin") |
+----------------+
| B500           |
+----------------+

This simple example isn’t terribly useful, but if you were trying to find similar, but differently spelled, names across continents, it could be helpful:

MariaDB> select soundex("William");
+--------------------+
| soundex("William") |
+--------------------+
| W450               |
+--------------------+

MariaDB> select soundex("Walaam");
+-------------------+
| soundex("Walaam") |
+-------------------+
| W450              |
+-------------------+

MariaDB> select soundex("Willem");
+-------------------+
| soundex("Willem") |
+-------------------+
| W450              |
+-------------------+

MariaDB> select soundex("Williama");
+---------------------+
| soundex("Williama") |
+---------------------+
| W450                |
+---------------------+

And you could probably agree these variations match as well:

MariaDB> select soundex("Guillaume");
+----------------------+
| soundex("Guillaume") |
+----------------------+
| G450                 |
+----------------------+

MariaDB> select soundex("Uilleam");
+--------------------+
| soundex("Uilleam") |
+--------------------+
| U450               |
+--------------------+

MariaDB> select soundex("Melhem");
+-------------------+
| soundex("Melhem") |
+-------------------+
| M450              |
+-------------------+

MariaDB> select soundex("Uilliam");
+--------------------+
| soundex("Uilliam") |
+--------------------+
| U450               |
+--------------------+

Well, that’s pretty neat. Of course, I want to try the silliest word I can think of:

MariaDB> select soundex("supercalifragilisticexpealidocious");
+-----------------------------------------------+
| soundex("supercalifragilisticexpealidocious") |
+-----------------------------------------------+
| S162416242321432                              |
+-----------------------------------------------+

So the algorithm doesn’t stop at 3 digits; good to know.

What does the algorithm do? Luckily MySQL is open source, and so we can look in the source code:

This looks like the raw mapping. And then this is called into a function that loops through the characters in the word.

/* ABCDEFGHIJKLMNOPQRSTUVWXYZ */
/* :::::::::::::::::::::::::: */
const char *soundex_map= "01230120022455012623010202";

Note that even though it’s called “sounds like” it is really simply a character mapping based on an agreement by the developers’ ears which characters sounds similar. That’s of course an oversimplification, and I see in the code comments the following:

/****************************************************************
* SOUNDEX ALGORITHM in C *
* *
* The basic Algorithm source is taken from EDN Nov. *
* 14, 1985 pg. 36. *

But despite hitting up several librarians, I can’t seem to get a copy of this. Someone out there has a copy sitting around, right?

As a side note, this is obviously specific to the English language. I found references to German and other languages having soundex mappings, and would be curious to see those and hear of any language-specific ways to do this.

Curiosity aside, here’s a real use.

I pulled down some government climate data. Let’s say the location field has some of my favorite misspellings of “Durham” in it:

MariaDB [weather]> select distinct(two), count(two) from weather.temps group by two;
+--------------------------------------------+------------+
| two                                        | count(two) |
+--------------------------------------------+------------+
| NULL                                       |          0 |
| DRM                                        |         51 |
| DURHAM                                     |    1101887 |
| DURM                                       |         71 |
| NCSU                                       |    1000000 |
| RALEIGH DURHAM INTERNATIONAL AIRPORT NC US |    1096195 |
| RDU AIRPORT                                |    1000000 |
+--------------------------------------------+------------+

A “LIKE” clause won’t work terribly well here.

I confirmed the misspellings would match as I expected:

MariaDB [weather]> select soundex("Durham"), soundex("Durm"), soundex("DRM");
+-------------------+-----------------+----------------+
| soundex("Durham") | soundex("Durm") | soundex("DRM") |
+-------------------+-----------------+----------------+
| D650              | D650            | D650           |
+-------------------+-----------------+----------------+

So instead of manually creating a query like:

MariaDB [weather]> select count(two) from weather.temps where two='DRM' or two='DURHAM' or two='DURM';
+------------+
| count(two) |
+------------+
|    1102009 |
+------------+

I can simply do this:

MariaDB [weather]> select count(two) from weather.temps where two sounds like 'Durham';
+------------+
| count(two) |
+------------+
|    1102009 |
+------------+

There are more than several ways to do string comparisons, but I enjoyed finding this one.

(Bonus points will be granted to the first person who comments that RDU is also Durham and submits a unique query to include it in the count.)

Categories: DBA Blogs

Monitoring Cassandra with Grafana and Influx DB

Pythian Group - Mon, 2015-03-16 07:37

Hello,

In this post I will explain how to set up Cassandra monitoring with influxDB and Grafana. This can also be used to connect to other monitoring systems (Graphite, Collectd, etc…) but since both influxDB and Grafana are hot topics at the moment I decided to follow the trend! I was asked why I was doing this when a tool like OpsCenter is available, but sometimes you want to have all your systems reporting to a single dashboard. And if your dashboard is Grafana and your Backend is influxDB then you will learn how to connect Cassandra to it!

Assumptions:
– You are running a Linux system (This post is based on CentOS 7)
– You are using Cassandra 1.2+ (I’m using 2.1.3 in this case)

Prerequisites
  • Cassandra Installation
  • Graphite Metrics Jar
  • influxDB – http://influxdb.com/
  • Grafana – http://grafana.org/
  • Apache (Any webserver would do)
Installing and configure influxDB

This one is dead easy, once you have the package install it (rpm -i, dpkg -i). Start the service:

service influxdb start

Once the service is running, go to the configuration (/opt/influxdb/shared/config.toml) and edit the file so that under [input_plugins] it looks like this:

# Configure the graphite api
[input_plugins.graphite]
enabled = true
# address = "0.0.0.0" # If not set, is actually set to bind-address.
port = 2003
database = "cassandra-metrics" # store graphite data in this database
udp_enabled = true

Save the file, reload the service:

service influxdb reload

Now go to your browser localhost:8083, click connect (no credentials should be needed), and after you logged in, enter in a database name (use cassandra-metrics) and click Create (This should be your only option). Now you can click the database, and add an user to it (and make it admin). Now create another database, with name “grafana”, create an admin for that database also.
Now you are done with influxDB.

Installing Grafana

Grafana is a bit more tricky, since it is needed to configure a webserver also. Let’s assume apache is installed, and the home directory for www is /var/www/html.

So get the grafana package and extract it to /var/www/html. So the end result should be something like /var/www/html/grafana.

Now do the following:

cd /var/www/html/grafana
cp config.sample.js config.js

Now let’s configure the connection between influXDB and Grafana. Open for edit the new copied file config.js and edit it so it looks like this:

datasources: {
  influxdb: {
    type: 'influxdb',
    url: "http://localhost:8086/db/cassandra-metrics",
    username: 'admin',
    password: 'admin',
  },
  grafana: {
    type: 'influxdb',
    url: "http://localhost:8086/db/grafana",
    username: 'admin',
    password: 'admin',
    grafanaDB: true
  },
},

Now redirect your browser to localhost/grafana and you will have the Grafana default dashboard.

Preparing Cassandra

Now the final piece of the puzzle. Now we follow more or less the Cassandra guide that exists here, but we need to make some changes to make it more valuable (and allow multiple nodes to provide metrics).

So, first of all, put the metrics-graphite-2.2.0.jar in all the Cassandra nodes /lib directory.
Now create a yaml file with similar to the Datastax example, lets call it influx-reporting.yaml and store it on /conf directory. Now edit the file again so it looks like this:

graphite:
-
  period: 60
  timeunit: 'SECONDS'
  prefix: 'Node1'
  hosts:
  - host: 'localhost'
    port: 2003
  predicate:
    color: "white"
    useQualifiedName: true
    patterns:
    - ".*"

What did we change here, first we added a prefix field, this will allow us to identify the node that is providing the metrics. It must be different for every node, otherwise the metrics will overwrite/mix with each other. Then we decided to allow all patterns (“.*”), this means that Cassandra will push out all the metrics into influxDB. You can decide whether or not this is too much and just enable the metrics you want (find out more about it here).

Now edit the cassandra-env.sh so that it will read the yaml file to provide the metrics. Add the following line to the end of the file:

JVM_OPTS="$JVM_OPTS -Dcassandra.metricsReporterConfigFile=influx-reporting.yaml"

If all is done correctly, you can restart the Cassandra node (or nodes, but don’t do it all at the same time, 2min between each is recommended) and if the log file has the following message:

INFO [main] YYYY-MM-DD HH:MM:SS,SSS CassandraDaemon.java:353 - Trying to load metrics-reporter-config from file: inf
lux-reporting.yaml
INFO [main] YYYY-MM-DD HH:MM:SS,SSS GraphiteReporterConfig.java:68 - Enabling GraphiteReporter to localhost:2003

All is good!

Graphing!

Grafana is not that difficult to use, so once you start exploring a bit (And reading the documentation) you will find out doing nice graphs. This could be a long post only about graphing out, so I’m just go and post some images of the graphs I’m getting out of Grafana so that you can see how it can be powerful and help you on keeping your Cassandra Healthy.

Grafana_cassandra-test3 Grafana_cassandra-test2 Grafana_cassandra-test1
Categories: DBA Blogs

Cassandra 101 : Understanding What Cassandra Is

Pythian Group - Mon, 2015-03-16 07:35

As some of you may know, in my current role at Pythian, I am tackling OSDB and currently Cassandra is on my radar. So one of the things I have been trying to do is learn what Cassandra is, so in this series, I’m going to share a bit of what I have been able to learn.

According to the whitepaper “Solving Big Data Challenges for Enterprise Application Performance Management” , Cassandra is a “distributed key value store developed at Facebook. It was designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service without single point of failure allowing replication even across multiple data centers as well as for choosing between synchronous or asynchronous replication for each update.”

Cassandra, in layman’s terms, is a NoSQL database developed in JavaOne. One of Cassandra’s many benefits is that it’s an open source DB with deep developer support. It is also a fully distributed DB, meaning that there is no master DB, unlike Oracle or MySQL, so this allows this database to have no point of failure. It also touts being linearly scalable, meaning that if you have 2 nodes and a throughput of 100,000 transactions per second, and you added 2 more nodes, you would now get 200,000 transactions per second, and so forth.

2015-03-12_1145

Cassandra is based on 2 core technologies, Google’s Big Table and Amazon’s Dynamo, which Facebook uses to power their Inbox Search feature and released it as an open source project on Google Code and then incubated at Apache, and is nowadays a Top-Level-Project. Currently there exists 2 versions of Cassandra:

Since Cassandra is a distributed system, it follows the CAP Theorem, which is awesomely explained here, and it states that, in a distributed system, you can only have two out of the following three guarantees across a write/read pair:

  • Consistency.- A read is guaranteed to return the most recent write for a given client.
  • Availability.-A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout).
  • Partition Tolerance.-The system will continue to function when network partitions occur.

Also Cassandra is a BASE (Basically Available, Soft state, Eventually consistent) type system, not an ACID (Atomicity, Consistency, Isolation, Durability) type system, meaning that the system is optimistic and accepts that the database consistency will be in a state of flux, not like ACID which is pessimistic and it forces consistency at the end of every transaction.

Cassandra stores data according to the column family data model where:

  • Keyspace is the container for your application data, similar to a schema in a relational database. Keyspaces are used to group column families together. Typically, a cluster has one keyspace per application.It also defines the replication strategy and data objects belong to a single keyspace
  • Column Family is a set of  one,two or more individual rows with a similar structure
  • Row is a collection of sorted columns, it is the the smallest unit that stores related data in Cassandra, and any component of a Row can store data or metadata
    •  Row Key uniquely identifies a row in a column family

      •  Column key uniquely identifies a column value in a row
      •  Column value stores one value or a collection of values
keyspace

Also we need to understand the basic architecture of Cassandra, which has the following key structures:

  • Node is one Cassandra instance and is the basic infrastructure component in Cassandra. Cassandra assigns data to nodes in the cluster, each node is assigned a part of the database based on the Row Key. Usually corresponds to a host, but not necessarily, specially in Dev or Test environments.
  • Rack is a logical set of nodes
  • Data Center is a logical set of Racks, a data center can be a physical data center or virtual data center. Replication is set by data center
  • Cluster contains one or more data centers and is the full set of nodes which map to a single complete token ring
Cassandra_Arch

Conclusion

Hopefully this will help you understand the basic Cassandra concepts. In the next series, I will go over architecture concepts of what a Seed node is, the purpose of the Snitch and topologies, the Coordinator node, replication factors, etc

Note 1:

André Araújo, a great friend of mine and previous Pythianite, wrote about his first experience with Cassandra : My First Experience with Cassandra – Part 1

Note 2:

This post was originally published in my personal blog: rene-ace.com

Categories: DBA Blogs

Log Buffer #414, A Carnival of the Vanities for DBAs

Pythian Group - Mon, 2015-03-16 07:22

This Log Buffer Edition picks the sea shells from Blogs across the seas of Oracle, SQL Server and MySQL and arrange them for you in this Edition. Enjoy.

Oracle:

12c Parallel Execution New Features: Concurrent UNION ALL

Visualizing Statspack Performance Data in SQL Developer

Optimizer statistics – Gathering Statistics and Histograms

Big Data Made Actionable with Omar TawaKol at SXSW

Mobile backend with REST services and JSON payload based on SOA Suite 12c

SQL Server:

Setting Different Colors for Connections in SSMS

Defusing Database Time Bombs: Avoiding the Need to Refactor Databases

This article shows a step by step tutorial to create a virtual machine in 15 min on Windows Azure.

What SQL Statements Are Currently Using The Transaction Logs?

SQL Server Random Sorted Result Set

MySQL:

Oracle Linux 7.1 and MySQL 5.6

MySQL Workbench 6.3.2 RC has been released

MariaDB CONNECT storage engine now offers access to JSON

Avoiding MySQL ERROR 1052 by prefixing column names in multi table queries

MySQL 5.7.6 DMR: Packages, Repos, Docker Images

Categories: DBA Blogs

Conditions Based On Inequalities Can’t Use Indexes – How To Resolve?

Oracle in Action - Mon, 2015-03-16 02:55

RSS content

Conditions based on inequalities (!=, <>) cannot make use of index(es). I will illustrate this limitation and show you how to optimize SQL statements hitting it.

For the demonstration, I have  a table  students table having a column named result that  can contain the values – ‘Pass’, ‘Fail’, ‘To be evaluated’. The column is characterized by a very non-uniform distribution having most of the rows  set to value Passed (P). Here’s the example:

SQL>drop table students purge;
    create table students (id , result )
    as
    select rownum, decode (mod(rownum, 30), 0, 'F', 1, 'T',  'P')
    from  all_tables;

    create index students_idx on students (result);
    exec dbms_stats.gather_table_stats (USER, 'STUDENTS', cascade => TRUE);

     SELECT result , count(*)
     FROM students
     GROUP BY result;
RESULT COUNT(*)
---------- ----------
P              100
T                4
F                3

Let’s execute the  query to select all students who have not passed (result = ‘T’ or ‘F’). Even though the query has a very strong selectivity and the result column is indexed, the query optimizer chooses a full table scan for reading 7 rows as the predicate involves inequality.

SQL>select * from students where result <> 'P';
    select * from table(dbms_xplan.display_cursor);

ID RESULT
---------- ----------
1 T
30 F

....

7 rows selected.

PLAN_TABLE_OUTPUT
---------------------------------------------------------
SQL_ID f2wkxqy3b6b5h, child number 0
-------------------------------------
select * from students where result <> 'P'

Plan hash value: 4078133427
---------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | TABLE ACCESS FULL| STUDENTS | 71 | 355 | 3 (0)| 00:00:01 |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RESULT"<>'P')

In a case like this, where the inequality condition has a strong selectivity, we can advantage of an index using folowing three techniques :

First, the inequality condition can be rewritten into an IN condition. This is an option only when the number of values to be selected is known and the number is limited. For example, if the query is modified as shown, index range scan is employed.

SQL>select * from students where result in ('F', 'T');
select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
---------------------------------------------------------
SQL_ID 672mnj9pggkq7, child number 0
-------------------------------------
select * from students where result in ('F', 'T')

Plan hash value: 2871222462
---------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 71 | 355 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | STUDENTS_IDX | 71 | | 1 (0)| 00:00:01 |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(("RESULT"='F' OR "RESULT"='T'))

Second,   manually rewrite the query to make sure that both component queries can take advantage of an index range scan. This technique  can be applied if the values are unknown or the number of values to be specified is too high.   Hence, if  the query is rewritten as shown, it will be able to to take advantage of the or expansion query transformation:

SQL>select * from students where result < 'P'
    union all
    select * from students where result > 'P' ;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------- 
SQL_ID gqrp063y9c5a5, child number 0
-------------------------------------
select * from students where result < 'P' union all select * from
students where result > 'P'

Plan hash value: 2171568329
--------------------------------------------------------- 
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------- 
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | UNION-ALL | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 76 | 380 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | STUDENTS_IDX | 76 | | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 36 | 180 | 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | STUDENTS_IDX | 36 | | 1 (0)| 00:00:01 |
--------------------------------------------------------- 

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("RESULT"<'P')
5 - access("RESULT">'P')

The third technique simply forces an index full scan with, for example, the index hint. From a performance point of view, it’s not optimal,as, for a query with very strong selectivity, full index has to be scanned.

SQL>SELECT /*+ index(students) */ * FROM students where result != 'P';
select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------- 
SQL_ID 2hyrf6n7kb8pr, child number 0
-------------------------------------
SELECT /*+ index(students) */ * FROM students where result != 'P'

Plan hash value: 635752001
---------------------------------------------------------  
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------- 
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| STUDENTS | 71 | 355 | 2 (0)| 00:00:01|
|* 2 | INDEX FULL SCAN | STUDENTS_IDX | 71 | | 1 (0)| 00:00:01 |
--------------------------------------------------------- 

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("RESULT"<>'P')

Conclusion:

In cases where the inequality condition having a strong selectivity is notable to make use of an index, we can advantage of an index using following three techniques : 

  • First, the inequality condition can be rewritten into an IN condition. This is an option only when the number of values to be selected is known and the number is limited.
  • Second,   manually rewrite the query to make sure that both component queries can take advantage of an index range scan. This technique  can be applied if the values are unknown or the number of values to be specified is too high.
  • The third technique simply forces an index full scan with, for example, the index hint. From a performance point of view, it’s not optimal,as, for a query with very strong selectivity, full index has to be scanned.

References:
Troubleshooting Oracle Performance (second edition ) by Christian Antognini
—————————————————————————————————————

Related links:

Home
Tuning Index

————————-

 



Tags:  

Del.icio.us
Digg

Comments:  1 (One) on this item
You might be interested in this:  
Copyright © ORACLE IN ACTION [Conditions Based On Inequalities Can't Use Indexes - How To Resolve?], All Right Reserved. 2015.

The post Conditions Based On Inequalities Can’t Use Indexes – How To Resolve? appeared first on ORACLE IN ACTION.

Categories: DBA Blogs

Loads of fun with DBA_HIST_OSSTAT

Bobby Durrett's DBA Blog - Fri, 2015-03-13 17:35

I saw a load of 44 on a node of our production Exadata and it worried me.  The AWR report looks like this:

Host CPU
            Load Average
 CPUs     Begin       End     %User   %System      %WIO     %Idle
----- --------- --------- --------- --------- --------- ---------
   16     10.66     44.73      68.3       4.3       0.0      26.8

So, why is the load average 44 and yet the CPU is 26% idle?

I started looking at ASH data and found samples with 128 processes active on the CPU:

     select
  2  sample_time,count(*)
  3  from DBA_HIST_ACTIVE_SESS_HISTORY a
  4  where
  5  session_state='ON CPU' and
  6  instance_number=3 and
  7  sample_time
  8  between
  9  to_date('05-MAR-2015 01:00:00','DD-MON-YYYY HH24:MI:SS')
 10  and
 11  to_date('05-MAR-2015 02:00:00','DD-MON-YYYY HH24:MI:SS')
 12  group by sample_time
 13  order by sample_time;

SAMPLE_TIME                    COUNT(*)
---------------------------- ----------
05-MAR-15 01.35.31.451 AM           128

... lines removed for brevity

Then I dumped out the ASH data for one sample and found all the sessions on the CPU were running the same parallel query:

select /*+  parallel(t,128) parallel_index(t,128) dbms_stats ...

So, for some reason we are gathering stats on a table with a degree of 128 and that spikes the load.  But, why does the CPU idle percentage sit at 26.8% when the load starts at 10.66 and ends at 44.73?  Best I can tell load in DBA_HIST_OSSTAT is a point measurement of load.  It isn’t an average over a long period.  The 11.2 manual describes load in v$osstat in this way:

Current number of processes that are either running or in the ready state, waiting to be selected by the operating-system scheduler to run. On many platforms, this statistic reflects the average load over the past minute.

So, load could spike at the end of an hour-long AWR report interval and still CPU could average 26% idle for the entire hour?  So it seems.

– Bobby

Categories: DBA Blogs

Not NULL Constraint Influences Access Path

Oracle in Action - Thu, 2015-03-12 23:12

RSS content

The optimizer can make use of explicitly defined Not NULL constraints to take advantage
of an index in order to avoid a full table scan since a B-tree index stores only not NULL values .
When  count (constant) or count(*)  is queried,  we want to count no. of rows in the table. Hence , if there is a column which is defined as not NULL and has an index on it, the number of index entries  in the index are bound to be same as the number of rows. The query optimizer uses the index to count no. of rows in the table.

Similarly, when  a count (not-nullable-column) is queried,  we want to count the no. of rows having not null values in the column. Since the column  has a not NULL constraint on it, every row in the table will have a not null value in it and count(not-nullable-column) is  same as count(*). As a result, the query optimizer can use  the index on the column to process the query.
In fact, in both the cases above, any B-tree containing at least a not-nullable column can serve the purpose.

When a count (nullable-column) is queried, we want to count the no. of rows having not null values in the column. If we have an index on the column, the index will store only not NULL values and hence can be effectively used by  the query optimizer to give the result.
In fact, the optimizer can use any index containing the nullable column for this purpose.

To demonstrate the above functionality, I have created a  table HR.TEST with two columns – NOTNULL having not NULL constraint
NULLABLE
. having same data as column NOTNULL but has not been declared not NULL
. has a B-tree index on it

SQL>drop table hr.test purge;
    create table hr.test (notnull number not null, nullable number);
    insert into hr.test select rownum, rownum from all_tables;
    create index hr.test_idx on hr.test(nullable);
    exec dbms_stats.gather_table_stats ('HR','TEST', cascade => true);

Now I will query count for various arguments and check if optimizer can use the index on NULLABLE column.

Note that to process count(*),  count(1) and   count(notnull), the query optimizer uses Full Table Scan. Although the column NULLABLE has non-null values in all the rows but since it has not been explicitly declared not null , the  optimizer does not know that no. of entries in index reflect the count correctly and hence does not use the index .

SQL>select count(*) from hr.test;
            select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 1mat065c25crk, child number 0
-------------------------------------
select count(*) from hr.test

Plan hash value: 1950795681
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST | 108 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------

SQL>select count(1) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID gzpsn7ff3ncmc, child number 0
-------------------------------------
select count(1) from hr.test

Plan hash value: 1950795681
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST | 108 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------

SQL>select count(notnull) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 6kxdzxbac62b4, child number 0
-------------------------------------
select count(notnull) from hr.test

Plan hash value: 1950795681
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST | 108 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------

To process count(nullable), the optimizer uses index on column NULLABLE because we want to count not null values in column nullable and Btree index stores only not null values.

SQL> select count(nullable) from hr.test;
select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID bz8rxw5rmmv8g, child number 0
-------------------------------------
select count(nullable) from hr.test

Plan hash value: 2284640995
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | INDEX FULL SCAN| TEST_IDX | 108 | 432 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------

Now I will declare not NULL constraint on  column NULLABLE.

SQL> alter table hr.test modify (nullable not null);

Now if query count(*), count(1), count(notnull) and count(nullable), the optimizer is able to avoid Full Table Index by making  use of the index  on NULLABLE column in all the cases . Since the column NULLABLE having index has been declared not null and optimizer knows that entries in the index represent all the rows of the table.

SQL>select count(*) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------- 
SQL_ID 1mat065c25crk, child number 0
-------------------------------------
select count(*) from hr.test

Plan hash value: 2284640995
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FULL SCAN| TEST_IDX | 108 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------

SQL>select count(1) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------- 
SQL_ID gzpsn7ff3ncmc, child number 0
-------------------------------------
select count(1) from hr.test

Plan hash value: 2284640995
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FULL SCAN| TEST_IDX | 108 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------

SQL>select count(notnull) from hr.test;
    select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------- 
SQL_ID 6kxdzxbac62b4, child number 0
-------------------------------------
select count(notnull) from hr.test

Plan hash value: 2284640995
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FULL SCAN| TEST_IDX | 108 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------

SQL> select count(nullable) from hr.test;
     select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------- 
SQL_ID bz8rxw5rmmv8g, child number 0
-------------------------------------
select count(nullable) from hr.test

Plan hash value: 2284640995
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FULL SCAN| TEST_IDX | 108 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------

Hence, It is advisable to declare NOT NULL constraint on relevant columns so that optimizer can choose index access in relevant cases.

References:
Troubleshooting Oracle Performance (second edition ) by Christian Antognini
—————————————————————————————————————

Related links:

Home
Tuning Index

————————-



Tags:  

Del.icio.us
Digg

Comments:  0 (Zero), Be the first to leave a reply!
You might be interested in this:  
Copyright © ORACLE IN ACTION [Not NULL Constraint Influences Access Path], All Right Reserved. 2015.

The post Not NULL Constraint Influences Access Path appeared first on ORACLE IN ACTION.

Categories: DBA Blogs

Arizona Oracle User Group Meeting March 18

Bobby Durrett's DBA Blog - Thu, 2015-03-12 09:39
I just found out that this meeting was cancelled.  We will have to catch the next one. :) – 3/16/2015

Sign up for the Arizona Oracle User Group (AZORA) meeting next week: signup url

The email that I received from the meeting organizer described the topic of the meeting in this way:

“…the AZORA meetup on March 18, 2015 is going to talk about how a local business decided to upgrade their Oracle Application from 11i to R12 and give you a first hand account of what went well and what didn’t go so well. ”

Description of the speakers from the email:

Becky Tipton

Becky is the Director of Project Management at Blood Systems located in Scottsdale, AZ. Prior to coming to Blood Systems, Becky was an independent consultant for Tipton Consulting for four years.

Mike Dill

Mike is the Vice President of Application Solutions at 3RP, a Phoenix consulting company. Mike has over 10 years of experience implementing Oracle E-Business Suite and managing large-scale projects.

I plan to attend.  I hope to see you there too. :)

– Bobby

Categories: DBA Blogs

Delphix User Group Presentation

Bobby Durrett's DBA Blog - Wed, 2015-03-11 16:30

My Delphix user group presentation went well today. 65 people attended.  It was great to have so much participation.

Here are links to my PowerPoint slides and a recording of the WebEx:

Slides: PowerPoint

Recording: WebEx

Also, I want to thank two Delphix employees, Ann Togasaki and Matthew Yeh.  Ann did a great job of converting my text bullet points into a visually appealing PowerPoint.  She also translated my hand drawn images into useful drawings.  Matthew did an amazing job of taking my bullet points and my notes and adding meaningful graphics to my text only slides

I could not have put the PowerPoint together in time without Ann and Matthew’s help and they did a great job.

Also, for the first time I wrote out my script word for word and added it to the notes on the slides.  So, you can see what I intended to say with each slide.

Thank you to Adam Leventhal of Delphix for inviting me to do this first Delphix user group WebEx presentation.  It was a great experience for me and I hope that it was useful to the user community as well.

– Bobby

Categories: DBA Blogs

#db12c now certified for #em12c repository (MOS Note: 1987905.1) with some restrictions

DBASolved - Wed, 2015-03-11 11:06

Last October (2014), at Oracle Open World 2014, I posted about a discussion where there was confusion on if Oracle Database 12c was supported as the Oracle Management Repository (OMR).  At the time, Oracle had put a temporary suspension on support for the OMR running on Oracle Database 12c. 

Over the last week or so, in discussions with some friends I heard that there may be an announcement on this topic soon.  As of yesterday, I was provided a MOS note number to reference (1987905.1) for OMR support on database 12c.  In checking out the note, it appears that the OMR can now be ran on a database 12c instance (12.1.0.2) with some restrictions.

These restrictions are:

  • Must apply database patch 20243268
  • Must apply patchset 12.1.0.2.1 (OCT PSU) or later

This note (1987905.1) is welcomed by many in the community who want to build their OMS on the latested database version.  What is missing from the note is if installing the OMR into a pluggable database (PDB) is support.  Guess the only way to find out is to try building a new Oracle Enterprise Manager 12c on top of a pluggable and see what happens.  At least for now, Oracle Database 12c is supported as the OMR.

Enjoy!

about.me: http://about.me/dbasolved


Filed under: OEM
Categories: DBA Blogs

Partner Webcast – Oracle Private Cloud: Database as a Service (DBaaS) using Oracle Enterprise Manager 12c

Large enterprises today have hundreds and thousands of databases of various versions, configurations and patch levels. Another challenge is around time to provision new databases. When an end...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Blog third anniversary

Bobby Durrett's DBA Blog - Thu, 2015-03-05 09:31

My first blog post was March 5, 2012, three years ago today.

I have enjoyed blogging.  Even though I am talking about topics related to my work blogging does not feel like work. The great thing about blogging is that it’s completely in my control.  I control the content and the time-table. I pay a small amount each year for hosting and for the domain name, but the entertainment value alone is worth the price of the site.  But, it also has career value because this blog has given me greater credibility both with my employer and outside the company.  Plus, I think it makes me better at my job because blogging forces me to put into words the technical issues that I am working on.

It’s been three good years of blogging.  Looking forward to more in the future.

– Bobby

Categories: DBA Blogs

Oracle Information Security Partner Community Forum - March 26-27, 2015

FEBRUARY 2015 ...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Joined twitter

Bobby Durrett's DBA Blog - Wed, 2015-03-04 17:27

I joined twitter.  I don’t really know how to use it.  I’m setup as Bobby Durrett, @bobbydurrettdba if that means anything to you. :)

– Bobby

Categories: DBA Blogs

Different plan_hash_value same plan

Bobby Durrett's DBA Blog - Mon, 2015-03-02 15:38

I mentioned this same effect in an earlier post about SQL profiles: link

I get a different plan_hash_value values for a query each time I run an explain plan or run the query.  I see this in queries whose plan includes a system generated temporary segment like this:

|   1 |  TEMP TABLE TRANSFORMATION   |                             |
...
|  72 |    TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9D668C_764DD84C |

For some reason the system generated temporary table name gets included in the plan_hash_value calculation.  This makes plan_hash_value a less than perfect way to compare two plans to see if they are the same.

Last week I was using my testselect package to test the effect of applying a patch to fix bug 20061582.  I used testselect to grab 1160 select statements from production and got their plans with and without the patch applied on a development database.  I didn’t expect many if any plans to change based on what the patch does.  Surprisingly, 115 out of the 1160 select statements had a changed plan, but all the ones I looked at had the system generated temporary table names in their plan.

Now, I am going to take the queries that have different plans with and without the patch and execute them both ways.  I have a feeling that the plan differences are mainly due to system generated temp table names and their execution times will be the same with and without the patch.

I’ve run across other limitations of plan hash value as I mentioned in an earlier post: link

I’m still using plan_hash_value to compare plans but I have a list of things in my head that reminds me of cases where plan_hash_value fails to accurately compare two plans.

– Bobby

P.S. After posting this I realized that I didn’t know how many of the 115 select statements with plans that differed with and without the patch had system generated temp tables.  Now I know.  114 of the 115 have the string “TEMP TABLE TRANSFORMATION” in their plans.  So, really, there is only one select statement for which the patch may have actually changed its plan.

P.P.S. I reapplied the patch and verified that the one sql_id didn’t really change plans with the patch.  So, that means all the plan changes were due to the system generated name.  Also, all the executions times were the same except for one query that took 50 seconds to parse without the patch and 0 with the patch.  So, one of the queries with the system generated temp table name happened to benefit from the patch.  Very cool!

P.P.P.S This was all done on an 11.2.0.4 Exadata system.

Categories: DBA Blogs

Webcast - Oracle Database 12c High Availability New Features

Organizations today are dependent on IT to run efficient operations, quickly analyze information and compete more effectively. Consequently, it is essential that their IT infrastructure and databases...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Even More Oracle Database Health Checks with ORAchk 12.1.0.2.1 and 12.1.0.2.3 (Beta)

As we have discussed before, it can be a challenge to quantify how well your database is meeting operational expectations and identify areas to improve performance. Database health checks are...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Partner Webcast – Oracle Business Process Management 12c : The Game Changer for your Business

The Oracle Business Process Management Suite 12c (BPM) is of one the most complete BPM suites in the market and the most feature rich BPM suite offerings. There have been a wide variety of changes...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Oracle REST data services on Oracle Database Cloud Application Express

For those familiar with Oracle Application Express (aka APEX) - Oracle’s web-based application development tool, you probably now, that Oracle Application Express Listener is now known...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Hot off the press : Latest Release of Oracle Enterprise Manager 12c (R4)

Pankaj Chandiramani - Tue, 2014-06-03 06:53

Read more here about the PRESS RELEASE:  Oracle Delivers Latest Release of Oracle Enterprise Manager 12c


Richer
Service Catalog for Database and Middleware as a Service; Enhanced
Database and Middleware Management Help Drive Enterprise-Scale Private
Cloud Adoption


In coming weeks  , i will be covering latest topics like :



  1. DbaaS Service Catalog incorporating High Availability and Disaster Recovery

  2. New Rapid Start kit

  3. Other new Features 


Stay Tuned !

Categories: DBA Blogs