Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 6 hours 6 min ago

Auditing Files in Linux

Tue, 2014-07-15 08:25

Stat command in Linux can be used to display a file or a file system status.

I came across an issue in RHEL4 where a file’s ‘Change time’ is far ahead than the ‘Modification time’ without a change in uid, gid and mode.

# stat /etc/php.ini
File: `/etc/php.ini'
Size: 45809 Blocks: 96 IO Block: 4096 regular file
Device: 6801h/26625d Inode: 704615 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-06-25 13:22:15.000000000 -0400
Modify: 2012-10-01 13:21:41.000000000 -0400
Change: 2014-06-01 20:06:35.000000000 -0400 

To explain why this can be considered unusual, I will start by explaining the time values associated with a file:

  • Access (atime) – Time the file was last accessed. This involves syscalls like open(). For example, running cat command on the file would update this.
  • Modify    (mtime) – Time the file content was last modified. For example, if a file is edited and some content is added this value would change.
  • Change (ctime) – When any of the inode attributes in the file changes this value changes. Stat command would notice change if inode attributes except access time is changed. Following are the rest of the inode attributes – mode, uid, gid, size and modification time.

So ctime would get updated with mtime and file size would get updated with a mtime. So if a file’s ctime is changed from mtime without a change in mode, uid, and gid, the behaviour can be considered unexpected.

On checking the stat upstream (coreutils) source, I came across a known issue. Running chmod on a file without changing the file permissions can alter inode and cause the same behaviour. It is documented in TODO of coreutils upstream source.

Modify chmod so that it does not change an inode's st_ctime
when the selected operation would have no other effect.
First suggested by Hans Ecke  in

http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/2920

Discussed more recently on http://bugs.debian.org/497514.

This behaviour is not fixed in upstream.

Now we can assume that a process or user ran a chmod command which actually did not changed the attributes of php.ini. This would change ctime and not other attributes.

I can reproduce the same behaviour in my Fedora system as well.

For example,

# stat test
File: ‘test’
Size: 0             Blocks: 0          IO Block: 4096   regular empty file
Device: 803h/2051d    Inode: 397606      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-07-14 16:26:10.996128678 +0530
Modify: 2014-07-14 16:26:10.996128678 +0530
Change: 2014-07-14 16:26:10.996128678 +0530
Birth: -
# chmod 644 test
# stat test
File: ‘test’
Size: 0             Blocks: 0          IO Block: 4096   regular empty file
Device: 803h/2051d    Inode: 397606      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-07-14 16:26:10.996128678 +0530
Modify: 2014-07-14 16:26:10.996128678 +0530
Change: 2014-07-14 16:26:41.444377623 +0530 
Birth: -

But this is just an assumption. For getting a conclusive answer on what is causing this behaviour in this specific system, we would need to find what process is causing this.

auditd in linux can be used for watching a file and capturing audit records on that file to /var/log/audit/.

To watch the file, I edited /etc/audit.rules and added following.

-w /etc/php.ini

Then restarted auditd,

# service auditd start
Starting auditd:                                           [  OK  ]
# chkconfig auditd on

Running a cat command on the php.ini file would give following logs.

type=SYSCALL msg=audit(1404006436.500:12): arch=40000003 syscall=5 success=yes exit=3 a0=bff88c10 a1=8000 a2=0 a3=8000 items=1 pid=19905 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 comm="cat" exe="/bin/cat"
type=FS_WATCH msg=audit(1404006436.500:12): watch_inode=704615 watch="php.ini" filterkey= perm=0 perm_mask=4
type=FS_INODE msg=audit(1404006436.500:12): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:01 inode_rdev=00:00
type=CWD msg=audit(1404006436.500:12):  cwd="/root"
type=PATH msg=audit(1404006436.500:12): name="/etc/php.ini" flags=101 inode=704615 dev=68:01 mode=0100644 ouid=0 ogid=0 rdev=00:00

ausearch command is available for searching through the audit logs. Following command would display the audit entries from 6th July related to /etc/php.ini file.

# ausearch -ts 7/6/2014 -f /etc/php.ini | less

When I noticed the ctime changed again, I ran ausearch. I saw multiple events on the file. Most of the access are from syscall=5, which is the open system call.

Following entries seem to be pointing to the culprit. You can see that the system call is 271.

type=SYSCALL msg=audit(1404691594.175:37405): arch=40000003 syscall=271 success=yes exit=0 a0=bff
09b00 a1=bff07b00 a2=7beff4 a3=bff0a1a0 items=1 pid=9830 auid=4294967295 uid=0 gid=0 euid=0 suid=
0 fsuid=0 egid=0 sgid=0 fsgid=0 comm="bpbkar" exe="/usr/openv/netbackup/bin/bpbkar"
type=FS_WATCH msg=audit(1404691594.175:37405): watch_inode=704615 watch="php.ini" filterkey= perm
=0 perm_mask=2
type=FS_INODE msg=audit(1404691594.175:37405): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:
01 inode_rdev=00:00
type=FS_WATCH msg=audit(1404691594.175:37405): watch_inode=704615 watch="php.ini" filterkey= perm
=0 perm_mask=2
type=FS_INODE msg=audit(1404691594.175:37405): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:
01 inode_rdev=00:00
type=CWD msg=audit(1404691594.175:37405):  cwd="/etc"
type=PATH msg=audit(1404691594.175:37405): name="/etc/php.ini" flags=1 inode=704615 dev=68:01 mod
e=0100644 ouid=0 ogid=0 rdev=00:00

Using ausearch you can search based on system calls also. You can see that there is only one record with system call number 271. Another advantage of ausearch is that it would convert the time stamps to human readable form.

# ausearch -ts 7/6/2014 -sc 271 -f /etc/php.ini 

You can see time in the start of each block of search outputs.

----
time->Sun Jul  6 20:06:34 2014
type=PATH msg=audit(1404691594.175:37405): name="/etc/php.ini" flags=1 inode=704615 dev=68:01 mod
e=0100644 ouid=0 ogid=0 rdev=00:00
type=CWD msg=audit(1404691594.175:37405):  cwd="/etc"
type=FS_INODE msg=audit(1404691594.175:37405): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:
01 inode_rdev=00:00
type=FS_WATCH msg=audit(1404691594.175:37405): watch_inode=704615 watch="php.ini" filterkey= perm
=0 perm_mask=2
type=FS_INODE msg=audit(1404691594.175:37405): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:
01 inode_rdev=00:00
type=FS_WATCH msg=audit(1404691594.175:37405): watch_inode=704615 watch="php.ini" filterkey= perm
=0 perm_mask=2
type=SYSCALL msg=audit(1404691594.175:37405): arch=40000003 syscall=271 success=yes exit=0 a0=bff
09b00 a1=bff07b00 a2=7beff4 a3=bff0a1a0 items=1 pid=9830 auid=4294967295 uid=0 gid=0 euid=0 suid=
0 fsuid=0 egid=0 sgid=0 fsgid=0 comm="bpbkar" exe="/usr/openv/netbackup/bin/bpbkar"

The time stamps matches.

# stat /etc/php.ini
File: `/etc/php.ini'
Size: 45809         Blocks: 96         IO Block: 4096   regular file
Device: 6801h/26625d    Inode: 704615      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-07-07 01:06:47.000000000 -0400
Modify: 2012-10-01 13:21:41.000000000 -0400
Change: 2014-07-06 20:06:34.000000000 -0400

From RHEL4 kernel source code we can see that syscall 271 is utimes.

# cat ./include/asm-i386/unistd.h |grep 271
#define __NR_utimes        271

utimes is a legacy syscall that can change a file’s last access and modification times. utimes is later deprecated and replaced with utime from RHEL5.

netbackup process bpbkar is doing a utimes syscall on the file, possibly modifying the mtime to the already existing time resulting in the change.

This example shows us the power of Linux Auditing System. Auditing is a kernel feature which provides interface to daemons like auidtd to capture events related to system and user space processes and log it.

Categories: DBA Blogs

Lock Timeout Error While Creating Database

Mon, 2014-07-14 08:07

Recently I worked on a issue where a third-party application was failing during the installation. Below was the error returned by the Application.

Lock request time out period exceeded. (Microsoft SQL Server, Error: 1222)

The application was failing while it was trying to create a database. The application seemed to have a default timeout setting which was about 60 seconds after which it was failing, as the command had not yet returned any results.

I tried creating a test database directly from the SQL Server Management Studio and noticed that it was taking long time as well. Once I checked the sys.sysprocesses, I found that the created database was having IO related waits.

Some of the reasons why it might be taking more time while creating the database are

  • IO bottleneck on the drive where we are creating the database files
  • Large size of Model database
  • Instant File Initialization is not enabled

I verified the size of the model database files and found that the model database data file is 5 GB and log file is 1 GB. I have reduced the size of the model database to 1 GB and log file size to 512 MB and then I was able to create the Test database quickly.

Now we started the installation of the Application and it completed successfully as well.

Categories: DBA Blogs

Script to Get the Create Date of an Object from Multiple Databases

Mon, 2014-07-14 08:04

As a DBA, it is common to get a request to run scripts against production databases. However, there can be environments where there are multiple databases on same instance, where the script needs to be run against. I have seen a environment where there were 50+ databases which have same schema with different data, and each database serving different customers.

When we get a request to run large scripts against many databases, at times with user over sight, it may be possible to miss running the script against one or more databases. The requirement comes to verify if the script was executed against all the databases. One way to verify if the script is executed against all databases is to pick an object (stored procedure, table, view, function) which was created as part of script execution, get the create date of that object and verify if it is showing the date and time when we ran the script. The challenge is to get the create date of a specific object from all databases at a time with little work.

Below is the code which will help in fetching the create date of the specified object (Stored Procedure, Table, View, Function) from all user databases on a instance. Pass the object name @ObjName in the 7th line of the code. Run the code and verify the create date from the output and make sure that the script was executed and created the object during the time the script was run.

-- Script Originally Written By: Keerthi Deep | http://www.SQLServerF1.com/

Set NOCOUNT ON
Declare @ObjName nvarchar(300)
declare @dbn nvarchar(200)

Set @ObjName = 'Object_Name' -- Specify the name of the Stored Procedure/ Table/View/Function

create table #DatabaseList(dbname nvarchar(2000)) 

Insert into #DatabaseList select name from sys.sysdatabases
where name not in ('master', 'msdb', 'model','tempdb')
order by name asc 

--select * from #DatabaseList
Create table #Output_table (DB nvarchar(200), crdate datetime, ObjectName nvarchar(200))
declare c1 cursor for select dbname from #DatabaseList open c1
Fetch next from c1 into @dbn
WHILE @@FETCH_STATUS = 0
BEGIN

declare @Query nvarchar(2048)
Set @Query = 'select ''' + @dbn + ''' as DBName, crdate, [name] from ' + @dbn + '.sys.sysobjects where name = ''' + @ObjName + ''''
--print @Query
Insert into #Output_table Exec sp_executesql @Query

FETCH NEXT FROM c1 into @dbn
END
CLOSE c1
DEALLOCATE c1

select * from #Output_table
Drop table #Output_table
Drop table #DatabaseList

Limitations:
This will work only if the object is created using create command, but will not work if Alter command is used.

Any suggestions are welcome.

Categories: DBA Blogs

A Ringleader Proxy for Sporadically-Used Web Applications

Fri, 2014-07-11 08:46

As you might already know, I come up with my fair share of toy web applications.

Once created, I typically throw them on my server for a few weeks but, as the resources of good ol’ Gilgamesh are limited, they eventually have to be turned off to make room for the next wave of shiny new toys. Which is a darn shame, as some of them can be useful from time to time. Sure, running all webapps all the time would be murder for the machine, but there should be a way to only fire up the application when it’s needed.

Of course there’s already a way of doing just that. You might have heard of it: it’s called CGI. And while it’s perfectly possible to run PSGI applications under CGI, it’s also… not quite perfect. The principal problem is that since there is no persistence at all between requests (of course, with the help of mod_perl there could be persistence, but that would defeat the purpose), so it’s not exactly snappy. Although, to be fair, it’d probably be still fast enough for most small applications. But still, it feels clunky. Plus, I’m just plain afraid that if I revert to using CGI, Sawyer will burst out of the wall like a vengeful Kool-Aid Man and throttle the life out of me. He probably wouldn’t, but I prefer not to take any chances.

So I don’t want single executions and I don’t want perpetual running. What I’d really want is something in-between. I’d like the applications to be disabled by default, but if a request comes along, to be awaken and ran for as long as there is traffic. And only once the traffic has abated for a reasonable amount of time do I want the application to be turned off once more.

The good news is that it seems that Apache’s mod_fastcgi can fire dynamic applications upon first request. If that’s the case, then the waking-up part of the job comes for free, and the shutting down is merely a question of periodically monitoring the logs and killing processes when inactivity is detected.

The bad news is that I only heard that after I was already halfway done shaving that yak my own way. So instead of cruelly dropping the poor creature right there and then, abandoning it with a punk-like half-shave, I decided to go all the way and see how a Perl alternative would look.

It’s all about the proxy

My first instinct was to go with Dancer (natch). But a quick survey of the tools available revealed something even more finely tuned to the task at hand: HTTP::Proxy. That module does exactly what it says on the tin: it proxies http requests, and allows you to fiddle with the requests and responses as they fly back and forth.

Since I own my domain, all my applications run on their own sub-domain name. With that setting, it’s quite easy to have all my sub-domains point to the port running that proxy and have the waking-up-if-required and dispatch to the real application done as the request comes in.


use HTTP::Proxy;
use HTTP::Proxy::HeaderFilter::simple;

my $proxy = HTTP::Proxy->new( port => 3000 );

my $wait_time = 5;
my $shutdown_delay = 10;

my %services = (
    'foo.babyl.ca' => $foo_config,
    'bar.babyl.ca' => $bar_config,

);

$proxy->push_filter( request => 
    HTTP::Proxy::HeaderFilter::simple->new( sub {

            my( $self, $headers, $request ) = @_;

            my $uri = $request->uri;
            my $host = $uri->host;

            my $service = $services{ $host } or die;

            $uri->host( 'localhost' );
            $uri->port( $service->port );

            unless ( $service->is_running ) {
                $service->start;
                sleep 1;
            }

            # store the latest access time
            $service->store_access_time(time);
    }),
);

$proxy->start;

With this, we already have the core of our application, and only need a few more pieces, and details to iron out.

Enter Sandman

An important one is how to detect if an application is running, and when it goes inactive. For that I went for a simple mechanism. Using CHI to provides me with a persistent and central place to keep information for my application. As soon as an application comes up, I store the time of the current request in its cache, and each time a new request comes in, I update the cache with the new time. That way, the existence of the cache tells me if the application is running, and knowing if the application should go dormant is just a question of seeing if the last access time is old enough.


use CHI;

# not a good cache driver for the real system
# but for testing it'll do
my $chi = CHI->new(
    driver => 'File',
    root_dir => 'cache',
);

...;

# when checking if the host is running
unless ( $chi->get($host) ) {
    $service->start;
    sleep 1;
}

...;

# and storing the access time becomes
$chi->set( $host => time );

# to check periodically, we fork a sub-process 
# and we simply endlessly sleep, check, then sleep
# some more

sub start_sandman {
    return if fork;

    while( sleep $shutdown_delay ) {
        check_activity_for( $_ ) for keys %services;
    }
}

sub check_activity_for {
    my $s = shift;

    my $time = $chi->get($s);

    # no cache? assume not running
    return if !$time or time - $time <= $shutdown_delay;

    $services{$s}->stop;

    $chi->remove($s);
}

Minding the applications

The final remaining big piece of the puzzle is how to manage the launching and shutting down of the applications. We could do it in a variety of ways, beginning by using plain system calls. Instead, I decided to leverage the service manager Ubic. With the help of Ubic::Service::Plack, setting a PSGI application is as straightforward as one could wish for:


use Ubic::Service::Plack;

Ubic::Service::Plack->new({
    server => "FCGI",
    server_args => { listen => "/tmp/foo_app.sock",
                     nproc  => 5 },
    app      => "/home/web/apps/foo/bin/app.pl",
    port     => 4444,
});

Once the service is defined, it can be started/stopped from the CLI. And, which is more interesting for us, straight from Perl-land:


use Ubic;

my %services = (
    # sub-domain      # ubic service name
    'foo.babyl.ca' => 'webapp.foo',
    'bar.babyl.ca' => 'webapp.bar',
);

$_ = Ubic->service($_) for values %services;

# and then to start a service
$services{'foo.babyl.ca'}->start;

# or to stop it
$services{'foo.babyl.ca'}->stop;

# other goodies can be gleaned too, like the port...
$services{'foo.babyl.ca'}->port;

Now all together

And that’s all we need to get our ringleader going. Putting it all together, and tidying it up a little bit, we get:


use 5.20.0;

use experimental 'postderef';

use HTTP::Proxy;
use HTTP::Proxy::HeaderFilter::simple;

use Ubic;

use CHI;

my $proxy = HTTP::Proxy->new( port => 3000 );

my $wait_time      = 5;
my $shutdown_delay = 10;

my $ubic_directory = '/Users/champoux/ubic';

my %services = (
    'foo.babyl.ca' => 'webapp.foo',
);

$_ = Ubic->service($_) for values %services;

# not a good cache driver for the real system
# but for testing it'll do
my $chi = CHI->new(
    driver => 'File',
    root_dir => 'cache',
);


$proxy->push_filter( request => HTTP::Proxy::HeaderFilter::simple->new(sub{
            my( $self, $headers, $request ) = @_;
            my $uri = $request->uri;
            my $host = $uri->host;

            my $service = $services{ $host } or die;

            $uri->host( 'localhost' );
            $uri->port( $service->port );

            unless ( $chi->get($host) ) {
                $service->start;
                sleep 1;
            }

            # always store the latest access time
            $chi->set( $host => time );
    }),
);

start_sandman();

$proxy->start;

sub start_sandman {
    return if fork;

    while( sleep $shutdown_delay ) {
        check_activity_for( $_ ) for keys %services;
    }
}

sub check_activity_for {
    my $service = shift;

    my $time = $chi->get($service);

    # no cache? assume not running
    return if !$time or time - $time <= $shutdown_delay;

    $services{$service}->stop;

    $chi->remove($service);
}

It’s not yet completed. The configuration should go in a YAML file, we should have some more safeguards in case the cache and the real state of the application aren’t in sync, and the script itself should be started by Unic too to make everything Circle-of-Life-perfect. Buuuuut as it is, I’d say it’s already a decent start.

Categories: DBA Blogs

Log Buffer #379, A Carnival of the Vanities for DBAs

Fri, 2014-07-11 07:34

During this summer time in Northern hemisphere, and winter time in Southern hemisphere, the bloggers are solving key problems either by sitting besides the bonfire, or enjoying that bbq. This Log Buffer Edition shares both of these with them.


Oracle:

3 Key Problems To Solve If You Want A Big Data Management System

OpenWorld Update: Content Catalog NOW LIVE!

Interested in Showcasing your Solutions around Oracle Technologies at Oracle OpenWorld?

GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 4: Start Journalizing!

What You Need to Know about OBIEE 11.1.1.7

SQL Server:

Interoperability between Microsoft and SOA Suite 12c

This article describes a way to speed up various file operations performed by SQL Server.

The Mindset of the Enterprise DBA: Creating and Applying Standards to Our Work

Stairway to T-SQL: Beyond The Basics Level 8: Coding Shortcuts using += and -= Operators

Microsoft Azure Diagnostics Part 2: Basic Configuration of Azure Cloud Service Diagnostics

MySQL:

MySQL Enterprise Monitor 2.3.18 has been released

Harnessing the power of master/slave clusters to operate data-driven businesses on MySQL

NoSQL Now! Conference – coming to San Jose, CA this August!

Manually Switch Slaves to new Masters in mySQL 5.6 (XTRADB 5.6)

How to Configure ClusterControl to run on nginx

Categories: DBA Blogs

Comparing CPU Throughput of Azure and AWS EC2

Thu, 2014-07-10 08:11

After observing CPU core sharing with Amazon Web Services EC2, I thought it would be interesting to see if Microsoft Azure platform exhibits the same behavior.

Signing up for Azure’s 30-day trial gives $200 in credit to use over the next 30-day period: more than enough for this kind of testing. Creating a new virtual machine, using the “quick create” option with Oracle Linux, and choosing a 4-core “A3″ standard instance.

I must say I like the machine naming into built-in “clouadpp.net” DNS that Azure uses: no mucking around with IP addresses. The VM provisioning definitely takes longer than AWS, though no more than a few minutes. And speaking of IP addresses, both start with 191.236. addresses assigned to Microsoft’s Brazilian subsidiary through the Latin American LACNIC registry, due to the lack of north american IP addresses.

Checking out the CPU specs as reported to the OS:

[azureuser@marc-cpu ~]$ egrep '(processor|model name|cpu MHz|physical id|siblings|core id|cpu cores)' /proc/cpuinfo
processor       : 0
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.990
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
processor       : 1
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.990
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
processor       : 2
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.990
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
processor       : 3
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.990
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

2.2GHz rather than 2.6GHz, but otherwise the same family and architecture as the E5-2670 under AWS. Identified as a single-socket, 4-core processor, without hyperthreads at all.

Running the tests
[azureuser@marc-cpu ~]$ taskset -pc 0 $$
pid 1588's current affinity list: 0-3
pid 1588's new affinity list: 0
[azureuser@marc-cpu ~]$ dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null
2170552320 bytes (2.2 GB) copied, 36.9319 s, 58.8 MB/s
[azureuser@marc-cpu ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
2170552320 bytes (2.2 GB) copied, 72.8379 s, 29.8 MB/s
2170552320 bytes (2.2 GB) copied, 73.6173 s, 29.5 MB/s

Pretty low; that’s half the throughput we saw on AWS, albeit with a slower clock speed here.

[azureuser@marc-cpu ~]$ taskset -pc 0,1 $$
pid 1588's current affinity list: 0
pid 1588's new affinity list: 0,1
[azureuser@marc-cpu ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
[azureuser@marc-cpu ~]$ 2170552320 bytes (2.2 GB) copied, 36.4285 s, 59.6 MB/s
2170552320 bytes (2.2 GB) copied, 36.7957 s, 59.0 MB/s

[azureuser@marc-cpu ~]$ taskset -pc 0,2 $$
pid 1588's current affinity list: 0,1
pid 1588's new affinity list: 0,2
[azureuser@marc-cpu ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
[azureuser@marc-cpu ~]$ 2170552320 bytes (2.2 GB) copied, 36.3998 s, 59.6 MB/s
2170552320 bytes (2.2 GB) copied, 36.776 s, 59.0 MB/s

Pretty consistent results, so no core sharing, but running considerably slower than we saw with AWS.

Kicking off 20 runs in a rows:

[azureuser@marc-cpu ~]$ taskset -pc 0-3 $$
pid 1588's current affinity list: 0,2
pid 1588's new affinity list: 0-3
[azureuser@marc-cpu ~]$ for run in {1..20}; do
>  for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2>> output | gzip -c > /dev/null & done
> wait
> done
...
[azureuser@marc-cpu ~]$ cat output | awk '/copied/ {print $8}' | sort | uniq -c
      1 59.1
      4 59.2
      1 59.3
      2 59.4
      2 59.5
      8 59.6
     12 59.7
      7 59.8
      3 59.9

We get very consistent results, between 59.1 and 59.9 mB/sec

Results from “top” while running:

cat > ~/.toprc <<-EOF
RCfile for "top with windows"           # shameless braggin'
Id:a, Mode_altscr=0, Mode_irixps=1, Delay_time=3.000, Curwin=0
Def     fieldscur=AEHIOQTWKNMbcdfgjplrsuvyzX
        winflags=25913, sortindx=10, maxtasks=2
        summclr=1, msgsclr=1, headclr=3, taskclr=1
Job     fieldscur=ABcefgjlrstuvyzMKNHIWOPQDX
        winflags=62777, sortindx=0, maxtasks=0
        summclr=6, msgsclr=6, headclr=7, taskclr=6
Mem     fieldscur=ANOPQRSTUVbcdefgjlmyzWHIKX
        winflags=62777, sortindx=13, maxtasks=0
        summclr=5, msgsclr=5, headclr=4, taskclr=5
Usr     fieldscur=ABDECGfhijlopqrstuvyzMKNWX
        winflags=62777, sortindx=4, maxtasks=0
        summclr=3, msgsclr=3, headclr=2, taskclr=3
EOF
[azureuser@marc-cpu ~]$  top -b -n20 -U azureuser
...
top - 14:38:41 up 2 min,  2 users,  load average: 2.27, 0.78, 0.28
Tasks: 205 total,   3 running, 202 sleeping,   0 stopped,   0 zombie
Cpu0  : 95.4%us,  4.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 94.4%us,  5.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu3  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1606 azureuse  20   0  4292  800  400 R 97.0  0.0   0:03.49 gzip
 1604 azureuse  20   0  4292  796  400 R 96.7  0.0   0:03.50 gzip

top - 14:38:44 up 2 min,  2 users,  load average: 2.25, 0.80, 0.29
Tasks: 205 total,   3 running, 202 sleeping,   0 stopped,   0 zombie
Cpu0  : 94.4%us,  5.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 72.3%us,  3.9%sy,  0.0%ni, 23.4%id,  0.0%wa,  0.0%hi,  0.4%si,  0.0%st
Cpu3  : 12.0%us,  0.7%sy,  0.0%ni, 85.6%id,  1.4%wa,  0.0%hi,  0.4%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1604 azureuse  20   0  4292  796  400 R 96.8  0.0   0:06.42 gzip
 1606 azureuse  20   0  4292  800  400 R 96.4  0.0   0:06.40 gzip

top - 14:38:47 up 2 min,  2 users,  load average: 2.25, 0.80, 0.29
Tasks: 205 total,   3 running, 202 sleeping,   0 stopped,   0 zombie
Cpu0  : 94.9%us,  5.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  9.7%us,  0.3%sy,  0.0%ni, 89.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  : 51.8%us,  2.8%sy,  0.0%ni, 45.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 17.9%us,  1.4%sy,  0.0%ni, 80.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1604 azureuse  20   0  4292  796  400 R 96.5  0.0   0:09.34 gzip
 1606 azureuse  20   0  4292  800  400 R 95.5  0.0   0:09.29 gzip

It’s using full CPUs and all from gzip, so no large system overhead here. Also, “%st”, time reported “stolen” by the hypervisor, is zero. We’re simply getting half the throughput of AWS.

Basic instances

In addition to standard instances, Microsoft makes available basic instances, which claim to offer “similar machine configurations as the Standard tier of instances offered today (Extra Small [A0] to Extra Large [A4]). These instances will cost up to 27% less than the corresponding instances in use today (which will now be called “Standard”) and do not include load balancing or auto-scaling, which are included in Standard” (http://azure.microsoft.com/blog/2014/03/31/microsoft-azure-innovation-quality-and-price/)

Having a look at throughput here, by creating a basic A3 instance “marc-cpu-basic” that otherwise matches exactly marc-cpu created earlier.

[azureuser@marc-cpu-basic ~]$ egrep '(processor|model name|cpu MHz|physical id|siblings|core id|cpu cores)' /proc/cpuinfo
processor       : 0
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.993
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
processor       : 1
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.993
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
processor       : 2
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.993
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
processor       : 3
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.993
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

CPU specs are identical to marc-cpu. Running the same tests:

[azureuser@marc-cpu-basic ~]$ taskset -pc 0 $$
pid 1566's current affinity list: 0-3
pid 1566's new affinity list: 0
[azureuser@marc-cpu-basic ~]$  dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null
2170552320 bytes (2.2 GB) copied, 54.6678 s, 39.7 MB/s
for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
[azureuser@marc-cpu-basic ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
2170552320 bytes (2.2 GB) copied, 107.73 s, 20.1 MB/s
2170552320 bytes (2.2 GB) copied, 107.846 s, 20.1 MB/s

Now that’s very slow: even with the identical stated CPU specs as marc-cpu, marc-cpu-basic comes in with 33% less throughput.

Doing 20 runs in a rows:

[azureuser@marc-cpu-basic ~]$ taskset -pc 0-3 $$
pid 1566's current affinity list: 0
pid 1566's new affinity list: 0-3
[azureuser@marc-cpu-basic ~]$ for run in {1..20}; do
> for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2>> output | gzip -c > /dev/null & done
> wait
> done
...
[azureuser@marc-cpu-basic ~]$ cat output | awk '/copied/ {print $8}' | sort | uniq -c
      4 40.4
     15 40.5
     14 40.6
      7 40.7

Very consistent results, but consistently slow. They do show that cores aren’t being shared, but throughput is lower than even a shared core under AWS.

Wrapping up Comparison chart

Under this simple gzip test, we are testing CPU integer performance. The Azure standard instance got half the throughput of the equivalent AWS instance, in spite of a clock speed only 15% slower. But the throughput was consistent: no drops when running on adjacent cores. The basic instance was a further 33% slower than a standard instance, in spite of having the same CPU configuration.

Under Azure, we simply aren’t getting a full physical core’s worth of throughput. Perhaps the hypervisor is capping throughput, and capping even lower for basic instances? Or maybe the actual CPU is different than the E5-2660 reported? For integer CPU-bound workloads like our gzip test, we would need to purchase at least twice as much capacity under Azure than AWS, making Azure considerably more expensive as a platform.

Categories: DBA Blogs

Making it Easier to Graph Your Infrastructure’s Performance Data

Tue, 2014-07-08 07:46

Today I would like to share a story with you about the development of a Puppet module for Grafana and its release on the Puppet Forge. But first, I’d like to provide some context so you can understand where Grafana fits in and why I feel this is important.

Those of you that know me have likely heard me talk about the importance of data-driven decision making, and more specifically some of the tools that can be used to help enable individuals to make smart decisions about their IT infrastructure. A common approach is to deploy a graphing system such as Graphite, which stores performance data about your infrastructure to aide you in performing a number of functions including problem diagnosis, performance trending, capacity planning, and data analytics.

If you are unfamiliar with the software, I’ll briefly describe its architecture. Graphite consists of a daemon, called carbon, which listens for time series data and writes it to a fixed-size database called whisper. It also provides a web application to expose the data and allow the user to create and display graphs on demand using a powerful API.

While Graphite does a good job of storing time series data and providing a rich API for visualizing it, one of the things it does not really focus on is providing a dashboard for the data. Thankfully we have Grafana to fill this role and it happens to do it quite well.

If you have ever worked with the ELK stack (Elasticsearch, Logstash, and Kibana) before, Grafana’s interface should be familiar to you, as it is based on Kibana. It is a frontend for Graphite or InfluxDB, and runs as a client side application in your browser. Its only (optional) external dependency is Elasticsearch, as it can use it to store, load and search for dashboards.

Below are some of Grafana’s most notable features (see its feature highlights for a more comprehensive list):

  • Dashboard search
  • Templated dashboards
  • Save / load from Elasticsearch and / or JSON file
  • Quickly add functions (search, typeahead)
  • Direct link to Graphite function documentation
  • Graph annotation
  • Multiple Graphite or InfluxDB data sources
  • Ability to switch between data sources
  • Show graphs from different data sources on the same dashboard

We like to make use of IT automation software whenever possible to deploy tools for our clients. Most tools already have Puppet modules or Chef cookbooks available for them, including the other components of the graphing system: Graphite itself, and a great Python-based collector named Diamond. Grafana, however, had no Puppet module available so I decided to rectify the situation by creating one and publishing it to the Puppet Forge.

The module would be pretty simple: all that is required is to download and extract Grafana into an installation directory, and ensure appropriate values for the Elasticsearch, Graphite and InfluxDB servers / data sources are inserted into its configuration.

I decided to offload the work of downloading and extracting the software to another module, namely gini/archive. And managing the configuration file, config.js, would be done with a combination of module parameters and ERB template.

The only real complication arose when it came time to test serving Grafana with a web server such as Apache or Nginx. I decided not to have my module manage the web server in any way, so I would leverage Puppet Labs’ own Apache module for this purpose.

My test environment consisted of a CentOS virtual machine provisioned by Vagrant and Puppet, with Graphite and Grafana on the same server. I decided to use Daniel Werdermann’s module to deploy Graphite on my virtual machine as it had worked well for me in the past.

I quickly ran into problems with duplicate resources, however, due to the Graphite module managing Apache for creation of its virtual host etc. I moved to separate virtual machines for Graphite and Grafana, and that made my life easier. If you do decide to run both pieces of software on the same server, and are also using Daniel’s module, you can work around the problem by setting gr_web_server to ‘none’ like this:

class { 'graphite':
  gr_web_server			=> 'none',
  gr_web_cors_allow_from_all	=> true,
}

Since my module does not manage Apache (or Nginx), it is necessary to add something like the following to your node’s manifest to create a virtual host for Grafana:

# Grafana is to be served by Apache
class { 'apache':
  default_vhost   => false,
}

# Create Apache virtual host
apache::vhost { 'grafana.example.com':
  servername      => 'grafana.example.com',
  port            => 80,
  docroot         => '/opt/grafana',
  error_log_file  => 'grafana-error.log',
  access_log_file => 'grafana-access.log',
  directories     => [
    {
      path            => '/opt/grafana',
      options         => [ 'None' ],
      allow           => 'from All',
      allow_override  => [ 'None' ],
      order           => 'Allow,Deny',
    }
  ]
}

And the Grafana declaration itself:

class { 'grafana':
  elasticsearch_host  => 'elasticsearch.example.com',
  graphite_host       => 'graphite.example.com',
}

Now that my module was working, it was time to publish it to the Puppet Forge. I converted my Modulefile to metadata.json, added a .travis.yml file to my repository and enabled integration with Travis CI, built the module and uploaded it to the Forge.

Since its initial release, I have updated the module to deploy Grafana version 1.6.1 by default, including updating the content of the config.js ERB template, and have added support for InfluxDB. I am pretty happy with the module and hope that you find it useful.

I do have plans to add more capabilities to the module, including support of more of Grafana’s configuration file settings, having the module manage the web server’s configuration similar to how Daniel’s module does it, and adding a stronger test suite so I can ensure compatibility with more operating systems and Ruby / Puppet combinations.

I welcome any questions, suggestions, bug reports and / or pull requests you may have. Thanks for your time and interest!

Project page: https://github.com/bfraser/puppet-grafana
Puppet Forge URL: https://forge.puppetlabs.com/bfraser/grafana

Categories: DBA Blogs

Pro-active AWR Data Mining to Find Change in SQL Execution Plan

Mon, 2014-07-07 11:11

Many times we have been called for the poor performance of a database and it has been narrowed down to a  SQL statement. Subsequent analysis have shown that the execution plan has been changed and a wrong execution plan was being used.

Resolution normally, is to fix the execution plan in 11g by running

variable x number
begin
:x :=
    dbms_spm.load_plans_from_cursor_cache(
    sql_id=>'&sql_id',
    plan_hash_value=>&plan_hash,
    fixed=>'YES');
end;
/

or for 10g, SQL_PROFILE is created as mentioned in Carlos Sierra’s blog .

A pro-active approach can be to mine AWR data for any SQL execution plan changes.

Following query from dba_hist_sqlstat can retrieve the list of SQL IDs whose plans have changed. It orders the SQL IDs,so that those SQL IDs for which maximum gains can be achieved by fixing plan, are listed first.

 
spool sql_with_more_than_1plan.txt
set lines 220 pages 9999 trimspool on
set numformat 999,999,999
column plan_hash_value format 99999999999999
column min_snap format 999999
column max_snap format 999999
column min_avg_ela format 999,999,999,999,999
column avg_ela format 999,999,999,999,999
column ela_gain format 999,999,999,999,999
select sql_id,
       min(min_snap_id) min_snap,
       max(max_snap_id) max_snap,
       max(decode(rw_num,1,plan_hash_value)) plan_hash_value,
       max(decode(rw_num,1,avg_ela)) min_avg_ela,
       avg(avg_ela) avg_ela,
       avg(avg_ela) - max(decode(rw_num,1,avg_ela)) ela_gain,
       -- max(decode(rw_num,1,avg_buffer_gets)) min_avg_buf_gets,
       -- avg(avg_buffer_gets) avg_buf_gets,
       max(decode(rw_num,1,sum_exec))-1 min_exec,
       avg(sum_exec)-1 avg_exec
from (
  select sql_id, plan_hash_value, avg_buffer_gets, avg_ela, sum_exec,
         row_number() over (partition by sql_id order by avg_ela) rw_num , min_snap_id, max_snap_id
  from
  (
    select sql_id, plan_hash_value , sum(BUFFER_GETS_DELTA)/(sum(executions_delta)+1) avg_buffer_gets,
    sum(elapsed_time_delta)/(sum(executions_delta)+1) avg_ela, sum(executions_delta)+1 sum_exec,
    min(snap_id) min_snap_id, max(snap_id) max_snap_id
    from dba_hist_sqlstat a
    where exists  (
       select sql_id from dba_hist_sqlstat b where a.sql_id = b.sql_id
         and  a.plan_hash_value != b.plan_hash_value
         and  b.plan_hash_value > 0)
    and plan_hash_value > 0
    group by sql_id, plan_hash_value
    order by sql_id, avg_ela
  )
  order by sql_id, avg_ela
  )
group by sql_id
having max(decode(rw_num,1,sum_exec)) > 1
order by 7 desc
/
spool off
clear columns
set numformat 9999999999

The sample output for this query will look like

SQL_ID        MIN_SNAP MAX_SNAP PLAN_HASH_VALUE          MIN_AVG_ELA              AVG_ELA             ELA_GAIN     MIN_EXEC     AVG_EXEC
------------- -------- -------- --------------- -------------------- -------------------- -------------------- ------------ ------------
ba42qdzhu5jb0    65017    67129      2819751536       11,055,899,019       90,136,403,552       79,080,504,532           12            4
2zm7y3tvqygx5    65024    67132       362220407       14,438,575,143       34,350,482,006       19,911,906,864            1            3
74j7px7k16p6q    65029    67134      1695658241       24,049,644,247       30,035,372,306        5,985,728,059           14            7
dz243qq1wft49    65030    67134      3498253836        1,703,657,774        7,249,309,870        5,545,652,097            1            2

MIN_SNAP and MAX_SNAP are the minimum/maximum snap id where the SQL statement occurs

PLAN_HASH_VALUE is the hash_value of the plan with the best elapsed time

ELA_GAIN is the estimated improvement in elapsed time by using this plan compared to the average execution time.

Using the output of the above query, sql execution plans can be fixed, after proper testing.  This method can help DBAs pin-point and resolve problems with SQL execution plans, faster.

Categories: DBA Blogs

Salt Stack for Remote Parallel Execution of Commands

Mon, 2014-07-07 11:08

There are many scenarios when a SysAdmin has to do a “box walk” of the entire infrastructure to execute a command across many servers. This is universally accepted as one of the less glamorous parts of our job. The larger the infrastructure, the longer these box walks take, and the greater chance that human error will occur.

Even giving this task to a junior resource, as is often the case, is not sustainable as the infrastructure grows, and does not represent the best value to the business in terms of resource utilization. Additionally, too much of this type of “grind” work can demoralize even the most enthusiastic team member.

Thankfully the days of having to do these box walks are over. Thanks to configuration management and infrastructure automation tools, the task has been automated and no longer requires the investment in time by a human SysAdmin that it once did. These tools allow you, at a very high level, to off load this repetitive work to the computer, with the computer doing the heavy lifting for you.

 

Introducing Salt Stack

Salt Stack is a distributed remote execution system used to execute commands and query data on remote nodes, either individually or by arbitrary selection criteria. Salt Stack is also a configuration management system in it’s own right but this post will be focusing on Salt from a “Command and Control” point of view.

Salt has 2 main components, the “salt master” (server) and the “salt minions” (clients). Once the minions are accepted by the master, then further execution of commands can come directly from the central salt master server.

Once you have installed your packages the minion needs to be configured to know where its master is. This can be accomplished through a DNS or hosts-file entry or by setting the variable in the /etc/salt/minion config.


master: XXX.XXX.XXX.XXX

Where “XXX.XXX.XXX.XXX” is the IP Address of your master server. Once that is done, and the salt-minion service has been started the minion will generate and ship an SSL key back to the master to ensure all communication is secure.

The master must accept the key from the minion before any control can begin.


# Listing the Keys

[root@ip-10-154-193-216 ~]# salt-key -L
Accepted Keys:
Unaccepted Keys:
ip-10-136-76-163.ec2.internal
Rejected Keys:

# Adding The Key

[root@ip-10-154-193-216 ~]# salt-key -A
The following keys are going to be accepted:
Unaccepted Keys:
ip-10-136-76-163.ec2.internal
Proceed? [n/Y] y
Key for minion ip-10-136-76-163.ec2.internal accepted.

# Nailed It! Now the Master can control the Minion!

[root@ip-10-154-193-216 ~]# salt-key -L
Accepted Keys:
ip-10-136-76-163.ec2.internal
Unaccepted Keys:
Rejected Keys:

Note: Not Shown – I added a 2nd Minion

Now that your master has minions the fun begins. From your master you can now query information from your minions such as disk space:


[root@ip-10-154-193-216 ~]# salt '*' disk.percent

ip-10-136-76-163.ec2.internal:
----------
/:
15%
/dev/shm:
0%
ip-10-147-240-208.ec2.internal:
----------
/:
14%
/dev/shm:
0%

And you can also execute remote commands such as finding out service status, and restarting services.


[root@ip-10-154-193-216 ~]# salt '*' cmd.run "service crond status"

ip-10-136-76-163.ec2.internal:
crond (pid 1440) is running...
ip-10-147-240-208.ec2.internal:
crond (pid 1198) is running...

[root@ip-10-154-193-216 ~]# salt '*' cmd.run "service crond restart"
ip-10-136-76-163.ec2.internal:
Stopping crond: [ OK ]
Starting crond: [ OK ]
ip-10-147-240-208.ec2.internal:
Stopping crond: [ OK ]
Starting crond: [ OK ]

These are only the most basic use cases for what Salt Stack can do, but even from these examples it is clear that salt can become a powerful tool which can reduce the potential for human error and increase the efficiency of your SysAdmin Team.

By Implementing Configuration Management and Infrastructure Automation tools such as Salt Stack you can free up the time of your team members to work on higher quality work which delivers more business value.

Salt Stack (depending on your setup) can be deployed in minutes. On RHEL/CentOS/Amazon Linux using the EPEL repo I was able to be up and running with Salt in about 5 minute on the 3 nodes I used for the examples in this post. Salt can be deployed using another configuration management tool, it can be baked into your provisioning environment, or into base images. If all else fails, (ironically) you can do a box walk to install the package on your existing servers.

Even if you have another configuration management solution deployed, depending on what you are trying to accomplish using Salt for parallel command execution rather then the Config Management system can often prove a much simpler and lightweight solution.

Salt is also a great choice in tools for giving other teams access to execute commands on a subset of boxes without requiring them to have shell access to all of the servers. This allows those teams to get their job done without the SysAdmin team becoming a bottle neck.

Categories: DBA Blogs

Log Buffer #378, A Carnival of the Vanities for DBAs

Fri, 2014-07-04 08:43

New technologies, new ideas, and new tips are forthcoming in abundance in numerous blog posts across Oracle, SQL Server, and MySQL. This Log Buffer Edition covers many of the salient ones.

Oracle:

Wither you use a single OEM and migrating to a new OEM or have multiple OEMs, the need to move templates between environments will arise.

Oracle Coherence is the industry’s leading in-memory data grid solution that enables applications to predictably scale by providing fast, reliable and scalable access to frequently used data.

Needless to say, some ATG applications are more complex than others.  Some ATG applications support a single site, single language, single catalog, single currency, have a single development staff, single business team, and a relatively simple business model.

The purpose of this article is to describe some of the important foundational concepts of ATG.

You can use Ops Center to perform some very complex tasks. For instance, you might use it to provision several operating systems across your environment, with multiple configurations for each OS.

SQL Server:

SSRS In a Flash – Level 1 in the Stairway to Reporting Services.

The “Numbers” or “Tally” Table: What it is and how it replaces a loop.

Arshad Ali demonstrates granular level encryption in detail and explains how it differs from Transparent Data Encryption (TDE).

There were many new DMVs added in SQL Server 2012, and some that have changed since SQL Server 2008 R2.

There are some aspects of tables in SQL Server that a lot of people get wrong, purely because they seem so obvious that one feels embarrassed about asking questions.

MySQL:

A much awaited release from the MariaDB project is now stable (GA) – MariaDB Galera Cluster 10.0.12.

Failover with the MySQL Utilities: Part 2 – mysqlfailover.

HowTo: Integrating MySQL for Visual Studio with Connector/Net.

Single database backup and restore with MEB.

Externally Stored Fields in InnoDB.

Categories: DBA Blogs

What’s New with Apps Password Change in R12.2 E-Business Suite ?

Wed, 2014-07-02 08:39

Apps password change routine in Release 12.2 E-Business Suite changed a little bit. We have now extra options to change password, as well as some manual steps after changing the password using FNDCPASS.

There is a new utility introduced called AFPASSWD. This utility unlike FNDCPASS wont require you to enter apps and system user password, and makes it possible to separate duties between database administrator and application administrator. In most cases both these roles are done by same DBA. But in large organizations, there may be different teams that manage Database and Application. You can read about different options available in AFPASSWD in EBS Maintenance guide.

Whether you use FNDCPASS or AFPASSWD to change the APPLSYS/APPS password, you must also perform some additional steps. This is because in R12.2, the old AOL/J connection pooling is replaced with Weblogic Connection Pool ( JDBC Datasource ).  Currently this procedure is not yet automated. It would be good, if this can be automated using some WLS scripting.

  • Shut down the application tier services
  • Change the APPLSYS password, as described for the utility you are using.
  • Start AdminServer using the adadminsrvctl.sh script from your RUN filesystem
  • Do not start any other application tier services.
  • Update the “apps” password in WLS Datasource as follows:
    • Log in to WLS Administration Console.
    • Click Lock & Edit in Change Center.
    • In the Domain Structure tree, expand Services, then select Data Sources.
    • On the “Summary of JDBC Data Sources” page, select EBSDataSource.
    • On the “Settings for EBSDataSource” page, select the Connection Pool tab.
    • Enter the new password in the “Password” field.
    • Enter the new password in the “Confirm Password” field.
    • Click Save.
    • Click Activate Changes in Change Center.
  • Start all the application tier services using the adstrtal.sh script.

I will be posting more of these What’s new with R12.2 articles in future. Post your experiences changing passwords in Oracle EBS in the comments section. I will happy to hear your stories and give my inputs

Categories: DBA Blogs

Essential Hadoop Concepts for Systems Administrators

Wed, 2014-07-02 08:38

Of course, everyone knows Hadoop as the solution to Big Data. What’s the problem with Big Data? Well, mostly it’s just that Big Data is too big to access and process in a timely fashion on a conventional enterprise system. Even a really large, optimally tuned, enterprise-class database system has conventional limits in terms of its maximum I/O, and there is a scale of data that outstrips this model and requires parallelism at a system level to make it accessible. While Hadoop is associated in many ways with advanced transaction processing pipelines, analytics and data sciences, these applications are sitting on top of a much simpler paradigm… that being that we can spread our data across a cluster and provision I/O and processor in a tunable ratio along with it. The tune-ability is directly related to the hardware specifications of the cluster nodes, since each node has processing, I/O and storage capabilities in a specific ratio. At this level, we don’t need Java software architects and data scientists to take advantage of Hadoop. We’re solving a fundamental infrastructure engineering issue, which is “how can we scale our I/O and processing capability along with our storage capacity”? In other words, how can we access our data?

The Hadoop ecosystem at it’s core is simply a set of RESTfully interacting Java processes communicating over a network. The base system services, such as the data node (HDFS) and task tracker (MapReduce) run on each node in the cluster, register with an associated service master and execute assigned tasks in parallel that would normally be localized on a single system (such as reading some data from disk and piping it to an application or script). The result of this approach is a loosely coupled system that scales in a very linear fashion. In real life, the service masters (famously, NameNode and JobTracker) are a single point of failure and potential performance bottleneck at very large scales, but much has been done to address these shortcomings. In principal, Hadoop uses the MapReduce algorithm to extend parallel execution from a single computer to an unlimited number of networked computers.

MapReduce is conceptually a very simple system. Here’s how it works. Given a large data set (usually serialized), broken into blocks (as for any filesystem) and spread among the HDFS cluster nodes, feed each record in a block to STDIN of a local script, command or application, and collect the records from STDOUT that are emitted. This is the “map” in MapReduce. Next, sort each record by key (usually just the first field in a tab-delimited record emitted by the mapper, but compound keys are easily specified). This is accomplished by fetching records matching each specific key over the network to a specific cluster node, and accounts for the majority of network I/O during a MapReduce job. Finally, process the sorted record sets by feeding the ordered records to STDIN of a second script, command or application, collecting the result from STDOUT and writing them back to HDFS. This is the “reduce” in MapReduce. The reduce phase is optional, and usually takes care of any aggregations such as sums, averages and record counts. We can just as easily pipe our sorted map output straight to HDFS.

Any Linux or Unix systems administrator will immediately recognize that using STDIO to pass data means that we can plug any piece of code into the stream that reads and writes to STDIO… which is pretty much everything! To be clear on this point, Java development experience is not required. We can take advantage of Linux pipelines to operate on very large amounts of data. We can use ‘grep’ as a mapper. We can use the same pipelines and commands that we would use on a single system to filter and process data that we’ve stored across the cluster. For example,

grep -i ${RECORD_FILTER} | cut -f2 | cut -d’=’ -f2 | tr [:upper:][:lower:]

We can use Python, Perl and any other languages with support configured on the task tracker systems in the cluster, as long as our scripts and applications read and write to STDIO. To execute these types of jobs, we use the Hadoop Streaming jar to wrap the script and submit it to the cluster for processing.

What does this mean for us enterprise sysadmins? Let’s look at a simple, high level example. I have centralized my company’s log data by writing it to a conventional enterprise storage system. There’s lots of data and lots of people want access to it, including operations teams, engineering teams, business intelligence and marketing analysts, developers and others. These folks need to search, filter, transform and remodel the data to shake out the information they’re looking for. Conventionally, I can scale up from here by copying my environment and managing two storage systems. Then 4. Then 8. We must share and mount the storage on each system that requires access, organize the data across multiple appliances and manage access controls on multiple systems. There are many business uses for the data, and so we have many people with simultaneous access requirements, and they’re probably using up each appliance’s limited I/O with read requests. In addition, we don’t have the processor available… we’re just serving the data at this point, and business departments are often providing their own processing platforms from a limited budget.

Hadoop solves this problem of scale above the limits of conventional enterprise storage beautifully. It’s a single system to manage that scales in a practical way to extraordinary capacities. But the real value is not the raw storage capacity or advanced algorithms and data analytics available for the platform… it’s about scaling our I/O and processing capabilities to provide accessibility to the data we’re storing, thereby increasing our ability to leverage it for the benefit of our business. The details of how we leverage our data is what we often leave for the data scientists to figure out, but every administrator should know that the basic framework and inherent advantages of Hadoop can be leveraged with the commands and scripting tools that we’re already familiar with.

Categories: DBA Blogs

Differences Between R12.1 and R12.2 Integration with OAM

Wed, 2014-07-02 08:37

With the revamp of technology stack in R12.2 of Oracle E-Business Suite (EBS) , the way we integrate Oracle Access Manager (OAM) has changed. R12.2 now is built on Weblogic techstack, which drastically changed how it integrates with Other Fusion Middleware Products like OAM

Here is a overview of Steps to configure OAM with EBS R12.1

  • Install Oracle HTTP Server ( OHS)  11g
  • Deploy & Configure Webgate on OHS 11g
  • Install Weblogic
  • Deploy & Configure Accessgate on Weblogic
  • Integrate Webgate, Accessgate with EBS and OAM/OID

R12.2 has both OHS and Weblogic built-in. So we no longer have to Install OHS and Weblogic for Webgate and Accessgate. All we have to do is Deploy and Configure Webgate and Accessgate.  Webgate is deployed on top of R12.2 OHS 11g home. Accessgate is deployed as a separate managed server ( oaea_server1 )  on top of R12.2 weblogic.

Here is the pictorial version of the same

R12.1 and 11i EBS integration with OAM/OID

11iand12_Reference_Architecture

 

R12.2 Integration with OAM/OID

12.2_Reference_Architecture_2

Basically R12.2 reduces the number of moving parts in the OAM integration EBS. It saves DBAs lot of time, as it reduces the number of servers to manage.

References:

Integrating Oracle E-Business Suite Release 12.2 with Oracle Access Manager 11gR2 (11.1.2) using Oracle E-Business Suite AccessGate (Doc ID 1576425.1)

Integrating Oracle E-Business Suite Release 12 with Oracle Access Manager 11gR2 (11.1.2) using Oracle E-Business Suite AccessGate (Doc ID 1484024.1)

Images are courtesy of Oracle from note “Overview of Single Sign-On Integration Options for Oracle E-Business Suite (Doc ID 1388152.1)”

 

Categories: DBA Blogs

vCPU sharing in EC2: HVM to the rescue?

Fri, 2014-06-27 07:51

I’ve been doing some testing to clarify what a vCPU in Amazon Web Services actually is. Over the course of the testing, I experienced inconsistent results on a 2-thread test on a 4-vCPU m3.xlarge system, due to the mislabeling of the vCPUs as independent single-core processors by the Linux kernel. This issue manifests itself in a CPU-bound, multithreaded workload where there is idle CPU time.

My test environment used a paravirtualized (PV) kernel, which moves some of the virtualization logic into the Linux kernel, reducing the need for high-overhead hardware emulation. One drawback is that the kernel cannot be modified to, for example, resolve the CPU mislabeling. But there is an alternative: an HVM system relying on virtualization extensions in the CPU hardware and allowing custom kernels or even non-Linux operating systems to run. Historically the drawback has been a performance hit, though I read a very interesting post from Brendan Gregg’s blog, indicating that what’s called HVM in Amazon EC2 is actually a hybrid of PV and HVM, combining aspects of both. A test run by Phoronix on EC2 showed HVM performance on par with PV, and in some cases even better. So it definitely seems worth repeating my earlier tests on.

As before, I fire up an instance, but this time using the latest HVM Amazon Linux image:

$ ec2-describe-images ami-76817c1e -H
Type    ImageID Name    Owner   State   Accessibility   ProductCodes    Architecture    ImageType       KernelId        RamdiskId Platform        RootDeviceType  VirtualizationType      Hypervisor
IMAGE   ami-76817c1e    amazon/amzn-ami-hvm-2014.03.2.x86_64-ebs        amazon  available       public          x86_64  machine                           ebs     hvm     xen
BLOCKDEVICEMAPPING      /dev/xvda               snap-810ffc56   8
$ ec2-run-instances ami-76817c1e -k marc-aws --instance-type m3.xlarge --availability-zone us-east-1d
RESERVATION     r-a4f480da      462281317311    default
INSTANCE        i-c5d5b6ef      ami-76817c1e                    pending marc-aws        0               m3.xlarge       2014-06-23T19:02:18+0000  us-east-1d                              monitoring-disabled                                     ebs                                       hvm     xen             sg-5fc61437     default

Checking in on CPUs:

[ec2-user@ip-10-145-187-117 ~]$ egrep '(processor|model name|cpu MHz|physical id|siblings|core id|cpu cores)' /proc/cpuinfo
processor       : 0
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
cpu MHz         : 2593.949
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
processor       : 1
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
cpu MHz         : 2593.949
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
processor       : 2
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
cpu MHz         : 2593.949
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
processor       : 3
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
cpu MHz         : 2593.949
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

It’s the same 2.6GHz E5-2670 processor, but is reported as a single-socket non-hyperthreaded quad-core processor. Not yet the dual-core hyperthreaded processing we’re getting though.

Time to run a few tests.

[ec2-user@ip-10-145-187-117 ~]$ taskset -pc 0 $$
pid 1768's current affinity list: 0-3
pid 1768's new affinity list: 0
[ec2-user@ip-10-145-187-117 ~]$ dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null
2170552320 bytes (2.2 GB) copied, 18.1955 s, 119 MB/s
[ec2-user@ip-10-145-187-117 ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
2170552320 bytes (2.2 GB) copied, 36.4968 s, 59.5 MB/s
2170552320 bytes (2.2 GB) copied, 36.506 s, 59.5 MB/s

In the same range as with PV, but also 1-2% slower, meaning we’re seeing a small amount of HVM overhead. Let’s try across processors

[ec2-user@ip-10-145-187-117 ~]$ taskset -pc 0,1 $$
pid 1768's current affinity list: 0
pid 1768's new affinity list: 0,1
[ec2-user@ip-10-145-187-117 ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
2170552320 bytes (2.2 GB) copied, 27.8401 s, 78.0 MB/s
2170552320 bytes (2.2 GB) copied, 27.8398 s, 78.0 MB/s
[ec2-user@ip-10-145-187-117 ~]$ taskset -pc 0,2 $$
pid 1768's current affinity list: 0,1
pid 1768's new affinity list: 0,2
[ec2-user@ip-10-145-187-117 ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
2170552320 bytes (2.2 GB) copied, 18.1849 s, 119 MB/s
2170552320 bytes (2.2 GB) copied, 18.2014 s, 119 MB/s

Again, a tiny bit slower than with PV. To test variability, I’ll kick off 20 consecutive runs, and print a histogram of output:

[ec2-user@ip-10-145-187-117 ~]$ taskset -pc 0-3 $$
pid 1768's current affinity list: 0,2
pid 1768's new affinity list: 0-3
[ec2-user@ip-10-145-187-117 ~]$ for run in {1..20}; do
>  for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2>> output | gzip -c > /dev/null & done
>  wait
> done
...
[ec2-user@ip-10-145-187-117 ~]$ cat output | awk '/bytes/ {print $8,$9}' | sort -n | uniq -c
      1 113 MB/s
      3 114 MB/s
      4 115 MB/s
      6 116 MB/s
     10 117 MB/s
     10 118 MB/s
      6 119 MB/s

Running between 113 and 119 MB/s per thread: much less variability than before. In chart form:
aws-cpu-hvm

Looking at “top”:

[ec2-user@ip-10-145-187-117 ~]$ cat > ~/.toprc <<-EOF
> RCfile for "top with windows"           # shameless braggin'
> Id:a, Mode_altscr=0, Mode_irixps=1, Delay_time=3.000, Curwin=0
> Def     fieldscur=AEHIOQTWKNMbcdfgjplrsuvyzX
>         winflags=25913, sortindx=10, maxtasks=2
>         summclr=1, msgsclr=1, headclr=3, taskclr=1
> Job     fieldscur=ABcefgjlrstuvyzMKNHIWOPQDX
>         winflags=62777, sortindx=0, maxtasks=0
>         summclr=6, msgsclr=6, headclr=7, taskclr=6
> Mem     fieldscur=ANOPQRSTUVbcdefgjlmyzWHIKX
>         winflags=62777, sortindx=13, maxtasks=0
>         summclr=5, msgsclr=5, headclr=4, taskclr=5
> Usr     fieldscur=ABDECGfhijlopqrstuvyzMKNWX
>         winflags=62777, sortindx=4, maxtasks=0
>         summclr=3, msgsclr=3, headclr=2, taskclr=3
> EOF
[ec2-user@ip-10-145-187-117 ~]$ top -b -n20 -U ec2-user

top - 20:31:51 up 28 min,  2 users,  load average: 1.37, 1.17, 0.63
Tasks:  82 total,   4 running,  78 sleeping,   0 stopped,   0 zombie
Cpu0  : 22.9%us,  0.3%sy,  0.0%ni, 76.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 74.0%us,  3.0%sy,  0.0%ni, 23.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 59.7%us,  4.0%sy,  0.0%ni, 36.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 33.7%us,  2.7%sy,  0.0%ni, 63.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1951 ec2-user  20   0  4444  608  400 R 97.1  0.0   0:08.92 gzip
 1953 ec2-user  20   0  4444  608  400 R 97.1  0.0   0:08.92 gzip                                                         

top - 20:31:54 up 28 min,  2 users,  load average: 1.37, 1.17, 0.63
Tasks:  82 total,   4 running,  78 sleeping,   0 stopped,   0 zombie
Cpu0  : 72.3%us,  4.3%sy,  0.0%ni, 23.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 94.4%us,  5.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 21.3%us,  2.0%sy,  0.0%ni, 76.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1953 ec2-user  20   0  4444  608  400 R 97.1  0.0   0:11.84 gzip
 1951 ec2-user  20   0  4444  608  400 R 96.8  0.0   0:11.83 gzip                                                         

top - 20:31:57 up 28 min,  2 users,  load average: 1.34, 1.17, 0.64
Tasks:  82 total,   3 running,  79 sleeping,   0 stopped,   0 zombie
Cpu0  : 95.3%us,  4.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 92.4%us,  7.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1951 ec2-user  20   0  4444  608  400 R 96.8  0.0   0:14.74 gzip
 1953 ec2-user  20   0  4444  608  400 R 96.8  0.0   0:14.75 gzip                                                         

top - 20:32:00 up 28 min,  2 users,  load average: 1.32, 1.17, 0.64
Tasks:  82 total,   4 running,  78 sleeping,   0 stopped,   0 zombie
Cpu0  : 29.9%us,  1.7%sy,  0.0%ni, 68.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 63.0%us,  3.7%sy,  0.0%ni, 33.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 40.5%us,  2.3%sy,  0.0%ni, 57.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 55.3%us,  3.7%sy,  0.0%ni, 41.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1951 ec2-user  20   0  4444  608  400 R 97.1  0.0   0:17.66 gzip
 1953 ec2-user  20   0  4444  608  400 R 97.1  0.0   0:17.67 gzip

We see that work is split between adjacent CPUs, but that the scheduler is doing a good job of keeping the adjacent CPUs near 100% usage between them.

So based on these tests, it looks like, even though the CPU is still mislabeled, HVM has almost entirely avoided the issue of variability due to shared-core scheduling, at the cost of a small reduction in overall throughput.

Categories: DBA Blogs

Log Buffer #377, A Carnival of the Vanities for DBAs

Fri, 2014-06-27 07:49

Improvement and progress are all about growth. Pythian just recently grew more by entering into an agreement to acquire Blackbird.io as announced by Paul Vallee. This Log Buffer also adds one more edition to its growing history.

Oracle:

WebLogic 12.1.3 is a major step forward to becomes SOA Suite 12c, BPM Suite 12c and all the other Fusion Middleware Components will run on it!

The Internet of Things is changing everything. From the way you start your day in the morning, to the way products are manufactured and cities operate their services, machines all around you will collect data, share it with each other, and use it to recommend and even make decisions along the way.

Pivotal Cloud Foundry Installed lets create an ORG / USER to get started

An in-line sub process can best be thought of as a callable scope. It is a scope that is not part of the normal process flow, but rather a unit of encapsulated logic that can be called zero, one or multiple times from anywhere within the BPEL process

Do you need to create PDF reports from PL/SQL?

SQL Server:

Unique indexes are the database developer’s responsibility. Non-unique indexes can be more easily maintained directly on the production database by an automated process.

Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of software delivery and deployment by making build, integration and delivery into a routine.

When you are writing TSQL code there are times when you want to perform the same logic over and over again. To accomplish this you can used the WHILE keyword.

What you need for a Multi Subnet Configuration for AlwaysOn in SQL Server 2012

All the wonderful functionality that in-memory tables and natively compiled procedures provide in SQL Server 2014 is pretty cool. But, changes to core of the engine results in changes in things that we may have developed a level of comfort with. Grant Fritchey explains.

MySQL:

Optimistic updates for Edit Data operations in MySQL for Excel

Here are a few videos those of you who are just getting started with Sphinx may find useful.

Harnessing the power of master/slave clusters to operate data-driven businesses on MySQL

Every SELECT from your Python program may acquire a metadata lock!

Why %util number from iostat is meaningless for MySQL capacity planning

Categories: DBA Blogs

Welcome to Blackbird.io Employees and Clients

Thu, 2014-06-26 11:29

Today, we announced that Pythian has entered into an agreement to acquire Blackbird.io, itself the result of a recent merger between PalominoDB and DriveDev.

I want to start with a hearty welcome to the 40+ new esteemed collaborators joining our firm today. Simultaneously, I want to welcome Blackbird.io’s valued clients to the Pythian family.

I am looking forward to cultivating a long-lasting collaboration and friendship with each one of you, many of whom I have already counted as friends for years.

To that point, I want to highlight my longstanding friendship and collaboration with Laine Campbell, the CEO of Blackbird.io. I first met Laine in 2007 and was impressed by her intelligence, her energy, her charisma and, most of all, her remarkable passion for doing the right thing by her team, her clients, and her community.

In February 2008, I sent Laine an email with the subject “Join us?”, the most important line of which was “I’m looking for a founder for a new office in the Bay Area.”

Laine was gracious in her reply: “At this point, I’m absolutely packed with long-term clients.  I’m quite comfortable with revenue and challenge and location.  I really am flattered you’d consider me for the position, but I’m going to have to pass.” That was only about a year after she had founded PalominoDB.

Laine and I have been friends ever since and have made a discipline of trading notes and advice about our respective businesses.

As we fast-forward six years to the present, Laine and her team have achieved what many might have thought impossible. Out of thin air, with no venture capital and in only eight short years, Blackbird.io is a business eerily reminiscent of Pythian in 2008… a feat that took us 11 years.

Earlier this year, PalominoDB joined forces with DriveDev, itself a highly successful DevOps business transformation company founded in 2007 to create Blackbird.io. Blackbird.io delivers a coherent and differentiated vision that helps transform businesses through breakthrough velocity, availability, security, performance, and cost.

In what has to be one of the longest corporate romances our niche has known, Laine reached out to me in May indicating that she’d like to accept my original offer and join forces with us. It was my turn to be flattered and go through a week’s soul searching.  I was not alone in the exercise. A lot of soul searching, strategic thinking, and sheer hard work has gone into this announcement today. By the end of our efforts, it became clear that joining forces would dramatically accelerate our ability to reshape the enterprise IT services landscape.

I would like to specifically thank Laine Campbell, Aaron Lee, and Vicki Vance as owners of Blackbird.io for their courage, vision, and determination through these demanding weeks. On the Pythian side, I would like to especially thank  Andrew Waitman, without whom this deal would be impossible to contemplate, Alain Tardif and Siobhan Devlin, and the rest of the executive team at Pythian who’ve moved mountains on our behalf to make it real. I don’t want to forget to highlight as well the external support of Bob Ford at Kelly Santini and our financing partners.

We have months of hard work ahead of us integrating our businesses. It’s our goal and imperative to listen and learn from each other, and pick and choose the best parts of each respective business as we weave a coherent and integrated whole. This will be the first meaningful merger Pythian undertakes.

Together we are almost 350 strong and are home to the industry’s largest open-source database managed services capability. Together we will accelerate the adoption of Enterprise DevOps and help countless SaaS, retail, media, and online businesses leave their competitors in the dust. And that is a vision worth getting excited about.

Categories: DBA Blogs

Oracle Database: Query to List All Statistics Tables

Wed, 2014-06-25 08:00

If you were a big fan of manual database upgrade steps, perhaps you would have come across this step many times in your life while reading MOS notes, upgrade guides, etc.

Upgrade Statistics Tables Created by the DBMS_STATS Package
If you created statistics tables using the DBMS_STATS.CREATE_STAT_TABLE procedure, then upgrade these tables by executing the following procedure:
EXECUTE DBMS_STATS.UPGRADE_STAT_TABLE(‘SYS’,’dictstattab’);

In my experience, I found the statistics tables can be created from Oracle rdbms version 8i. So this step became part of the database upgrade documents until now. I also noticed the structure of the statistics table was the same until 10gR2 version, but Oracle had modified the structure marginally on 11g and 12c versions.

I have been using this single query to list all statistics tables that exist on a database, which can be still used despite changes on the table structure.

SQL> select owner,table_name from dba_tab_columns where COLUMN_NAME=’STATID’ AND DATA_TYPE= ‘VARCHAR2′;

Though this is not a critical step, it is required as a part of the post upgrade. Here is the small action plan to run the required command to upgrade all statistics tables.

Connect as SYS database user and run these steps:
SQL> set pages 1000
SQL> set head off
SQL> set feedback off
SQL> spool /home/oracle/stattab_upg.sql
SQL> select ‘EXEC DBMS_STATS.UPGRADE_STAT_TABLE(”’||owner||”’,”’||table_name||”’);’ from dba_tab_columns where COLUMN_NAME=’STATID’ AND DATA_TYPE= ‘VARCHAR2′;
SQL> spool off
SQL> @/home/oracle/stattab_upg.sql
SQL> exit

Categories: DBA Blogs

Virtual CPUs with Amazon Web Services

Tue, 2014-06-24 15:41

Some months ago, Amazon Web Services changed the way they measure CPU capacity on their EC2 compute platform. In addition to the old ECUs, there is a new unit to measure compute capacity: vCPUs. The instance type page defines a vCPU as “a hyperthreaded core for M3, C3, R3, HS1, G2, and I2.” The description seems a bit confusing: is it a dedicated CPU core (which has two hyperthreads in the E5-2670 v2 CPU platform being used), or is it a half-core, single hyperthread?

I decided to test this out for myself by setting up one of the new-generation m3.xlarge instances (with thanks to Christo for technical assistance). It is stated to have 4 vCPUs running E5-2670 v2 processor at 2.5GHz on the Ivy Bridge-EP microarchitecture (or sometimes 2.6GHz in the case of xlarge instances).

Investigating for ourselves

I’m going to use paravirtualized Amazon Linux 64-bit for simplicity:

$ ec2-describe-images ami-fb8e9292 -H
Type    ImageID Name    Owner   State   Accessibility   ProductCodes    Architecture    ImageType       KernelId        RamdiskId Platform        RootDeviceType  VirtualizationType      Hypervisor
IMAGE   ami-fb8e9292    amazon/amzn-ami-pv-2014.03.1.x86_64-ebs amazon  available       public          x86_64  machine aki-919dcaf8                      ebs     paravirtual     xen
BLOCKDEVICEMAPPING      /dev/sda1               snap-b047276d   8

Launching the instance:

$ ec2-run-instances ami-fb8e9292 -k marc-aws --instance-type m3.xlarge --availability-zone us-east-1d
RESERVATION     r-cde66bb3      462281317311    default
INSTANCE        i-b5f5a2e6      ami-fb8e9292                    pending marc-aws        0               m3.xlarge       2014-06-16T20:23:48+0000  us-east-1d      aki-919dcaf8                    monitoring-disabled                              ebs                                      paravirtual     xen             sg-5fc61437     default

The instance is up and running within a few minutes:

$ ec2-describe-instances i-b5f5a2e6 -H
Type    ReservationID   Owner   Groups  Platform
RESERVATION     r-cde66bb3      462281317311    default
INSTANCE        i-b5f5a2e6      ami-fb8e9292    ec2-54-242-182-88.compute-1.amazonaws.com       ip-10-145-209-67.ec2.internal     running marc-aws        0               m3.xlarge       2014-06-16T20:23:48+0000        us-east-1d      aki-919dcaf8                      monitoring-disabled     54.242.182.88   10.145.209.67                   ebs                      paravirtual      xen             sg-5fc61437     default
BLOCKDEVICE     /dev/sda1       vol-1633ed53    2014-06-16T20:23:52.000Z        true

Logging in as ec2-user. First of all, let’s see what /proc/cpuinfo says:

[ec2-user@ip-10-7-160-199 ~]$ egrep '(processor|model name|cpu MHz|physical id|siblings|core id|cpu cores)' /proc/cpuinfo
processor       : 0
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
cpu MHz         : 2599.998
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 1
processor       : 1
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
cpu MHz         : 2599.998
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 1
processor       : 2
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
cpu MHz         : 2599.998
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 1
processor       : 3
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
cpu MHz         : 2599.998
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 1

Looks like I got some of the slightly faster 2.6GHz CPUs. /proc/cpuinfo shows four processors, each with physical id 0 and core id 0. Or in other words, one single-core processor with 4 threads. We know that the E5-2670 v2 processor is actually a 10-core processor, so the information we see at the OS level is not quite corresponding.

Nevertheless, we’ll proceed with a few simple tests. I’m going to run “gzip”, an integer-compute-intensive compression test, on 2.2GB of zeroes from /dev/zero. By using synthetic input and discarding output, we can avoid effects of disk I/O. I’m going to combine this test with taskset comments to impose processor affinity on the process.

A simple test

The simplest case: a single thread, on processor 0:

[ec2-user@ip-10-7-160-199 ~]$ taskset -pc 0 $$
pid 1531's current affinity list: 0-3
pid 1531's new affinity list: 0
[ec2-user@ip-10-7-160-199 ~]$ dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null
2170552320 bytes (2.2 GB) copied, 17.8837 s, 121 MB/s

With the single processor, we can process 121 MB/sec. Let’s try running two gzips at once. Sharing a single processor, we should see half the throughput.

[ec2-user@ip-10-7-160-199 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null & done
2170552320 bytes (2.2 GB) copied, 35.8279 s, 60.6 MB/s
2170552320 bytes (2.2 GB) copied, 35.8666 s, 60.5 MB/s
Sharing those cores

Now, let’s make things more interesting: two threads, on adjacent processors. If they are truly dedicated CPU cores, we should get a full 121 MB/s each. If our processors are in fact hyperthreads, we’ll see throughput drop.

[ec2-user@ip-10-7-160-199 ~]$ taskset -pc 0,1 $$
pid 1531's current affinity list: 0
pid 1531's new affinity list: 0,1
[ec2-user@ip-10-7-160-199 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null & done
2170552320 bytes (2.2 GB) copied, 27.1704 s, 79.9 MB/s
2170552320 bytes (2.2 GB) copied, 27.1687 s, 79.9 MB/s

We have our answer: throughput has dropped by a third, to 79.9 MB/sec, showing that processors 0 and 1 are threads sharing a single core. (But note that Hyperthreading is giving performance benefits here: 79.9 MB/s on a shared core is higher than then 60.5 MB/s we see when sharing a single hyperthread.)

Trying the exact same test, but this time, non-adjacent processors 0 and 2:

[ec2-user@ip-10-7-160-199 ~]$ taskset -pc 0,2 $$
pid 1531's current affinity list: 0,1
pid 1531's new affinity list: 0,2
[ec2-user@ip-10-7-160-199 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null & done
2170552320 bytes (2.2 GB) copied, 17.8967 s, 121 MB/s
2170552320 bytes (2.2 GB) copied, 17.8982 s, 121 MB/s

All the way up to full-speed, showing dedicated cores.

What does this all mean? Let’s go back to the Amazon’s vCPU definition

Each vCPU is a hyperthreaded core

As our tests have shown, a vCPU is most definitely not a core. It’s a half of a shared core, or one hyperthread.

A side effect: inconsistent performance

There’s another issue at play here too: the shared-core behavior is hidden from the operating system. Going back to /proc/cpuinfo:

[ec2-user@ip-10-7-160-199 ~]$ grep 'core id' /proc/cpuinfo
core id         : 0
core id         : 0
core id         : 0
core id         : 0

This means that the OS scheduler has no way of knowing which processors have shared cores, and can not schedule tasks around it. Let’s go back to our two-thread test, but instead of restricting it to two specific processors, we’ll let it run on any of them.

[ec2-user@ip-10-7-160-199 ~]$ taskset -pc 0-3 $$
pid 1531's current affinity list: 0,2
pid 1531's new affinity list: 0-3
[ec2-user@ip-10-7-160-199 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null & done
2170552320 bytes (2.2 GB) copied, 18.041 s, 120 MB/s
2170552320 bytes (2.2 GB) copied, 18.0451 s, 120 MB/s
[ec2-user@ip-10-7-160-199 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null & done
2170552320 bytes (2.2 GB) copied, 21.2189 s, 102 MB/s
2170552320 bytes (2.2 GB) copied, 21.2215 s, 102 MB/s
[ec2-user@ip-10-7-160-199 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null & done
2170552320 bytes (2.2 GB) copied, 26.2199 s, 82.8 MB/s
2170552320 bytes (2.2 GB) copied, 26.22 s, 82.8 MB/s

We see throughput varying between 82 MB/sec and 120 MB/sec, for the exact same workload. To get some more performance information, we’ll configure top to run 10-second samples with per-processor usage information:

[ec2-user@ip-10-7-160-199 ~]$ cat > ~/.toprc <<-EOF
RCfile for "top with windows"           # shameless braggin'
Id:a, Mode_altscr=0, Mode_irixps=1, Delay_time=3.000, Curwin=0
Def     fieldscur=AEHIOQTWKNMbcdfgjplrsuvyzX
        winflags=25913, sortindx=10, maxtasks=2
        summclr=1, msgsclr=1, headclr=3, taskclr=1
Job     fieldscur=ABcefgjlrstuvyzMKNHIWOPQDX
        winflags=62777, sortindx=0, maxtasks=0
        summclr=6, msgsclr=6, headclr=7, taskclr=6
Mem     fieldscur=ANOPQRSTUVbcdefgjlmyzWHIKX
        winflags=62777, sortindx=13, maxtasks=0
        summclr=5, msgsclr=5, headclr=4, taskclr=5
Usr     fieldscur=ABDECGfhijlopqrstuvyzMKNWX
        winflags=62777, sortindx=4, maxtasks=0
        summclr=3, msgsclr=3, headclr=2, taskclr=3
EOF
[ec2-user@ip-10-7-160-199 ~]$ top -b -n10 -U ec2-user
top - 21:07:50 up 43 min,  2 users,  load average: 0.55, 0.45, 0.36
Tasks:  86 total,   4 running,  82 sleeping,   0 stopped,   0 zombie
Cpu0  : 96.7%us,  3.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  1.4%sy,  0.0%ni, 97.9%id,  0.0%wa,  0.3%hi,  0.0%si,  0.3%st
Cpu2  : 96.0%us,  4.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  1.0%sy,  0.0%ni, 97.9%id,  0.0%wa,  0.7%hi,  0.0%si,  0.3%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1766 ec2-user  20   0  4444  608  400 R 99.7  0.0   0:06.08 gzip
 1768 ec2-user  20   0  4444  608  400 R 99.7  0.0   0:06.08 gzip

Here two non-adjacent CPUs are in use. But 3 seconds later, the processes are running on adjacent CPUs:

top - 21:07:53 up 43 min,  2 users,  load average: 0.55, 0.45, 0.36
Tasks:  86 total,   4 running,  82 sleeping,   0 stopped,   0 zombie
Cpu0  : 96.3%us,  3.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 96.0%us,  3.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.3%hi,  0.0%si,  0.3%st
Cpu3  :  0.3%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.3%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1766 ec2-user  20   0  4444  608  400 R 99.7  0.0   0:09.08 gzip
 1768 ec2-user  20   0  4444  608  400 R 99.7  0.0   0:09.08 gzip

Although usage percentages are similar, we’ve seen earlier that throughput drops by a third when cores are shared, and we see varied throughput as the processes are context-switched between processors.

This type of situation arises where compute-intensive workloads are running, and when there are fewer processes than total CPU threads. And if only AWS would report correct core IDs to the system, this problem wouldn’t happen: the OS scheduler would make sure processes did not share cores unless necessary.

Here’s a chart summarizing the results:

aws-cpu Summing up

Over the course of the testing I’ve learned two things:

  • A vCPU in an AWS environment actually represents only half a physical core. So if you’re looking for equivalent compute capacity to, say, an 8-core server, you would need a so-called 4xlarge EC2 instance with 16 vCPUs. So take it into account in your costing models!
  • The mislabeling of the CPU threads as separate single-core processors can result in performance variability as processes are switched between threads. This is something the AWS and/or Xen teams should be able to fix in the kernel.

Readers: what has been your experience with CPU performance in AWS? If any of you has access to a physical machine running E5-2670 processors, it would be interesting to see how the simple gzip test runs.

Categories: DBA Blogs

Working in Pythian’s Advanced Technology Consulting Group

Mon, 2014-06-23 08:22

Before I joined Pythian, I had the fortune of having a lot of good jobs across various industries. My favorite jobs were the ones that were fast paced and required me to ramp up my skills on the fly while learning new technology. My least favorite jobs were the ones where my skills were not used and the challenges were few and far between. When I joined Pythian I hadn’t realized I found my first great job.

In April 2012, I joined Pythian’s Professional Consulting Group (PCG). The members of PCG represented some of the world’s leading data experts, but the name did not adequately represent the skills of the members. Members of PCG were experts in many complementary technologies and many, if not all, were quickly becoming experts in emerging technologies such as Big Data. Because of this, the Professional Consulting Group became the Advanced Technology Consulting Group (ATCG).

As a member of ATCG, my main responsibility is to deliver consulting services to our customers either on site or remotely. Examples of some of the work we might do include: troubleshooting performance problems, migrating databases into Exadata, setting up replication with Oracle GoldenGate, and data integration with numerous sources using Oracle Data Integrator. While all of the items I mentioned deal with Oracle technologies, ATCG also has members who specialize in Microsoft SQL Server and MySQL.

The services we provide to our customers do not stop at traditional database services, ATCG also delivers Big Data services using Hadoop. Examples of some of the Hadoop work I have been involved with include: installing and configuring Cloudera Hadoop, securing Hadoop with Kerberos, and troubleshooting performance. As you can see, ATCG has the opportunity to gain valuable experience across a broad range of technologies.

All of our projects begin with a call with the potential customer. Members of ATCG serve as a technical resource on the call. It is our responsibility to understand the customer’s issue and estimate the effort required to perform the work. Sometimes this can be challenging because the customer might not have a technical resource on their side who can articulately convey the issue. Even if there is a technical resource on the customer’s side, we have to be mindful to not alienate others on the call, so it is vitally important that we are able to convey our message in way everybody on the call can understand.

You might be thinking “I am not a salesperson!” and “I have never used some of these technologies.” You would not be alone. ATCG are not sales people, we simply assist Sales by providing our technical knowledge on a call. Imagine that you are speaking with your boss or customer about a problem or issue – It really is no different. Dealing with new technology is little different at Pythian from your current job; If you don’t understand something, you can talk to a few coworkers or research on the net at your current job. At Pythian we can reach out to 200+ coworkers and find quite a few who have experience with the technology in question. We can search our internal technical documents, which are quite vast as they detail all of the work we have done, and as a last resort we can search the net. At Pythian, you are never alone and you are never without resources.

There are times when we might not have a project to work on, a.k.a. downtime. During our downtime, we can build our knowledge of technologies that we have interest in or that we may need a refresher for. We can practice our new found knowledge assisting other teams. We can help build the Pythian knowledge base by posting blogs and contributing to our internal documentation.

The work in ATCG is very challenging and you are always learning something new, whether it is a new technology or a new way of thinking about a particular topic. Being bored or pigeonholed is not a problem in ATCG; we are involved in some of toughest problems and work with the latest technologies. And when we are not, we are in control of our workday so we can pursue interests in new and emerging database technologies.

Categories: DBA Blogs

Ambari Blueprints and One-Touch Hadoop Clusters

Fri, 2014-06-20 11:11

For those who aren’t familiar, Apache Ambari is the best open source solution for managing your Hadoop cluster: it’s capable of adding nodes, assigning roles, managing configuration and monitoring cluster health. Ambari is HortonWorks’ version of Cloudera Manager and MapR’s Warden, and it has been steadily improving with every release. As of version 1.5.1, Ambari added support for a declarative configuration (called a Blueprint) which makes it easy to automatically create clusters with many ecosystem components in the cloud. I’ll give an example of how to use Ambari Blueprints, and compare them with existing one-touch deployment methods for other distributions.

Why would I want that?

I’ve been working on improving the methodology used by the Berkeley Big Data Benchmark. Right now spinning up the clusters is a relatively manual process, where the user has to step through the web interfaces of Cloudera Manager and Ambari, copy-paste certificates and IPs, and assign roles to nodes. The benchmark runs on EC2 instances, so I’ve been focused on automatic ways to create clusters on Amazon:

  • Apache Whirr can create a Hadoop cluster (or a number of other Big Data technologies), including CDH5, MapR and HDP. Documentation is sparse, and there doesn’t appear to be support for installing ecosystem projects like Hive automatically.
  • Amazon EMR supports installing Hive and Impala natively, and other projects like Shark via bootstrap actions. These tend to be older versions which aren’t suitable for my purposes.
  • MapR’s distribution is also available on EMR, but I haven’t used that since the different filesystem (MapRFS vs. HDFS) would impact results.

Hive-on-Tez is only supported on HDP at the moment, so it’s crucial that I have a one-touch command to create both CDH5 clusters, but also HDP clusters. Ambari Blueprints provide a crucial piece of the solution.

The Blueprint

Blueprints themselves are just JSON documents you send to the Ambari REST API. Every Ambari Blueprint has two main parts: a list of “host groups”, and configuration.

Host Groups

Host groups are a set of machines with the same agents (“components” in Ambari terms) installed – a typical cluster might have host groups for the NameNode, SecondaryNameNode, ResourceManager, DataNodes and client nodes for submitting jobs. The small clusters I’m creating have a “master” node group with the NameNode, ResourceManager, and HiveServer components on a single server and then a collection of “slaves” running the NodeManager and DataNode components. Besides a list of software components to install, every host group has a cardinality. Right now this is a bit of a pain, since the cardinality is exact: your blueprint with 5 slave nodes must have 5 slaves Hopefully the developers will add an option for “many”, so we don’t have to generate a new blueprint for every different sized cluster.  Thanks to John from HortonWorks for a correction, cardinality is an optional hint which isn’t validated by Ambari. This wasn’t clear from the docs.

To provide a concrete example, the sample host groups I’m using look like this:

"host_groups" : [
 {
 "name" : "master",
 "components" : [
 {
 "name" : "NAMENODE"
 },
 {
 "name" : "SECONDARY_NAMENODE"
 },
 {
 "name" : "RESOURCEMANAGER"
 },
 {
 "name" : "HISTORYSERVER"
 },
 {
 "name" : "ZOOKEEPER_SERVER"
 },
 {
 "name" : "HIVE_METASTORE"
 },
 {
 "name" : "HIVE_SERVER"
 },
 {
 "name" : "MYSQL_SERVER"
 }
 ],
 "cardinality" : "1"
 },
{
 "name" : "slaves",
 "components" : [
 {
 "name" : "DATANODE"
 },
 {
 "name" : "HDFS_CLIENT"
 },
 {
 "name" : "NODEMANAGER"
 },
 {
 "name" : "YARN_CLIENT"
 },
 {
 "name" : "MAPREDUCE2_CLIENT"
 },
 {
 "name" : "ZOOKEEPER_CLIENT"
 },
 {
 "name" : "TEZ_CLIENT"
 },
 {
 "name" : "HIVE_CLIENT"
 }
 ],
 "cardinality" : "5"
 }

This host_groups describes a single node with all of the “master” components installed, and five slaves with just the DataNode, NodeManager and clients installed. Note that some components have depedencies: it’s possible to build an invalid blueprint which contains a HIVE_METASTORE but not a MYSQL_SERVER. The REST API provides appropriate error messages when such a blueprint is submitted.

Configuration

Configuration allows you to override the defaults for any services you’re installing, and it comes in two varieties: global, and service-specific. Global parameters are required for different services: to my knowledge Nagios and Hive require global parameters to be specified – these parameters apply to multiple roles within the cluster, and the API will tell you if any are missing. Most cluster configuration (your typical core-site.xml, hive-site.xml, etc. parameters) can be overriden in the blueprint by specifying a configuration with the leading part of the file name, and then providing a map of the keys to overwrite. The configuration below provides a global variable that Hive requires, and it also overrides some of the default parameters in hive-site.xml. These changes will be propagated to the cluster as if you changed them in the Ambari UI.

"configurations": [
  {
    "global": {
      "hive_metastore_user_passwd": "p"
    }
  },
  {
    "hive-site": {
      "javax.jdo.option.ConnectionPassword": "p",
      "hive.security.authenticator.manager": "org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator",
      "hive.execution.engine": "tez",
      "hive.exec.failure.hooks": "",
      "hive.exec.pre.hooks": "",
      "hive.exec.post.hooks": ""
    }
  }
]

This config will override some parameters in hive-site.xml, as well as setting the metastore password to ‘p’. Note that you can specify more configuration files to override (core-site.xml, hdfs-site.xml, etc.) but each file must be it’s own object in the configurations array, similar to how global and hive-site are handled above.

Once you’ve specified the host groups and any configuration overrides, the Blueprint also needs a stack – the versions of software to install. Right now Ambari only supports HDP – see this table for the stack versions supported in each Ambari release. As a weird constraint, the blueprint name is inside the blueprint itself, along with the stack information. This name must be the same as the name you provide to the REST endpoint, for some reason. To upload a new blueprint to an Ambari server you can use:

$ curl -X POST -H 'X-Requested-By: Pythian' <ambari-host>/api/v1/blueprints/<blueprint name> -d @<blueprint file>

The X-Requested-By header is required, and as noted the blueprint name must match the file.

You can see the entire blueprint file from this example here, feel free to use it as a baseline for your cluster.

Creating the Cluster

Once you’ve written a blueprint with the services and configuration you want, you need to:

  • Create EC2 instances with the correct security groups
  • Install ambari-master on one, and ambari-agent on the others
  • Configure the agents to report to the master
  • Write a file mapping hosts to host groups
  • Push both files (the blueprint and the mapping) to the REST API

Fortunately, we have a Python script that can do that for you! This script will create a benchmarking cluster with a specific number of data nodes, an Ambari master and a separate Hadoop master. It can easily be modified to create multiple classes of machines, if you want to have more host groups than “master” and “slave”. The core of the script (the EC2 interaction and Ambari RPM installation) is stolen from based on work by Ahir Reddy from Databricks, with the Ambari Blueprints support added by yours truly.

If you’re curious about the host mapping file: it has the blueprint name, and an array of host names for every host_group. Corresponding to the example above, the cluster definition would be:

{
  "blueprint":"hadoop-benchmark",
  "host_groups: [
    { 
      "name":"master",
      "hosts":[{"fqdn":"host-1"}]
    },
    {
      "name":"slaves",
      "hosts":[ 
        {"fqdn":"host-2"},
        {"fqdn":"host-3"}  
      ]
  ]
}

You could replace “host-n” with the real domain names for your Amazon instances (use the internal ones!), and create a new cluster over those machines using:

$ curl -X POST -H 'X-Requested-By: Pythian' <ambari-host>/api/v1/clusters/<cluster name> -d @<mapping file>
Conclusion

Ambari Blueprints have some rough edges right now, but they provide a convenient way to deploy all of the services supported by the HDP stack. Watch this space for more posts about my effort to create a repeatable, one-touch, cross-distribution Hadoop SQL benchmark on EC2.

Categories: DBA Blogs