Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 4 hours 56 min ago

SQL Server Replication Quick Tips

Tue, 2014-08-26 07:56

There is a time in every SQL Server DBA career where a mail came in with a “replication is not working, please check” message. This article is intended to provide with quick tips on how to handle common replication errors and performance problems in a one way transactional replication topology

Oh boy, there is a data problem:


You check replication monitor and get a :

“Transaction sequence number: 0x0003BB0E000001DF000600000000, Command ID: 1″

The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)

Note the sequential number will be used in the following scripts, also the commandID is important to note as not necessarily the whole sequential number has issues, it might be tied to just one command.

Go to the distributor database en run the following command to get the list of articles involved in this issue:

select * from dbo.MSarticles
where article_id in (
select article_id from MSrepl_commands
where xact_seqno = 0x0003BB0E000001DF000600000000)

To get the whole list of commands you can run below query

exec sp_browsereplcmds
@xact_seqno_start = ’0x0003BB0E000001DF000600000000′,
@xact_seqno_end = ’0x0003BB0E000001DF000600000000′

With this last query you can get to the exact command that is failing (by searching the command number in the commandID column)

You will notice that a transactional replication will typically(depending on setup) use insert, delete, update stored procedures to replicate the data, so the command you will see over here will look something like:

{CALL [sp_MSdel_dboMyArticle] (118)}

That is the stored procedure generated to process delete statement over dbo.MyArticle table, and in this case it is trying to delete ID 118. Based on the error reported you will now realize that the issue is that the replication is trying to delete MyArtcile on ID 118 and is not there, so it is trying to delete a non existent record.


  1. You can either check the publisher for this record and manually insert it in the subscriber, this will cause the replication command to succeed and will fix the issue.
  2. You can skip the command, for this specific example you can skip the command as there is no need to delete something that has been already deleted, by removing the command from the MSrepl_commands table. (Beware, only do this when you know what you are doing, manually removing transactions can result in a unstable replication.) In this example you would use something like
    Delete from MSrepl_commands
    where xact_seqno = 0x0003BB0E000001DF000600000000 and commandID=1
  3. Reinitialize, this option is the least famous, you should try to fix the issue before doing this, however if after skipping the command you still get new errors everywhere, something definitely went wrong and there is no easy way to guarantee that your subscription is up to date and stable, this can be indicator that someone or something messed around with the data, there was some type of modification at the subscription and this is causing issues with the replication. Remember most likely a one way transactional replication is intended to have a copy of the data so it can be queried, no modification should be made to the data as this won´t replicate back to the publisher.

Query time outs:

After checking the replication monitor you get a message like:ID-10054415

Query timeout expired
The process is running and is waiting for a response from the server

and then terminating with this error…
Agent ‘MyAgent’ is retrying after an error, YY retries attempted

This can be due to several reasons:

  • Your transaction is taking a long time and needs some tuning. If your transaction is touching too much data or is using a bad query plan it can result in a long running query, check your TSQL and see if the execution plan is optimal
  • There is a problem with the network. If you normally don´t have this issue and this just happened out of the blue, you can try to check the network, sometimes a network failure or saturated endpoint can increase transfer rates affecting your replication.
  • Server performance, either the publisher or subscriber can have a performance problem, either too much CPU or Memory usage can eventually impact a replication transaction causing it to timeout
  • The query just needs some more time to complete. If this is the case you can tweak the time out setting to give the transaction some more time so it can process properly. To do this:
  1. Right click the Replication folder
  2. Click Distributor Properties and select General
  3. Click ‘Profile Defaults’
  4. Choose ‘Distribution Agents’ on left
  5. Click ‘New’ to create a new default agent profile
  6. Choose ‘Default Agent Profile’ from the list displayed, (to copy this)
  7. Pick a name for your new profile and upate the QueryTimeout value in right column
  8. Save
  9. Choose to use this profile across all your replication sets. However I would recommend to only apply to the agent that requires this change
  10. To individually assign the profile, open Replication Monitor and then in the left pane click your replication set
  11. In the right pane, select your desired agent, right click and change the profile to the new one you just created

 Mini Hack on expired subscriptionsID-10098834

When a replication is marked as expired, it will tell you that you need to reinitialize.

To activate it “under the hood”, check your replication monitor last error, it will show you the last sequential number that tried to process, then run this command(using the corresponding seq_no):

update MSsubscriptions
set status=2
where subscription_seqno=0x0002AADE00005030000100000002

The status column means:

0 = Inactive.

1 = Subscribed.

2 = Active.

You can change it to Active and it will try to process again. Why would you use this? if the subscription expired but your distribution cleanup job haven´t run, then it can try to reprocess everything again, if the issue was related to a network time out and now you have your network back up, you can try this as it will try to start from the last sequential number. Also you can try to do this to reproduce the last error reported, so it will fail and eventually expire again but you will have a better idea on why it failed in the first place.

Multi threading or “Streams”

A slow replication, and by slow I mean when you know that your replication is experiencing a delay when your command goes from the distributor to the subscription, you can check this with performance counters or quickly insert a token(

You can improve the performance by adding streams, normally a default setting will write sequentially the replication transactions one by one, with Streams you can add more threads, say you specify to use 4 strems, you will be processing 4 transactions at a time meaning a faster turnaround. This can work beautifully but it can also generate deadlocks and inconsistencies, I would recommend to start low and just add 1 stream at a time and stop when you start seeing a problem. Do not go crazy and feel this is a turbo button and add 30 streams, and like most features, test it in QA first!

To Enable this option follow these steps:

  1. Open Replication Monitor, expand the Publisher and select the Publication in the left pane.
  2. On the right pane window , under “All Subscriptions” , you will see a list of all the Subscribers.
  3. Right Click the Subscriber you want to modify and click on “View Details”. A new Window will appear with the distribution agent session details.
  4. Now click on “Action” in the Menu bar at top and select “Distribution Agent Job Properties”, this will open the corresponding job properties.ID-100203331
  5. Go to  “Steps” in the left pane window followed by highlighting “Run Agent” on the Right pane window, click Edit.
  6. A new Windows will popup , scroll to the right end of the command section and append this parameter “ -SubscriptionStreams 2”
  7. Save the settings and restart the Distribution Agent job.

You might encounter some issues when implementing this, you can read this KB for further info:


There are many tips on how to fix a replication, sometimes is easier to just reinitialize, but sometimes this is not an option when critical systems depend on the subscription to be up to date or your database is so huge that it will take days to complete. When possible try to troubleshoot instead of just restarting the replication from scratch as it will give you a lot more insight on what is going on.

Categories: DBA Blogs

Log Buffer #385, A Carnival of the Vanities for DBAs

Fri, 2014-08-22 08:00

This Log Buffer edition combs through top notch blog posts from Oracle, MySQL and SQL Server postings around the globe.


You want to test the Oracle Java Cloud? You can get your own 30 day test account & instance. Or you can get a Java account with our permanent test system.

Some ramblings about Oracle R Distribution 3.1.1.

Scott is demystifying Oracle Unpivot.

Java 8 for Tablets, Pis, and Legos at Silicon Valley JUG – 8/20/2014

A new version of Oracle BPM Suite with Adaptive Case Management (ACM) is now available.

SQL Server:

Data Mining: Part 14 Export DMX results with Integration Services

Should you be planning to move from Exchange to Office 365? If so, why?

Stairway to T-SQL DML Level 12: Using the MERGE Statement

From SQL Server Management Studio it’s hard to look through the first few rows of a whole lot of tables in a database.

Special Characters can lead to many problems. Identifying and reporting on them can save a lot of headache down the road.


MariaDB Galera Cluster 5.5.39 now available

A closer look at the MySQL ibdata1 disk space issue and big tables

How-To: Guide to Database Migration from MS Access using MySQL Workbench

Using resource monitoring to avoid user service overload

How to use MySQL Global Transaction IDs (GTIDs) in production

Categories: DBA Blogs

Microsoft PowerBI: News from WPC

Wed, 2014-08-20 14:24

During the Worldwide Partner Conference (WPC) that happened last month, Microsoft made public some of the new functionalities that we can expect to see soon on Microsoft PowerBI.

If you were on another planet in the last few months, and didn’t heard about PowerBI for Office 365, I’ve included a list of the set of tools that comprise Microsoft BI stack in the cloud. Keep in mind, they are different tools and not a single product, each tool serving a different purpose. At the date of writing of this article the tools that comprise the entire solutions are:

  • Power Pivot
  • Power View
  • Power Query
  • Power Map
  • Power Q&A
  • Power BI Windows Store App

Some of those were already available as an Excel add-in or built-in as part of the Excel product for a long time and now had being re-packed on this cloud solution, while others, like the Power Q&A are a cloud-only solution.

So, what are the big news from the WPC 2014? During the conference, we watched a demonstration of what to expect for the coming months, so today I’ll discuss the ones I believe were the most important. If you want to see the entire presentation you can find it here.


New visualizations!

One of the key issues we were facing with the current version of Power BI, was the lack of some visualizations the customer always asks for, the main one being the gauge. It is incredible how popular those little gauges have become with time – decision makers love it, so it’s really good to see they are finally available in Power BI.

Besides the gauge, we can see in the image above taken from the WPC demonstration, other data visualizations like the radar chart and the treemap.

Edition capabilities in the browser

Another important thing that was announced was the possibility to edit the dashboard and change the data visualizations on the browser, without using Excel. It doesn’t seem like much, but this can be very important, and indicate a shift in Microsoft behaviour. If you look at all the Power BI functionalities, and in fact, at the entire BI stack, Excel was always the central tool, you needed Excel to basically everything. And now we are seeing some nice features that you can manage and control inside the browser. Let’s wait for the next steps.

Important to mention that everything is HTML5 instead of Silverlight, meaning we can have the same experience in basically any device.

Partner Solution Pack

If I was asked to name just a single announcement that was made that could drastically change the market, it would be this one.

Partner Solution Pack is the ability for the Microsoft partners to create a bundled BI solution including the data, connectivity to the data sources and all the interactive reports. That means that we can as a user buy a solution pack from Salesforce for instance, connect it with our Salesforce account and it would automatically create a fully interactive dashboard with our own data.

Now, imagine the other way around: you, as a Microsoft Partner now has the ability to create a complete BI solution to your customers and make it available on the cloud. And your customers can buy this package, connect it with their data and make use of the solution in a manner of seconds.

The Partner Solution Pack in my opinion will create a huge market for all Microsoft partners and provide us, the users, with tons of good packages and the ability to have a BI solution paying much less than what would cost creating everything from scratch.

PowerBI-Image2 PowerBI-Image3

But you may tell me that we have other tools in the market that can do this, connect on partner applications and build a dashboard, what would be the advantage of using Power BI over the existing tools?

The biggest advantage is the PowerBI Q&A, as you can see in the screenshot in every screen we have a simple search box at the top of the page, which allows the user to do simple natural language questions to query the data. So, if the user wants to know the “opportunity size by month and by industry” all you have to do is ask and PowerBI will find the data, and choose the best visualization method for you. After that, you can just pin this report in the dashboard and that’s it, now you can keep track of this important information on a daily basis. Without requiring a single line of code, without asking for a change request to the IT department and going to a huge queue of requests that would take months to be addressed.


I hope that in this article I was able to show you the potential this new functionalities can bring to your company. If you need more information about PowerBI, or if you’re as excited as I am with it and want to start using it right away, just contact us and our team will be glad to work with your company to either develop a BI solution that consumes your data, or to plan the development of your own Partner Solution Pack so you can offer your customers a complete BI solution using PowerBI for Office 365.

Click here to watch the entire WPC announcement and see the PowerBI reports in action. The PowerBI demonstration starts at 21:10 minutes.


Categories: DBA Blogs

SSAS Database Doesn’t Show Up in SharePoint 2013 Dashboard Designer

Wed, 2014-08-20 07:43

Howdy everyone,

Just a quick tip for everyone that is struggling to configure SharePoint Server 2013 PerformancePoint  to connect to a SQL Analysis Services 2012 or 2014 cube.

After a new SharePoint Server 2013 installation, I have tried to create a new connection to my Analysis Services cube through the SharePoint Dashboard Designer, but no matter what, the Database option always shows up as empty and I can select my Analysis Services database.



In the Windows server log event I could find the following message:

The data source provider for data sources of type ‘ADOMD.NET’ is not registered. Please contact an administrator.

PerformancePoint Services error code 10115.

The reason you would receive this error message, believe it or not, is because even if you are using SQL Server 2012 or newer, and SharePoint Server 2013, it will try to load the SQL Server 2008 version of the ADMD.NET dll.

If you install the SQL Server 2008 R2 ADMD.NET component, that you can download from the following location: , and restart IIS you will fix this issue and will be able to successfully connect to your SQL Server 2012/2014 Analysis Services database.

Hope this helps.


Categories: DBA Blogs

SQL Server Error: 18056, Severity: 20, State: 29

Wed, 2014-08-20 07:40

Howdy everyone,

One of the most frequent error messages I come across when dealing with SQL Server, particularly SQL Server 2008 R2, is Error 18056, Severity: 20, State: 29. Not only do I constantly see this error message in the SQL logs, but also in the Microsoft community forums. I often see inquiries likes, “I have SQL Server 2008 R2 SP2 installed and I see the error below. I have the most recent Service Pack installed, but the problem remains.”

Error: 18056, Severity: 20, State: 29.
The client was unable to reuse a session with SPID XXXX, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Now, this error message can be triggered by many things. For instance, lack of user permissions in any database, or in the worst case, lack of resources in the server that made it stop accepting new connection requests. However, in any of those cases you would probably have a lot of angry users calling you and complaining that they can’t do their work. If that’s not the case, you are probably just seeing a generic and internal error message that shouldn’t be in the SQL error log in the first place.

The first step to addressing this issue, or simply investigating if it’s something more serious, is to update to the latest Cumulative Update.

Microsoft released a fix that makes SQL Server stop logging this generic error message in some cases, when it’s not important for you to investigate the issue. This fix is documented in the following Knowledge Base article:

I hope this help you to get rid of this error message.


Categories: DBA Blogs

On the Road with Laerte

Mon, 2014-08-18 09:21

For the month of October, Microsoft PowerShell MVP, Laerte Junior will be touring Brazil and Europe for various SQL Server-related speaking engagements.

“Thankfully, I am working at a company that fully supports their employees to speak and participate in community events.” Laerte says. “I can travel to Europe for 5 SQL Server conferences, and then go to the USA to attend the MVP Global Summit and SQL PASS Summit.”

While most European speaking sessions have been confirmed, we’ll be updating the schedule as the topics become available. You can follow Laerte on his personal blog at

Date Location Event Topic Speaking Schedule September 27, 2014 São Paulo, Brazil SQL Saturday #325 Criando suas próprias solouções usando PowerShell See speaking schedule October 1, 2014 Schelle, Belgium SQL Server Days Mastering PowerShell for SQL Server See speaking schedule October 2, 2014 Utrecht, Holland SQL Saturday #336 Full day pre-conference training session: Mastering PowerShell for SQL Server October 3/4 2014 Utrecht, Holland SQL Saturday #336 TBD See speaking schedule October 11, 2014 Sophia, Bulgaria SQL Saturday #311 Writing Your Solutions Using PowerShell See speaking schedule October 18, 2014 Oporto, Portugal SQL Saturday #341 Criando suas próprias solouções usando PowerShell See speaking schedule October 24, 2014 Barcelona, Spain SQL Saturday #338 TBD See speaking schedule

Will you be attending any of these sessions? If so, which ones?

Categories: DBA Blogs

Log Buffer #384, A Carnival of the Vanities for DBAs

Fri, 2014-08-15 07:32

This Log Buffer Edition starts with some great posts from Oracle arena, then passes through the world of SQL Server, and stops at the MySQL field.


OAG/OES Integration for Web API Security: skin and guts by Andre Correa

Showing Foreign Key Names in your Data Modeler Diagrams

walkmod : A Tool to Apply Coding Conventions

Oracle VM Virtual Appliances for E-Business Suite 12.1.3 Now Available

RMAN Catalog requires Enterprise Edition (EE) since Oracle Database

SQL Server:

Restore Gene : Automating SQL Server Database Restores

With a hybrid cloud, can you get the freedom and flexibility of a public cloud with the security and bandwidth of a private cloud?

A clear understanding of SQL Data Types and domains is a fundamental requirement for the Database Developer, but it is not elementary.

Automating SQL Server Agent Notification

Adding Custom Reports to SQL Server Management Studio


The Road to MySQL 5.6 — A DBA Perspective

Virtual servers for MySQL are popular but are they the answer? Should we be containing our instances instead.

Jeremy Cole recently blogged about the feature SET GLOBAL sql_log_bin.

Which SQL queries take all the time? Using MaxScale to answer that age old question.

SBR vs RBR when using On Duplicate Key Update for High Availability

Categories: DBA Blogs

Managing Files with SaltStack

Thu, 2014-08-14 09:06

Before we begin, take a look at my previous two blog posts, SaltStack for Remote Parallel Execution of Commands and Using SaltStack for Configuration Management.

We manage files using configurations management much for the same reasons that we manage packages and services – we want/need consistency across all of our boxes, and to spend our time on tasks that add more business value than logging into all of our servers to change a line in a config file.

Using Salt I will show you have to manage the message of the day (MOTD) on the server.


There are many examples of configuration files which differ only by a few lines between staging and production. A great example of this is a config file which has database host/user/pass information. The drive for consistency tells us that we would like the environment that our release is tested on to match (as closely as possible) the production environment where the code will run.

Using templates allows us to affect the few lines in a configuration file, which we would like to change, while leaving the rest alone. This also simplifies the management of our servers and the configuration repository, allowing us to maintain one config file for many servers.

Salt grains

The salt minions know a lot of information about the boxes they run on. Everything from the hostname, to the IP address, to the kernel, and more is available to be queried by the salt master. These pieces of information are called “grains” in the salt world and allow us to insert dynamic variables into our templates.

A great use case for grains would be the expansion of our Apache formula from my last post. On Red Hat-based machines, the Apache package is called “httpd” but on Debian-based machines the package is called “Apache2″ Using the “osfamily” grain we can dynamically redefine the package name for each minion while maintaining a single formula for all servers.

Likewise, any configurations files which need to have the current box IP address can benefit from grains. As each minion installs that configuration file it will see that the variable needs to be populated with a “grain” and will then do so as requested. Rather than maintaining an Apache vhost file for each of your 10 web servers where the only difference is the IP address declaration you can maintain one dynamic template which will ensure that everything else in that config file matches on each of the 10 nodes other then the one thing that needs to be dynamic (the IP address).

Putting it all together – the MOTD

In your /srv/salt dir we are going to create a subdir called motd. inside of that directory you will find 2 files. an init.sls which is the default top level formula for the director and an motd.template file which is our config file. The init.sls looks like this:

    - user: root 
    - group: root 
    - mode: 0644 
    - source: salt://motd/motd.template 
    - template: jinja

For the file /etc/motd we are telling Salt that we want to manage the file that its owner and group should be root, that we want the file to have 0644 permissions. We are letting Salt know that it will find the config file (source) under the motd subdir, the salt:// maps to /srv/salt and that our template will be in the jinja format.

Our template will look like:

Welcome to {{ grains['fqdn'] }}

Server Stats at a Glance:

OS: {{ grains['osfullname'] }}
Kernel: {{ grains['kernelrelease'] }}
Memory: {{ grains['mem_total'] }} MB

This server is managed using a configuration management system (
Changes made to this box directly will likely be over-written by SALT. Instead
modify server configuration via the configuration management git repository.

As each minion installs this MOTD file it will see the variables in use because they are grains the minion will know that it has the information required to populate the variable and will do so for each server. This will give you a final MOTD that looks like this:

[root@ip-10-0-0-172 ~]# cat /etc/motd

Welcome to ip-10-0-0-172.ec2.internal

Server Stats at a Glance:

OS: Amazon Linux AMI
Kernel: 3.10.42-52.145.amzn1.x86_64
Memory: 996 MB

This server is managed using a configuration management system (
Changes made to this box directly will likely be over-written by SALT.  Instead
modify server configuration via the configuration management git repository.
[root@ip-10-0-0-172 ~]#

As you can see each variable was populated with the information specific to the node.

If we wanted to add, remove, or change anything in the MOTD, rather than having to box walk the entire infrastructure (which depending on your side, could tie up a resource for days), we can edit the single template file on the master and allow the tool to propagate the change out to the boxes for us, reducing that task from a very boring day (or more) to a few minutes!

Categories: DBA Blogs

How to Configure an Azure Point-to-Site VPN – Part 2

Thu, 2014-08-14 08:47

This blog post is the second in a series of three which will demonstrate how to configure a Point-to-Site VPN step-by-step. In my first blog post, I demonstrated how to configure a virtual network and a dynamic routing gateway. Today’s post will be about creating certificates.


At this step, we will create and upload a certificate. This certificate will be used to authenticate the VPN clients and are performed in few steps:

  • Generate the certificate
  • Upload the root certificate to the Azure Management Portal
  • Generate a client certificate
  • Export and install the client certificate

Let’s start …

  1. We will need to use the MakeCert tool. MakeCert is part of “Microsoft Visual Studio Express” available here.
  2. After successfully downloading the tool, start the setup and follow the installation steps. Note that you can generate this certificate in any computer, not only in the computer where you are configuring the VPN.
    After the installation, you can find MakeCert at:

    • C:\Program Files (x86)\Windows Kits\8.1\bin\x64
    • C:\Program Files (x86)\Windows Kits\8.1\bin\x86
  3. Launch the command prompt as Administrator. Point the path to one of the folders referred in the previous step and execute the following command (note: keep the command line opened):
    makecert -sky exchange -r -n “CN=RootCertificateMurilo” -pe -a sha1 -len 2048 -ss My “RootCertificateMurilo.cer”
    (where “RootCertificateMurilo” is teh certificate name).Screen Shot 2014-07-30 at 11.38.38
    This command will create and install a root certificate in the Personal certificate store and create the define RootCertificateMurilo.cer file in the same directory that you are executing the command.Screen Shot 2014-07-30 at 11.59.41Note: Store this certificate in a safe location.
  4. Now, go to the Windows Azure Management Portal in order to upload the certificate.
  5. In the networks section, select the previously created network and go to the certificate page.Screen Shot 2014-07-30 at 13.02.05
  6. Click Upload a root certificate, select your certificate, and click in the check mark.Screen Shot 2014-07-30 at 13.04.10
    • Depending on the time zone of the server where you created the certificate, you might receive an error message, “The certificate is not valid yet, effective date is [date and time].” To work around this, delete the created certificate, and create another one adding the following parameter (change the date):-b “07/30/2014″It will be valid form 00:00:00 hours for the day you set.
  7. Now we need to create a Client Certificate. We will use the Root Certificate to do this.
    In the same command line window, opened before, execute the following command:makecert.exe -n “CN=ClientCertificateMurilo” -pe -sky exchange -m 96 -ss My -in “RootCertificateMurilo” -is my -a sha1This certificate will be stored in your personal certificate store.
  8. Now we need to export this certificate, as this should be installed on each computer that needs to be connected to the virtual network. To achieve this, enter the command “mmc”, still in the opened command line. The following window will be shown: Screen Shot 2014-07-30 at 16.52.59
    • Go to File->Add/Remove Snap-in.
    • Select “Certificates” and click on “Add >”.Screen Shot 2014-07-30 at 16.54.02
    • Select My user account and click Finish.Screen Shot 2014-07-30 at 16.54.56
    • Click OK in the remaining window.
    • Now you will be able to see your certificates under the “Personal\Certificates” folder:Screen Shot 2014-07-30 at 16.56.13
  9. To export the certificate, right click the Client certificate and click on “All Tasks->Export…”, as shown:Screen Shot 2014-07-30 at 17.00.46
  10. A wizard will be presented. Choose Yes, export the private key and click.Screen Shot 2014-07-31 at 11.15.06
  11. Leave this as default, and click Next.Screen Shot 2014-07-31 at 11.22.15
  12. Choose a strong password (try to remember this) and click Next.Screen Shot 2014-07-31 at 11.23.39
  13. Now you need to set the path to store you .pfx file.Screen Shot 2014-07-31 at 11.25.01
  14. Click Next, then Finish.
  15. To finalize the “Certificates part”, we will need to install the certificate on all the servers where we want to setup the VPN.To accomplish this, you just need to:
    • Copy the exported .pfx file (step 13) to all the servers.
    • Double-click the pfx on all the servers.
    • Enter the password.
    • Proceed with the installation, maintaining the default location.

Stay tuned for my next blog post on how to configure the VPN client.

Categories: DBA Blogs

Michael Abbey: Still Presenting After All These Years

Thu, 2014-08-14 07:50

A cool, wintery day in late 1989. This kid’s working for the Office of the Auditor General of Canada. I’d been working with Oracle and in my fourth year. I had cut my teeth on after first seeing V3 some four years prior. I stumbled across a well-placed ad for a show happening in Anaheim USA in September 1990. I’ve got the bug. I apply to go to the show and was told by my employer ,”Just a sec, David and I were thinking of going to that show – let us get back to you.” Some three weeks I am told it’s a go.

I am off to sunny California for six wonderful days of International Oracle User Week (IOUW); this was a joint effort put on by Oracle and the International Oracle User Group (IOUG). I had spent the better part of the summer of 1969 in southern Cali so this was shaping up to be a resurrection. I toddle off to Cali and have a wonderful time. It’s magic – such a learning opportunity. I even came away knowing how to place a database in archivelog mode. I was so pleased with myself and got to meet one of my heroes. I had only been working with the software for 4 years, but already knew of Ken Jacobs (a.k.a. Dr. DBA).

I had the bug to present almost from day one. I saw an ad in one of the bazillion pieces of paper I brought home from that IOUW about a show in DC – Sheraton Woodley Park to be exact. I don’t even think that it exists anymore. I figured I’d attend ECO then present an abstract for IOUW 1991 in Miami. Some of the history is described in a blog post I made in 2013 located here. Enough said about that. It was quite a whirlwind of activity on the presentation circuit in those days. Starting in 1992 I became very active in the IOUG holding a handful of board positions up to the 2006 or maybe 2007 time frame. I attended a gazillion conferences in those days and the pinnacle was a show in Philly in 1995. I had been on the board of the IOUW for a few years and the paid attendance count at that show was staggering. Chubby Checker played at the big bash and arrangements were made for me to sit in on the bass guitar for the Twist. That got cancelled at the last minute but it was close. My paper was in one of the biggest rooms in the convention centre. There were over 1,500 people in attendance and it was intoxicating. I was pleased when I got my evals to find out the attendees were as pleased as I was. It was all (or close to all) about the CORE database technology in those days. In 1995, Oracle7 was the hot item having been on the street for over 3 years.

As guests of Oracle, a handful of us had the pleasure of attending the launch of Oracle7 at the Hudson Theatre in the Hotel Macklowe on 44th St. in beloved NYC. We were thrilled to be very close in those days to Ray Lane, then President of Oracle Corp. and we introduced Ray to a lot of his direct reports at that “party.” A mere four years later we were back for the release launch of Oracle8 at Radio City Music Hall. Again, a pleasant time was had by all. There turned out to be surprisingly little coverage/mention of Oracle8 at that event. It was more concentrated on Oracle Network Computer (NC) designed to bring computing power to every desktop at a low cost. Once during that Oracle8 launch, the operator of the boom mic in then pit swept the stage to get from one side to the other and almost hit LJE in the side of the head. I think I was the only one who heard what Larry said – “Watch out Bill.” Does anyone get the reference but me?

My torrid Oracle technology career was just that. Between 1991 and the date of this post I have probably given over 100 papers at shows from Ottawa to Hyderabad, Brighton to San Diego, and Vienna to Addis Ababa. There is still a voracious hunger out there for the heart of my expertise – anything that starts with an “O” and ends in an “E” and has the word database tagged on the end. After becoming very close to some of the kernel developers at Oracle, we discussed how they were still in the middle of their workday when the Loma Prieta quake hit in October 1989. Me and a few close friends hung out with the guys whose names litter the bottom of the “this change was done when” section of the ?/rdbms/admin directory on Oracle database software installs. We were in David Anderson’s office schmoozing and asked what he happened to be up to that day. He was ftp’ing source code from a VAX to a Sun box in preparation for the base-platform change that happened in the early 1990s. It was a magic carpet ride.

In some ways it still is. To finish off this year I am appearing at:

  • OOW (Oracle Open World) in San Francisco – September 29-October 2
  • ECO (East Coast Oracle) event in Raleigh/Durham – November 3-5
  • MOUS (Michigan Oracle User Summit) in Livonia – November 13
  • UKOUG in Liverpool – December 8-10

My personal top 10 moments (actually top 11 – the exchange rate) in my still developing tech career you say … drum roll:

Rank Event Date 11 First ever tech show 1990 10 Longest lasting tech contact – Yossi Amor 25 years 9 Number of IOUG yearly events attended 23 8 Books published in Oracle Press series (including translations) 42 7 Most attendees at a presentation – 1500 (Philadelphia) 1995 6 Fewest attendees at a presentation – 1 2013 5 Most exciting event attended – CODA in Burlingame CA 1993 4 First PL/SQL code block to compile – Oracle7 1993 3 Favourite version of SQL*Forms – 2.3 1993 2 First got hands wet with this famous technology – 5.1.22 1986 1 Biggest thrill – the rush of speaking to live audiences 1991-??
Categories: DBA Blogs

Offline Visualization of Azkaban Workflows

Mon, 2014-08-11 07:51

As mentioned in my past adventures, I’m often working with the workflow management tool ominously called Azkaban. Its foreboding name is not really deserved; it’s relatively straightforward to use, and offers a fairly decent workflow visualization. For that last part, though, there is a catch: to be able to visualize the workflow, you have to (quite obviously) upload the project bundle to the server. Mind you, it’s not that much of a pain, and could easily managed by, say, a Gulp-fueled watch job. But still, it would be nice to tighten the feedback loop there, and be able to look at the graphs without having to go through the server at all.

Happily enough, all the information we need is available in the Azkaban job files themselves, and in a format that isn’t too hard to deal with. Typically, a job file will be called ‘foo.job’ and look like

command=echo "some command goes here"

So what we need to do to figure out a whole workflow is to begin at its final job, and recursively walk down all its dependencies.

use 5.12.0;

use Path::Tiny;

sub create_workflow {
  my $job = path(shift);
  my $azkaban_dir = $job->parent;

  my %dependencies;

  my @files = ($job);

  while( my $file = shift @files ) {
    my $job = $file->basename =~ s/\.job//r;

    next if $dependencies{$job}; # already processed

    my @deps = map  { split /\s*,\s*/ }
               grep { s/^dependencies=\s*// }
                    $file->lines( { chomp => 1 } );

    $dependencies{$job} = \@deps;

    push @files, map { $azkaban_dir->child( $_.'.job' ) } @deps;

  return %dependencies;

Once we have that dependency graph, it’s just a question of drawing the little boxes and the little lines. Which, funnily enough, is a much harder job one would expect. And better left off to the pros. In this case, I decided to go with Graph::Easy, which output text and svg.

use Graph::Easy;

my $graph = Graph::Easy->new;

while( my( $job, $deps ) = each %dependencies ) {
    $graph->add_edge( $_ => $job ) for @$deps;

print $graph->as_ascii;

And there we go. We put those two parts together in a small script, and we have a handy cli workflow visualizer.

$ target/azkaban/foo.job

  |                        v
+------+     +-----+     +-----+     +-----+
| zero | --> | baz | --> | bar | --> | foo |
+------+     +-----+     +-----+     +-----+
               |                       ^

Or, for the SVG-inclined,

$ -f=svg target/azkaban/foo.job

which gives us

Screen Shot 2014-08-10 at 3.09.42 PM
Categories: DBA Blogs

12c: Fun with WITH!

Fri, 2014-08-08 11:30

Last night I couldn’t sleep and what else you’re going to do? I was thinking about Oracle stuff.

In Oracle version 12, Oracle has enhanced the WITH clause – traditionally used for sub-query factoring – to allow the declaration of functions and procedures. This can be (ab)used to create a very interesting scenario, that is not very common in Oracle: Reading data within the same SELECT statement, but from two different points in time. And the points in time are in the future, and not in the past.

Let’s say I want to take a snapshot of the current SCN, and then another one 5 or 10 seconds after that. Traditionally we’d have to store that somewhere. What if I could take two snapshots – at different SCNs – using a single SELECT statement ? Without creating any objects ?

col value for a50
set lines 200 pages 99

procedure t (secs in number, scn out varchar2)
    pragma autonomous_transaction;
    select 'at ' || to_char(sysdate,'HH24:MI:SS') || ' SCN: ' 
                 || dbms_flashback.get_system_change_number 
      into scn 
      from dual;
function wait_for_it (secs in number) 
 return varchar2 is
    l_ret varchar2(32767);
    t(secs, l_ret);
    return l_ret;
select 1 as time, 'at ' || to_char(sysdate,'HH24:MI:SS') || ' SCN: ' 
                || dbms_flashback.get_system_change_number as value 
  from dual
union all
select 5, wait_for_it(5) from dual
union all
select 10, wait_for_it(5) from dual

And the result is:

---------- --------------------------------------------------
         1 at 09:55:49 SCN: 3366336
         5 at 09:55:54 SCN: 3366338
        10 at 09:55:59 SCN: 3366339


We can clearly see there, that the SCN is different, and the time shown matches the intervals we’ve chosen, 5 seconds apart. I think there could be some very interesting uses for this. What ideas can you folks come up with ?

Categories: DBA Blogs

Log Buffer #383, A Carnival of the Vanities for DBAs

Fri, 2014-08-08 07:34

This Log Buffer Edition picks few of the informative blog posts from Oracle, SQL Server, and MySQL fields of database.


g1gc logs – Ergonomics -how to print and how to understand

In Solaris 11.2, svcs gained a new option, “-L”.  The -L option allows a user to easily look at the most recent log events for a service.

ADF Thematic Map component from DVT library was updated in ADF 12c with marker zoom option and area layer styling

When cloning pluggable databases Oracle gives you also SNAPSHOT COPY clause to utilize storage system snapshot capabilities to save on storage space.

It is normal for bloggers including myself to post about the great things they have done.

SQL Server:

In six years Microsoft has come from almost zero corporate knowledge about how cloud computing works to it being an integral part of their strategy.

A brief overview of Columnstore index and its usage with an example.

The Road To Hell – new article from the DBA Team

Encryption brings data into a state which cannot be interpreted by anyone who does not have access to the decryption key, password, or certificates.

How to test what a SQL Server application would do in the past or in the future with date and time differences.


MySQL for Visual Studio 1.2.3 GA has been released

An approach to MySQL dynamic cross-reference query.

The MySQL replication and load balancing plugin for PHP, PECL/mysqlnd_ms, aims to make using a cluster of MySQL servers instead of a single server as transparent as possible.

Picking the Right Clustering for MySQL: Cloud-only Services or Flexible Tungsten Clusters? New webinar-on-demand.

Collation options for new MySQL schemas and tables created in MySQL for Excel

Categories: DBA Blogs

Alter Session Kill on Steroids

Wed, 2014-08-06 10:27

Perhaps you have encountered something like this: A session that is consuming too many resources needs to be killed. You locate the session and use ALTER SYSTEM KILL SESSION ‘SID,SERIAL#’ to kill the session. As you continue to monitor the database you find that the status of the session in v$session is ‘KILLED’, but the session does not go away. You also notice that the SERIAL# is continually changing.

Now you find there is no OS process associated with the session, but the session continues as PMON is unable to finish cleanup for the session. Usually when this happens, the session will be holding a lock. When that happens, the only method to release the lock is to bounce the database. There are some bugs that may be responsible for this problem, such as this one described by Oracle Support:

Pmon Spins While Cleaning Dead Process (Doc ID 1130713.1)

This particular bug affects Oracle – I have personally seen this same behavior happen on many versions of the database from 7.0 on. To avoid these hanging sessions many DBA’s have adopted the habit of first killing the OS process with an OS utility, and if the session is still visible in v$session, issue the ALTER SYSTEM KILL command.

The OS command used on linux/unix is usually ‘kill -9′. On windows it is OraKill. This method usually avoids the problems encountered when killing a session that is holding a lock and processing DML.

I don’t know just what circumstances trigger this behavior, as I have never been able to reproduce it at will. When it does happen though, it is more than annoying as the only way to clear locks held by the recalcitrant session is to bounce the database.

Quite some time ago (at least as far back as Oracle 8i) Oracle introduced the new IMMEDIATE keyword to use with ALTER SYSTEM KILL SESSION. Using this keyword removes the need to use an OS command to kill a session – Oracle will do it for you! To test this I am using Oracle on Oracle Linux 5.5. I have previously run these same tests in with the same results. Had I access to an 8i or 9i database I would have run the tests there. To start with let’s see what happens when a session is killed without the immediate keyword.

Login to the session to be killed:

$ sqlplus scott/tiger@10gr2

Login as SYSDBA from another terminal and check for scott’s session:

SQL> l
  1  select
  2     s.username,
  3     s.sid,
  4     s.serial#,
  5     p.spid spid
  6  from v$session s, v$process p
  7  where s.username = 'SCOTT'
  8*    and p.addr = s.paddr
SQL> /

USERNAME                              SID    SERIAL# SPID
------------------------------ ---------- ---------- ------------
SCOTT                                 133         35 22870

1 row selected.

All that has happened at this point is that Oracle has made an internal call that has disconnected Scott’s session. (tracing that operation is a different topic.) The process on the server has not been terminated. This can be seen by the following experiment:

Logon again as Scott.

In a SYSDBA session check for Scott’s:

 SQL> @scott

USERNAME                              SID    SERIAL# SPID
------------------------------ ---------- ---------- ------------
SCOTT                                 146         81 23678

Now check for the shadow process associated with scott’s session on the server:

[root@ora10gR2 tmp]# ps -fp 23678
oracle 23678 1 0 16:56 ? 00:00:00 oraclejs01 (LOCAL=NO)

Kill the session and check the status:

SQL> alter system kill session '146,81';

SQL> l
  1  select
  2     s.username,
  3     s.sid,
  4     s.serial#,
  5     p.spid spid
  6  from v$session s, v$process p
  7  where s.username = 'SCOTT'
  8*    and p.addr = s.paddr

no rows selected

Check again on the server for the process:

[root@ora10gR2 tmp]# ps -fp 23678
oracle 23678 1 0 16:56 ? 00:00:00 oraclejs01 (LOCAL=NO)

Interesting, isn’t it? We know the process is still alive on the server, but the session information is no longer associated with the process. This happens because Oracle has disconnected the session, which allows the process to continue until the sqlplus session is terminated. The session information is still available in v$session, but is no longer associated with a server process:

  2     s.username,
  3     s.status,
  4     s.sid,
  5     s.serial#
  6  from v$session s
  7* where s.username = 'SCOTT'

USERNAME                       STATUS          SID    SERIAL#
------------------------------ -------- ---------- ----------
SCOTT                          KILLED          146         81

1 row selected.

 1* select pid,spid from v$process where pid = 146

no rows selected

When exiting the Scott session, I can see that the session was killed:

SQL> exit
ORA-00028: your session has been killed

Let’s perform the experiment again, but this time use the IMMEDIATE keyword.

Logon as scott:

> sqlplus scott/tiger@10gr2

SQL*Plus: Release Production on Tue Aug 5 17:18:53 2014

Logon as SYSDBA and check for the scott session;

SQL> @scott

USERNAME                              SID    SERIAL# SPID
------------------------------ ---------- ---------- ------------
SCOTT                                 146         83 23939

1 row selected.

Before killing scott’s session:

  • get my OS PID
  • enable 10046 trace

The OS PID will be used for strace on the SYSDBA session shadow process on the server.
The 10046 trace is so we can see what is happening in the strace output.

SQL> l
1 select
2 s.username,
3 s.sid,
4 s.serial#,
5 p.spid spid
6 from v$session s, v$process p
7 where s.username is not null
8 and p.addr = s.paddr
9 and userenv('SESSIONID') = s.audsid
10* order by username, sid

USERNAME                              SID    SERIAL# SPID
------------------------------ ---------- ---------- ------------
SYS                                   145         65 23947

1 row selected.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

Now ssh to the db server , check for Scott session shadow process and start strace:

[root@ora10gR2 tmp]# strace -o 23947.strace -p 23947
[1]+ Stopped strace -o 23947.strace -p 23947
[root@ora10gR2 tmp]# bg
[1]+ strace -o 23947.strace -p 23947 &

[root@ora10gR2 tmp]# ps -p 23939
23939 ? 00:00:00 oracle

Now kill Scott’s session and exit the SYSDBA session:

SQL> alter system kill session '146,83' immediate;

System altered.

The strace command will now have exited on the server.

First check again for Scott’s session:

[root@ora10gR2 tmp]# ps -p 23939
[root@ora10gR2 tmp]#

So the Scott shadow process has terminated.

As the 10046 trace was enabled, the output to the oracle trace file will appear in the strace file, which allows searching for ‘alter system kill’ in the strace file.

From the strace file:

write(5, "alter system kill session '146,8"..., 44) = 44

Now searching for the PID of scott’s session 23939:

read(10, "23939 (oracle) S 1 23939 23939 0"..., 999) = 228
close(10) = 0
open("/proc/23939/stat", O_RDONLY) = 10
read(10, "23939 (oracle) S 1 23939 23939 0"..., 999) = 228
close(10) = 0
kill(23939, SIGKILL) = 0
kill(23939, SIGCONT) = 0
open("/proc/23939/stat", O_RDONLY) = 10
read(10, "23939 (oracle) Z 1 23939 23939 0"..., 999) = 178
close(10) = 0

From the previous text I can see that Oracle opened the status file for PID 23939.
Why it did so twice I am not sure.

What happens after that is the interesting part.

kill(23939, SIGKILL) = 0

That line means that the SIGKILL signal was successfully sent to Scott’s shadow process.

What does that mean? Run kill -l to get a list of signals:

kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3

Notice that signal 9 (kill -9) is SIGKILL.

So when killing a session with ALTER SYSTEM KILL SESSION ‘PID,SERIAL#’ IMMEDIATE Oracle is actually doing the kill -9 for you, and has been for many years now.

Though not shown here, this same test was run when the session was killed without using the IMMEDIATE keyword, and there were no attempts to kill the session. This was inferred as well by the fact the the process was still running on the server up until the time the Scott sqlplus session was exited.

Categories: DBA Blogs

Some Observations on Puppetrun with Foreman

Tue, 2014-08-05 11:50

After joining Pythian I was introduced to several configuration management systems and Puppet was one of them. Foreman is a system management tool which can be integrated with Puppet to manage puppet modules and to initiate puppet runs on hosts from web interface. This is very useful if you want to configure large number of systems.

Puppet kick, which was previously used to initiate puppet run from foreman is deprecated now.

For initiating puppet run from foreman interface, I used mcollective. Mcollective can be used to execute parallel jobs in remote systems. There are 3 main components,

    Client – Connects to the mcollective Server and send commands.
    Server – Runs on all managed systems and execute commands.
    Middleware – A message broker like activemq.

I used mcollective puppet module from Puppetlabs for my setup.

# puppet module install puppetlabs-mcollective

My setup includes middleware(activemq) and mcollective client in the puppet server and mcollective servers in all managed systems.

After the implementation, I found that Puppet run from foreman web interface is failing for some servers.

I found following in /var/log/foreman-proxy/proxy.log,

D, [2014-04-18T07:20:54.392392 #4256] DEBUG — : about to execute: /usr/bin/sudo /usr/bin/mco puppet runonce -I
W, [2014-04-18T07:20:56.167627 #4256] WARN — : Non-null exit code when executing ‘/usr/bin/sudo/usr/bin/’
E, [2014-04-18T07:20:56.175034 #4256] ERROR — : Failed puppet run: Check Log files

You can see that mco command is trying to execute a puppet run in and failing. mco command uses several sub commands called ‘applications’ to interact with all systems and ‘puppet’ is one of them.

While running the command in commandline, I received following,

# mco puppet runonce -I| [ > ] 0 / 1warn 2014/04/11 08:05:34: client.rb:218:in `start_receiver’ Could not receive all responses. Expected : 1. Received : 0

Finished processing 0 / 1 hosts in 22012.79 ms

No response from:

I am able to ping the server.

When I ran ‘mco ping’ I found that the server with issue is identified with short hostnames and others with fqdn.

$ mco pingserver time=89.95 ms time=95.26 ms time=96.16 ms

So mcollective is exporting a short hostname when foreman is expecting an FQDN (Fully Qualified Domain Name) from this server.

Foreman takes node name information from puppet certificate name and that is used for filtering while sending mco commands.

Mcollective exports identity differently. From,

The node’s name or identity. This should be unique for each node, but does not need to be.Default: The value of Ruby’s Socket.gethostname method, which is usually the server’s FQDN.
Sample value:
Allowed values: Any string containing only alphanumeric characters, hyphens, and dots — i.e. matching the regular expression /\A[\w\.\-]+\Z/

I passed FQDN as identity in the servers using mcollective module, which resulted in following setting,

# cat /etc/mcollective/server.cfg |grep identity
identity =

This allowed the command to run successfully and getting ‘Puppet Run’ from foreman to work.

# mco puppet runonce -I* [ ============================================================> ] 1 / 1

Now ‘mco ping’ looks good as well.

$ mco time=91.34 ms time=91.23 ms time=82.16 ms

Now let us check why this was happening.

mcollective identity is exported from ruby function Socket.gethostname.

From ruby source code you can see that Socket.gethostname is getting the value from gethostname().

./ext/socket/socket.c#ifdef HAVE_GETHOSTNAME
* call-seq:
* Socket.gethostname => hostname
* Returns the hostname.
* p Socket.gethostname #=> “hal”
* Note that it is not guaranteed to be able to convert to IP address using gethostbyname, getaddrinfo, etc.
* If you need local IP address, use Socket.ip_address_list.
static VALUE
sock_gethostname(VALUE obj)
#if defined(NI_MAXHOST)
#elif defined(HOST_NAME_MAX)
# define RUBY_MAX_HOST_NAME_LEN 1024


if (gethostname(buf, (int)sizeof buf – 1) < 0)

buf[sizeof buf - 1] = ”;
return rb_str_new2(buf);

gethostname is a glibc function which calls uname system call and copy the value from returned nodename.

So when foreman uses the FQDN value which it collects from puppet certificate name, mcollective exports the hostname returned by gethostname().

Now let us see how gethostname() gives different values in different systems.

When passing the complete FQDN in HOSTNAME parameter in /etc/sysconfig/network, we can see that Socket.gethostname is returning FQDN.

[root@centos ~]# cat /etc/sysconfig/network
NETWORKING=yes[root@centos ~]# hostname -v

[root@centos ~]# irb
1.9.3-p484 :001 > require ‘socket’
=> true
1.9.3-p484 :002 > Socket.gethostname
=> “”
1.9.3-p484 :003 >

The system which was having problem was having following configuration.

[root@centos ~]# cat /etc/sysconfig/network
HOSTNAME=server[root@centos ~]# hostname -v

[root@centos ~]# irb
1.9.3-p484 :001 > require ‘socket’
=> true
1.9.3-p484 :002 > Socket.gethostname
=> “server”
1.9.3-p484 :003 >

Here ruby is only returning the short hostname for Socket.gethostname. But it was having following entry in /etc/hosts. server

This allowed system to resolve FQDN.

[root@centos ~]# hostname -f -v
Resolving `server’ …
Result: h_name=`’
Result: h_aliases=`server’
Result: h_addr_list=`′

From ‘man hostname’.

The FQDN of the system is the name that the resolver(3) returns for the
host name.Technically: The FQDN is the name gethostbyname(2) returns for the host name returned by gethost-
name(2). The DNS domain name is the part after the first dot.

As the resolver is able to resolve the hostname from /etc/hosts, puppet is able to pick up the fqdn value for certificate which it later used by foreman.
But mcollective exports the short hostname returned by gethostname().

To fix the issue in Red Hat based linux distributions, we can try any of the following,

* Pass an FQDN in /etc/sysconfig/network like below.

# cat /etc/sysconfig/network


* Use a short hostname as HOSTNAME but make sure that it would not resolve to an FQDN in /etc/hosts or DNS (not really suggested).


* Pass short hostname or FQDN as HOSTNAME but, make sure that there is an entry like below in /etc/hosts and mcollective is exporting fqdn as identity. server
Categories: DBA Blogs

How to Configure an Azure Point-to-Site VPN – Part 1

Tue, 2014-08-05 06:24

This blog post is the first in a series of three which will demonstrate how to configure a Point-to-Site VPN step-by-step. Today’s post will teach you how to configure a virtual network and a dynamic routing gateway, and the following blog posts will demonstrate how to create the certificates, and how to configure the VPN client.

Nowadays we are opting to move parts of, or even entire systems to the cloud. In order to build a hybrid environment, we need to find a way to connect our enterprise/local network, also known as on-premises, and the cloud.

Currently, we have two options to connect Azure and On-Premises:

  1. Using a Point-to-Site VPN
  2. Using a Site-to-Site VPN

The first option, using a Point-to-Site VPN is the option I’ll be demonstrating. It is recommended when you need to connect only some servers of your network to Azure. On the other hand, the Site-to-Site VPN connects your entire on-premises network to Azure.


To start, connect to your Azure account ( and click in the “add button”, in the bottom left corner.

    1. Now follow the options that you can see in the image, and create a custom virtual network:|Screen Shot 2014-07-29 at 23.41.53
    2. Fill the Virtual Network name and the location you want to create.Screen Shot 2014-07-29 at 23.44.36
    3. Check “Configure a Point-to-Site VPN” (DNS server is an option setting, used for name resolution between this virtual network and your on-premises network):Screen Shot 2014-07-29 at 23.45.59
    4. Set the the IP range accordingly, after verify if this range is not overlapping with your on-premises network.Screen Shot 2014-07-29 at 23.54.26
    5. Click in the “add gateway subnet” button and than in the finish button (check mark).Screen Shot 2014-07-29 at 23.57.52
    6. Now you need to wait few minutes, while the virtual network is being created.Screen Shot 2014-07-29 at 23.58.11
    7. You will see a message like this when the process is done:Screen Shot 2014-07-30 at 00.00.24
    8. At this stage, you will be able to see the network created, under the network section.Screen Shot 2014-07-30 at 00.22.20
    9. Now we need to create a “Dynamic Routing Gateway”. To complete this, click on the network you just created and go to the Dashboard.Screen Shot 2014-07-30 at 00.31.00
    10. Click on “CREATE GATEWAY” button, in the page bottom and confirm your intention by selecting “Yes”.Screen Shot 2014-07-30 at 00.58.58
    11. It may take few minutes. You will see the message “CREATING GATEWAY”, as shown in the image bellow:Screen Shot 2014-07-30 at 00.59.47
    12. After a successfully creating, you will see the following:Screen Shot 2014-07-30 at 01.22.39

At this point, we are done with the Virtual Network creation. Now we can proceed to the certificate creation steps… Stay tuned for my next two posts.

Categories: DBA Blogs

Log Buffer #382, A Carnival of the Vanities for DBAs

Fri, 2014-08-01 07:41

Leading the way are the blogs which are acting as beacons of information guiding the way towards new vistas of innovation. This Log Buffer edition appreciates that role and presents you with few of those blogs.


Is there any recommended duration after which Exalytics Server should be rebooted for optimal performance of Server?

GlassFish On the Cloud Consulting Services by C2B2

This introduction to SOA Governance series contains two videos. The first one explains SOA Governance and why we need it by using a case study. The second video introduces Oracle Enterprise Repository (OER), and how it can help with SOA Governanc.

Oracle BI APPs provide two data warehouse generated fiscal calendars OOTB.

If you’re a community manager who’s publishing, monitoring, engaging, and analyzing communities on multiple social networks manually and individually, you need a hug.

SQL Server:

Spackle: Making sure you can connect to the DAC

Test-Driven Development (TDD) has a misleading name, because the objective is to design and specify that the system you are developing behaves in the ways that the customer expects, and to prove that it does so for the lifetime of the system.

Set a security standard across environments that developers can see and run, but not change.

Resilient T-SQL code is code that is designed to last, and to be safely reused by others. The goal of defensive database programming, the goal of this book, is to help you to produce resilient T-SQL code that robustly and gracefully handles cases of unintended use, and is resilient to common changes to the database environment.

One option to get notified when TempDB grows is to create a SQL Alert to fire a SQL Agent Job that will automatically send an email alerting the DBA when the Tempdb reaches a specific file size.


By default when using MySQL’s standard replication, all events are logged in the binary log and those binary log events are replicated to all slaves (it’s possible to filter out some schema).

Testing MySQL repository packages: how we make sure they work for you

If your project does not have something that you can adapt that quote to, odds are your testing is inadequate.

Compare and Synchronize with Updated Comparison Tools!

Beyond the FRM: ideas for a native MySQL Data Dictionary.

Categories: DBA Blogs

SQL Server and OS Error 1117, Error 9001, Error 823

Thu, 2014-07-31 08:32

small__3212904193 Along with other administrators, life of us, the DBAs are no different but full of adventure.  At times, we encounter an issue which is very new for us, rather, one that we have not faced in the past.  Today, I will be writing about such case.  Not so long back, in the beginning of June, I was having my morning tea I got a page from a customer we normally do not receive pages from. While I was analyzing the error logs, I noticed several lines of error like the ones below:

2014-06-07 21:03:40.57 spid6s Error: 17053, Severity: 16, State: 1.
LogWriter: Operating system error 21(The device is not ready.) encountered.
2014-06-07 21:03:40.57 spid6s Write error during log flush.
2014-06-07 21:03:40.57 spid67 Error: 9001, Severity: 21, State: 4.
The log for database 'SSCDB' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
2014-06-07 21:03:40.58 spid67 Database SSCDB was shutdown due to error 9001 in routine 'XdesRMFull::Commit'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted.
2014-06-07 21:03:40.65 spid25s Error: 17053, Severity: 16, State: 1.
fcb::close-flush: Operating system error (null) encountered.
2014-06-07 21:03:40.65 spid25s Error: 17053, Severity: 16, State: 1.
fcb::close-flush: Operating system error (null) encountered.

I had never seen this kind of error in the past so my next step was to check Google , which returned too many results. There were two sites that were worthwhile: The first site covers the OS Error 1117 , a Microsoft KB article, whereas the second site by Erin Stellato ( B | T ) talks about other errors like Error 823, Error 9001.  Further, I checked the server details and found that it’s exactly what the issue is here,  the server is using  PVSCSI (Para Virtualized SCSI) controller to LSI on the VMWare host. 

Resolving the issue

I had a call with client and have his consent to restart the service. This was quick, and after it came back, I ran checkdb – “We are good!” I thought.

But wait. This was the temporary fix. Yes, you read that correctly. This was the temporary fix, and this issue is actually lies with the VMWare, it’s a known issue according to VMWare KB Article. To fix this issue, we’ll have to upgrade to vSphere 5.1 according to the VMWare KB article.

Please be advised that the first thing that I did here is to apply the temporary fix, the root cause analysis – I did that last, after the server is up and running fine.

photo credit: Andreas.  via photopin CC

Categories: DBA Blogs

Interesting Behavior of MaxCmdsInTran Parameter

Wed, 2014-07-30 10:06

I recently worked on transactional replication issue and discovered interesting behavior of the log reader agent switch called MaxCmdsInTran and wanted to share it with you guys.

Lets take a look at  the use of this switch by looking at the msdn documentation below,

MaxCmdsInTran number_of_commands

Specifies the maximum number of statements grouped into a transaction as the Log Reader writes commands to the distribution database. Using this parameter allows the Log Reader Agent and Distribution Agent to divide large transactions (consisting of many commands) at the Publisher into several smaller transactions when applied at the Subscriber. Specifying this parameter can reduce contention at the Distributor and reduce latency between the Publisher and Subscriber. Because the original transaction is applied in smaller units, the Subscriber can access rows of a large logical Publisher transaction prior to the end of the original transaction, breaking strict transnational atomicity. The default is 0, which preserves the transaction boundaries of the Publisher.

However, I observed that if you do any update on Primary Column which won’t be split into multiple smaller transactions as described in the documentation.

Looking further on this reveals that  it probably the effect of bounded update. Bounded update has to be processed as a whole since it send all delete followed by all insert, can’t break into smaller transactions as it won’t know what would be a safe boundary.

The key difference comes from the fact that how updates are replicated when you update PK column and non-PK column. Let’s take an example to look at this (In this example C1 is non-PK and C2 is PK column)

If you update the non-PK column it replicated as update.

– Updating non-PK column

begin tran My_Deferred_Update_1_Row

update T1 set c1 = 1 where C1=2

commit tran My_Deferred_Update_1_Row

– Below is what gets added in msrepl_commands

exec sp_replshowcmds 1000

xact_seqno                                      command

0x0000016E000005330004 {CALL [dbo].[sp_MSupd_dbot1] (1,,2,0×01)}

What is bounded update?

However when you do a update on PK/Clustered index columns are replicated as Delete/Insert pair.

– Updating unique column

begin tran My_Bounded_Update_2_Rows

update T1 set c2 = c2 + 1000

commit tran My_Bounded_Update_2_Rows

– Below is what get added in msrepl_commands

exec sp_replshowcmds 1000

xact_seqno                                        command

0x00000017000000B5000E  {CALL?[dbo].[sp_MSdel_dboT1] (1)}

0x00000017000000B5000E  {CALL?[dbo].[sp_MSdel_dboT1] (2)}

0x00000017000000B5000E  {CALL?[dbo].[sp_MSins_dboT1] (1,3000,1)}

0x00000017000000B5000E  {CALL?[dbo].[sp_MSins_dboT1] (2,1002,2)}

As you can see in above case when we do update on PK/clustered index column, the updates are sent as deletes followed by inserts( this is called bounded update). This is one single transaction which is converted into delete and update pair. All deletes are sent first followed by insert.

We cannot break this transaction (PK update) as it will cause the delete (few or all) to happen first and then insert in separate transaction and will break transaction boundary, breaking this operation into multiple transaction will cause inconsistency and that’s most probably reason for this switch won’t work in this situation.

Why replication sending all deletes first and then all inserts and not the pairs delete/insert in order?

Let’s assume table A contains two rows, unique column C1 values being 1 and 2.

Now user runs the following: update A set c1 = c1 + 1.

The log records will be like


Del 1

Ins 2

Del 2

Ins 3


And the commands posted in the distribution database will be like

{CALL [sp_MSdel_dboA] (1)}

{CALL [sp_MSdel_dboA] (2)}

{CALL [sp_MSins_dboA] (1,2)}

{CALL [sp_MSins_dboA] (2,3)}

But if its send update directly, you’ll see

Update A set c1 = 2

Update A set c1 = 3

In that case, the first update will fail since c1 = 2 already exist. that’s why it deletes the row first before inserting them back to the new value.

I would recommend to look at the option of publishing the stored procedure execution to avoid this kind of huge updates which will cause performance issues in replication.

Happy Reading!



Categories: DBA Blogs

In-Memory Column Store: 10046 May Be Lying to You!

Wed, 2014-07-30 07:46

The Oracle In-Memory Column Store (IMC) is a new database option available to Oracle Database Enterprise Edition (EE) customers. It introduces a new memory area housed in your SGA, which makes use of the new compression functionality brought by the Oracle Exadata platform, as well as the new column oriented data storage vs the traditional row oriented storage. Note: you don’t need to be running on Exadata to be able to use the IMC!


Part I – How does it work?

In this part we’ll take a peek under the hood of the IMC and check out some of its internal mechanics.

Let’s create a sample table which we will use for our demonstration:

create table test inmemory priority high
select a.object_name as name, rownum as rn,
sysdate + rownum / 10000 as dt
from all_objects a, (select rownum from dual connect by level <= 500)

Almost immediately upon creating this table, the w00? processes will wake up from sleeping on the event ‘Space Manager: slave idle wait’ and start their analysis to check out the new table. By the way, the sleep times for this event are between 3 and 5 seconds, so it’s normal if you experience a little bit of a delay.

The process who picked it up will then create a new entry in the new dictionary table compression$, such as this one:

SQL> exec pt('select ts#,file#,block#,obj#,dataobj#,ulevel,sublevel,ilevel,flags,bestsortcol, tinsize,ctinsize,toutsize,cmpsize,uncmpsize,mtime,spare1,spare2,spare3,spare4 from compression$');
TS# : 4
FILE# : 4
BLOCK# : 130
OBJ# : 20445
DATAOBJ# : 20445
ILEVEL : 1582497813
TINSIZE : 16339840
TOUTSIZE : 9972219
MTIME : 13-may-2014 23:14:46
SPARE1 : 31
SPARE2 : 5256
SPARE3 : 571822

Plus, there is also a BLOB column in compression$, which holds the analyzer’s findings:

SQL> select analyzer from compression$;

004B445A306AD5025A0000005A6B8E0200000300000000000001020000002A0000003A0000004A(output truncated for readability)

A quick check reveals that this is indeed our object:

SQL> exec pt('select object_name, object_type, owner from dba_objects where data_object_id = 20445');

PL/SQL procedure successfully completed.

And we can see the object is now stored in the IMC by looking at v$im_segments:

SQL> exec pt('select * from v$im_segments');
INMEMORY_SIZE : 102301696
BYTES : 184549376
CON_ID : 0

PL/SQL procedure successfully completed.

Thus, we are getting the expected performance benefit of it being in the IMC:

SQL> alter session set inmemory_query=disable;

Session altered.

Elapsed: 00:00:00.01
SQL> select count(*) from test;


Elapsed: 00:00:03.96
SQL> alter session set inmemory_query=enable;

Session altered.

Elapsed: 00:00:00.01
SQL> select count(*) from test;


Elapsed: 00:00:00.13

So far, so good.

Part II – Execution Plans

Some things we need to be aware of, though, when we are using the IMC in One of them being that we can’t always trust in the execution plans anymore.

Let’s go back to our original sample table and recreate it using the default setting of INMEMORY PRIORITY NONE.

drop table test purge

create table test inmemory priority none
select a.object_name as name, rownum as rn,
sysdate + rownum / 10000 as dt
from all_objects a, (select rownum from dual connect by level <= 500)

Now let’s see what plan we’d get if we were to query it right now:

SQL> explain plan for select name from test where name = 'ALL_USERS';


SQL> @?/rdbms/admin/utlxpls

Plan hash value: 1357081020

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 614 | 12280 | 811 (73)| 00:00:01 |
|* 1 | TABLE ACCESS INMEMORY FULL| TEST | 614 | 12280 | 811 (73)| 00:00:01 |

Predicate Information (identified by operation id):

1 – inmemory(“NAME”=’ALL_USERS’)

14 rows selected.

Okay, you might say now that EXPLAIN PLAN is only a guess. It’s not the real plan, and the real plan has to be different. And you would be right. Usually.

Watching the slave processes, there is no activity related to this table. Since it’s PRIORITY is NONE, it won’t be loaded into IMC until it’s actually queried for the first or second time around.

So let’s take a closer look than, shall we:

SQL> alter session set tracefile_identifier='REAL_PLAN';

Session altered.

SQL> alter session set events ’10046 trace name context forever, level 12′;

Session altered.

SQL> select name from test where name = ‘ALL_USERS’;

Now let’s take a look at the STAT line on that tracefile. Note: I closed the above session to make sure that we’ll get the full trace data.

PARSING IN CURSOR #140505885438688 len=46 dep=0 uid=64 oct=3 lid=64 tim=32852930021 hv=3233947880 ad='b4d04b00' sqlid='5sybd9b0c4878'
select name from test where name = 'ALL_USERS'
PARSE #140505885438688:c=6000,e=10014,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=1357081020,tim=32852930020
EXEC #140505885438688:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=32852930241
WAIT #140505885438688: nam='SQL*Net message to client' ela= 25 driver id=1650815232 #bytes=1 p3=0 obj#=20466 tim=32852930899
WAIT #140505885438688: nam='direct path read' ela= 13646 file number=4 first dba=21507 block cnt=13 obj#=20466 tim=32852950242
WAIT #140505885438688: nam='direct path read' ela= 2246 file number=4 first dba=21537 block cnt=15 obj#=20466 tim=32852953528
WAIT #140505885438688: nam='direct path read' ela= 1301 file number=4 first dba=21569 block cnt=15 obj#=20466 tim=32852955406

FETCH #140505885438688:c=182000,e=3365871,p=17603,cr=17645,cu=0,mis=0,r=9,dep=0,og=1,plh=1357081020,tim=32857244740
STAT #140505885438688 id=1 cnt=1000 pid=0 pos=1 obj=20466 op='TABLE ACCESS INMEMORY FULL TEST (cr=22075 pr=22005 pw=0 time=865950 us cost=811 size=12280 card=614)'

So that’s still the wrong one right there, and the STAT line even clearly shows that we’ve actually done 22005 physical reads, and therefore likely no in-memory scan, but a full scan from disk. There’s clearly a bug there with the execution plan reported, which is plain wrong.

Thus, be careful about using INMEMORY PRIORITY NONE, as you may not get what you expect. Since the PRIORITY NONE settings may also be overridden by any other PRIORITY settings, your data may get flushed out of the IMC, even though your execution plans will say otherwise. And I’m sure many of you know it’s often not slow response times on queries which cause a phone ringing hot. It’s inconsistent response times. This feature, if used inappropriately will pretty much guarantee inconsistent response times.

Apparently, what we should be doing is size up the In Memory Column store appropriately, to hold the objects we actually need to be in there. And make sure they’re always in there by setting a PRIORITY of LOW or higher. Use CRITICAL and HIGH to ensure the most vital objects of the application are populated first.

There was one other oddity that I noticed while tracing the W00? processes.

Part III – What are you scanning, Oracle ?

The m000 process’ trace file reveals many back-to-back executions of this select:

PARSING IN CURSOR #140670951860040 len=104 dep=1 uid=0 oct=3 lid=0 tim=23665542991 hv=2910336760 ad='fbd06928' sqlid='24uqc4aqrhdrs'
select /*+ result_cache */ analyzer from compression$ where obj#=:1 and ulevel=:2

They all supply the same obj# bind value, which is our table’s object number. The ulevel values used vary between executions.

However, looking at the related WAIT lines for this cursor, we see:

WAIT #140670951860040: nam='direct path read' ela= 53427 file number=4 first dba=18432 block cnt=128 obj#=20445 tim=23666569746
WAIT #140670951860040: nam='direct path read' ela= 38073 file number=4 first dba=18564 block cnt=124 obj#=20445 tim=23666612210
WAIT #140670951860040: nam='direct path read' ela= 38961 file number=4 first dba=18816 block cnt=128 obj#=20445 tim=23666665534
WAIT #140670951860040: nam='direct path read' ela= 39708 file number=4 first dba=19072 block cnt=128 obj#=20445 tim=23666706469
WAIT #140670951860040: nam='direct path read' ela= 40242 file number=4 first dba=19328 block cnt=128 obj#=20445 tim=23666749431
WAIT #140670951860040: nam='direct path read' ela= 39147 file number=4 first dba=19588 block cnt=124 obj#=20445 tim=23666804243
WAIT #140670951860040: nam='direct path read' ela= 33654 file number=4 first dba=19840 block cnt=128 obj#=20445 tim=23666839836
WAIT #140670951860040: nam='direct path read' ela= 38908 file number=4 first dba=20096 block cnt=128 obj#=20445 tim=23666881932
WAIT #140670951860040: nam='direct path read' ela= 40605 file number=4 first dba=20352 block cnt=128 obj#=20445 tim=23666924029
WAIT #140670951860040: nam='direct path read' ela= 32089 file number=4 first dba=20612 block cnt=124 obj#=20445 tim=23666962858
WAIT #140670951860040: nam='direct path read' ela= 36223 file number=4 first dba=20864 block cnt=128 obj#=20445 tim=23667001900
WAIT #140670951860040: nam='direct path read' ela= 39733 file number=4 first dba=21120 block cnt=128 obj#=20445 tim=23667043146
WAIT #140670951860040: nam='direct path read' ela= 17607 file number=4 first dba=21376 block cnt=128 obj#=20445 tim=23667062232

… and several more.

Now, compression$ contains only a single row. Its total extent size is neglibile as well:

SQL> select sum(bytes)/1024/1024 from dba_extents where segment_name = 'COMPRESSION$';


So how come Oracle is reading so many blocks ? Note that each of the above waits is a multi-block read, of 128 blocks.

Let’s take a look at what Oracle is actually reading there:

pt('select segment_name, segment_type, owner
from dba_extents where file_id = 4
and 18432 between block_id and block_id + blocks - 1');


PL/SQL procedure successfully completed.

There’s our table again. Wait. What ?

There must be some magic going on underneath the covers here. In my understanding, a plain select against table A, is not scanning table B.

If I manually run the same select statement against compression$, I get totally normal trace output.

This reminds me of the good old:

SQL> select piece from IDL_SB4$;
ORA-00932: inconsistent datatypes: expected CHAR got B4

But I digress.

It could simply be a bug that results in these direct path reads being allocated to the wrong cursor. Or it could be intended, as it’s indeed this process’ job to analyze and load this table, and using this the resource usage caused by this is instrumented and can be tracked?

Either way, to sum things up we can say that:

- Performance benefits can potentially be huge
- Oracle automatically scans and caches segments marked as INMEMORY PRIORITY LOW|MEDIUM|HIGH|CRITICAL (they don’t need to be queried first!)
- Oracle scans segments marked as INMEMORY PRIORITY NONE (the default) only after they’re accessed the second time – and they may get overridden by higher priorities
- Oracle analyzes the table and stores the results in compression$
- Based on that analysis, Oracle may decide to load one or the other column only into IMC, or the entire table, depending on available space, and depending on the INMEMORY clause used
- It’s the W00? processes using some magic to do this analysis and read the segment into IMC.
- This analysis is also likely to be triggered again, whenever space management of the IMC triggers again, but I haven’t investigated that yet.

Categories: DBA Blogs