Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 1 hour 52 min ago

The MySQL Community Pay-Your-Own-Way Dinner

Mon, 2014-02-10 10:26

Once again, Pythian is organizing an event that by now may be considered a tradition: The MySQL community dinner at Pedro’s! This dinner is open to all MySQL community members, as many of you will be in town for the MySQL Conference that week. Here are the details:

What: The MySQL Community Pay-Your-Own-Way Dinner

When: Wednesday April 2, 2014 – Meet us at 6:30 PM in the lobby of the Hyatt Santa Clara, or at 7 PM at Pedro’s (You are welcome to show up later, too!)

Cost: The meal will be $25 USD including tax and gratuities. Please bring cash. (See menu below)

Where: Pedro’s Restaurant and Cantina – 3935 Freedom Circle, Santa Clara, CA 95054

How: RSVP through Eventbrite

Please note: Due to the historically high attendance for this event, Pedro’s has asked that each person pays in cash to simplify billing. Pedro’s can handle large groups of people, but we would like to have an idea of how many people are attending to give Pedro’s an accurate number of attendees so that they can be adequately prepared.

Pythian attendees:

1. Paul Vallee

2. Wagner Bianchi

3. Danil Zburivsky

4. Alex Gorbachev

5. Derek Downey

6. Chad Scheiter

7. Add your name…

Looking forward to seeing you all at the event!

Categories: DBA Blogs

2013 Year in Review – Oracle E-Business Suite

Mon, 2014-02-10 10:25

Here are the Top 5 things in Oracle E-Business Suite world that will have major impact in 2014 and beyond.

1. Oracle E-Business Suite 12.2 Now Available

2013 started on a low note in Oracle E-Business Suite (EBS) World. Many people were expecting some announcement related to upcoming EBS release during Openworld 2012. But all they got it is a extension of support deadline  for existing 11i EBS customers. Oracle finally announced Oracle EBS R12.2 few days before Openworld 2013. This releases packs exciting features like Online Patching, which elevates Oracle E-Business Suite ranking in ERP systems domain. Online Patching will enable large multi-national customers consolidate their ERP systems in different Countries into one Single Global Oracle E-Business Suite instance, as it cuts down the downtime required for patching maintenance window to all most nil.  This is a big plus point for clients who cannot afford downtime because their user base is spread all over the world. 2014 will be a years of upgrades to R12.2 for all clients.

2. Database Certified with Oracle E-Business Suite

Around the same time as R12.2 announcement, Oracle certified 12c Database with Oracle EBS. The good news here is they certified Oracle 11i also with 12c Database. This should give EBS clients option to get onto newest version of Oracle Database and take advantage of the new features of 12c database. The effort involved in upgrading database is significantly less than upgrading to newer version of EBS. So i believe many customers will uptake 12c database upgrade before the R12.2 EBS upgrade. Also upgrading database to newer version earlier than EBS, will save some hours during future R12.2 upgrade downtime window.

3. E-Business Suite Support Timelines Updated at OpenWorld 2013

Oracle once again extended the support timelines for 11i customers. They named it as Exception support and it ends on December 2015. During this Exception support period, Oracle will primarily provide fixes for Sev 1 issues and Security patches. So this gives 2 years of additional time to Customers on 11i to migrate to latest R12.2. With typical R12 upgrades taking around 1 year time, The sooner you plan and start your R12.2 migration the better.

4. No to Third-Party Tools to Modify Your EBS Database

Oracle Development warned officially in their blog about use third party tools to modify, archive & purge data in Oracle E-Business suite. Managing data growth in Oracle EBS is a known problem. Now Oracle wants customers to use Oracle Database technologies like ILM, Advanced Compression and Partitioning, to archive the data instead of using third party utilities. Note that all these database features will cost customers additional money in licensing costs. So get your bargaining hat on with your Oracle Account Manager and score some discounts using this oracle Achilles heel namely EBS purging and archiving data.

5. Sign E-Business Suite JAR Files Now

Do you remember the days when Oracle EBS moved from Oracle Jinitiator to Sun JRE for oracle forms? Then be prepared for one more similar thing around oracle forms. With stream of viruses and malware that exploit bugs in Oracle/Sun JRE made Oracle to tighten security around Oracle JRE. Its now required to sign forms jar files with a real certificate. In future releases of Oracle JRE7, Unsigned Oracle forms will stop working completely. So customers caught unaware of this will be in for big trouble with user complaints.

Categories: DBA Blogs

Dancer In Chains

Mon, 2014-02-10 08:46

Sometimes, you idly think about a problem, and an answer comes to you. And it has the simplicity and the elegance of a shodo brush-stroke. It is so exquisitely perfect, you have to wonder… Have you reached the next level of enlightenment, or did the part of your brain responsible for discernment suddenly called it quits?

I’ll let you be judge of it.

Chains of Command

Something that Catalyst has and Dancer doesn’t is chained routes. A little while ago, I came up with a way to mimic chains using megasplats. People agreed: it was cute, but not quite de par with Catalyst’s offering.

Which brings us to my epiphany of yesterday…

Forging the Links

What if we did things just a tad different than what Catalyst does? What if the bits making the chain segments were defined outside of routes:

my $country = chain '/country/:country' => sub {
    # silly example. Typically much more work would 
    # go on in here
    var 'site' => param('country');

my $event = chain '/event/:event' => sub {
    var 'event' => param('event');

And what if we could put them together as we define the final routes:

# will match /country/usa/event/yapc
get chain $country, $event, '/schedule' => sub {
    return sprintf "schedule of %s in %s\n", map { var $_ } qw/ site event/;

Or we could forge some of those segments together as in-between steps too:

my $continent = chain '/continent/:continent' => sub {
    var 'site' => param('continent');

my $continent_event = chain $continent, $event;

# will match /continent/europe/event/yapc
get chain $continent_event, '/schedule' => sub {
    return sprintf "schedule of %s in %s\n", map { var $_ } qw/ event site /;

Or, heck, we could even insert special in-situ operations directly in the route when the interaction between two already-defined segments needs a little bit of fudging:

# will match /continent/asia/country/japan/event/yapc
# and will do special munging in-between!

get chain $continent, 
          sub { var temp => var 'site' },
          sub {
              var 'site' => join ', ', map { var $_ } qw/ site temp /
            => sub {
                return sprintf "schedule of %s in %s\n", map { var $_ } 
                               qw/ event site /;

Wouldn’t that be something nice?

The Wind of Chains…ge

Here’s the shocker: all the code above is functional. Here’s the double-shocker: the code required to make it happen is ridiculously short.

First, I had to create a class that represents chain bits. The objects are simple things keeping track of the pieces of path and code chunks constituting the segment.

package Chain;

use Moose;

has "path_segments" => (
    traits => [ qw/ Array /],
    isa => 'ArrayRef',
    is => 'ro',
    default => sub { [] },
    handles => {
        add_to_path       => 'push',
        all_path_segments => 'elements'

sub path {
    my $self = shift;
    return join '', $self->all_path_segments;

has code_blocks => (
    traits => [ qw/ Array /],
    isa => 'ArrayRef',
    is => 'ro',
    default => sub { [] },
    handles => {
        add_to_code     => 'push',
        all_code_blocks => 'elements'

sub code {
    my $self = shift;

    my @code = $self->all_code_blocks;
    return sub {
        my $result;
        $result = $_->(@_) for @code;
        return $result;

sub BUILD {
    my $self = shift;
    my @args = @{ $_[0]{args} };

    my $code;
    $code = pop @args if ref $args[-1] eq 'CODE';

    for my $segment ( @args ) {
        if ( ref $segment eq 'Chain' ) {
            $self->add_to_path( $segment->all_path_segments );
            $self->add_to_code( $segment->all_code_blocks );
        elsif( ref $segment eq 'CODE' ) {
        else {
            $self->add_to_path( $segment );

    $self->add_to_code($code) if $code;

sub as_route {
    my $self = shift;

    return ( $self->path, $self->code );

Then, I had to write the chain() DSL keyword making the junction between the objects and the Dancer world:

sub chain(@) {
    my $chain = Chain->new( args => [ @_ ] );

    return wantarray ? $chain->as_route : $chain;

And then… Well, I was done.

What Happens Next

Obviously, there will be corner cases to consider. For example, the current code doesn’t deal with regex-based paths, and totally ignore prefixes. But I have the feeling it could go somewhere…

So, yeah, that’s something that should make it to CPAN relatively soon. Stay tuned!

Categories: DBA Blogs

Log Buffer #358, A Carnival of the Vanities for DBAs

Fri, 2014-02-07 08:36

Here in your hands, there is yet another fresh sizzling hot edition of the celebrated Log Buffer. Enjoy the collection of assorted gems from Oracle, SQL Server and MySQL.


Oracle Cloud—The Modern Cloud for Your Modern Business.

Oracle ADF Virtual Developer Day – On Demand in YouTube.

When Impala started, many of the early adopters were experts in HiveQL, so the Impala documentation assumed some familiarity with Hive SQL syntax and behavior.

Improve Your Customer Experience Through World-Class Data Quality.

There continues to be a disproportionate amount of hype around ‘NoSQL‘ data stores.

SQL Server:

What are your options for sending a variable number of choices in a parameter to a stored procedure? Alex Grinberg looks at three techniques you can use.

Window Functions in SQL greatly simplify a whole range of financial and statistical aggregations on sets of data.

When should you use a SQL CLR Aggregate? Lots of people have struggled with this one, but David Poole found a use, and has some interesting performance data analysis as well.

Free eBook: Understanding SQL Server Concurrency.

Fix SQL Server Log Shipping After a New Database File has been Added.


MariaDB 5.5.35 was recently released (it is the latest MariaDB 5.5), and is available for download.

jQuery and GIS distance in MariaDB.

Filesort optimization in 5.7.3: pack values in the sort buffer

Keith Hollman has written about –use-tts backup & restore.

Why delegating code to MySQL Stored Routines is poor engineering practice.

Categories: DBA Blogs

Untold Secrets of SQL Server: What if?

Wed, 2014-02-05 10:45

It’s common practice for IT vendors to keep some of the commands and statements for their own use, and to avoid publishing or documenting these tools for two reasons:

  1. Some of these commands are unsafe for production environments and should be used carefully on non-production systems only – otherwise it may cause some adverse effects
  2. To maintain an edge over third parties, so the vendor can always be superior in analysis, and quality of support.

Microsoft is not an exception. Today I’ll share one of the cool secrets that Microsoft did not publish, document or even support: It’s called what-if statement.

In several cases, individuals would confront a situation where they would want to test the execution of a query or an application on a different hardware – upsize or downsize. One solution is to bring an actual server, perform the setup of OS and SQL Server, and restore the database to the new hardware.

Another alternative is made possible by using the what-if statement: Simply run the query or application session after issuing few statements to the optimizer to simulate the needed hardware. Of course, if you are planning to downsize, you will get the full benefit of the command as the actual hardware in terms of CPUs and RAM is superior to the one being simulated. However, if you are planning to upgrade the server, such as in the case the actual server is using 4 CPUs while the emulated one is using 16 CPUs, then you still get the feel of what execution plans will be used. Unfortunately you will never get the performance of 16 cores using 4 cores only!

The syntax of what-if statement goes like this:

DBCC Optimizer_WhatIf (1,4) –tells optimizer that your server has 4 CPUs


DBCC Optimizer_WhatIf (2,3327) –tells optimizer to emulate a machine with only 3 GB of RAM


DBCC Optimizer_WhatIf (3,64) – Sets your system to simulate 64 bit systems


–insert your query here

DBCC Optimizer_WhatIf (1, 0) –clear CPU


DBCC Optimizer_WhatIf (2, 0) –clear RAM


DBCC Optimizer_WhatIf (3, 0) –clear 64 bits


Although the virtualization of SQL Server might resolve the issue of testing applications with higher-end or lower-end hardware, there will still be a significant amount of time consumed during the installation and/or restoring of the database. The what-if statement might give you the look and feel of how queries will perform on another hardware, without the hassle of actually installing one.

Nevertheless, it is important to emphasize that the what-if statement is good in development and pre-testing stages –you still need to get servers (real or virtualized) for doing official testing, quality testing, users testing, stress testing, and of course going to production.

Categories: DBA Blogs

Even Posting in Forums Requires Etiquette

Wed, 2014-02-05 09:40

Etiquette — as if we do not have enough rules for dining, dress, meetings, corporate and opening doors for ladies!

From time to time, I browse forums which are mainly SQL Server related. I check on threads, and reply to some if I have something to add. I have to say that forums are an excellent way to enhance your knowledge about any product; people come with different and new problems and scenarios and you have to read more about subject or build a proof of concept to be able to contribute.

Forums aren’t always tightly regulated or moderated — sometimes they are, but in general people are free to post whatever they think as long as it is not overly offensive. Compared to some newspapers where they have moderation over comments, the forums (including MSDN) do not do much of that. In my personal experience, I have seen many times that a forum post can slip to a non-productive direction and this can discourage some people posting or even contributing to discussions.

I’ve included some points about forums, that I’ve learned through personal experience:

  • Forums are not intended for product support.

From time to time, someone poses a question and stresses how urgent the matter is — their business is down, and they need help. My advice to them is to simply open a case with your vendor if it is that urgent, as the time spent to get answers on forums can cost you much more than getting vendor product. For critical issues, you do not want to get advice on forums but rather from experienced vendor support. Remember that the people who answer you, do it for free and they may be busy supporting their own environment or have personal arrangements. Do not pester them for a quicker reply!

  • Forums are not to answer your homework or interview questions.

Asking, “How can I do that?” without even trying is not appealing. Yes, we all struggle sometimes but we have to try. I would find it much better if you outline an approach or an idea and ask for advice.

Also, do not bring your interview questions to the forums and ask for answer. I have seen this personally because I interview people from time to time; we give them some questions and they go on forums asking for answers, whom you think you are cheating? Sorry pal, just yourself!

  • Put in some effort, and do a little research before posting.

This may be relative but if you are asking, “What is the difference between a clustered and a non-clustered index?” then you may not have heard about search engines! Research your topic a bit and if you cannot find answers then you can ask.

In addition, it is common to find repliers quoting other or previous forum posts as they deal with almost exact topic, so you should also search forum topics beforehand.666b0efec6[1]

Same goes for the people who respond — Research your answer, too. If possible, support it with a vendor documentation, blog posts or a whitepaper.

  •  Avoid robotic responses.

Back in the time, MSDN forums would award points for every reply even if you said “Hello” and some people abused that. It makes the thread longer with no value.
I remember one answer specifically where someone would always jump and reply asking this

Please run this query and paste the answer

SERVERPROPERTY(‘ProductVersion’) , SERVERPROPERTY(‘ProductLevel’) ,

Many times, the poster would have already included the information in his original post declaring he has SQL 200X , SPX, Edition X with X CPU cores and X GB of memory!

  • Not everyone is a Guru, so try to avoid getting personal.

This applies to those who post, and respond to questions. Regardless if the question looks novice, it is still worth the answer. No matter how a reply may look, it is still a contribution (Obviously restrictions apply.)

On MSDN T-SQL forum , for example, a known figure is famous for his bullish replies , criticizing posters and contributors for their questions, approach or answers; examples of some of his replies are: “your screwed up design “, “you failed this basic IT standard”, “Your code is also bad”, “I would fire you for this coding” …etc. It’s vicious!

  • Avoid a very generic/vague title.

Try to avoid titles like, “Problem with SQL server,” “Someone help me with this issue,” “An urgent question,” or “TSQL query help.” Potential responders usually prefer to have a glimpse about the topic from the title — Someone browsing forums may find issue relevant to his/her experience if the title is more descriptive.

  •   Please – no more spaghetti posts.

How about some formatting? Posting few hundred words all in one paragraph is a turn-off, hurts eye, hard to read and hard to get back to when I need more details.
Use spacing and get relevant information together. You can start with a brief about your topology, technology used, versions et al. Follow that with your situation, problem you are facing, your attempted resolutions and then ask what we can do for you.0910_scarpettas-spaghetti-recipe-2_036[1]

If your post has a HUGE code snippet, then it is preferred to attach it as a query file. Use a reputed fileshare provide like SKydrive (soon to be OneDrive,) Google Drive or DropBox.

In addition, please – CAPS makes me nervous!

  • Format your code

MSDN forums for example, has a nice code formatter – use it! It makes the post easier to read. If the forum does not have a code formatter then you can format it elsewhere. Here is a nice code formatter I use frequently:

  • Stop those cheeky edits.

If you post an answer then people add to your answer or discuss it then you go and edit your earlier one to look better or more sophisticated then that is rather a cheating. It makes subsequent answers look weird and irrelevant. As long as it is not rude, there is no harm in a less than optimal answer, just try harder next time.

  •  Include a disclaimer if you work for a certain company.

If someone asks for a recommendation for a third party tool, and you work for a vendor of one of those tools, it is OK to tout your product! But ensure that you disclose that you work for the vendor.

  • Follow up on your posts.

Dead blog posts clog the forum, and can send more valuable posts down the stack. It is quite common that people ask for more details, so you should followup and reply. If an answer satisfies you, then mark it as answer. Ir helps those who have similar situation, and pays back to those who contributed.blessu[1] If you found an answer on your own, post it and mark it as an answer.

How about you? If you could change ONE thing in forums, what would it be?

Categories: DBA Blogs

Puppet and MySQL: A How-To Guide

Tue, 2014-02-04 12:24

Today’s blog post will include a how-to guide on setting up Puppet Master and Agents for streamlined server management. First off you’ll have to configure each of your puppet agent nodes (i.e. clients.)

Get the latest repo from Puppet Labs for your distro — These will have to be added both to the puppet master and to the puppet agent nodes.

Log in to each node and install/configure the puppet agent:

apt-get install puppet #[or yum install puppet for RHEL derivatives]

Edit your /etc/puppet/puppet.conf and add your puppet master servername under the [main] section:

server= # e.g. dev-laptop in my case

Also, edit your /etc/default/puppet file and set START to yes:

# Start puppet on boot?

Now restart the puppet agent service on all nodes and enable the puppet service:

service puppet restart
puppet resource service puppet ensure=running enable=true

After running the above commands on each of your agent nodes, it’s now time to hop over onto your puppet master server node and configure your puppet master:

apt-get install puppetmaster #[or yum install puppet-server for RHEL]

sudo puppet resource service puppetmaster ensure=running enable=true

By running the following command you should now see the signed certificate requests made by your puppet agents:

sudo puppet cert list
“deb-box-1.lan” (09:00:F5:B9:CF:D0:E7:BF:C5:D3:8B:74:BC:3E:87:E2)
“deb-box-2.lan” (AC:B5:D0:BD:CF:AC:C8:9C:83:21:86:09:40:D3:ED:1E)
“deb-box-3.lan” (67:4E:81:48:18:73:47:16:2F:2C:3D:31:4D:4D:80:F8)

You will have to certify each request in order to enable the agent:

puppet cert sign “deb-box-1.lan”
notice: Signed certificate request for deb-box-1.lan
notice: Removing file Puppet::SSL::CertificateRequest deb-box-1.lan at ‘/var/lib/puppet/ssl/ca/requests/deb-box-1.lan.pem’

puppet cert sign “deb-box-2.lan”
notice: Signed certificate request for deb-box-2.lan
notice: Removing file Puppet::SSL::CertificateRequest deb-box-2.lan at ‘/var/lib/

puppet cert sign “deb-box-3.lan”
notice: Signed certificate request for deb-box-3.lan
notice: Removing file Puppet::SSL::CertificateRequest deb-box-3.lan at ‘/var/lib/puppet/ssl/ca/requests/deb-box-3.lan.pem’

Finally, test the client puppet agent by running:

puppet agent –test

Now you are ready to create/install modules to manage various server components. Activities such as managing your MySQL databases can be performed by installing the puppetlabs/mysql module as follows:

puppet module install puppetlabs/mysql

Your module has been added! From here it is as simple as adding your requirements to your “site.pp” file in “/etc/puppet/manifests” e.g. (1.) to ensure mysql-server is installed on the node <serverbox.lan> with (2.) a root password = “root_pass_string” & (3.) max_connections = “1024″ the following configuration will suffice (all options are configurable, just specify the section you would like modified & the variable as in the example ~ for more info):

# vi /etc/puppet/manifests/site.pp

node ‘deb-box-1.lan’ {
class { ‘::mysql::server’:
root_password => ‘root_pass_string’,
override_options => { ‘mysqld’ => { ‘max_connections’ => ’1024′ } }

node ‘deb-box-2.lan’ {
class { ‘::mysql::server’:
root_password => ‘root_pass_string’,
override_options => { ‘mysqld’ => { ‘max_connections’ => ’1024′ } }

node ‘deb-box-3.lan’ {
class { ‘::mysql::server’:
root_password => ‘root_pass_string’,
override_options => { ‘mysqld’ => { ‘max_connections’ => ’1024′ } }

Since site.pp is configured run the following command on the puppet agent node to update the configuration:

puppet agent -t

Voilà! Your MySQL server is now under new management =)

Optionally you can also install puppet dashboard by performing the following steps:

apt-get install puppet-dashboard # RHEL # yum install puppet-dashboard

Edit your database.yml (using standard YAML notation) adding the dashboard database name, username and password – it is recommended to use the same database for production/development/testing environments:

cd /usr/share/puppet-dashboard/config;
vi database.yml # add the database name, username & password you intend to setup in the database (see the following step for details)

Connect to your mysql instance (hopefully installed already) and create the puppet-dashboard database & users:


CREATE USER ‘dashboard’@'localhost’ IDENTIFIED BY ‘my_password’;

GRANT ALL PRIVILEGES ON dashboard.* TO ‘dashboard’@'localhost’;

Now time to populate the database using rake by running:

rake RAILS_ENV=production db:migrate

rake db:migrate db:test:prepare

Your dashboard is now setup – you can run it with the builtin webserver WEBrick:

sudo -u puppet-dashboard ./script/server -e production

Here are some useful Puppet commands that will come in handy:

# Apply changes with noop
puppet apply –noop /etc/puppet/modules/testit/tests/init.pp

# Full apply changes
puppet apply /etc/puppet/modules/testit/tests/init.pp

# Print a puppet agent’s full configuration
puppet –configprint all

Categories: DBA Blogs

Merge Replication Identity Range Management – An Identity Disorder? Part 2

Tue, 2014-02-04 11:58

In Part 1 of this series, we explored Merge Replication Identity Range Management of tables with identity columns. We outlined how important it is to select the right datatype for identity columns and assign suitable ranges for your identity to reduce the overhead of allocating new ranges frequently.

There are some cases that can affect the range management due to the changes they make to the identity values allocation , this includes batches and transactions!
The case of batches and transactions on publisher

I had worked recently with a case where the client’s apps fail intermittently with this error:

The insert failed. It conflicted with an identity range check constraint in database %, replicated table %, column %. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.

The client just followed the advice and executed sp_adjustpublisheridentityrange to assign a new range on publisher but that was not convenient. The user was a db_owner so it should assign a new range by itself, but the error kept appearing.

Initially, as a best practice, we advised changing the range to something larger in order to accommodate new rows — but this did not work. We began looking at processes inserting the data, and we found few stored procedures that insert single rows — nothing abnormal.

It wasn’t until we ran a trace to capture the load for few hours — we found a job running SSIS package on another instance. The ETL SSIS package calls a stored procedure on publisher to load data from another system and dumps thousands of rows inside a transaction.

Worse, the job never failed because the SSIS package used conditional constraints to suppress the stored procedure failure through conditional constraints, it also did not include logic to email anyone if the SP step failed.

The code was something like this, logic wise:

SELECT DISTINCT rows    INTO   #temp
    from staging_instance.staging_database.dbo.table


INSERT INTO dbo.table1 (col1, col2,....., colZ)
SELECT *	FROM   #temp

INSERT INTO dbo.table1_details
(col1, col2,....., colZ)

SELECT t.* , stag.col1, stag.col2
 From table1 t1 join  #temp t

on =

and t.date_ < gedate() -2

join stag

on t.Detail_id = stag.detail_id

Insert into tracking  select ‘Complete_ETL’, getdate() , @@rowcount

Commit Tran1

The problem here became obvious: Triggers fire once per statement, not once per row. It is possible to create triggers to cope with multiple rows using INSERTED but a replication trigger is not the place we want to mess with! Let us test this on our existing simple schema , please refer to Part 1 of this series for the schema. We will insert 400 rows into a temp table and insert them at once to the published table.

use pub

Create table #tmp (col2 datetime)
Insert into #tmp (col2) select getdate();
Go 400

select count(*) from #tmp

Let’s record the identity value of the table on publisher

Dbcc checkident (tbl_pub , noreseed)

Checking identity information: current identity value ’4602′

The primary range is 4601-4701 & the secondary is 4701-4801 so adding 400 rows should override this

Insert into pub.dbo.tbl_pub (col2) select * from #tmp

Msg 548, Level 16, State 2, Line 1
The insert failed. It conflicted with an identity range check constraint in database ‘Pub’, replicated table ‘dbo.tbl_pub’, column ‘col1′. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.

Dbcc checkident (tbl_pub , noreseed)

Checking identity information: current identity value ’4802′

The current identity pushed to the limit but no rows were inserted.

Here is what happened:

  1. SQL Server began inserting the data and assigning new Identity values; since the trigger fires per statement, it will not fire until the batch completes.
  2. Because the batch has more rows than the available free Identity slots (in this case, fewer than 200) it fails because the constraint is violated. However, the next identity value is already greater than the max value defined by the constraint.
  3. Because the batch failed, the trigger never fired and never allocated a new range.
  4. Any subsequent processes, even if ONE single row, will fail because the next available Identity slot, 4802, is greater than the limit of the secondary range (4801). This is when the applications fail and return error.

With single row operations , the trigger will fire and assign new ranges if both primary and secondary ranges were exhausted and the new ranges will be ready for future operations.
Temporary resolution was following the advice of the error message and run sp_adjustpublisheridentityrange. The disadvantage however, is that it creates some gaps in the identity column because the insert rolled back and never used the Identity slots. Kindly note that rolling back a transaction does NOT set identity Seed to the value before the transaction start.

If we want to reduce the gap then we can reseed the column to a value close to the most recent inserted value within the range.

select max(col1) +5 [reseed] from pub.dbo.tbl_pub where col1 between

from Distribution..MSmerge_identity_range_allocations
where publication='pub1' and article= 'tbl_pub' and subscriber_db='Pub' and subscriber = 'MYINSTANCE')


max(next_range_end) from Distribution..MSmerge_identity_range_allocations
where publication='pub1' and article= 'tbl_pub' and subscriber_db='Pub' and subscriber = 'MYINSTANCE');

Both are manual workarounds, and this was not desired.

Few workarounds can work here, they all evolve around ensuring that the triggers fire to allocate new ranges including

  1. Increase the range assuming that a larger range can accommodate larger batches. Well,not quite true since the batches have varied sizes and user processes can reduce available identity slots to a number less than the batch rows count.
  2. Always assign new ranges when batches start. This creates gaps.
  3. Add some logic in the code that will check for the available Identity range VS batch row counts and insert if there is enough slots or assign new ranges if the rows count are more. We tried this but sometimes failed because some user processes added rows, exhausting range mid-way. It also created big gaps sometimes>If Current ID in the primary range then check the available ID count
    >>>> If the Available empty IDs > batch count then proceed
    >>>>>>>>>> If Available empty IDs If Current ID in the Secondary range then check the available ID count
    >>>> If the Available empty IDs > batch count then proceed
    >>>>>>>>>> If Available empty IDs < batch count then run sp_adjustpublisheridentityrange to assign new ranges
  4. Insert everything as single rows. This works, but it’s very slow sometimes.

We ended up having a “Hybrid” approach :

  • Batch insert if the range permits, much faster
  • Row by row if the range does NOT accommodate all the rows. We could have stretched it a bit more by batch inserting till the end of the range and insert the rest using row-by-row but that’s complicating it more.
Create table #temp
        SNo INT IDENTITY(1,1) ,
		CARTONID	nvarchar(40),
		STOREID	int,
		STORECODE	nvarchar(20),

    -- insert the rows required into the temp table.
    SELECT DISTINCT rows    INTO   #temp
    from staging_instance.staging_database.dbo.table

    select @count= COUNT(*) +500 from #temp --five hundered rows in case user processes add rows in the meantime

	declare @range_begin bigint,@range_end bigint,@next_range_begin bigint,@next_range_end bigint ,
	@current bigint

	select  top 1
	@range_begin =range_begin,@range_end =range_end,@next_range_begin =next_range_begin,
	where article = 'table1' and subscriber = 'Publisher'
	order by range_begin desc

	select @current=IDENT_CURRENT('dbo.table1')

    set @i=1


    If (@current< @range_end and @count < (@range_end - @current) ) or (@current > @next_range_begin AND @count <  (@next_range_end - @current) )
--Batch Insert
				FROM   #temp

If (@current< @range_end and @count > (@range_end - @current) )  or (@current > @next_range_begin AND @count >  (@next_range_end - @current) )
    WHILE (@i <= @count)

				FROM   #temp
		where ID = @i
		SELECT @i = @i + 1



Besides the recommendations in Part 1 of this series, the lesson we learned from that engagement was basic : You need to understand your environment! If you have transactions or batches that run against tables with identity columns engaged in Merge replication, then you want to ensure that the batch rows can fit within the available ranges.

Categories: DBA Blogs

SQL Server versions – Should you upgrade to the next release?

Tue, 2014-02-04 00:58

Today there are five versions of SQL Servers serving at the hearts of business enterprises, and the sixth is coming soon. Current SQL Server versions are 2000, 2005, 2008, 2008R2, 2012 and the upcoming version SQL Server 2014. The question CIOs, VPs of IT and/or IT managers are asking themselves is, “Shall I upgrade SQL Server boxes to the next release, or not?”

The answer is seemingly easy — yes. You need to upgrade to the next version to take advantage of more features, reduced total cost of ownership, more stability, more scalability, less bugs, etc. However, there are major challenges that are obligating IT Managers to count to 10 before thinking of upgrade.

A study conducted by RockSolidSQL suggests that the majority of 60% of installations-base are using SQL Server 2008/2008R2. Around 27% are using SQL Server 2005, 11% are using SQL Server 2000, and a humble percentage of 2% only started to use SQL Server 2012 recently.

SQL Server Version Usage

While there is an evidence of rapid adoption and migration to the newest SQL Server 2012 within the previous period, especially among new installations and development projects, we still witness kind of adverseness to migrate to newer versions among SQL Server 2000/2005 installations, which amounts to a total of 38% installations when combined according to the earlier source.

The reluctance of upgrading to SQL Server 2008/2008 R2 or later editions, though nine years have passed since the release of SQL Server 2005, and 14 years since the release of SQL Server 2000, might be attributed – in our opinion – to two main reasons:

  • A financial aspect: we all remember the financial crisis that has emerged at the end of 2008 and has severely hit most business enterprises on the planet and till today, the US and world economies are not fully recovered yet.
    Enterprises had to cut their costs, and Information Technology departments, being looked at in most of the times as cost-centers, had to absorb a major cut into their budgets. Therefore, CIOs and IT Managers did not give much priority to upgrade their existing applications and database management systems, and their top priority went to acquiring new technologies and applications, while at the same time capping or even cutting their operational expenses.
    Upgrading applications and services is a lengthy task that requires significant investments. Migration project costs include acquiring new hardware, planning for re-coding deprecated features of prior SQL Server versions, doing unit testing, quality control testing and users testing, etc… This will amount to significant cost and time that CIOs had to give it less priority and eventually skip it for several years in the previous decade.
  • A technical aspect: there were significant changes at the core level of SQL Server versions 2005, 2008, and 2008R2 that made many enterprises and application vendors hesitant to upgrade from earlier versions. These changes go beyond deprecated features and direct into core database engine of SQL Server. Thus, enterprises and application vendors had not only to ensure syntax validity, functions and tools expiration, but in many cases they had to rewrite wide portions of their code and even to implement different programming strategies and techniques to make the application compliant with later versions. This is true for migrating SQL Server 2000 to 2005 path, and for migrating of SQL Server 2005 to 2008/2008R2 paths. The core differences between SQL Server 2008 and SQL Server 2008 R2 were not as significant.

While it is pretty easy for a programmer to re-write code to eliminate deprecated features and tools which are well-documented, it is extremely difficult to re-code or tune the application for the unknown changes within SQL Server engine that is black-box for both developers and enterprises and, in many scenarios re-writing the code completely to be compliant with the newer versions might be a favorable alternative rather than tuning performance of the code that is written for a previous versions.

As such, while it is always recommended to upgrade to newer versions of SQL Server, or at least to stay a version or two lagging to the current SQL Server version, other factors should be considered including to have sufficient budget for the upgrading to cover hardware, testing, licensing, application support, and to consider re-coding and testing the applications for not only deprecated features within the newer versions, but also unannounced changes to the SQL Server core, which can be only chased by running the code through extensive cycles of testing for all features, stress testing and other types of tests.

Categories: DBA Blogs

How to Configure Transactional Replication-Mirroring Failover

Mon, 2014-02-03 11:20

Configure SQL Server Transactional Replication with Mirroring – this was one of the recent interesting and unusual projects that came our way. At first it looked really odd and intriguing at the same time. The question was, can it really be done?

Well, as it happens, Microsoft SQL Server has a pretty straight forward method to set it up – at least the Publishers-failover part. The Subscribers-failover is not a straight forward method, but is definitely possible through workarounds.

In this blog post, we’ll discuss how to configure a simple Transactional Replication-Mirroring failover configuration, particularly configuring failover of publications. This may be of some help if you do not want to reset publications and subscribers in case the published (and mirrored) database fails over to its mirror. While setting up the actual configuration, we’ll also discuss a couple of tweaks involved with the process.

Here’s the Scenario to begin with:

  1. Current setup contains 2 servers.
    • The OLTP Server (S_OLTP1) contains databases which support main portal application, and are highly volatile.
    • The application middle tier server (S_Apps) hosts SSIS and SSRS instances, besides basic configuration and metadata databases, like, SSISDB (for SSIS) and ReportServer (for SSRS).
    • Both the servers have SQL Server 2008 installed.
  2. Following requirements must be met for the project:
    • Provisioning a DR solution for main portal databases, by adding a new mirror server to the setup. So, Database Mirroring is the selected DR strategy.
    • Offloading the reporting requirements from S_OLTP1 server, by adding a new reporting server. This new server should be configured to subscribe data from S_OLTP1 and make it available for reporting purposes, by means of transactional replication.
    • A “complementary” requirement that comes up is that the transactional replication should not break in case any of the mirrored databases failover.

Note: Although, all the servers used in our project are SQL Server 2008’s, this blog post is valid for all implementations of SQL Server versions 2005 and higher.

Based on above requirements, we now have the following servers in our configuration:

  • S_OLTP1: The primary OLTP server, hosting all application databases. Transactional replication publisher. Mirror member server.
  • S_OLTP2: The new OLTP DR server. Mirror member server, with S_OLTP1 as Principal. Transactional replication publisher (obviously, as we want the replication to be working after failover).
  • S_Apps: The original middle tier server. (This will also be used as replication Distributor. Optionally, this server could be used as a Witness server for the mirrored pair of S_OLTP1 and S_OLTP2.)
  • S_Rept1: The new reporting server. (This server will host all databases required for reporting. These databases will be subscribers of publication(s) from S_OLTP1/2.)

Note: It is a must to have a remote Distributor if you want replication to failover. It is a MUST to configure a common (ideally, remote) distributor for publishers to failover.

Diagrammatic representation:


The Setup will be done as discussed below.

First step we will take in this document is mirroring on all the required databases. Since there is no special configuration required for “Replication-failover” to be done at mirroring end, we will exclude details of mirroring setup in this blog post.

Note: Database Mirroring can also be done after setting up publication. So, this step can be performed after publishing steps discussed later in this blog post.

Next, we will configure replication. As per the required setup, we will have to perform following actions:

  1. Setup Distributor on central remote distributor server (S_Apps1).
  2. Add S_OLTP1 and S_OLTP2 as publishers on Distributor.
  3. Add new Publications on the Publisher server.
  4. Configure the agent profiles or jobs to failover.
  5. Add subscription to S_Rept1.
  6. Repeat steps 3 & 4 for all the databases that require publishing.

Now that we know the process, let’s discuss these steps in detail starting with setting up the distributor.

  1. To setup a common remote distributor, we will connect to S_Apps1 server, using SSMS. Once connected, expand the server and right click on Replication and select “Configure Distribution…” from the context action menu. See figure 2 below.

    Figure 2:

  2. In the first page of Configure Distribution Wizard, press Next, as shown below in Figure 3.Figure 3:
  3. On the next page, keep the default option (Server will act as its own distributor) on, and press Next button. Refer Figure 4 below.Figure 4:
  4. As shown in Figure 5 below, select path for generating snapshots, and click Next to proceed.Figure 5:
  5. On the next screen, specify the details for distribution database, as shown in Figure 6.Figure 6:
  6. Next, enable this server to act as Distributor for the mirrored pair – S_OLTP1 and S_OLTP2. By default, you will find the server itself is listed as Publisher (refer Figure 7).
    a) Click on Add, to open “Connect to Server” dialog box, as shown in Figure 8.
    b) Specify the name for Primary server (S_OLTP1).
    c) Click Connect. This adds a new server to list of Publishers.Figure 7:

    1. ConfigDistributor_Step6Figure 8:

    d) Click Add again from main screen, and connect to Secondary server (S_OLTP2).
    e) Finally, you will see both servers are listed as Publishers, along with the distributor. Your screen should look like the snapshot shown in Figure 9 below.

    1. Figure 9:
      Note: I always use SQL Server Agent account, which is a domain account, for such connections — this saves a lot on connectivity and permissions issues.
  7. On the next screen, we will need to enter a custom password, which will be used by the Publishers to connect to the Distributor. This password is not related to SQL Server or Agent service. Make a note of the password you enter and keep it safe, because this is the one which you will need to provide when configuring the publishers to use remote distributor. A snapshot of this screen is shown in Figure 10 below.Figure 10:
  8. As in Figure 11, we are now done with configuration. Click Next.Figure 11:
  9. On the next screen, review the summary and press Finish button.
  10. The “Configuring…” screen will appear, and at the end of a successful completion, the final screen will look something like Figure 12 below.Figure 12:

Now that Distributor is successfully set up, let’s proceed to creating publications step — I won’t get into the details of creating a publication, to keep myself from deviating from the topic. Instead, let’s discuss which actions are differently performed when creating a publication for failover scenario. Here is how it goes:

  1. In the “New Publication Wizard”, when selecting distributor, choose to use remote distributor. Click Add and connect to the distributor configured in the steps above. The “Distributor” screen of “New Publication Wizard” will look something like Figure 13 below.Figure 13:
  2. On the next screen, we will specify the password that we provided while configuring the distributor for “Distributor Password” – refer Step 7, in configuring Distributor steps above. The wizard screen to specify the password will look like Figure 14 below.Figure 14:
  3. We will continue with rest of the wizard normally, as we would while creating a usual transactional publication.

Once we have the publication in place, we are done with setting up Distributor, Publishers and the Publication.

Next step up will be to configure the agents to failover.

Let’s discuss what needs to be done to make the replication agents to recognize the failover partner. There are a few ways we can do it:

a) Modify agent profile to change the “PublisherFailoverPartner” property, using Distributor properties dialog box.
b) Modify agent profile to change the “PublisherFailoverPartner” property, using TSQL.
c) Modify each agent’s job individually, to add the parameter “-PublisherFailoverPartner”.

The difference between changing agent profiles and agent job properties is that by changing profile, we can alter the behaviour of all publisher side agents. By changing the job properties, we only affect those agents whose job is modified. We will look at options A and C in this blog post.

To modify agent profile using Distributor properties, we will go through the following steps:

  1. Connect to the distributor. Right click “Replication” and select “Distributor Properties” from the context action menu. See Figure 15 below.Figure 15:
  2. This will open up the Distributor Properties dialog box for your distributor, as show below in Figure 16.Figure 16:
  3. Click on “Profile Defaults…” button to open up “Agent Profiles” dialog box. Choose profiles for log reader agents as show in Figure 17.Figure 17:

  4. Click on the ellipsis button “…” in the “Type” column for Default agent profile, to open up “Default agent profile Properties” dialog box. See Figure 18 below.Figure 18:
  5. By default, the dialog box shows only the properties that are configured. To see the complete list, including the property we are interested in, uncheck the Check box “Show only parameters used in this profile”.  Figure 19 shows what the screen looks like. Profile property “PublisherFailoverPartner” is highlighted in the snapshot below.Figure 19:
  6. We will not be able to change values here, as this is a System type profile. We will need to create a new agent profile and enable it for Log Reader agents to use. To do so, close the current properties and click the “New” button to open “New Agent Profile” dialog, shown below in Figure 20.Figure 20:
  7. Click “OK”, and open the “New Agent Profile” dialog box. Specify the Secondary Mirror server (S_OLTP2 in our case) as value of parameter “PublisherFailoverPartner”, as shown in Figure 21 below, and click “OK”.Figure 21:

  8. Now, the new profile is shown in the Agent profiles list for Log Reader Agents. Click the check box next to “NewProfile” profile. This makes a default for any new log reader agents that will be created. Figure 22 shows how the screen will look.Note: If you want to change all existing agents to use the newly created settings, click the “Change Existing Agents” button given below in the box.

    Figure 22:

Click OK, and voila! The log reader agents for this project are all set for automatic failover.

Alternatively, we can also customize automatic failover for specific agents by just modifying Agent job parameters.

To achieve this, simply open the properties and modify the job step “Run agent”. Figure 23 shows how it should be put in. We will simply add the parameter “-PublisherFailoverPartner” to the command. A typical command parameter-set will look like this:
-Publisher [S_OLTP1] -PublisherDB [DB1] -Distributor [S_Apps1] -DistributorSecurityMode 1  -Continuous -PublisherFailoverPartner [S_OLTP2]

The underlined portion is what we need to add to the job. This is how it will look once complete:

Figure 23:

That concludes the failover configuration for publishers. Now we can proceed with adding subscribers normally. Since, there is no configuration required at subscriber end, to configure publisher-failover scenario, we can proceed with adding subscribers normally. And, there we are, with the project successfully implemented!

Food for thought:

Although, we have wonderfully configured publisher failover, this is not a fully established topic and needs some more exploration and discussion. Clearly, not all types of replication setups support publisher-failover. It would be interesting to note which ones.

Moreover, we are talking about two major techniques combined! It increases possibilities and options, but it also introduces some challenges to be dealt with. Replication is highly robust, but what if mirroring breaks, or simply, the principal server is unavailable for some time? We sometimes rely on Replication Monitor, and would also love to think that it provides a complete picture of Replication state – even when mirrored!

Categories: DBA Blogs

A Song of Graph and Report, Part I: A Gathering of Stats

Mon, 2014-02-03 11:02

Statistics and graphs — they don’t always mean much, but they can be so darn mesmerizing.

It’s in that optic that I was looking at the Map of CPAN the other day and thought, “Wouldn’t it be cool to have the same view within a distribution?” It’s not that I need to do that for anything specific, mind you. But to have a distribution split into territories based on authors and files… Maybe animated throughout different releases, to see the ebbs of contributors and code. It would be so nifty!

Now, considering that there are many things that can be done with those statistics — and because I should probably do some real work between bouts of fooling around — I decided to cut the exercise into different parts. Therefore, today we’ll begin with the essential, if a tad dry, step required for everything that will follow. Namely: the gathering of the statistics.

Lay the blames around.

The good news is that Git already has its git blame sub-command that provide us with a per-line author attribution of the codebase. All we have to do is to interface with it, munge its data to our liking, and we’ll be good to go.

To talk to Git, we’ll use Git::Repository, with the help of its plugin Git::Repository::Plugin::AUTOLOAD to make things a little neater.

Round up the files.

So, first step: get the list of the files of the project for a given version tag.

use Git::Repository 'AUTOLOAD';

my $project_dir = '.';
my $version     = 'v1.2.3';

my $git = Git::Repository->new( work_tree => $project_dir );

my @files = sort $git->ls_tree( '-r', '--name-only', $version );
Who did what?

So far, so good. Now for every file, we want to have a list of its authors, and how many lines they claim as their own:

my %file_attribution = map { 
    $_ => file_authors( $git, $version, $_ ) 
} @files;

sub file_authors {
    my( $git, $version, $file ) = @_;

    return [
        map { /\(\s*(.*?)\s*\d{4}-/ } $git->blame( $version, $file )
Same thing, but compressed.

With the code above, we now have an array for each file, holding the name of the author of each line. If we just want to tally the overall contribution of each author, that’s slight overkill but it’ll become handy when we’re ready to draw the author map of each file.

Still… one entry per line, that’s quite verbose. Instead, let’s try to scrunch that into a tally of the successive lines associated with the same author:

use List::AllUtils qw/ reduce /;

my %file_attribution = map { 
    $_ => file_authors( $git, $version, $_ ) 
} @files;

sub file_authors {
    my( $git, $version, $file ) = @_;

    return reduce {
        if ( @$a and $a->[-1][0] eq $b ) {
        else {
            push @$a, [ $b, 1 ];
    } [ ], map { /\(\s*(.*?)\s*\d{4}-/ } 
               $git->blame( $version, $file );

Aaaah, yes. Not as dirt-simple as before but the data structure is now much less wasteful.

Massively parallel blaming.

Next challenge: git blame is a little bit on the slow side. Not sloth-on-a-diet-of-snails kind of slow, but it does have to spit out all lines of every file. For big projects, that takes a few seconds.

But… isn’t our little file_authors() function work in isolation for each file? Wouldn’t that be a perfect moment to whip out some parallelization ninja trick? Like… Oh, I don’t know, try out that neat new MCE module that makes the rounds these days?

use MCE::Map;

my %file_attribution = mce_map { 
    $_ => file_authors( $git, $version, $_ ) 
} @files;

And indeed, with the 4 cores of my faithful Enkidu, the overall script suddenly goes 4 times faster. Considering that all it took was changing a map to a mce_map… Yeah, I’ll take that.

Sum everything up.

We now have our raw numbers. Getting the stats per author is now simple bookkeeping:

my $nbr_lines;
my %authors;

for my $x ( map { @$_ } values %file_attribution ) {
    $nbr_lines += $x->[1];
    $authors{$x->[0]} += $x->[1];

# switch into percents
$_ = 100 * $_ / $nbr_lines for values authors;
Put it in a big book of stats.

We’ll want to return to those stats, so let’s keep them in a file. Let’s make it a JSON file. Everybody loves JSON.

use JSON_File;

tie( my %stats, 'JSON_FILE', 'stats.json', pretty => 1 );

$stats{$version} = {
    nbr_lines   => $nbr_lines,
    authors     => \%authors,
    files       => \%file_attribution,
Time to generate a report.

The numbers are now ours to do as we see fit. Pretty graphical stuff will have to wait for the next installments, but why not create a quick credit roll for the project version we just munged?

use 5.10.0;

use List::AllUtils qw/ part /;

my( $minor, $major ) = part    { $authors{$_} >= 1 } 
                       sort    { $authors{$a} <=> $authors{$b} } 
                       keys    %authors;

my $lines = $nbr_lines;

# 1000000 => 1,000,000 
1 while $lines =~ s/^(\d+)(\d{3})/$1,$2/;

say <<"END";


This is the list of all persons who had their hands in
crafting the $lines lines of code that make
this version of $project, according to `git blame`.

This being said, don't take those statistics too seriously, 
as they are at best a very naive way to judge the contribution 
of individuals.  Furthermore, it doesn't take into account the army 
of equaly important peoples who report bugs, worked on previous
versions and assisted in a thousand different ways. For a glimpse of
this large team, see the CONTRIBUTORS file.

## The Major Hackarna

All contributors claiming at least 1% of the lines of code the project.


    printf "    * %-50s %2d %%\n", $_, $authors{$_} for @$major;

    if ( $minor ) {
        say "\n\n## The Minor Hackarna\n\n",
            "With contributions of: \n";

        say "    * ", $_ for @$minor;

Running that against the latest release of Dancer gives us:


This is the list of all persons who had their hands in
crafting the 32,944 lines of code that make
this version of Dancer, according to `git blame`.

This being said, don't take those statistics too seriously, 
as they are at best a very naive way to judge the contribution 
of individuals.  Furthermore, it doesn't take into account the army 
of equaly important peoples who report bugs, worked on previous
versions and assisted in a thousand different ways. For a glimpse of
this large team, see the CONTRIBUTORS file.

## The Major Hackarna

All contributors claiming at least 1% of the lines of code the project.

    * Alexis Sukrieh                                     30 %
    * franck cuny                                        17 %
    * Sawyer X                                           10 %
    * Damien Krotkine                                     7 %
    * David Precious                                      6 %
    * ambs                                                5 %
    * Yanick Champoux                                     4 %
    * Michael G. Schwern                                  2 %
    * sawyer                                              1 %
    * Alberto Simoes                                      1 %
    * Mark Allen                                          1 %
    * Ovid                                                1 %

## The Minor Hackarna

With contributions of: 

    * mokko
    * Fabrice Gabolde
    * Philippe Bruhat (BooK)
    * Alex Kapranoff
    * LoonyPandora
    * chromatic
    * Anton Gerasimov
    * geistteufel
    * Paul Driver
    * Pedro Melo
    * sdeseille
    * David Moreno
    * Naveed Massjouni
    * Gabor Szabo
    * Rick Myers
    * Alex Kalderimis
    * Sam Kington
    * Max Maischein
    * Colin Keith
    * pdl
    * David Golden
    * JT Smith
    * Flavio Poletti
    * Nate Jones
    * William Wolf
    * Dagfinn Ilmari MannsÃ?Â¥ker
    * Stefan Hornburg (Racke)
    * Chris Andrews
    * Kaitlyn Parkhurst
    * Oliver Gorwits
    * Perlover
    * Squeeks
    * Rowan Thorpe
    * CPAN Service
    * John Wittkoski
    * David Steinbrunner
    * Felix Dorner
    * Alessandro Ranellucci
    * Danijel Tasov
    * Al Newkirk
    * Mikolaj Kucharski
    * Kent Fredric
    * Kirk Kimmel
    * Vyacheslav Matyukhin
    * niko
    * Tom Hukins
    * jamhed
    * Franck Cuny
    * Joshua Barratt
    * James Aitken
    * Jesse van Herk
    * rowanthorpe
    * Roman Galeev
    * Marc Chantreux
    * Alex C
    * Martin Schut
    * Naveed
    * Michal Wojciechowski
    * Olof Johansson
    * Naveen
    * Ilya Chesnokov
    * Roberto Patriarca
    * boris shomodjvarac
    * Tom Wyant
    * Lee Carmichael
    * Jacob Rideout
    * Richard Sim�µes
    * Rik Brown
    * Alan Haggai Alavi
    * jonathan schatz
    * Dennis Lichtenthaeler
    * Emmanuel Rodriguez
    * smashz
    * David Cantrell
    * Xaerxess
    * Ask Bj�¸rn Hansen
    * Paul Fenwick
    * Lee Johnson
    * Jonathan "Duke" Leto
    * Tim King
    * Olivier Mengu�©
    * Brian Phillips
    * Fayland Lam
    * adamkennedy
    * Craig Treptow
    * Jonathan Scott Duff
    * Joel Roth
    * Mark A. Stratman
    * ironcamel
    * John Barrett
    * Brian E. Lozier
    * Florian Larysch
    * Nicolas Oudard
    * Grzegorz Ro�¼niecki
    * Christian Walde
    * Maurice
    * Sebastian de Castelberg
    * scoopio
    * Paul Tomlin
    * Fran�§ois Charlier
    * Anirvan Chatterjee
    * jonasreinsch
    * Neil Hooey
    * Hans Dieter Pearcey
    * Lars Thegler
    * Murray
    * Adam J. Foxson
    * Tatsuhiko Miyagawa
    * sebastian de castelberg
    * Craig
    * Jakob Voss
    * isync
    * Mark Stosberg
    * Assaf Gordon
    * Hagen Fuchs
    * Ivan Paponov
    * Roman Nuritdinov
    * Tom Heady
    * Michael Genereux
    * Duncan Hutty
    * Alfie John
    * Matthew Horsfall (alh)
On the next episode…

Things get interesting as we’ll tackle transforming those stats into an author-colored map of the distribution. Weeee!

Categories: DBA Blogs

How To Improve SQL Statements Performance: Using SQL Plan Baselines

Mon, 2014-02-03 09:25

The performance of any Oracle database heavily relies on query execution(s). The reasons for any SQL statement’s execution plan changes could include a variety of actions, like gathering optimizer statistics (table, index, schema etc.)  changing optimizer parameters, schema definitions, adding indexes etc.  As experienced Oracle DBAs, we should be aware of the fact that the above mentioned actions meant to improve SQL performance will not always guarantee positive results.

So in this case, many of us would try to freeze execution plans (Stored Outlines) or lock the optimizer statistics. However, doing so prevents such environments/databases of taking advantage of new optimizer functionality or access paths, which would improve the SQL statements performance. That is where SQL Plan Management comes in very handy…

SQL Plan Management (SPM) provides a framework for completely transparent controlled execution plan evolution. With SPM the optimizer automatically manages execution plans and ensures only known or verified plans are used. When a new plan is found for a SQL statement it will not be used until it has been verified by the database to have comparable or better performance than the current plan.


Next, I will explain the steps for forcing a bad query to use a better execution plan by loading SQL Plans into SPM using AWR.

Identifying the Slow Query

We have the following scenario:

- Oracle version single instance database

- Performance issue caused by the following bad query:


with initial explain plan :


As shown in the Plan output, a full table scan was used, resulting in excessive IO for this query.  It seemed this query needed an index to reduce the IO. Therefore I have added two indexes on ‘status’ and ‘prevobjectid’ columns for the EMPLOYEES table, gathered table statistics and then checked again the explain plan. We will see now that due to index creation the DISPLAY_AWR program shows a newly generated explain plan with improved cost using an index range scan versus the full table scan used by the initial plan (Plan hash value: 2172072736).


Now we have obtained a new, better execution plan in AWR for the SQL statement, but our next question would be, “How can we make sure it will be the only plan picked by the Cost Based Optimizer for future executions”?

The answer:  “Create a SQL Tuning Set for the SQL, then create a new SQL Baseline from the STS so the Optimize will choose the preferred Execution Plan”.

Each time a SQL statement is compiled, the optimizer first uses a cost-based search method to build a best-cost plan, then tries to find a matching plan in the SQL plan baseline.  If a match is found, the optimizer will proceed using this plan. Otherwise, it evaluates the cost of each accepted plan in the SQL plan baseline and selects the plan with the lowest cost.

Here are the steps for loading SQL Plans into SPM using AWR by implementing SQL Baselines for the bad query.

Step 1: Set up a SQL Baseline using known-good plan, sourced from AWR snapshots.

To do so, SQL Plan Management must be active and the easiest condition to checking optimizer_use_sql_plan_baselines which needs to be TRUE.

sql2 Step 2: Create SQL Tuning Set (STS).

SQL tuning set (STS) is a database object that includes one or more SQL statements along with their execution statistics and execution context, and could include a user priority ranking. You can load SQL statements into a SQL tuning set from different SQL sources, such as AWR, the shared SQL area, or customized SQL provided by the user. An STS includes:

-          A set of SQL statements

-          Associated execution context, such as user schema, application module name and action, list of bind values, and the cursor compilation environment

-          Associated basic execution statistics, such as elapsed time, CPU time, buffer gets, disk reads, rows processed, cursor fetches, the number of executions, the number of complete executions, optimizer cost, and the command type

-          Associated execution plans and row source statistics for each SQL statement (optional)

The concept of SQL tuning sets, along with the DBMS_SQLTUNE package to manipulate them, was introduced in Oracle 10g as part of the Automatic SQL Tuning functionality. Oracle 11g makes further use of SQL tuning sets with the SQL Performance Analyzer, which compares the performance of the statements in a tuning set before and after a database change. The database change can be as major or minor as you like, such as:

  • Database, operating system, or hardware upgrades.
  • Database, operating system, or hardware configuration changes.
  • Database initialization parameter changes.
  • Schema changes, such as adding indexes or materialized views.
  • Refreshing optimizer statistics.
  • Creating or changing SQL profiles.

Now I create a SQL Tuning Set based on the slow query with a SQL_ID of 9kt723m2u5vna.


Step 3: Populate STS from AWR.

Now I will locate the AWR snapshots required to populate the STS, and load the STS based on those snapshot ID’s and the SQL_ID.


Step 4: List out SQL Tuning Set contents.

Now I can query the STS to verify it contains the expected data.


Step 5: List out SQL Tuning Set contents

Though I have created and verified the STS, the Baseline has not yet been created.


Step 6: Load desired plan from STS as SQL Plan Baseline

Now I will load the known good plan that uses the newly created index into the Baseline.


Step 7: List out the Baselines again.

Now verify the Baseline contains the desired plan.


Step 8. Flush the current bad SQL Plan.

After loading the baseline, the current cursor must be flushed from the cache to make sure the new plan will be used on next execution of the sql_id 9kt723m2u5vna

sql1 Conclusion

As this blog post demonstrates, SQL Plan Management (SPM) allows database users to maintain stable yet optimal performance for a set of SQL statements and baselines seem to be a definite step in the right direction. Baselines can be captured from multiple sources, SPM allowing new plans to be used if they perform better than the baseline fact that could improve the overall application/system functionality.

Categories: DBA Blogs

Log Buffer #357, A Carnival of the Vanities for DBAs

Fri, 2014-01-31 09:16

This Log Buffer captures the excitement which is flowing unbridled after the new year frenzy, and the blogosphere is bubbling with new and novel technical features, tricks and tips. Enjoy!!!


Jeffrey Kemp asks. Need to change the label of an item on-the-fly?

As a system grows, it accumulates technical debt – improvements and cleanup that your really ought to get around to doing.

Sometimes statements take longer than you’d like to complete. This can happen for a number of reasons including data changes, table growth, stale statistics and the occasional optimizer bug you haven’t encountered before.

The JSON-P API as described in JSR-353 is a limited API for working with JSON; but at the basic level it will do the job. It is possible to add a few utility methods that can make your code, in my eyes, prettier.

Oracle Database 11gR2 ( installation on Oracle Linux 6.4

SQL Server:

Finding “Active” Rows for the Previous Month (SQL Spackle)

How do you configure where your database backups are created? Do you have problems updating all of your backup jobs if the backup location needs to change?

Stairway to Advanced T-SQL Level 2: Writing Subqueries

Getting Starting with Change Data Capture (CDC) in SQL Server – Part 1

In this article, Gail Shaw looks at how you can identify common types of deadlock, the difference between a deadlock and severe blocking, and how to avoid and fix the most common deadlock types.


Percona XtraDB Cluster 5.6 GA release now available.

In MySQL 5.7.2 a new include file “–source include/“ made its appearance in most of the .test files in the MTR suite.

Daniel thought it would be nice to be able to query the FOSDEM schedule for the MySQL devroom with SQL.

Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics.

They say there are two hard things in software development: cache invalidation, naming things, and off-by-one errors.

Categories: DBA Blogs

Purge statspack snaps of source db on test db having different DBID?

Fri, 2014-01-31 09:07

As most DBAs are aware, the clean deletion of the old statspack snaps is very difficult up to rdbms version 9i. Thanks to the statspack.purge procedure introduced by oracle from 10gR1 version, now it’s possible to purge un-referenced data too. This blog post explains about how to use statspack.purge procedure, but for the test/clone databases which uses different DataBase IDentifier(DBID) compared to the source database. Please remember the steps explained on this blog are not required when the source and test databases have the same DBID.

Normally DBID gets changed on cloned databases during the following scenarios.

1. The most commonly used RMAN ‘duplicate database’ feature to create the test database.

2. The database utility ‘nid’ used to change the test database name and dbid.

3. The test database controlfile gets created using syntax based on the scripts available on text backup controlfile.

Consider you have production/source database is configured to generate statspack snaps once in 20 minutes and the retention were 90 days. When this source database gets cloned using above methods to create test database, it inherits the same behavior. Now the test database contains statspack snaps belongs to source database as well as for the current database too. Even when you modify the existing purge script to retain less snaps, it would valid only for the snaps belong to the current DBID. The snaps belongs to other than current DBID would never get purged by this script, even though they are no longer valid for this test database.

1. Gather the DBID details from stats$snapshot table on the test database.

For example,

SQL> select distinct dbid from stats$snapshot;

1215068670 ==> This is the source database DBID
393689388 ==> This is the test database DBID

2. Gather the snaps range handled by the source database using the following queries.

For eg:

SQL> select min(snap_id) from stats$snapshot where dbid=1215068670;


SQL> select max(snap_id) from stats$snapshot where dbid=1215068670;


3. Gather the row count on various tables to verify the successful purge activity completion.

For eg:

SQL> select count(1) from stats$snapshot where dbid=1215068670;


SQL> select count(1) from stats$sqltext where last_snap_id < 93776;


SQL> select count(1) from STATS$STATSPACK_PARAMETER where dbid=1215068670;


SQL> select count(1) from STATS$UNDOSTAT where snap_id < 93776;


4. Copy the $ORACLE_HOME/rdbms/admin/sppurge.sql to your home directory and modify it accordingly.

i) Remove the dbid column from this select statement on the script.

select d.dbid dbid ==> Remove this column being selected.
, db_name
, i.instance_number inst_num
, i.instance_name inst_name
from v$database d,
v$instance i;

ii) Substitute the source database DBID on this location on the script.

:dbid := &dbid; ==> Change the ‘&dbid’ value as 1215068670
:inst_num := &inst_num;
:inst_name := ‘&inst_name’;
:db_name := ‘&db_name’;

iii) Change the variable “i_extended_purge” value as ‘true’ on the script.

:snapshots_purged := statspack.purge( i_begin_snap => :lo_snap
, i_end_snap => :hi_snap
, i_snap_range => true
, i_extended_purge => false ==> Change the value as true
, i_dbid => :dbid
, i_instance_number => :inst_num);

5. Execute this custom purge script on the test database and provide the snaps range when requested.

For eg:

Specify the Lo Snap Id and Hi Snap Id range to purge
Enter value for losnapid: 90920
Using 90920 for lower bound.

Enter value for hisnapid: 93775
Using 93775 for upper bound.

Deleting snapshots 90920 – 93775.

6. Now logged into the test database and execute the queries to verify the deletion happened.

SQL> select count(1) from stats$snapshot where dbid=1215068670;


SQL> select count(1) from stats$sqltext where last_snap_id < 93776;


SQL> select count(1) from STATS$STATSPACK_PARAMETER where dbid=1215068670;


SQL> select count(1) from STATS$UNDOSTAT where snap_id < 93776;


As you noticed, This is very simple action plan, which may require some more modification on the custom purge script when it was used on RAC database.

Categories: DBA Blogs

Do AWR Reports Show the Whole Picture?

Mon, 2014-01-27 09:59

AWR report is a great source of aggregated information on top activities happening in our databases. I use data collected in AWR quite often, and obviously the easiest way of getting the data out from the AWR is by running the AWR report. In most cases that’s not an issue, but there are certain scenarios when it hides the information one is looking for, just because of how it’s designed.

If I’m trying to collect information about top queries by physical reads, I would normally look at the “SQL ordered by Reads” section and this is what I’d see:

AWR DIsk readsI have the top SQLs by physical reads – just what I’ve been looking for (except the fact that AWR report covers only one of my RAC nodes).

But wait a second, what if there are queries that don’t use bind variables? This might be a problem as each query would have it’s own SQL_ID and probably they wouldn’t make it into the TOP10 just because each of them is treated separately. Nothing to worry about – AWR also collects FORCE_MATCHING_SIGNATURE values (read this blog post to understand why I know they would help) and we can use them to identify and group “similar” statements, we just need a custom script to do that.

Here I use my custom script to report TOP 20 SQL_IDs by physical reads in last 7 days (and I’m reporting data from both RAC nodes in the same list) – you can see the few TOP SQLs are the same as reported in AWR report, but because I’m reporting database-wide statistics instead of instance-wide as AWR does, I have other SQLs on the list too. I’ve also included 2 additional columns:

  • DIFF_PLANS – number of different PLAN_HASH_VALUE values reported for this SQL_ID, and if only one is found – it shows the actual PLAN_HASH_VALUE
  • DIFF_FMS - number of different FORCE_MATCHING_SIGNATURE values reported for this SQL_ID, and if only one is found – it shows the actual FORCE_MATCHING_SIGNATURE

Custom Script - sqlidNow, I can adjust the custom script to aggregate the data by FORCE_MATCHING_SIGNATURE, instead of SQL_ID. I’ll still keep the DIFF_PLANS column and will add a new one – DIFF_SQLID.

Custom Script - fmsThe situation is a little bit different now. Notice how the second row reports FORCE_MATCHING_SIGNATURE  = 0, this typically shows PL/SQL blocks that execute the SQL statements and aggregate statistics from them, so we’re not interested in them. Otherwise the original report by SQL_ID showed quite accurate data in this situation and my suspicions regarding the misuse of literal values where binds should be used, didn’t materialize. Could I be missing anything else? Yes — even the FORCE_MATCHING_SIGNATURE could be misleading in identification of TOP resource consumers, you can write two completely different SQLs (i.e. “select * from dual a” and “select * from dual b”) that will do the same thing and will use the same execution plan. Let’s query the top consumers by PLAN_HASH_VALUE to check this theory!

Custom Script - planI’ve highlighted the third row as the same PLAN_HASH_VALUE is reported for 20 different SQL_IDs, which allowed it to take the third place in the TOP list by physical reads (actually it’s the second place as PLAN_HASH_VALUE=0 is ignorable). The next query expands the third row:

Custom Script - sqlids for planAnd here are All the SQL statements:

All plan sqlsWhat I have here is 20 different  views generated by Oracle Discoverer that query the database by using exactly the same execution plan. Closer look revealed the views included hardcoded query parameters (date intervals for reporting), but in the end, this was the same query! It’s the TOP2 query by physical reads. in the database and if I tune it – all 20 discoverer views will benefit.

I think one of the drawbacks of AWR reports is that it is not able to identify such situations, it would be great if user could choose the column by which he aggregation is done. In the situation I described I was able to identify one of the top queries by physical reads only when I aggregated data by PLAN_HASH_VALUE.

Categories: DBA Blogs

Log Buffer #356, A Carnival of the Vanities for DBAs

Fri, 2014-01-24 07:38

This Log Buffer edition drops right from the clouds to check what bloggers are doing with their Oracles, SQL Servers, and MySQLs. Enjoy!


The Oracle E-Business Suite provides a large number of diagnostic and monitoring solutions.

A free e-book: Consolidate Your Applications on Oracle WebLogic Server.

How to read the values of bind variables for currently executing statements real time monitoring kicks in.

Problems when doing a V2V or P2V of a windows server to Oracle VM.

Oracle Social Relationship Management: Access the Specialization Exams Today.

SQL Server:

Skipping Columns (& other tricks) Using BCP Format Files (SQL Spackle) .

Search multiple SQL Server Error Logs at the same time.

An Overview of Project Conversion – Level 14 of the Stairway to Integration Services.

Free e-book: Performance Tuning with SQL Server Dynamic Management Views.

Calculating the Median Value within a Partitioned Set Using T-SQL.


Percona Server: Improve Scalability with Percona Thread Pool.

MySQL and PostgreSQL JSON functions: Do they differ much?

Select into outfile and load data infile are not complementary by default.

MySQL Workbench 6.1.1 BETA has been released.

Noel Herrick have covered some strange default behavior around nulls in MySQL.

Categories: DBA Blogs

Merge Replication Identity Range Management,an identity disorder? Part 1

Fri, 2014-01-24 07:32

No no , we are not opening a new psychology practice at Pythian; unless we consider something like picking an optimal execution plan a form of schizophrenia!

Those who work with Merge replication know replicating articles with identity columns need some planning, especially if the identity column is part of the primary key or if there are many subscribers. Merge replication provides automatic identity range management across replication nodes that can offset the overhead of managing them manually. This blog will briefly shed some light on identity range management and how it works. In part 2, we will explore a specific case when batch inserts can affect the identity range management.

Why Identity Columns ?

Identity columns are very popular (and even touted more by DBA’s) to use as Primary Keys other than GUIDs for multiple reasons, including :

- They are smaller in size ; GUIDs are four times larger than INT

- Indexes over GUIDs tend to be larger. If the GUID is clustered index then it will affected every NON-Clustered index as well. Check out those two excellent blogs about GuIDs as PK and as Clustered indexes.

- Because they are random, indexes fragmentation are common on indexes over GUIDs. You can reduce this by using NEWSEQUENTIALID() function but it has a security disadvantage

Here’s a very good blog about GUIDs comparing them to INT/BIGINT

How Merge Replication Manages Identity Ranges ?

So, you have a few articles with identity columns that you want to replicate between a publisher and few subscriptions in a merge replication setup so the subscribers will also be able to update data back. Each “node” in the replication topology should be assigned a range which the user processes will use and then replicate to other nodes. It is important, especially if identity column is part of primary key, that no duplicates happen.

Let’s say we want to publish one article with identity column as a primary key to two subscribers. If we do not assign the correct ranges to each “node” here, then it’s possible that user processes will create same identity values. Once the merge agent tries to synchronize, it will insert duplicate values to the tables from other nodes and will fail if identity column is part of a primary key.

If we designate that each “node” gets isolated range of identity values, then user application will write values that cannot be used elsewhere. This is simply what Automatic Identity range does!

- On the publisher , the identity pool is used to assign values when adding new articles. There are two kinds of pools (ranges)

1) Pool assigned to publisher and subscribers that will be used directly to insert data in these subscribers.

This is controlled through @identity_range parameter of sp_addmergearticle

2) If any of the subscribers are SERVER subscriptions , which can possibly republish data , they are assigned a pool (range) so that they can use their own subscribers. Usually this pool is larger than the pool in point one because it can serve multiple subscribers. This is controlled through @pub_identity_range parameter of sp_addmergearticle


Example: If we configure the publisher to have range from 1-1000, Subscriber 1 to have 1100-2100 and subscriber 2 to have range from 2200-3200, then merge replication should NOT insert any duplicate values at any nodes. Kindly note that Merge agent processes do NOT increase the identity values when synchronizing , they just insert the plain values,  like the case when we use “set identity_insert ON”!

How the Ranges are Enforced

Through the usage of constraints on the identity column. Starting with SQL server 2005, each node is assigned two ranges, primary and secondary. Both primary and secondary ranges are equal in size and once the primary range is exhausted, the secondary one is used and merge agent assigns a new range to subscriber (becoming the secondary range); this is when automatic range management is chosen when the article is added (please see @identityrangemanagementoption parameter of sp_addmergearticle).

If you need more documentation, please refer to the following links:

Now some hands on!

A Simple Tutorial

Let us demonstrate a very simple merge replication topology with a publisher and two subscriptions. I am doing this on the same instance of my machine. The instance has to have SQL server agent to run job so STD, DEV or ENT edition is needed.

Let’s create three user databases and one table under the database that will be published:

Use Master
Create database Pub;
Create database sub1;
Create database sub2;
create table pub.dbo.tbl_pub (col1 int identity(1,1), col2 datetime)

Enable distributor feature and create distribution database:

use master
Declare @instance nvarchar(1000)

Set @instance = @@servername;

exec sp_adddistributor @distributor = @instance, @password = N''
exec sp_adddistributiondb @database = N'distribution', @log_file_size = 2, @min_distretention = 0, @max_distretention = 72, @history_retention = 48, @security_mode = 1
exec sp_adddistpublisher @publisher = @instance , @distribution_db = N'distribution', @security_mode = 1,@trusted = N'false', @thirdparty_flag = 0, @publisher_type = N'MSSQLSERVER'

Create publication and add article:

use [Pub]
exec sp_replicationdboption @dbname = N'Pub', @optname = N'merge publish', @value = N'true'
-- Adding the merge publication
use [Pub]
exec sp_addmergepublication @publication = N'Pub1',
@sync_mode = N'native', @retention = 14, @allow_push = N'true', @allow_pull = N'true', @allow_anonymous = N'false', @publication_compatibility_level = N'100RTM', @replicate_ddl = 1
exec sp_addpublication_snapshot @publication = N'Pub1', @frequency_type = 1, @frequency_interval = 0, @frequency_relative_interval = 0,@publisher_security_mode = 1

use [Pub]
exec sp_addmergearticle @publication = N'Pub1',
@article = N'tbl_pub', @source_owner = N'dbo', @source_object = N'tbl_pub', @type = N'table', @pre_creation_cmd = N'drop', @schema_option = 0x000000010C034FD1, @identityrangemanagementoption = N'auto', @pub_identity_range = 1000, @identity_range = 100, @threshold = 80, @destination_owner = N'dbo', @force_reinit_subscription = 1, @column_tracking = N'false',  @allow_interactive_resolver = N'false'

The important parameters here are @pub_identity_range = 1000, @identity_range = 100
@identity_range is what is allocated to the publisher and subscribers (i.e: this is the range that controls the values inserted on the publisher and subscribers as well).

@pub_identity_range : has a role when the subscribers also re-publish data because those republishing subscribers will synchronize data with their own subscribers and eventually will synchronize with the original publisher. It’s not a common practice but can happen , we won’t talk about it here.

Let’s create the subscriptions and run the snapshot:

use [Pub]
exec sp_addmergesubscription @publication = N'Pub1', @subscriber = N'MYINSTANCE',
@subscriber_db = N'sub1', @subscription_type = N'Push', @sync_type = N'Automatic',
@subscriber_type = N'Global', @subscription_priority = 75, @use_interactive_resolver = N'False'

Declare @instance nvarchar(1000)
Set @instance = @@servername;

exec sp_addmergepushsubscription_agent @publication = N'Pub1', @subscriber =@instance,
@subscriber_db = N'sub1', @subscriber_security_mode = 1, @publisher_security_mode = 1;

exec sp_addmergesubscription @publication = N'Pub1', @subscriber = N'MYINSTANCE',
@subscriber_db = N'sub2', @subscription_type = N'Push', @sync_type = N'Automatic',
@subscriber_type = N'Global', @subscription_priority = 75, @use_interactive_resolver = N'False'

Declare @instance nvarchar(1000)
Set @instance = @@servername;

exec sp_addmergepushsubscription_agent @publication = N'Pub1', @subscriber =@instance,
@subscriber_db = N'sub2', @subscriber_security_mode = 1, @publisher_security_mode = 1;

-- Start snapshot agent

Exec pub..sp_startpublication_snapshot 'pub1'
Waitfor delay '00:00:15'

Get snapshot agent output:

Exec sp_MSenum_replication_agents @type = 1

dbname name status publisher publisher_db publication start_time time duration comments distribution MYINSTANCE-Pub-Pub1-2 2 MYINSTANCE Pub Pub1 20140118 18:05:30.750 20140118 18:05:40.523 10 [100%] A snapshot of 1 article(s) was generated.

Start Merge agent for each subscription to push initial snapshot:
– job_name = —-

Exec msdb..sp_start_job @job_name='MYINSTANCE-Pub-Pub1-MYINSTANCE-3';
Exec msdb..sp_start_job @job_name='MYINSTANCE-Pub-Pub1-MYINSTANCE-4';

Waitfor delay '00:00:15'
-- Get output of the merge agent
use [master]
exec [distribution].sys.sp_replmonitorhelpmergesession @agent_name = N'MYINSTANCE-Pub-Pub1-MYINSTANCE-3', @hours = -1, @session_type = 1
Session_id Status StartTime EndTime Duration UploadedCommands DownloadedCommands ErrorMessages PercentageDone LastMessage 3 2 1/18/2014 18:14:42 1/18/2014 18:14:54 13 0 0 0 100 Applied the snapshot and merged 0 data change(s) (0 insert(s),0 update(s), 0 delete(s), 0 conflict(s)).

This is where Merge replication metadata tracks the history allocated ranges:

order by range_begin
publication article subscriber subscriber_db range_begin range_end is_pub_range next_range_begin next_range_end Pub1 tbl_pub MYINSTANCE Pub 1 101 0 101 201 Pub1 tbl_pub MYINSTANCE sub2 201 301 0 301 401 Pub1 tbl_pub MYINSTANCE sub1 401 501 0 501 601 Pub1 tbl_pub MYINSTANCE sub2 601 1601 1 1601 2601 Pub1 tbl_pub MYINSTANCE sub1 2601 3601 1 3601 4601

is_pub_range=1 is reserved for the subscribers in case they will re-publish the data.

Let’s check the schema on each table of the replication nodes:

select 'Publisher' [Node], name, definition from pub.sys.check_constraints where type = 'C' and parent_object_id =
(select object_id from pub.sys.objects where name ='tbl_pub')
union all
select 'Sub1', name, definition from Sub1.sys.check_constraints where type = 'C' and parent_object_id =
(select object_id from Sub1.sys.objects where name ='tbl_pub')
union all
select 'Sub2', name, definition from Sub2.sys.check_constraints where type = 'C' and parent_object_id =
(select object_id from Sub2.sys.objects where name ='tbl_pub')
Node name definition Publisher repl_identity_range_21A0E50F_D0F6_4E43_9FE1_349C1210247F ([col1]>=(1) AND [col1](101) AND [col1]<=(201)) Sub1 repl_identity_range_21A0E50F_D0F6_4E43_9FE1_349C1210247F ([col1]>(401) AND [col1](501) AND [col1]<=(601)) Sub2 repl_identity_range_21A0E50F_D0F6_4E43_9FE1_349C1210247F ([col1]>(201) AND [col1](301) AND [col1]<=(401))

Here we can see that each node has a primary and secondary range enforced through the constrainers.

Let’s add some data!

Insert into pub.dbo.tbl_pub (col2) select getdate();
Go 25
Insert into sub1.dbo.tbl_pub (col2) select getdate();
Go 25
Insert into sub2.dbo.tbl_pub (col2) select getdate();
Go 25

Start sync:

Exec msdb..sp_start_job @job_name='MYINSTANCE-Pub-Pub1-MYINSTANCE-3';
Exec msdb..sp_start_job @job_name='MYINSTANCE-Pub-Pub1-MYINSTANCE-4';

Count rows in each table, should be 75 (3 * 25 rows)
select (select count(*) from pub.dbo.tbl_pub),(select count(*) from sub1.dbo.tbl_pub), (select count(*) from sub2.dbo.tbl_pub)

75 — 75 — 75

Now, let’s see what will happen when we insert many rows that exceed the range. Let’s start with the publisher. We had inserted 25 rows and the primary + secondary ranges are 200.

Insert into sub1.dbo.tbl_pub (col2) select getdate();
Go 177

Let’s see the new range:

publication article subscriber subscriber_db range_begin range_end is_pub_range next_range_begin next_range_end Pub1 tbl_pub MYINSTANCE Pub 4601 4701 0 4701 4801

How? Well , the largest range last allocated was 4601 (republishing range).

How “new range” was allocated

Under each table , there are 3 triggers corresponding for Insert, Update & Delete. The insert trigger ***On Publisher*** has an extra part to check for the identity range and use secondary range if Primary range is exhausted. If both primary and secondary range are exhausted, a new range is allocated.

Here’s what the trigger code looks here:

ALTER trigger [dbo].[MSmerge_ins_21A0E50FD0F64E439FE1349C1210247F] on [Pub].[dbo].[tbl_pub] for insert   as
    declare @is_mergeagent bit, @at_publisher bit, @retcode smallint

    set rowcount 0
    set transaction isolation level read committed

        select @is_mergeagent = convert(bit, sessionproperty('replication_agent'))
        select @at_publisher = 0
    if (select trigger_nestlevel()) = 1 and @is_mergeagent = 1
    if is_member('db_owner') = 1
        -- select the range values from the MSmerge_identity_range table
        -- this can be hardcoded if performance is a problem
        declare @range_begin numeric(38,0)
        declare @range_end numeric(38,0)
        declare @next_range_begin numeric(38,0)
        declare @next_range_end numeric(38,0)

        select @range_begin = range_begin,
               @range_end = range_end,
               @next_range_begin = next_range_begin,
               @next_range_end = next_range_end
            from dbo.MSmerge_identity_range where artid='21A0E50F-D0F6-4E43-9FE1-349C1210247F' and subid='95208516-4B98-451D-B264-C1F27B20449A' and is_pub_range=0

        if @range_begin is not null and @range_end is not NULL and @next_range_begin is not null and @next_range_end is not NULL
            if IDENT_CURRENT('[dbo].[tbl_pub]') = @range_end
                DBCC CHECKIDENT ('[dbo].[tbl_pub]', RESEED, @next_range_begin) with no_infomsgs
            else if IDENT_CURRENT('[dbo].[tbl_pub]') >= @next_range_end
                exec sys.sp_MSrefresh_publisher_idrange '[dbo].[tbl_pub]', '95208516-4B98-451D-B264-C1F27B20449A', '21A0E50F-D0F6-4E43-9FE1-349C1210247F', 2, 1
                if @@error<>0 or @retcode<>0
                    goto FAILURE


The important part is this:

if @range_begin is not null and @range_end is not NULL and @next_range_begin is not null and @next_range_end is not NULL
            if IDENT_CURRENT('[dbo].[tbl_pub]') = @range_end
                DBCC CHECKIDENT ('[dbo].[tbl_pub]', RESEED, @next_range_begin) with no_infomsgs
            else if IDENT_CURRENT('[dbo].[tbl_pub]') >= @next_range_end
                exec sys.sp_MSrefresh_publisher_idrange '[dbo].[tbl_pub]', '95208516-4B98-451D-B264-C1F27B20449A', '21A0E50F-D0F6-4E43-9FE1-349C1210247F', 2, 1

A very important note: Only members of Db_owner will be able to allocate a new range. This is a limitation because, in many instances, you want to limit the permissions of your connected users. If the user inserting the data is not a member of DB_OWNER, then merge agent has to run OR you need to manually allocate a new range using sp_adjustpublisheridentityrange

What about subscribers? Should we expect a new range to be generated if we insert more rows than the allowed range? Let’s see

Insert into sub2.dbo.tbl_pub (col2) select getdate();
Go 175

use sub2
Dbcc checkident (tbl_pub , noreseed)

--Checking identity information: current identity value '401', current column value '426'.

--We just reached the limit of the secondary range, any new insert will violate the constraint unless a new range is allocated

Insert into sub2.dbo.tbl_pub (col2) select getdate();

Msg 548, Level 16, State 2, Line 1
The insert failed. It conflicted with an identity range check constraint in database ‘sub2′, replicated table ‘dbo.tbl_pub’, column ‘col1′. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.

Why wasn’t this handled like the publisher? Simply put , it’s by design. New ranges are allocated for subscribers when they sync with the publisher. This is even reflected in the trigger code. The following part is missing from the trigger on the subscriber side:

exec sys.sp_MSrefresh_publisher_idrange ‘[dbo].[tbl_pub]‘, ’95208516-4B98-451D-B264-C1F27B20449A’, ’21A0E50F-D0F6-4E43-9FE1-349C1210247F’, 2, 1

Some people complained about this, but, no dice!


Here we learnt few points:

- Merge replication can handle identity ranges automatically.

- Starting with SQL 2005 , there are two range pools assigned to each publisher and subscriber.

- A republishing subscriber will have a “republishing” range to use for its own subscribers.

- The size of each pool can be configured.

Some recommendations here as well:

1- You need to choose the datatype of your identity column carefully because you do not want the subscribers to run out of range. SMALLINT, for example, allows -2^15 (-32,768) to 2^15-1 (32,767) which is quite low for any production system while BIGINT is generous here with range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807), it consumes more storage though.
2- When adding articles , choose a range that can cover your operations without the need to frequently assign new ones. If you choose a datatype like BIGINT then you can be a bit generous as a safe factor.
3- Plan the subscribers sync before the subscribers consume all range values so that new ranges are assigned. You can monitor the ranges consumption and kick off sync manually.
4- On publisher , if processes inserting data are not part of a db_owner, then you will need to run agent or manually run sp_adjustpublisheridentityrange

What about you folks? How do you manage identities on Merge replication? Did you face any problems?

Happy reading!

Categories: DBA Blogs

Script to Collect Database Information Quickly

Thu, 2014-01-23 15:38
As a DBA there are occasions where we are required to collect details from a database which can 
be used for further analysis. For example, getting the status of the database (i.e. File Size, 
Free Space, Status of the database, and who is the current owner of the database). This kind of 
information is required, and very useful, for auditing purposes in addition to tracking the database/database 
file's size for various reasons. I had a script which does this job for me, exactly the way 
I want it; however, I have to run it for each database separately.  

One fine day, while answering some question's on a forum, I found a script by Dan Guzman which 
retrieved most of the information I needed, and it does this for all the databases. I have 
adopted Dan G.'s script for my use and modified it by adding some more details to it. 

Please review the script below. Let me know if you like it or dislike it. I will try to 
make further improvements on this script.
-- Script Originally Written By: Dan Guzman | 
-- Modified by: Hemantgiri S. Goswami 
-- Reference: 
    @SqlStatement NVARCHAR(MAX)
    ,@DatabaseName SYSNAME;

IF OBJECT_ID(N'tempdb..#DatabaseSpace') IS NOT NULL
    DROP TABLE #DatabaseSpace;

CREATE TABLE #DatabaseSpace
    DBID            INT,
    Recovery_Model    VARCHAR(15),
    DBOWNER            VARCHAR(25),
    FILE_SIZE_MB    DECIMAL(12, 2),
    SPACE_USED_MB    DECIMAL(12, 2),
    FREE_SPACE_MB    DECIMAL(12, 2),
    AUTOGROWTH        INT,
    DB_STATUS        VARCHAR(100)

    SELECT name FROM sys.databases WHERE STATE = 0;

OPEN DatabaseList;
WHILE 1 = 1
    FETCH NEXT FROM DatabaseList INTO @DatabaseName;
    SET @SqlStatement = N'USE '
        + QUOTENAME(@DatabaseName)
        + CHAR(13)+ CHAR(10)
        + N'INSERT INTO #DatabaseSpace
                [ServerName]         = @@ServerName
                ,[DBID]             = SD.DBID
                ,[DATABASE_NAME]    = DB_NAME()
                ,[Recovery_Model]    = d.recovery_model_desc
                ,[DBOwner]             = SUSER_SNAME(sd.sid)
                ,[LOGICAL_NAME]     =
                ,[File_Path]         = sf.filename
                ,[FILE_SIZE_GB]     = (CONVERT(decimal(12,2),round(f.size/128.000,2))/1024)
                ,[SPACE_USED_GB]     = (CONVERT(decimal(12,2),round(fileproperty(,''SpaceUsed'')/128.000,2))/1024)
                ,[FREE_SPACE_GB]     = (CONVERT(decimal(12,2),round((f.size-fileproperty(,''SpaceUsed''))/128.000,2))/1024)
                ,[Growth_Option]     = case sf.status 
                                        & 0x100000
                                        WHEN 1048576    THEN    ''Percentage''                                        
                                        WHEN 0            THEN    ''MB''
                ,[Maximum_Size]     = SF.MaxSize
                ,[AutoGrowth(MB)]     = (SF.Growth*8/1024)
                ,[DB_Status]        =
                                    CASE SD.STATUS
                                        WHEN 0 THEN ''Normal''
                                        WHEN 1 THEN ''autoclose'' 
                                        WHEN 2 THEN ''2 not sure'' 
                                        WHEN 4 THEN ''select into/bulkcopy'' 
                                        WHEN 8 THEN ''trunc. log on chkpt'' 
                                        WHEN 16 THEN ''torn page detection'' 
                                        WHEN 20 THEN ''Normal'' 
                                        WHEN 24 THEN ''Normal'' 
                                        WHEN 32 THEN ''loading'' 
                                        WHEN 64 THEN ''pre recovery'' 
                                        WHEN 128 THEN ''recovering'' 
                                        WHEN 256 THEN ''not recovered'' 
                                        WHEN 512 THEN ''offline'' 
                                        WHEN 1024 THEN ''read only'' 
                                        WHEN 2048 THEN ''dbo use only'' 
                                        WHEN 4096 THEN ''single user'' 
                                        WHEN 8192 THEN ''8192 not sure'' 
                                        WHEN 16384 THEN ''16384 not sure'' 
                                        WHEN 32768 THEN ''emergency mode'' 
                                        WHEN 65536 THEN ''online'' 
                                        WHEN 131072 THEN ''131072 not sure'' 
                                        WHEN 262144 THEN ''262144 not sure'' 
                                        WHEN 524288 THEN ''524288 not sure'' 
                                        WHEN 1048576 THEN ''1048576 not sure'' 
                                        WHEN 2097152 THEN ''2097152 not sure'' 
                                        WHEN 4194304 THEN ''autoshrink'' 
                                        WHEN 1073741824 THEN ''cleanly shutdown''
            ON F.NAME = SF.NAME
            SD.DBID = SF.DBID
            D.DATABASE_ID = SD.DBID
            AND DATABASEPROPERTYEX(SD.NAME,''Updateability'') <> ''OFFLINE''
            ORDER BY [File_Size_GB] DESC';

CLOSE DatabaseList;
DEALLOCATE DatabaseList;

SELECT * FROM #DatabaseSpace;

DROP TABLE #DatabaseSpace;

-- Hemantgiri S. Goswami |Linkedin | Twitter

Categories: DBA Blogs

Log Buffer: Marching from 2013 to 2014

Thu, 2014-01-23 15:15

Pythian’s cherished Log Buffer carnival is celebrating the year 2013, while looking passionately towards yet another promising year for 2014. So much digital water has flown under the bridges in 2013, and Log Buffer covered it all from the eyes of bloggers across database technologies like Oracle, SQL Server, MySQL, and big data.

Almost 50 Log Buffer episodes in year 2013 threw light on new releases of software, cool features, elucidating obscure technical corners, conferences stories, database news, quirks, nifty tricks, trivia, and much more.

Log Buffer remained focused on every facet of database technology, and while doing so, noticed that there were a few salient technologies which interested bloggers and their readers a lot. Some of these technologies and areas were old and some new.

One evergreen area of interest for the database bloggers is discussing performance tuning. This is the one area of databases where the interest carried on from the past year, remaining hot throughout the year and will be the center of discussion next year.

The hot topics this year among the database bloggers will include big data, Cloud Computing, Engineered Systems, and the new version of Oracle database 12c.  Regardless of the database vendor, bloggers loved to explore, discuss, share, and present about these brave new bleeding edge technologies which are still evolving.

One prime takeaway from the Log Buffer carnival is the unique ability of the database bloggers to set the tone of the technology. Database vendors take these bloggers very seriously and improve their offerings based on the blog posts and resulting discussions. There is so much to learn, and database blog posts are a quick and efficient way to stay on top of the ever-changing and ever-evolving data oriented technologies.

Log Buffer is all geared up to keep encompassing all things data related in 2014. Stay tuned.

Pythian’s cherished Log Buffer carnival is celebrating the year 2013, while looking passionately towards yet another promising year for 2014. So much digital water has flown under the bridges in 2013, and Log Buffer covered it all from the eyes of bloggers across database technologies like Oracle, SQL Server, MySQL, and big data.

Almost 50 Log Buffer episodes in year 2013 threw light on new releases of software, cool features, elucidating obscure technical corners, conferences stories, database news, quirks, nifty tricks, trivia, and much more.

Log Buffer remained focused on every facet of database technology, and while doing so, noticed that there were a few salient technologies which interested bloggers and their readers a lot. Some of these technologies and areas were old and some new.

One evergreen area of interest for the database bloggers is discussing performance tuning. This is the one area of databases where the interest carried on from the past year, remaining hot throughout the year and will be the center of discussion next year.

The hot topics this year among the database bloggers will include big data, Cloud Computing, Engineered Systems, and the new version of Oracle database 12c.  Regardless of the database vendor, bloggers loved to explore, discuss, share, and present about these brave new bleeding edge technologies which are still evolving.

One prime takeaway from the Log Buffer carnival is the unique ability of the database bloggers to set the tone of the technology. Database vendors take these bloggers very seriously and improve their offerings based on the blog posts and resulting discussions. There is so much to learn, and database blog posts are a quick and efficient way to stay on top of the ever-changing and ever-evolving data oriented technologies.

Log Buffer is all geared up to keep encompassing all things data related in 2014. Stay tuned.

Categories: DBA Blogs

2013 in Review — SQL Server

Mon, 2014-01-20 13:24

It’s that time of year. When I take some time away from work, hide out, and reflect on the past year.

2013 was a big year for Microsoft, particularly SQL Server, Windows Azure, and HDInsight. While I’d love to recap every Microsoft moment that excited me, there are too many to count, so I’ll just list the highlights and leave you with some things to look forward to in 2014.


HDInsight went from a selective beta to a general release. Microsoft’s Hadoop partner, HortonWorks, released version 2.0. Although Windows Azure has not upgraded to HDP 2.0 yet, I’m told it will upgrade soon. I think there is a lot of potential to use Hadoop in conjunction with SQL Server and this is something I’ll be experimenting with soon.

SQL Server 2012

Microsoft released several patches for SQL Server 2012. I’ve seen a lot of organizations begin migrating to the platform, which is excellent. There is a lot of interest in AlwaysOn Availability Groups, which is one of my favorite new(er) technologies in SQL Server.

SQL Server 2014

Microsoft announced SQL Server 2014, and then proceeded to release not one but two community preview versions (CTP1 and CTP2) to the public. A full release (RTM) is expected sometime in 2014. I’m very excited about SQL Server 2014 and its full list of features. My favorites are the expanded replicas for AlwaysOn Availability Groups, the change in how SQL Server responds if a replica is offline, and in memory tables. The SQL Server 2014 CTP releases were placed in the image gallery, making it easy to explore the new product in minutes. There are also some new features slated to be added to the full product that will solve a lot of technical challenges for DBAs.

Windows Azure upgrades

Microsoft continued to enhance Windows Azure, and also announced support for Oracle databases. I’m a regular Windows Azure user and a big fan of the platform. Gartner predicts that cloud adoption will continue to grow and accelerate, so it’s time to accept the shift. If you haven’t checked out Windows Azure, I recommend you spend some time on the platform. It’s a strong player in the market and will only get better in the future.

These are a few of my highlights. What are yours?

Categories: DBA Blogs