Skip navigation.

Yann Neuhaus

Syndicate content
All blog entries from http://www.dbi-services.com/
Updated: 7 hours 46 min ago

Using Windows 2012 R2 & dynamic witness feature with minimal configurations

Mon, 2015-02-23 15:30

Do you have ever seen the following message while you’re trying to validate your cluster configuration with your availability groups or FCI’s and Windows Server 2012 R2?


blog_32_-_0_-_cluster_validation


 

Microsoft recommends to add a witness even if you have only two cluster members with dynamic weights. This recommendation may make sense regarding the new witness capabilities. Indeed, Windows 2012 R2 improves the quorum resiliency with the new dynamic witness behavior. However, we need to take care about it and I would like to say at this point that I’m reluctant to recommend to meet this requirement with a minimal cluster configuration with only 2 nodes. In my case, it’s very usual to implement SQL Server AlwaysOn and availability groups or FCI’s architectures with only two cluster nodes at customer places. Let’s talk about the reason in this blog post.

 

First of all, let’s demonstrate why I don’t advice my customers to implement a witness by following the Microsoft recommendation. In my case it consists in adding a file share witness on my existing lab environment with two cluster nodes that use the dynamic weight behavior:


blog_32_-_1_-_cluster_2_nodes_configuration_nodeweight


 

Now let’s introduce a file share witness (\DC2WINCLUST-01) as follows:


blog_32_-_2_-_adding_FSW

 

We may notice after introducing the FSW that the node weight configuration has changed:

 

blog_32_-_3_-_cluster_new_configuration

 

 

blog_32_-_4_-_cluster_fsw_config

 

The total number of votes equals 3 here because we are in the situation where we have an even number of cluster members plus the witness. As a reminder, we are supposed to use a dynamic witness feature according to the Microsoft documentation here.

 

In Windows Server 2012 R2, if the cluster is configured to use dynamic quorum (the default), the witness vote is also dynamically adjusted based on the number of voting nodes in current cluster membership. If there is an odd number of votes, the quorum witness does not have a vote. If there is an even number of votes, the quorum witness has a vote.

 

The quorum witness vote is also dynamically adjusted based on the state of the witness resource. If the witness resource is offline or failed, the cluster sets the witness vote to "0."

 

The last sentence draws my attention and now let’s introduce a failure of the FSW. In my case I will just turn off the share used by my WFSC as follows:

 

blog_32_-_5_-_disable_fileshare


 

As expected, the file share witness state has changed from online to failed state by the resource control manager:

 

blog_32_-_6_-_fileshare_witness_failed

 

At this point, according to the Microsoft documentation, we may expect that the WitnessDynamicWeight property will change by the cluster but to my surprise, this was not the case:

 

blog_32_-_62_-_fileshare_witness_configuration


 

In addition, after taking a look at the cluster log I noticed this sample among the entire log records:

000014d4.000026a8::2015/02/20-12:45:43.594 ERR   [RCM] Arbitrating resource 'File Share Witness' returned error 67 000014d4.000026a8::2015/02/20-12:45:43.594 INFO [RCM] Res File Share Witness: OnlineCallIssued -> ProcessingFailure( StateUnknown ) 000014d4.000026a8::2015/02/20-12:45:43.594 INFO [RCM] TransitionToState(File Share Witness) OnlineCallIssued-->ProcessingFailure. 000014d4.00001ea0::2015/02/20-12:45:43.594 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message 000014d4.000026a8::2015/02/20-12:45:43.594 ERR   [RCM] rcm::RcmResource::HandleFailure: (File Share Witness) 000014d4.000026a8::2015/02/20-12:45:43.594 INFO [QUORUM] Node 1: PostRelease for ac9e0522-c273-4da8-99f5-3800637db4f4 000014d4.000026a8::2015/02/20-12:45:43.594 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message 000014d4.000026a8::2015/02/20-12:45:43.594 INFO [QUORUM] Node 1: quorum is not owned by anyone 000014d4.000026a8::2015/02/20-12:45:43.594 INFO [RCM] resource File Share Witness: failure count: 0, restartAction: 0 persistentState: 1. 000014d4.00001e20::2015/02/20-12:45:43.594 INFO [GUM] Node 1: executing request locally, gumId:281, my action: qm/set-node-weight, # of updates: 1 000014d4.000026a8::2015/02/20-12:45:43.594 INFO [RCM] numDependents is zero, auto-returning true 000014d4.00001e20::2015/02/20-12:45:43.594 WARN [QUORUM] Node 1: weight adjustment not performed. Cannot go below weight count 3 in a hybrid configuration with 2+ nodes

 

The latest line (highlighted in red) is the most important. I guess here that “hybrid configuration” means my environment includes 2 cluster nodes and one witness (regarding its type). An interesting thing to notice is a potential limitation that exists for the dynamic witness behavior that cannot be performed below two cluster nodes. Unfortunately, I didn’t find any documentation from Microsoft about this message. Is it a bug or just a missing entry to the documentation or have I overlook something concerning the cluster behavior? At this point I can’t tell anything and I hope to get soon a response from Microsoft. The only thing I can claim at this point is that if I lose a cluster node, the cluster availability will be compromised. This reproduced issue is not specific on my lab environment and I faced the same behavior several times at my customers.

Let’s demonstrate by issuing a shutdown of one of my cluster node. After a couple of seconds, connection with my Windows failover cluster is lost and here what I found by looking at the Windows event log:


blog_32_-_7_-_quorum_lost

 

As I said earlier, at this point, with minimal configuration with two cluster nodes, I always recommend to my customers to skip this warming. After all, having only two cluster members with dynamic quorum behavior is sufficient to get a good quorum resiliency. Indeed, according to the Microsoft documentation to allow the system to re-calculate correctly the quorum, a simultaneous failure of a majority of voting members should not occur (in others words, the failure of cluster members must be sequential) and with two cluster nodes we may only lose one node at the same time in all cases.

What about more complex environments? Let’s say a FCI with 4 nodes (two cluster nodes on each datacenter) and a file share witness on the first datacenter. In contrast, in this case, if the file share witness fails, the cluster will adjust correctly the overall node weight configuration both on the cluster nodes and on the witness. This is completely consistent with the message found above: "Cannot go below weight count 3".


blog_32_-_8_-_quorum_adjustement_with_4_nodes



 

The bottom line is that the dynamic witness feature is very useful but you have to take care about its behavior with minimal configurations based on only two cluster nodes which may introduce unexpected results in some cases.

 

Happy cluster configuration!




12c online statistics gathering and startup restrict

Sat, 2015-02-21 16:15

I've written about 12c online statistics gathering in a UKOUG OracleScene article. My opinion is clear about it: you sill need to gather stale stats afterwards or you have mising, stale and inconsistent object statistics. This post is about cases where online statistics gathering does not occur (and are not documented) - which is another reason why we can't rely on it.

The case where it works

You can check on the article about how online gathering statistics works (or come to our 12c new feature workshop where we cover and practice all 12c optimizer new features)
In order to do something else here I'm showing how to trace it by activating the 0x10000 trace flag for dbms_stats:

SQL> connect demo/demo@//localhost/PDB1
Connected.
SQL> set serveroutput on
SQL> exec dbms_stats.set_global_prefs('TRACE',1+65536);
PL/SQL procedure successfully completed.

SQL> drop table DEMO;
Table dropped.

SQL> create table DEMO ( n number ) pctfree 99;
Table created.

SQL> insert /*+ append */ into DEMO select rownum from dual connect by 1000>=level;
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 1         1         1         to_char(count("N"))                        100
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 2         1         1         substrb(dump(min("N"),16,0,64),1,240)      9
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 3         1         1         substrb(dump(max("N"),16,0,64),1,240)      17
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 1         1         1         to_char(count("N"))                        100
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 2         1         1         substrb(dump(min("N"),16,0,64),1,240)      9
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 3         1         1         substrb(dump(max("N"),16,0,64),1,240)      17
DBMS_STATS: postprocess online optimizer stats gathering for DEMO.DEMO: save statis
DBMS_STATS: RAWIDX    SELMAPPOS RES                            NNV       NDV       
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 1         1          1000      1000      0         2891      1000
DBMS_STATS: RAWIDX    SELMAPPOS RES                            NNV       NDV       
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 2         2         Typ=2 Len=2: c1,2              NULL      NULL      
DBMS_STATS: RAWIDX    SELMAPPOS RES                            NNV       NDV       
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 3         3         Typ=2 Len=2: c2,b              NULL      NULL      
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 1         1         1         to_char(count("N"))                        100
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 2         1         1         substrb(dump(min("N"),16,0,64),1,240)      9
DBMS_STATS: SELMAPPOS CLISTIDX  INTCOL    SELITEM                                    GATHFLG
DBMS_STATS: ------------------------------------------------------------------------
DBMS_STATS: 3         1         1         substrb(dump(max("N"),16,0,64),1,240)      17

1000 rows created.

From the trace, online statistics gathering occured for that direct-path load.
We can see it also in the execution plan:

SQL> select * from table(dbms_xplan.display_cursor('1k2r9n41c7xba'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
SQL_ID  1k2r9n41c7xba, child number 0
-------------------------------------
insert /*+ append */ into DEMO select rownum from dual connect by 1000>=level

Plan hash value: 1600317434

---------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |      |       |     2 (100)|          |
|   1 |  LOAD AS SELECT                  |      |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |     1 |     2   (0)| 00:00:01 |
|   3 |    COUNT                         |      |       |            |          |
|   4 |     CONNECT BY WITHOUT FILTERING |      |       |            |          |
|   5 |      FAST DUAL                   |      |     1 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

and statistics are there:

SQL> select last_analyzed,num_rows,blocks from user_tables where table_name='DEMO';

LAST_ANAL   NUM_ROWS     BLOCKS
--------- ---------- ----------
21-FEB-15       1000        179

Don't forget to set the trace off:

SQL> exec dbms_stats.set_global_prefs('TRACE',0);
PL/SQL procedure successfully completed.

Ok. That is the known case. Table statistics are there.

 

startup restrict

When you want to do some online maintenance, being sure that the application is not connected, you start the database in restrict mode.

SQL> alter system enable restricted session;
System altered.

Then you can do you imports, reorg, bulk load, etc. and be sure that nobody will write or read into the table you are working on. Imagine you have tested the previous load and you have observed that the online gathered statistics are sufficient. Now you run the same in production in restricted mode.

SQL> connect demo/demo@//localhost/PDB1
Connected.
SQL> set serveroutput on
SQL> exec dbms_stats.set_global_prefs('TRACE',1+65536);
PL/SQL procedure successfully completed.

SQL> drop table DEMO;
Table dropped.

SQL> create table DEMO ( n number ) pctfree 99;
Table created.

SQL> insert /*+ append */ into DEMO select rownum from dual connect by 1000>=level;
1000 rows created.

No trace related to online statistics gathering.

SQL> select * from table(dbms_xplan.display_cursor('1k2r9n41c7xba'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
SQL_ID  1k2r9n41c7xba, child number 0
-------------------------------------
insert /*+ append */ into DEMO select rownum from dual connect by 1000>=level

Plan hash value: 1600317434

-------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | INSERT STATEMENT               |      |       |     2 (100)|          |
|   1 |  LOAD AS SELECT                |      |       |            |          |
|   2 |   COUNT                        |      |       |            |          |
|   3 |    CONNECT BY WITHOUT FILTERING|      |       |            |          |
|   4 |     FAST DUAL                  |      |     1 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

no STATISTICS GATHERING operation.

SQL> select last_analyzed,num_rows,blocks from user_tables where table_name='DEMO';

LAST_ANAL   NUM_ROWS     BLOCKS
--------- ---------- ----------

and no statistics.

 

10053 trace

Because we can't see the STATISTICS GATHERING operation in the execution plan, I know that it's an optimizer decision done at compilation time. I've dump the 10053 trace and got the following line:

ONLINEST: Checking validity of online stats gathering
ONLINEST: Failed validity check: database not open, in restricted/migrate mode, suspended, readonly, instance not open or OCI not available.

So we have a few reasons where online statistics does not occur and that are not documented as Restrictions for Online Statistics Gathering for Bulk Loadsand restricted mode is one of them.

 

Thin JDBC

Because the preceding line mentions OCI I wanted to be sure that online statistics gathering occurs even when connected though thin jdbc, and I used the sqlcl beta from SQL Developer 4.1 Early Adopter. Note that I'm not in restricted session anymore.

sql.bat demo/demo@//192.168.78.113/pdb1

SQLcl: Release 4.1.0 Beta on Sat Feb 21 21:10:59 2015

Copyright (c) 1982, 2015, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

SQL> show jdbc
-- Database Info --
Database Product Name: Oracle
Database Product Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Database Major Version: 12
Database Minor Version: 1
-- Driver Info --
Driver Name: Oracle JDBC driver
Driver Version: 12.1.0.2.0
Driver Major Version: 12
Driver Minor Version: 1
Driver URL: jdbc:oracle:thin:@//192.168.78.113/pdb1

SQL> create table DEMO ( n number ) pctfree 99;

Table DEMO created.

SQL> insert /*+ append */ into DEMO select rownum from dual connect by level select last_analyzed,num_rows,blocks from user_tables where table_name='DEMO';

LAST_ANALYZED                 NUM_ROWS     BLOCKS
--------------------------- ---------- ----------
21.02.15                          1000        100


Ok. no problem. I don't know what that 'OCI not available' is but it works even though JDBC Thin.

 

Conclusion

As I already said for other reasons, don't rely on online statistics gathering and always gather stale stats afterwards. It's good to have it as it saves some work to do by dbms_stats later. There are cases where it is better than no statistics (when combined with GTT private statistics for example) but don't rely on it. but don't rely on it.

How to set NLS for SQL Developer

Sat, 2015-02-21 14:24

I'm using Oracle SQL Developer 4.1 Early Adopter for a while and I like it. That version comes with a command line (in beta) which goal is to be fully compatible with sqlplus but running in java, and having a lot more features. 

Becuse it's connecting with thin java driver by default, it doesn't use NLS_LANG. It's java. It's unicode. So here is how to set the language and characterset with the java options.

QlikView Tips & Tricks

Thu, 2015-02-19 08:00

Qlik.png
For several months now, I've been working on some QlikView projects which is a quite interesting discovery for me. Generally, these projects are limited to the management of QlikView at the administrator level (installation, upgrade, configuration of the QlikView Management Console, aso...) but I was still able to accumulate some knowledge that I want to share with you today. In this blog entry, I will try to explain how to debug the QlikView Offline Service, how to properly configure the access to remote Shared Folders and how to enable the Single Sign-On between QlikView and a third party software. I will try to describe the required steps as best I can to avoid any problems.


I. QlikView Offline Service for QlikView 11.2 SR7 or below


In a complete QlikView environment that is using SSL (I don't know if it can happen without SSL), if you try to setup the QlikView Offline Service, you may face an issue where the Offline Service doesn't work at all. This happen even if the component was installed successfully and even if there are no errors in the QlikView log files. This issue comes from the fact that by default, QlikView enforces the FIPS compliance when using the Offline Service but this can cause some problems depending on your enterprise network restrictions. After a feedback on that point to the QlikView Support Team, they confirmed us that it was a bug and they fixed it in their next QlikView version (11.2 SR8 and above). A simple workaround for this issue can be setup by following these steps:

  1. SSL must be properly configured
  2. The QlikView Offline Service must be properly installed
  3. Login to the Windows Server with any Administrator account
  4. Open the file: C:/Windows/Microsoft.NET/Framework64/v4.0.30319/Config/machine.config
    1. Find the line with: ˂runtime /˃
    2. Replace this line with:
                ˂runtime˃
                         ˂enforceFIPSPolicy enabled="false" /˃
                ˂/runtime˃
  5. Save the file
  6. Open a command prompt as Administrator and execute the command: services.msc
  7. Restart all QlikView Services

Modification of the machine.conf file to disable the FIPS enforcementModification of the machine.conf file to disable the FIPS enforcement


After doing so, you should be able to access to your QlikView documents from a smartphone or a tablet to work offline.


II. Access to remote Shared Folders


As before, depending on your Windows Server GPOs, you may face some issues regarding the access to files stored on a remote Shared Folder (access via the user who run QlikView). By remote I mean another city, country, continent or whatever. This tips can help to solve some Shared Folders access even if you aren't using QlikView, it's more a Windows Server Tips ;). Regarding QlikView, this issue can be easily found in the log file because you will be able to see something like this during a task execution:

ShareIssue.png


The configuration I will show you below worked for me but depending on your network restrictions, it may not work as it is. The important thing here is to understand each parameters and the consequences of this configuration:

  1. Login to the Windows Server with any Administrator account
  2. Open a command prompt as Administrator and execute the command: regedit
  3. Open: HKLM ˃ SYSTEM ˃ CurrentControlSet ˃ Services ˃ LanmanServer ˃ Parameters
    1. Set "enablesecuritysignature" to 1
    2. Set "requiresecuritysignature" to 1
  4. Open: HKLM ˃ SYSTEM ˃ CurrentControlSet ˃ Services ˃ LanmanWorkstation ˃ Parameters
    1. Set "EnableSecuritySignature" to 1
    2. Set "RequireSecuritySignature" to 0
  5. Reboot the Windows Server

Share1.pngConfiguration of the LanmanServer registry keys to 1-1

Share2.pngConfiguration of the LanmanWorkstation registry keys to 1-0


As you can see, there are two different sections named "LanmanServer" and "LanmanWorkstation":

  • LanmanServer control the parameters of the current Windows Server when it acts as a Server
  • LanmanWorkstation control the parameters of the current Windows Server when it acts as a Client


For example, if you access a remote Shared Folder from the QlikView Windows Server, then you are acting as a Client and therefore with this configuration you can access to everything whatever is the LanmanServer configuration of the Shared Folder's Windows Server. Indeed, the local SecuritySignature is enabled but not required (Enable=1, Required=0, it will shorten this as "1-0") so it's the most generic case which cover all possible solutions of LanmanServer configuration of the remote Host (3 solutions: 0-0, 1-0 or 1-1).


In the same way, if a user try to access to a Shared Folder on the QlikView Server, then the QlikView Server will act as a LanmanServer and therefore the configuration taken into account is (1-1). This configuration can be changed but if the LanmanWorkstation configuration of the user's laptop is 1-1, then the LanmanServer configuration will need to be 1-1 otherwise the user will not be able to access to the Shared Folder of the QliKView Server. The 1-1 configuration is of course the most secure and therefore, it's often (always?) chosen on the user's Workstation. That's why it's generally a good idea to set the LanmanServer of the QlikView Server to 1-1 too.


It's really hard to explain this kind of stuff but I hope I was clear enough!


III. SSO between QlikView and "X"


Again, this configuration isn't something only related to QlikView but it can be useful if you need, for example, to allow QlikView to automatically store some documents into another system "X" which can be a Document Management System (Alfresco, SharePoint, Documentum, aso...) or something else. You may not need to do this because it can be allowed by default on your enterprise but it's generally a good practice to restrict the SSO features on a Windows Servers and therefore, this kind of configuration is often required. For this configuration, let's define X as a third party software and https://x.domain.com as the URL related to it.


From the Windows Server, if you try to access to the real URL of your third party software (e.g for Alfresco Share it would be: https://x.domain.com/share) and if you get a pop-up window asking you for credentials, then follow the steps below:

  1. Ensure that your Active Directory is properly configured for SSO (this is a very large topic and I will not describe it here)
  2. Login to the Windows Server with the account under which QlikView is running
  3. Open: Internet Explorer ˃ Internet Options ˃ Security ˃ Trusted Sites ˃ Sites
    1. Write: https://x.domain.com
    2. Click on: Add
    3. Write: about:blank (this step may not be mandatory)
    4. Click on: Add (this step may not be mandatory)
    5. Close the window
  4. Click on: Custom Level...
    1. Scroll down and find: User Authentication
    2. Set "Logon" to "Automatic logon with current username and password"
    3. Click on: OK
    4. Click on: Apply
    5. Click on: OK
  5. Restart Internet Explorer
  6. Ensure that the account under which QlikView is running has the proper permissions on the third party software

SSO_1.png
Trusted sites configuration with the list of URL for all "X"

SSO2.pngEnable the automatic logon with current user name and password for the Trusted Sites


After that configuration, if you try again to access to the real URL of your third party software, then the pop-up window should not be displayed and the login should be successful. Of course, the last step is important because the user that is running QlikView must have access to the third party software otherwise the SSO is useless...


This concludes this first approach of some QlikView Tips & Tricks. I'm sure that more will come soon but I will need to find some time to share that with you. I hope you will find this blog entry useful and don't hesitate to give me your feedback using the comments below!

SQL Server monitoring with nagios: utilisation

Wed, 2015-02-18 07:14

We saw in my last blog how to install the SQL Server plugin for Nagios.
In this new blog, I will explain:

  • how to create a wrapper to avoid the unset of the PERLLIB variable
  • how to create and configure a configuration file by monitoring instance
  • how to create an SQL Login for nagios
  • How to subscribe to an alert

As we saw in the first nagios blog, to be able to run the check mssql plugin we have to unset the PERLLIB variable and to export library.
We unset the PERLLIB libraries because there are set with the Oracle Client and there is conflicts. We force the plugin check_mssql_health to use the PERLLIB libraries of the OS instead of these of the Oracle client by unsetting the PERLLIB libraries of the check_oracle_health plugin.
We will do that in a wrapped file:

 b2ap3_thumbnail_Nagios_SQLServer_blog_correction1.jpg

After we are able to test our wrapper with success:

 b2ap3_thumbnail_Nagios_SQLServer_nagios_b2.jpg

We have decided to create a configuration file per SQL Server instance to monitor.
This file is as follow:

 b2ap3_thumbnail_Nagios_SQLServer_nagios_b3.jpg

The important things to see here is:

In the host definition line:

  • mssql_sid with the name of the instance given in the freetds.conf file (see first nagios blog)
  • mssql_usr with the user name used for the SQL Server connection
  • address with the IP Address of the SQL Server host

In the command line:

  • the call to the check_mssql_health_wrapped file
  • the counter used in this call, here cpu-busy
  • the warning threshold fixed at 80% of cpu used
  • the alert threshold fixed to 90% of cpu used

In the service line:

  • normal_check_interval which defines the interval to check the service under normal conditions, here 10 minutes
  • retry_check_interval which determines the number of minutes to wait before scheduling a re-check when service has changed to non-OK state, here 2 minutes
  • max_check_attempts which checks if the service has been retried max_check_attempts time without a change in its status, it will revert to being scheduled at normal_check_interval rate and a mail is sent to the contact_group, here 3 times
  • contact_groups which will receive alerts
  • notification_interval which determines every how many minutes alerts will be send

It means that the cpu-busy counter will be check every 10 minutes by nagios, in case of non-OK state the next check will be scheduled after 2 minutes and after 3 checks without change, an alert will be send to the contact group and another check will be scheduled 10 minutes later.  If the status stays non-ok a next alert will be sent after 6 hours.

A lots of counters are available for this plugin, to have a list, please go here.

 

An SQL Login can be created to connect to the SQL Server instance and databases to avoid to use sa user for example.
First create a windows domain account:

 b2ap3_thumbnail_Nagios_SQLServer_Account1.jpg

b2ap3_thumbnail_Nagios_SQLServer_Account2.jpgb2ap3_thumbnail_Nagios_SQLServer_Account2.jpg

Execute the following script in SSMS in a query window to create the new logins and grand permissions:

USE [master]
GO
CREATE LOGIN [DBITESTnagios_mssql_health] FROM WINDOWS WITH DEFAULT_DATABASE=[master]
GO
GRANT VIEW SERVER STATE to [DBITESTnagios_mssql_health]

USE [model]
GO
CREATE ROLE db_nagios_mssql_health
GRANT EXECUTE TO db_nagios_mssql_health
GRANT VIEW DATABASE STATE TO db_nagios_mssql_health
GRANT VIEW DEFINITION TO db_nagios_mssql_health
CREATE USER [DBITESTnagios_mssql_health] FOR LOGIN [DBITESTnagios_mssql_health]
EXEC sp_addrolemember'db_nagios_mssql_health', [DBITESTnagios_mssql_health]

After execute the following script to grand permission on database level, take care if a new database is installed later this script has to be run for this new db.

execute sp_MSforeachdb 'use [?]
print ''?''
USE [?]
CREATE ROLE db_nagios_mssql_health
GRANT EXECUTE TO db_nagios_mssql_health
GRANT VIEW DATABASE STATE TO db_nagios_mssql_health
GRANT VIEW DEFINITION TO db_nagios_mssql_health
CREATE USER [DBITESTnagios_mssql_health] FOR LOGIN [DBITESTnagios_mssql_health]
EXEC sp_addrolemember ''db_nagios_mssql_health'', [DBITESTnagios_mssql_health]'

 

To subscribe to an alert, new groups and contact can be defined.
Those objects will be created in a contact configuration file which will be added to the config.cfg file of nagios with this line:

# dbi-services contacts and contactgroups
cfg_file=/usr/local/nagios/etc/objects/DBI_contacts.cfg

The contact configuration file have the following structure:

 b2ap3_thumbnail_Nagios_SQLServer_nagios_b4.jpg

I hope those information will help you to configure the SQL Server plugin for nagios and will give you service status details as the picture below:

 b2ap3_thumbnail_Nagios_SQLServer_nagios_b5.jpg

See You.

Case of monitoring snapshot replication from the subscriber side

Tue, 2015-02-17 13:48

I don’t work often with SQL Server replication. The main reason is that the number of customers that use replication is pretty low and each time that I have to deal with it, it’s a good opportunity to improve my skills on this area. A couple of months ago I had to face an interesting issue with the snapshot replication (yes, I was lucky ... this is not the more complex replication mechanism you have to deal with). My customer had a lot of databases (approximatively 250) to replicate from two datacenters apart from one continent.

The global architecture includes two servers that act as publishers with a total of 250 articles (one table into one database) and a central subscriber that gathers this 250 articles (250 databases with one table into each database). All articles are concerned by push subscriptions and all replication jobs are running on the publishers. This replication process is part of a more wide ETL process which delivers some financial metric reports to the business users.

Before computing the financial metrics we need to ensure that all databases replication is done correctly. Unfortunately, according to the customer security context we are stuck because we may not have access from the different publishers. So the main question that remains is how to ensure that we don’t start the calculation of the financial metrics, if we detect that there exist running database replication processes at the same time from the subscriber side?

After reflexion, I had a solution but it is not perfect. I will explain why later in this blog post. My solution includes two things:

1- Recording the last time the concerned table is created. As reminder snapshot replication will recreate concerned articles on the subscriber.

2- Tracking bulk insert activity issued by the snapshot agent before performing the calculation of the financial metrics

Here my script:

 

SELECT      DB_NAME() as database_name,    tb.name as table_name,      tb.create_date as last_creation_date,    txt.text as current_statement,    txt.program_name as interface,    txt.request_mode,    txt.request_owner_type,    txt.request_status   FROM sys.tables as tb OUTER APPLY(    SELECT      db_name(r.database_id) as database_name,    t.text,    s.program_name,    l.request_mode,    l.request_status,    l.request_owner_type    FROM sys.dm_exec_requests AS r    JOIN sys.dm_exec_sessions as s    on r.session_id = s.session_id    JOIN sys.dm_tran_locks as l      on l.request_session_id = s.session_id    CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t    WHERE t.text LIKE 'insert bulk % + tb.name + ''%TABLOCK%'    AND r.session_id @@SPID    AND l.resource_type = 'OBJECT'    AND l.resource_database_id = r.database_id        AND l.resource_associated_entity_id = tb.object_id ) AS txt WHERE tb.name = 'mytable'

 

I finally created a stored procedure that fetches each concerned database and  execute the above script in the current context of the database. In the context of my customer I implemented an additional step which sends an email on half-day basis. Notice also that I added a custom business rule that detects replication issue if it does not occur before the last 6 hours. You can modify and adjust the script at your convenience.

 

SET NOCOUNT ON;   DECLARE @database_name SYSNAME; DECLARE @sql NVARCHAR(MAX);   -- working table : alert_replication_monitoring TRUNCATE TABLE msdb.dbo.alert_replication_monitoring;   -- for each concered database we will verify if the t_replixxx table is updated -- from snapshot replication DECLARE C_DB CURSOR FAST_FORWARD FOR SELECT        name FROM sys.databases WHERE name LIKE 'repli_%';   OPEN C_DB;   FETCH NEXT FROM C_DB INTO @database_name;   WHILE @@FETCH_STATUS = 0 BEGIN        SET @sql = N'USE ' + QUOTENAME(@database_name) + '                                   IF EXISTS (SELECT 1                                                FROM sys.tables WHERE name LIKE N''t_repli%'')                            BEGIN                                   INSERT INTO msdb.dbo.alert_replication_monitoring                                   SELECT                                         DB_NAME() as database_name,                                         tb.name as table_name,                                         tb.create_date as last_creation_date,                                         txt.text as current_statement,                                         txt.program_name as interface,                                         txt.request_mode,                                         txt.request_owner_type,                                         txt.request_status                                FROM sys.tables as tb                               OUTER APPLY (                                                             SELECT                                                                    db_name(r.database_id) as database_name,                                                                    t.text,                                                                    s.program_name,                                                                    l.request_mode,                                                                    l.request_status,                                                                    l.request_owner_type                                                             FROM sys.dm_exec_requests AS r                                                                    JOIN sys.dm_exec_sessions as s                                                                           on r.session_id = s.session_id                                                                    JOIN sys.dm_tran_locks as l                                                                                 on l.request_session_id = s.session_id                                                             CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t                                                             WHERE t.text LIKE ''insert bulk %'' + tb.name + ''%TABLOCK%''                                                                    AND r.session_id @@SPID                                                                           AND l.resource_type = ''OBJECT''                                                                                  AND l.resource_database_id = r.database_id                                                                                        AND l.resource_associated_entity_id = tb.object_id                                                       ) AS txt                                   WHERE tb.name LIKE ''t_repli%''                            END';               EXEC sp_executesql@sql;               FETCH NEXT FROM C_DB INTO @database_name; END   CLOSE C_DB; DEALLOCATE C_DB;   -- Check status of each database replication SELECT * FROM (        SELECT              database_name AS [database],              table_name AS [table],              last_creation_date AS [last synchro],              DATEDIFF(MI, last_creation_date, CURRENT_TIMESTAMP) AS [time since the last update (minutes)],              current_statement,              CASE                     WHEN DATEDIFF(MI, last_creation_date, CURRENT_TIMESTAMP)                                   AND current_statement IS NOT NULL THEN '1 - Replication in progress.'                     WHEN DATEDIFF(MI, last_creation_date, CURRENT_TIMESTAMP)                                   AND current_statement IS NULL THEN '2 - Replication done.'                     WHEN DATEDIFF(MI, last_creation_date, CURRENT_TIMESTAMP) > 360                                   AND current_statement IS NOT NULL THEN '3 - Replication in progress and takes longer than expected (6 hours).'                     ELSE '4 - No replication has occured during the six last hours.'              END replication_status        FROM msdb.dbo.alert_replication_monitoring ) AS repli ORDER BY [database], replication_status DESC;

 

Here a picture of the script result while the snapshot agent is running...

 

blog_31_1_replication_status_monitor

 

... and when there is no activity but we track the last time the table was synchronized.

 

blog_31_2_replication_status_monitor_21

 

As I said earlier, this method has a main following caveat:

We may only claim that a replication process is not running at a given time but we may not know if the replication process is done correctly or with errors.

Remember that this is more a workaround than a perfect solution.

Hope it helps! Please feel free to share your comments about this script!

Is CDB stable after one patchset and two PSU?

Mon, 2015-02-16 15:23

There has been the announce that non-CDB is deprecated, and the reaction that CDB is not yet stable.

Well. Let's talk about the major issue I've encountered. Multitenant is there for consolidation. What is the major requirement of consolidation? It's availability. If you put all your databases into one server and managed by one instance, then you don't expect a failure.

When 12c was out (and even earlier as we are beta testers) - 12.1.0.1 - David Hueber has encountered an important issue. When a SYSTEM datafile was lost, then we cannot revocer it without stopping the whole CDB. That's bad of course.

When Patchet 1 was out  (and we were beta tester again) I tried to check it that had been solved. I've seen that they had introduced the undocumented "_enable_pdb_close_abort" parameter in order to allow a shutdown abort of a PDB. But that was worse. When I dropped a SYSTEM datafile the whole CDB instance crashed immediately. I opened a SR and Bug 19001390 'PDB system tablespace media failure causes the whole CDB to crash' was created for that. All is documented in that blog post.

Now the bug status is: fixed in 12.1.0.2.1 (Oct 2014) Database Patch Set Update

Good. I've installed the latest PSU which is 12.1.0.2.2 (Jan 2015) And I test the most basic recovery situation: loss of a non-system tablespace in one PDB.

Here it is:

 

RMAN> report schema;
Report of database schema for database with db_unique_name CDB

List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 800 SYSTEM YES /u02/oradata/CDB/system01.dbf
3 770 SYSAUX NO /u02/oradata/CDB/sysaux01.dbf
4 270 UNDOTBS1 YES /u02/oradata/CDB/undotbs01.dbf
5 250 PDB$SEED:SYSTEM NO /u02/oradata/CDB/pdbseed/system01.dbf
6 5 USERS NO /u02/oradata/CDB/users01.dbf
7 490 PDB$SEED:SYSAUX NO /u02/oradata/CDB/pdbseed/sysaux01.dbf
11 260 PDB2:SYSTEM NO /u02/oradata/CDB/PDB2/system01.dbf
12 520 PDB2:SYSAUX NO /u02/oradata/CDB/PDB2/sysaux01.dbf
13 5 PDB2:USERS NO /u02/oradata/CDB/PDB2/PDB2_users01.dbf
14 250 PDB1:SYSTEM NO /u02/oradata/CDB/PDB1/system01.dbf
15 520 PDB1:SYSAUX NO /u02/oradata/CDB/PDB1/sysaux01.dbf
16 5 PDB1:USERS NO /u02/oradata/CDB/PDB1/PDB1_users01.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 60 TEMP 32767 /u02/oradata/CDB/temp01.dbf
2 20 PDB$SEED:TEMP 32767 /u02/oradata/CDB/pdbseed/pdbseed_temp012015-02-06_07-04-28-AM.dbf
3 20 PDB1:TEMP 32767 /u02/oradata/CDB/PDB1/temp012015-02-06_07-04-28-AM.dbf
4 20 PDB2:TEMP 32767 /u02/oradata/CDB/PDB2/temp012015-02-06_07-04-28-AM.dbf


RMAN> host "rm -f /u02/oradata/CDB/PDB1/PDB1_users01.dbf";
host command complete


RMAN> alter system checkpoint;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00601: fatal error in recovery manager
RMAN-03004: fatal error during execution of command
ORA-01092: ORACLE instance terminated. Disconnection forced
RMAN-03002: failure of sql statement command at 02/19/2015 22:51:55
ORA-03113: end-of-file on communication channel
Process ID: 19135
Session ID: 357 Serial number: 41977
ORACLE error from target database:
ORA-03114: not connected to ORACLE

 

Ok, but I have the PSU:

 

$ /u01/app/oracle/product/12102EE/OPatch/opatch lspatches
19769480;Database Patch Set Update : 12.1.0.2.2 (19769480)

 

Here is the alert.log:

 

Completed: alter database open
2015-02-19 22:51:46.460000 +01:00
Shared IO Pool defaulting to 20MB. Trying to get it from Buffer Cache for process 19116.
===========================================================
Dumping current patch information
===========================================================
Patch Id: 19769480
Patch Description: Database Patch Set Update : 12.1.0.2.2 (19769480)
Patch Apply Time: 2015-02-19 22:14:05 GMT+01:00
Bugs Fixed: 14643995,16359751,16870214,17835294,18250893,18288842,18354830,
18436647,18456643,18610915,18618122,18674024,18674047,18791688,18845653,
18849537,18885870,18921743,18948177,18952989,18964939,18964978,18967382,
18988834,18990693,19001359,19001390,19016730,19018206,19022470,19024808,
19028800,19044962,19048007,19050649,19052488,19054077,19058490,19065556,
19067244,19068610,19068970,19074147,19075256,19076343,19077215,19124589,
19134173,19143550,19149990,19154375,19155797,19157754,19174430,19174521,
19174942,19176223,19176326,19178851,19180770,19185876,19189317,19189525,
19195895,19197175,19248799,19279273,19280225,19289642,19303936,19304354,
19309466,19329654,19371175,19382851,19390567,19409212,19430401,19434529,
19439759,19440586,19468347,19501299,19518079,19520602,19532017,19561643,
19577410,19597439,19676905,19706965,19708632,19723336,19769480,20074391,
20284155
===========================================================
2015-02-19 22:51:51.113000 +01:00
db_recovery_file_dest_size of 4560 MB is 18.72% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Setting Resource Manager plan SCHEDULER[0x4446]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager CDB plan DEFAULT_MAINTENANCE_PLAN via parameter
2015-02-19 22:51:54.892000 +01:00
Errors in file /u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_ckpt_19102.trc:
ORA-63999: data file suffered media failure
ORA-01116: error in opening database file 16
ORA-01110: data file 16: '/u02/oradata/CDB/PDB1/PDB1_users01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_ckpt_19102.trc:
ORA-63999: data file suffered media failure
ORA-01116: error in opening database file 16
ORA-01110: data file 16: '/u02/oradata/CDB/PDB1/PDB1_users01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
USER (ospid: 19102): terminating the instance due to error 63999
System state dump requested by (instance=1, osid=19102 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_diag_19090_20150219225154.trc
ORA-1092 : opitsk aborting process
2015-02-19 22:52:00.067000 +01:00
Instance terminated by USER, pid = 19102

 

You can see the bug number in 'bug fixed' and the instance is still terminating after media failure on a PDB datafile. That's bad news. 

 

I've lost one datafile. At first checkpoint the CDB is crashed. I'll have to open an SR again. But for sure consolidation through multitenancy architecture is not yet for sensible production.

OracleText: deletes and garbage

Sun, 2015-02-15 15:21

In the previous post we have seen how the OracleText index tables are maintained when new document arrives: At sync the new documents are read up to the available memory and words are inserted in the $I table with their mapping information. Now we will see how removed documents are processed. We will not cover updates as their are just delete + insert.

Previous state

Here is the state from the previous post where I had those 3 documents:

  • 'Hello World'
which was synced alone, and then the two following ones were synced together:
  • 'Hello Moon, hello, hello'
  • 'Hello Mars'
The $K is a IOT which maps the OracleText table ROWID to the DOCID (the fact that the primary key TEXTKEY is not at start is a bit misleading):
SQL> select * from DR$DEMO_CTX_INDEX$K;

     DOCID TEXTKEY
---------- ------------------
         1 AAAXUtAAKAAABWlAAA
         2 AAAXUtAAKAAABWlAAB
         3 AAAXUtAAKAAABWlAAC
The $R is a table mapping the opposite navigation (docid to rowid) storing a fixed-length array of ROWIDs indexed by the docid, and split into several lines:
SQL> select * from DR$DEMO_CTX_INDEX$R;

    ROW_NO DATA
---------- ------------------------------------------------------------
         0 00001752D000280000056940414100001752D00028000005694041420000
The $I table stores the tokens, the first 5 columns being indexed ($X) and the TOKEN_INFO blob stores detailed location of the token:
SQL> select * from DR$DEMO_CTX_INDEX$I;

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO
---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802
We have seen that the $I table can be fragmented for 3 reasons:
  • Each sync insert his tokens (instead of merging with other ones)
  • TOKEN_INFO size is limited to fit in-row (we will see 12c new features later)
  • Only tokens that fit in the allocated memory can be merged
And the $N is empty for the moment:
SQL> select * from DR$DEMO_CTX_INDEX$N;

no rows selected

Delete

Do you remember how inserts are going to the CTXSYS.DR$PENDING table? Deletes are going to CTXSYS.DR$DELETE table:

SQL> delete from DEMO_CTX_FRAG where num=0002;

1 row deleted.

SQL> select * from CTXSYS.DR$DELETE;

DEL_IDX_ID DEL_IXP_ID  DEL_DOCID
---------- ---------- ----------
      1400          0          2
I've deleted docid=2 but the tokens are still there:
SQL> select * from DR$DEMO_CTX_INDEX$I;

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO
---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802
as well as their mapping to the ROWID:
SQL> -- $R is for rowid - docid mapping (IOT)
SQL> select * from DR$DEMO_CTX_INDEX$R;

    ROW_NO DATA
---------- ------------------------------------------------------------
         0 00001752D000280000056940414100001752D00028000005694041420000
However, the $N has been maintained to know that docid=2 has been removed:
SQL> select * from DR$DEMO_CTX_INDEX$N;

 NLT_DOCID N
---------- -
         2 U
This is the goal of $N (Negative) table which records the docid that should not be there and that must be deleted at next optimization (garbage collection).

From there, a search by words ('normal lookup') will give docid's and rowid's and the CTXSYS.DR$DELETE must be read in order to know that the document is not there anymore. It's an IOT and the docid can be found with an index unique scan.

However for the opposite way, having a ROWID and checking if it contains some words ('functional lookup') we need to know that there is no document. In my case I deleted the row, but you may update the document, so the ROWID is still there. There is no pending table for that. It is maintained immediately in the $K table:

SQL> select * from DR$DEMO_CTX_INDEX$K;

     DOCID TEXTKEY
---------- ------------------
         1 AAAXUtAAKAAABWlAAA
         3 AAAXUtAAKAAABWlAAC
the entry that addressed docid=2 has been deleted.

Commit

All those changes were done within the same transaction. Other sessions still see the old values. No need to read CTXSYS.DR$DELETE for them. What I described above is only for my session: the normal lookup reading the queuing table, and the functional lookup stopping at $K. We don't have to wait a sync to process CTXSYS.DR$DELETE. It's done at commit:

SQL> commit;

Commit complete.

SQL> select * from CTXSYS.DR$DELETE;

no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$R;

    ROW_NO DATA
---------- ------------------------------------------------------------
         0 00001752D000280000056940414100000000000000000000000000000000
Of course we can't read it but we see that part of it has been zeroed. That $R table is definitely special: it's not stored in a relational way, and its maintenance is deferred at commit time.

But nothing has changed in $I which contains garbage (and sync is not changing anything to that):

SQL> select * from DR$DEMO_CTX_INDEX$I;

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO
---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802
And of course $N row is still there to record the deleted docid:
SQL> select * from DR$DEMO_CTX_INDEX$N;

 NLT_DOCID N
---------- -
         2 U
Sync

I've not reproduced it here, but sync is not changing anything. Sync is for new documents - not for deleted ones.

Conclusion What you need to remember here is:
  • New documents are made visible through OracleText index at sync
  • Removed document are immediately made invisible at commit
Of course, you can sync at commit, but the second thing to remember is that
  • New documents brings fragmentation
  • Removed document brings garbage
and both of them increase the size of the $I table and its $X index, making range scans less efficient. We will see more about that but the next post will be about queries. I've talked about normal and functional lookups and we will see how they are done. Let's detail that.

OracleText: inserts and fragmentation

Sun, 2015-02-15 14:37

I plan to write several posts about OracleText indexes, which is a feature that is not used enough in my opinion. It's available in all editions and can index small text or large documents to search by words. When you create an OracleText index, a few tables are created to store the words and the association between those words and the table row that contains the document. I'll start to show how document inserts are processed.

Create the table and index

I'm creating a simple table with a CLOB

SQL> create table DEMO_CTX_FRAG
     (num number constraint DEMO_CTX_FRAG_PK primary key,txt clob);

Table created.
and a simple OracleText on that column
SQL> create index DEMO_CTX_INDEX on DEMO_CTX_FRAG(txt)
     indextype is ctxsys.context;

Index created.
That creates the following tables:
  • DR$DEMO_CTX_INDEX$I which stores the tokens (e.g words)
  • DR$DEMO_CTX_INDEX$K which index the documents (docid) and links them to the table ROWID
  • DR$DEMO_CTX_INDEX$R which stores the opposite way navigation (get ROWID from a docid)
  • DR$DEMO_CTX_INDEX$N which stores docid for deferred maintenance cleanup.

Inserts

I'm inserting a row with some text in the clob column

SQL> insert into DEMO_CTX_FRAG values (0001,'Hello World');

1 row created.
I commit
SQL> commit;

Commit complete.
And here is what we have in the OracleText tables:
SQL> select * from DR$DEMO_CTX_INDEX$K;
no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$R;
no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$I;
no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$N;
no rows selected
Nothing is stored here yet. Which means that we cannot find our newly inserted row from an OracleText search.

By default, all inserts maintain the OracleText tables asynchronously.
The inserted row is referenced in a CTXSYS queuing table that stores the pending inserts:

SQL> select * from CTXSYS.DR$PENDING;

   PND_CID    PND_PID PND_ROWID          PND_TIMES P
---------- ---------- ------------------ --------- -
      1400          0 AAAXUtAAKAAABWlAAA 13-FEB-15 N
and we have a view over it:
SQL> select pnd_index_name,pnd_rowid,pnd_timestamp from ctx_user_pending;

PND_INDEX_NAME                 PND_ROWID          PND_TIMES
------------------------------ ------------------ ---------
DEMO_CTX_INDEX                 AAAXUtAAKAAABWlAAA 13-FEB-15

Synchronization

let's synchronize:

SQL> exec ctx_ddl.sync_index('DEMO_CTX_INDEX');

PL/SQL procedure successfully completed.
The queuing table has been processed:
SQL> select pnd_index_name,pnd_rowid,pnd_timestamp from ctx_user_pending;

no rows selected
and here is how that document is sotred in our OracleText tables.

$K records one document (docid=1) and the table rowid that contains it:

SQL> select * from DR$DEMO_CTX_INDEX$K;

     DOCID TEXTKEY
---------- ------------------
         1 AAAXUtAAKAAABWlAAA
$R table stores the docid -> rowid is a non-relational way:
SQL> select * from DR$DEMO_CTX_INDEX$R;

    ROW_NO DATA
---------- ------------------------------------------------------------
         0 00001752D0002800000569404141
How is it stored? It's an array of ROWIDs which are fixed length. Then from the docid we can directly go to the offset and get the rowid. Because DATA is limited to 4000 bytes, there are several rows. But a docid determines the ROW_NO as well as the offset in DATA.

$I stores the tokens (which are the words here as we have TEXT token - which is the type 0) as well as their location information:

SQL> select * from DR$DEMO_CTX_INDEX$I;

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO
---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
For each word it stores the range of docid that contains the work (token_first and token_last are those docid) and token_info stores in an binary way the occurrences of the word within the documents (it stores pairs of docid and offest within the document). It's a BLOB but is limited to 4000 bytes so that it is stored inline. Which means that if a token is present in a lot of document, several lines in $I will be needed, each covering a different range of docid. This has changed in 12c and we will see that in future blog posts.

Thus, we can have several rows for one token. This is the first cause of fragmentation. Searching for documents that contain such a word will have to read several lines of the $I table. The $N has nothing here because we synchronized only inserts and there is nothing to cleanup.

SQL> select * from DR$DEMO_CTX_INDEX$N;

no rows selected

Several inserts

I will insert two lines, which also contain the 'hello' word.

SQL> insert into DEMO_CTX_FRAG values (0002,'Hello Moon, hello, hello');

1 row created.

SQL> insert into DEMO_CTX_FRAG values (0003,'Hello Mars');

1 row created.

SQL> commit;

Commit complete.
And I synchronize:
SQL> exec ctx_ddl.sync_index('DEMO_CTX_INDEX');

PL/SQL procedure successfully completed.
So, I've now 3 documents:
SQL> select * from DR$DEMO_CTX_INDEX$K;

     DOCID TEXTKEY
---------- ------------------
         1 AAAXUtAAKAAABWlAAA
         2 AAAXUtAAKAAABWlAAB
         3 AAAXUtAAKAAABWlAAC
The reverse mapping array has increased:
SQL> select * from DR$DEMO_CTX_INDEX$R;

    ROW_NO DATA
---------- ------------------------------------------------------------
         0 00001752D000280000056940414100001752D00028000005694041420000
And now the tokens:
SQL> select * from DR$DEMO_CTX_INDEX$I;

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO
---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802
What is interesting here is that the previous lines (docid 1) have not been updated and new lines have been inserted for docid 2 and 3.
  • 'moon' is only in docid 2
  • 'mars' is only in docid 3
  • 'hello' is in 2 (token_count) documents, from docid 2 to docid 3 (token_first and token_last)

This is the other cause of fragmentation coming from frequent sync. Each sync will add new rows. However, when multiple documents are processed in the same sync, then only one $I entry per token is needed.

There is a third cause of fragmentation. We see here that the token_info is larger for that HELLO covering docid 2 to 3 because there are several occurrences of the token. All that must fit in memory when we synchronize. So it's good to synchronize when we have several documents (so that the common tokens are not too fragmented) but we need also to have enough memory. The default is 12M and is usually too small. It can be increased with the 'index memory' parameter of the index. And there is also a maximum set by ctx_adm.set_parameter for which the default (50M) is also probably too low.

Nothing yet in the $N table that we will see in the next post:

SQL> select * from DR$DEMO_CTX_INDEX$N;

no rows selected

summary

The important points here is that inserted document are visible only after synchronization, and synchronizing too frequently will cause fragmentation. If you need to synchronize in real time (on commit) and you commit for each document inserted, then you will probably have to plan frequent index optimization. If on the other hand we are able to synchronize only when we have inserted a lot of documents, then fragmentation is reduced according that we had enough memory to process all documents in one pass.

The next post will be about deletes and updates.

SQL Server monitoring with nagios: installation

Sun, 2015-02-15 07:37

Nagios is an IT Infrastructure monitoring solution which is able to monitor SQL Server instances with a specific plugin.
I have installed this plugin to test those functionalities and I will explain here how to do it.

Prerequisites installation FreeTDS installation

FreeTDS is a set of libraries for Unix and Linux that allows your programs to natively talk to Microsoft SQL Server and Sybase databases.
First we will download this set of libraries, for that go to the freetds website and click on Stable Release:

 b2ap3_thumbnail_blog1_20150129-144605_1.jpg

The file Freetds-stable.tgz will be downloaded.
When the file is downloaded, go to the nagios download directory and uncompressed the tgz file:

b2ap3_thumbnail_script11.jpg

Go to the Freetds directory and list files:

b2ap3_thumbnail_script2.jpg

Prepare and control libraries before compilation:

b2ap3_thumbnail_script3.jpg

Now, we will compile and install the Freetds libraries:

b2ap3_thumbnail_script4.jpg

Please, check if you don't have any errors.
If not, Freetds is now installed.

Perl-module DBD::Sybase installation

DBD::Sybase is a Perl module which works with the DBI module to provide access to Sybase databases.
To download this module go to http://search.cpan.org/~mewp/DBD-Sybase-1.15/Sybase.pm and click the download link:

b2ap3_thumbnail_blog2_20150129-144603_1.jpg

When the file DBD-Sybase-1.15.tar.gz is downloaded, go to the download directory, uncompressed the file and go to the new directory:

b2ap3_thumbnail_script5.jpg

We will now compile and install this new module:

 b2ap3_thumbnail_script6.jpg

Now the module is installed. Check if you received errors.
Create a symbolic link:

b2ap3_thumbnail_script7.jpg

We are able now to edit the Freetds.conf file to change:

  • the tds version due to security advice (follow the link and find the chapter Security Advice)
  • add SQL Server parameters: IP Address, TCP/IP port, instance name if not default instance

b2ap3_thumbnail_Nagios_SQLServer_script81.jpg

The Perl module is installed.
Prerequisites are now installed and we have to install the SQL Server plugin.

Nagios SQL Server plugin Installation

First, we have to download the SQL Server plugin of Nagios. To do it connect here and click the below link:

 b2ap3_thumbnail_Nagios_SQLServer_blog3.jpg

When the file is downloaded:

  • go to the download directory
  • list files
  • uncompressed the file
  • re-list files to see the new plugin directory
  • change the location to the new directory

b2ap3_thumbnail_Nagios_SQLServer_script8.jpg

The next command to type is Configure which will check that all dependencies are present and configure & write a Makefile that will contain build orders.

 b2ap3_thumbnail_Nagios_SQLServer_script9.jpg

After we will have to execute a make which will make the compilation (can be run as normal user):

  b2ap3_thumbnail_Nagios_SQLServer_script91.jpg

The next command to run is a Make install which will install the plugin (as to be run as root):

b2ap3_thumbnail_Nagios_SQLServer_script92.jpg

The SQL Server Nagios plugin is successfully installed.

Set environment

As user root, change the permissions on file freetds.conf to be able to add server/instance from user nagios:

b2ap3_thumbnail_Nagios_SQLServer_script93.jpg

Add libraries /usr/local/freetds/lib to file /etc/ld.so.conf:

 b2ap3_thumbnail_Nagios_SQLServer_script94.jpg

As user nagio, export the freetds libraries to the LD_LIBRARY_PATH environment variable:

 b2ap3_thumbnail_Nagios_SQLServer_script95.jpg

Test the connection:

 b2ap3_thumbnail_Nagios_SQLServer_script96.jpg

Connection failed due to wrong PERLLIB path...
To resolve this error, unset the PERLLIB variable.
We force the plugin named check_mssql_health to use the PERLLIB libraries of the OS instead of these of the Oracle client by unsetting the PERLLIB libraries of the check_oracle_health plugin:

 b2ap3_thumbnail_Nagios_SQLServer_script97.jpg

Verification of the connection:

 b2ap3_thumbnail_Nagios_SQLServer_script98.jpg

Verification succeeded.

The next steps will be now to test this plugin by checking which counters can be used, how to configure them, with which thresholds, how to subscribe to alerts...
Those steps will be covered by a next blog.

Considerations about SQL Server database files placement, Netapp storage and SnapManager

Thu, 2015-02-12 14:06

When you install SQL Server, you have to consider how to place the database files. At this point you will probably meet the well-known best practices and guidelines provided by Microsoft but are you really aware of the storage vendor guidelines? Let’s talk about it in this blog post.

A couple of weeks ago, I had the opportunity to implement a new SQL Server architecture over on a Netapp storage model FAS 3140. The particularity here is that my customer wants to manage his backups with NetApp SnapManager for SQL Server. This detail is very important because in this case we have to meet some requirements about storage layout. After reading the Netapp storage documents I was ready to begin my database files.

First of all, we should know how SnapManager works. In fact SnapManager performs SQL Server backups by using either a snapshot of database files or by issuing streaming backups. Streaming backups concern only the system database files and the transaction log files. To achieve this task, SnapManager will coordinate several components like SnapDrive for Windows to control the storage LUNS and SQL Server to freeze IO by using the VSS Framework. It is important to precise that without SnapManager, performing snapshots from a pure storage perspective is still possible but unfortunately in this case the storage layer is not aware of the database files placement. We may find in such situation that a database is not consistent. Here a simplified scheme of what's happen during the SnapManager backup process:

 

SnapManager (1) --> VDI (2) - - backup database .. with snapshot --> VSS (3) - database files IO freeze  --> SnapDrive (3) - snapshot creation

 

So why do we have to take care about database file placements with Netapp Storage? Well, according to our first explanation above, if we perform snapshot backups the unit of work is in fact the volume (FlexVol is probably the better term here. FlexVol is a kind of virtual storage pool to manage physical storage efficiently and can include one or more LUNs). In others words, when a snapshot is performed by SnapManager all related LUNs are considered together and all concerned database files in single volume are frozen when snapshot is created. Notice that if a database is spread across several volumes, IO are frozen for all concerned volumes at the same time but snapshots are taken serially.

This concept is very important and as a database administrator, we also now have to deal with these additional constraints (that I will describe below) to place our database files regarding the storage layout and the RPO / RTO needs.

1- System database files must be placed on a dedicated volume including their respective transaction log files. Only streaming backups are performed for system databases

2- Tempdb database files must also be placed on a dedicated volume that will be excluded from SnapManager backup operations.

3- Otherwise, placement of user databases depends on several factors as their size or the storage layout architecture in-place. For example, we may encounter cases with small databases that will share all their database files on the same volume. We may also encounter cases where we have to deal with placement of large database files. Indeed, large databases often include different Filegroups with several database files. In this case, spreading database files on different LUNs may be a performance best practice and we have to design the storage with different LUNs that share the same dedicated volume. Also note that the database transaction log files have to be on separated volume (remember that transaction log files are concerned by streaming backups).

4- In consolidated scenario, we may have several SQL Server instances on the same server with database files on the same volume or we may have dedicated volumes for each instance.

Regarding these constraints, keep in mind that your placement strategy may presents some advantages and but also some drawbacks:

For example, restoring a database on a dedicated volume from a snapshot will be quicker than using streaming backups but in the same time we may reach quickly the maximum number of drives we may use if we are in the situation where we dedicate one or several volumes per database. Imagine that you have 5 databases spreaded on 4 volumes (system, tempdb, data and logs)... I'll let you do the math. A possible solution is to replace the letter-oriented volume identification by mount points.

In the same time, sharing volumes between several databases makes more difficult the restore in-place process of only one database because in this case we will have to copy data from a mounted snapshot back into the active file system. But increasing the number of shared files on a single volume may cause timeout issues during the snapshot operation. According to Netapp Knowledgebase here the maximum recommended number of databases on a single volume is 35.  If you want to implement a policy that verify that the number of database files is lower than this threshold please see the blog post of my colleague Stéphane Haby. The number of files concerned by a snapshot creation, the size of the volumes or the storage performance are all factors to consider during the database files placement.

After digesting all these considerations, let me present briefly my context: 3 SQL Server instances on a consolidated server with the following storage configuration. We designed the storage layout as follows

 

blog_30_1_storage_layout_architecture

 

One volume for system databases, one volume for tempdb, one volume for user database data files, one volume for user database log files and finally one volume dedicated for the snapinfo. The snapinfo volume is used by SnapManager to store streaming backups and some metadata

But according to what I said earlier my configuration seems to have some caveats. Indeed, to restore only one database in-place in this case is possible but because we’re actually sharing database files between several databases and several SQL Server instances, we will force the snapmanager to copy data from a mounted snapshot back into the file system. Otherwise, we may also face the timeout threshold issue (10s hard-coded into the VSS framework) during the freezing database IO step. Finally we may not exclude issues caused by the limit of 35 databases on a single volume because I’m in a consolidated environment scenario. Fortunately my customer is aware of all the limitations and he is waiting for the upgrade of its physical storage layout. We will plan to redesign the final storage layout at this time. Until then, the number of databases should not be a problem.

See you!

SCOM: don't forget to set the license key!

Wed, 2015-02-11 07:28

During some client testing operations I installed some months ago a version of System Center Operation Manager 2012 on a Virtual Machine.
Today I try to open this version of SCOM to perform some new tests and I had the surprise to receive a beautiful error message:

b2ap3_thumbnail_Scom_SetLicense1.jpg

In fact, I never thought to verify that my version was licensed or not and as SCOM gives by default an evaluation period of 180 days I don't face the problem before...
I decided to download the new version of SCOM 2012 R2, as I prefer to have the last version, and install it.
By default the installation center tells me that it will upgrade my old version which is a good point.
After the upgrade, SCOM 2012 R2 raises a warning which specifies to not forget to use the PowerShell cmdlet Set-SCOMLicense to license the current version... This was that I forget last time!!

b2ap3_thumbnail_Scom_SetLicense2.jpg

To check the version of my fresh installation, I opened the Operations Manager Shell and then I ran the following command:

b2ap3_thumbnail_Scom_SetLicense3.jpg

Get-SCOMManagementGroup | select SkuForProduct, skuforlicense, version, timeofexpiration

b2ap3_thumbnail_Scom_SetLicense4.jpg

At this point, I see that I have an Evaluation version of SCOM which will expire the third of August, so in six months.
To avoid the same disillusion than today, I decided to set my license key immediately with the following command:

Set-SCOMLicense -ProductId '?????-?????-?????-?????-?????'

b2ap3_thumbnail_Scom_SetLicense5.jpg

A confirmation is asked before validating the license key.
I can now execute a second time my first query to check if the license key is registered:

b2ap3_thumbnail_Scom_SetLicense6.jpg

Now, the license's term is Retail and the expiration date is longer than I will never live ;-)
I can also check directly in the Help/About menu of SCOM to see the same.

b2ap3_thumbnail_Scom_SetLicense7.jpg

See you!

Security via policies

Tue, 2015-02-10 10:15

 

Few weeks ago, I presented the session on security via Policies for "Les journées SQL Server 2014", organized by the French SQL Server User Group (GUSS) in Paris.

b2ap3_thumbnail_presentation.JPG

Exception from executeScript in Alfresco Share

Tue, 2015-02-10 03:00


I didn't have the opportunity to post a new entry about Alfresco in this blog for a long time now, so I will fix this! In this blog entry, I will talk about a bug I encountered a few months ago. I resolved it but I, so far, not had the time to share my knowledge with you.
 

I. Description of the issue


This bug appears no matter what the version of Alfresco is used, regardless of the components that are installed, aso... So what is this bug? In fact, this bug isn't blocking anything. Actually it has no impact on the daily work, however, it fills up the Alfresco log files very quickly which can be problematic if you are an administrator searching for information in these log files! Indeed, each time a user accesses a page of Alfresco, between 10 and 50 Java Exceptions are generated (always the same), this create gigabytes log files in minutes/hours. Here is the exception I'm talking about:
 

...
Jul 08, 2014 10:42:16 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 95898 ms
2013-07-08 10:45:02,300 INFO [web.site.EditionInterceptor] [http-apr-8080-exec-1] Successfully retrieved license information from Alfresco.
2013-07-08 10:45:02,417 ERROR [extensions.webscripts.AbstractRuntime] [http-apr-8080-exec-3] Exception from executeScript - redirecting to status template error: 06080001 Unknown method specified to remote store API: has
  org.springframework.extensions.webscripts.WebScriptException: 06080001 Unknown method specified to remote store API: has
  at org.alfresco.repo.web.scripts.bean.BaseRemoteStore.execute(BaseRemoteStore.java:326)
  at org.alfresco.repo.web.scripts.RepositoryContainer$3.execute(RepositoryContainer.java:426)
  at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:433)
  at org.alfresco.repo.web.scripts.RepositoryContainer.transactionedExecute(RepositoryContainer.java:495)
  at org.alfresco.repo.web.scripts.RepositoryContainer.transactionedExecuteAs(RepositoryContainer.java:533)
  at org.alfresco.repo.web.scripts.RepositoryContainer.executeScript(RepositoryContainer.java:276)
  at org.springframework.extensions.webscripts.AbstractRuntime.executeScript(AbstractRuntime.java:377)
  at org.springframework.extensions.webscripts.AbstractRuntime.executeScript(AbstractRuntime.java:209)
  at org.springframework.extensions.webscripts.servlet.WebScriptServlet.service(WebScriptServlet.java:118)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at org.alfresco.web.app.servlet.GlobalLocalizationFilter.doFilter(GlobalLocalizationFilter.java:61)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
  at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1002)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)
  at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1813)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)
...

 

The first time I encountered this exception, it was on an Alfresco v4.x installation, up and running for some years with a lot of extensions/customizations (mostly .AMP files). If you need more information about AMPs, it means Alfresco Module Package, it's the better way to extend Alfresco. Please take a look at some of my old blogs to find information about how to create this kind of stuff!
 

I had this exception on more than one Alfresco server, and because of that, I firstly thought that this exception came from an AMP... Therefore, I went through all these extensions but despite hours of research, I found nothing.


II. How to replicate the issue


I tried to replicate this issue with a fresh installation of Alfresco, same version, same extensions, aso... But I haven't been able to do so, at the first place. Finally, one day, I found out that there were something strange with the Alfresco servers on which the Java Exceptions appeared: the "Sites" folder weren't there. Indeed, after the installation of a new Alfresco server, the deletion of the default site ("Sample: Web Site Design Project") and the deletion of the "Sites" folder from the Repository browser, the exception appeared magically in the Alfresco log files... Now that we know from where this issue comes from, it's quite easy to replicate it:

  1. Install a new Alfresco server with the same version from the bundle executable/binaries (quicker)
  2. Start the Alfresco server and open the Alfresco Share UI (http://localhost:8080/share) using the admin account
  3. Navigate to the Sites finder (http://localhost:8080/share/page/site-finder)
  4. Click on "Search" to display all existing sites (only the default one is present: "Sample: Web Site Design Project")
  5. Click on "Delete" to delete the swsdp site
  6. Navigate to the Repository (http://localhost:8080/share/page/repository)
  7. Remove the "Sites" folder on the Repository page (/Company Home/Sites)
  8. Refresh the page and take a look at your logs


After doing that, you should be able to see a lot of exceptions like the one describe above. Issue replicated!


III. How to solve the issue


Being able to replicate an issue is good, but knowing how to solve it is better!

If the "Sites" folder has been deleted in the first place, it was because the Alfresco Sites weren't used at all. Therefore, the simplest solution to resolve this issue was to get the "Sites" folder back. But it's not that easy because this folder has a particular type, some particular attributes, aso... You can't just create a new folder, rename it "Sites" and hope that it will work ;). Starting from here, what you can do to solve this issue is:

  1. Restore the "Sites" folder using a backup
  2. Replicate the "Sites" folder from another Alfresco server


If you don't have a way to restore the "Sites" folder like it was my case (after some months, no backup left), here is what you can do to fix this issue:
 

Let's say that the Alfresco server, where the "Sites" folder doesn't exist anymore, is named "A". Please take a look at the end of this blog entry for some screenshots that may help you.

  1. Install a new Alfresco server with the same version as "A" from the bundle executable/binaries. This can be on your local machine. Let's name this Alfresco server "B"
  2. Start the Alfresco server "B" and open the Alfresco Share UI (http://localhost:8080/share) using the admin account
  3. Navigate to the Sites finder (http://localhost:8080/share/page/site-finder)
  4. Click on "Search" to display all existing sites (only the default one is present: "Sample: Web Site Design Project")
  5. Click on "Delete" to delete the swsdp site
  6. Navigate to the Repository (http://localhost:8080/share/page/repository) (DON'T delete the "Sites" folder)
  7. Configure a replication target on "B" to point to "A" (take a look at the Alfresco doc: http://docs.alfresco.com/4.1/tasks/adminconsole-replication-transfertarget.html)
  8. Enable the replication:
    1. Add the Alfresco Share url and the RepositoryId of "A" into the share-config-custom.xml file of "B" (take a look at the Alfresco doc: http://docs.alfresco.com/4.1/tasks/adminconsole-replication-lockedcontent.html)
    2. Add the "replication.enabled=true" into the alfresco-global.properties file of "B" (take a look at the Alfresco doc: http://docs.alfresco.com/4.1/tasks/replication-share.html)
    3. Restart "B" for the changes to be taken into account by Alfresco
  9. Configure a replication job on "B" to replicate the "Sites" folder from "B" to "A" (http://localhost:8080/share/page/console/admin-console/replication-jobs)
  10. Run the replication job on "B"


Configure the Replication Target on B (step 7 - create a folder named "TransfertToA" and edit its permissions):

CreateReplicationTarget.png


Find the Repository ID of A (step 8.1):

FindRepositoryId.png


Configure the share-config-custom.xml file of B (step 8.1):

EnableTheReplication.png


Once the replication job has run on "B", the exceptions will disappear from the log files of "A". I didn't go deeper so I don't really know if you can create new sites using this newly imported "Sites" folder but if you removed this folder in the first place, I would guess that you don't really need it ;).
 

Thank you for reading this post and I hope this will help. If you need more information, don't hesitate to let a little comment below. See you soon for more blogs!
 

 

Never gather WORKLOAD stats on Exadata...

Mon, 2015-02-09 01:56

For Exadata, oracle has introduced an 'EXADATA' mode which sets a high transfer rate (with IOTFRSPEED as in NOWORKLOAD statistics) and set a MBRC (as in WORKLOAD statistics). Those values are set rather than gathered because all the SmartScan optimization done at storage cell level, which makes the multiblock reads less expensive, is difficult to measure from the database.
Here I will explain what I stated in a previous blog: direct-path reads are not counted as multiblock reads for the MBRC system statistic. And direct-path read should be the main i/o path in Exadata as you probably bought that machine to benefit from SmartScan.

With direct-path reads

On a test database that has no activity, I’m creating a 1000 blocks table. My goal is to gather WORKLOAD system statistics during a simple table full scan on that table, and see how it calculates SREADTIM, MREADTIM and MBRC.

SQL> connect demo/demo
Connected.

SQL> drop table DEMO;
Table dropped.

SQL> create table DEMO pctfree 99 as select rpad('x',1000,'x') n from dual connect by level <=1000;
Table created.
Then I run a simple select between the calls to ‘start’ and ‘stop’ procedures of the dbms_stats WORKLOAD system stats gathering.
SQL> exec dbms_stats.gather_system_stats('start');
PL/SQL procedure successfully completed.

SQL> connect demo/demo
Connected.

SQL> select count(*) from DEMO;

  COUNT(*)
----------
      1000

I check the physical read statistics (this is why have reconnected my session so that I can query v$mystat without doing the delta)
SQL> select name,value from v$mystat join v$statname using(statistic#) where (name like 'phy%' or name like 'cell%') and value>0;

NAME                                                              VALUE
------------------------------------------------------------ ----------
physical read total IO requests                                      22
physical read total multi block requests                              7
physical read total bytes                                       8306688
cell physical IO interconnect bytes                             8306688
physical reads                                                     1000
physical reads direct                                              1000
physical read IO requests                                            15
physical read bytes                                             8192000
cell scans                                                            1
cell blocks processed by cache layer                               1000
cell blocks processed by txn layer                                 1000
cell blocks processed by data layer                                1000
cell physical IO bytes eligible for predicate offload           8192000
cell physical IO interconnect bytes returned by smart scan       130760
cell IO uncompressed bytes                                      8192000
I’ve read 1000 blocks in 15 i/o calls so I'm sure it is multiblock reads. All of them (1000 x 8k) was eligible for SmartScan and those 1000 blocks have been processed by the storage cell.

If you wonder why I have only 7 'physical read total multi block requests' it's because it accounts only the 'full' multiblock reads - not those that are limited by extent boundary. See here for that analysis.

If you wonder why I have only 22 'physical read total IO requests' then I've not the answer. The sql_trace shows only the 15 'direct path read'. And dbms_stats counts only the 'physical read IO requests'. If you have any idea, please comment.

I stop my WORKLOAD statistics gathering:
SQL> exec dbms_stats.gather_system_stats('stop');
PL/SQL procedure successfully completed.
And check the system statistics that have been set:

SQL> select * from sys.aux_stats$;

SNAME           PNAME           PVAL1
--------------- ---------- ----------
SYSSTATS_INFO   STATUS
SYSSTATS_INFO   DSTART
SYSSTATS_INFO   DSTOP
SYSSTATS_INFO   FLAGS               1
SYSSTATS_MAIN   CPUSPEEDNW       2300
SYSSTATS_MAIN   IOSEEKTIM          10
SYSSTATS_MAIN   IOTFRSPEED       4096
SYSSTATS_MAIN   SREADTIM
SYSSTATS_MAIN   MREADTIM         .151
SYSSTATS_MAIN   CPUSPEED         2300
SYSSTATS_MAIN   MBRC
SYSSTATS_MAIN   MAXTHR
SYSSTATS_MAIN   SLAVETHR
I have no SREADTIM which is expected as I've done only multiblock reads. I have a MREADTIM. But I don't have the MBRC set.

With conventional (aka buffered) reads

Let's do the same after disabling serial direct-path reads:

SQL> alter session set "_serial_direct_read"=never;
Session altered.
I do the same as before, but now my session stats show only conventional reads:
NAME                                                              VALUE
------------------------------------------------------------ ----------
physical read total IO requests                                      44
physical read total multi block requests                             28
physical read total bytes                                       8192000
cell physical IO interconnect bytes                             8192000
physical reads                                                     1000
physical reads cache                                               1000
physical read IO requests                                            44
physical read bytes                                             8192000
physical reads cache prefetch                                       956

and here are the gathered stats:
SNAME           PNAME           PVAL1
--------------- ---------- ----------
SYSSTATS_INFO   STATUS
SYSSTATS_INFO   DSTART
SYSSTATS_INFO   DSTOP
SYSSTATS_INFO   FLAGS               1
SYSSTATS_MAIN   CPUSPEEDNW       2300
SYSSTATS_MAIN   IOSEEKTIM          10
SYSSTATS_MAIN   IOTFRSPEED       4096
SYSSTATS_MAIN   SREADTIM
SYSSTATS_MAIN   MREADTIM         .028
SYSSTATS_MAIN   CPUSPEED         2300
SYSSTATS_MAIN   MBRC               23
SYSSTATS_MAIN   MAXTHR
SYSSTATS_MAIN   SLAVETHR

Now the MBRC is set with the gathered value.

This proves that MBRC is set only for conventional multiblock reads. Direct-path reads are not accounted.

Conclusion If you are on Exadata, you probably want to benefit from SmartScan. Then you probably want the CBO to choose FULL TABLE SCAN which will do direct-path reads for large tables (according that they don't have a lot of updated buffers in SGA). If you gather WORKLOAD statistics they will set MBRC without accounting for those direct-path reads and it will probably be set lower than the average actual multiblock read (which - in direct-path reads - is close the the db_file_multiblock_read - default or set value).
This is the reason why Oracle introduced the EXADATA mode: it sets the MBRC from the db_file_multiblock_read value. They also set the IOTFRSPEED to a high value because gathering MREADTIM will probably get a very low value - lower than SREADTIM - thanks to the SmartScan. And CBO ignores values where MREADTIM is less than SREADTIM.

An alternative to EXADATA mode can be setting those values as NOWORKLOAD statistics and keep the db_file_multiblock_read_count set. You will have the same behavior because CBO uses db_file_multiblock_read_count when it is set and there are no MBRC system stats. But the danger is that if someone resets the db_file_multiblock_read_count (and I often advise to keep defaults) then the CBO will use a value of 8 and that will probably increase the cost of full table scans too much.

All formulas are here with a script that shows what is used by the CBO.

Never say never Well, that blog post title is too extreme because:

@FranckPachot @kevinclosson @fritshoogland @moustafa_dba @JLOracle never say never ;-)

— Christian Antognini (@ChrisAntognini) February 8, 2015 So I should say:
Never gather WORKLOAD stats on Exadata... except if your workload is not an Exadata optimized one.
If you are using Exadata for OLTP, then yes, you can gather WORKLOAD statistics as they probably fit OLTP behaviour. But in any case, always check the gathered stats and see if they are relevant.

Oracle multitenant dictionary: rowcache

Sun, 2015-02-08 14:23

I've not finished with my investigation on 12c multitenant dictionary. Here, I'm checking how metadata links are managed by the dictionary cache (aka rowcache).

As I did previously, I'll create a metadata link function in my lab environment (this is not for production. For the moment metadata/object links are supported only for oracle objects).

SQL> connect / as sysdba
Connected.

SQL> create or replace function DEMO_MDL_FUNCTION sharing=metadata
  2    return varchar2 as
  3    begin return 'XXXXXXXXXX'; end;
  4  /
Function created.

SQL> alter session set "_oracle_script"=true container=PDB1;
Session altered.

SQL> create or replace function DEMO_MDL_FUNCTION sharing=metadata
  2    return varchar2 as
  3    begin return 'XXXXXXXXXX'; end;
  4  /
Function created.
It's not difficult. You set _oracle_script to true and run the DDL in all containers (CDB$ROOT and PDBs) with the sharing=metadata undocumented syntax.

I'm checking object_id in both containers:

SQL> alter session set container=CDB$ROOT;
Session altered.

SQL> select object_id,con_id from cdb_objects where object_name='DEMO_MDL_FUNCTION';

 OBJECT_ID     CON_ID
---------- ----------
     92013          1
     92371          3
And I flush the shared pool in order to flush the rowcache component:
SQL> alter system flush shared_pool;
System altered.

SQL> select * from v$sgastat where name like 'row cache';

POOL         NAME                            BYTES     CON_ID
------------ -------------------------- ---------- ----------
shared pool  row cache                     8640160          1

Now I connect to the PDB and call the function:

SQL> select DEMO_MDL_FUNCTION from dual;

DEMO_MDL_FUNCTION
---------------------------------------
XXXXXXXXXX

And then back to the CDB$ROOT I check what I have in the rowcache:

SQL> alter session set container=CDB$ROOT;
Session altered.

SQL> select cache_name,con_id,utl_raw.cast_to_varchar2(key) from v$rowcache_parent
  2  where utl_raw.cast_to_varchar2(key) like chr(37)||' DEMO_MDL_FUNCTION '||chr(37) order by 1,3,2;

CACHE_NAME     CON_ID UTL_RAW.CAST_TO_VARC
---------- ---------- --------------------
dc_objects          1       DEMO_MDL_FUNCT
dc_objects          1       DEMO_MDL_FUNCT
dc_objects          3       DEMO_MDL_FUNCT
dc_objects          3       DEMO_MDL_FUNCT

I've two entries in each container. I've used the object only within the PDB (con_id=3) but I've also an entry for the CDB$ROOT (con_id=1). Is that a problem? I don't know. Dictionary cache is not a large component of the shared pool so the size overhead is probably not an issue. However, I'll have to investigate what is the consequence about dictionary cache contention.

Something interesting is that the 'row cache' component of the shared pool is reported only for CDB$ROOT:

SQL> select * from v$sgastat where name like 'row cache';

POOL         NAME                            BYTES     CON_ID
------------ -------------------------- ---------- ----------
shared pool  row cache                     8640160          1

SQL> alter session set container=PDB1;
Session altered.

SQL> select * from v$sgastat where name like 'row cache';
no rows selected

Then, do we have an entry for CDB$ROOT in addition to the PDB ones in that CDB$ROOT shared pool? Let's dump it:

SQL> alter session set container=CDB$ROOT;
Session altered.

SQL> alter session set events 'immediate trace name row_cache level 8';
Session altered.
then get the trace file name:
SQL> column tracefile new_value tracefile
SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('USERENV','SID'));

TRACEFILE
--------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_ora_6125.trc
and grep my function name sith 4 rows before and 8 rows after:
SQL> host grep -B 4 -A 18 "name=DEMO_MDL_FUNCTION" &tracefile
The result is here:
BUCKET 26928:
  row cache parent object: addr=0x6cd8c4e0 cid=8(dc_objects) conid=1 conuid=1
  hash=102a692f typ=11 transaction=(nil) flags=00000002 inc=1, pdbinc=1
  objectno=92013 ownerid=0 nsp=1
  name=DEMO_MDL_FUNCTION
  own=0x6cd8c5b0[0x6cd8c5b0,0x6cd8c5b0] wat=0x6cd8c5c0[0x6cd8c5c0,0x6cd8c5c0] mode=N
  status=VALID/-/-/-/-/-/-/-/-
  set=0, complete=FALSE
  data=
  00000000 45440011 4d5f4f4d 465f4c44 54434e55 004e4f49 00000000 00000000
  00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 0001676d 0000ffff ffffffff 02737808 08271508
  08027378 78063115 15080273 00010631 00510000 00000000 00000000 00000000
  00000000 00000006 00000000 00000000 00000000 00000000 7a7871ab b01c11e7
  61dbc633 23ade680 00000001 00000000 00000000 102a692f 6cd8c4e0 00000000
  7ec6d318 00000000 7ec6d318 00000000 00000001 9a2a6093 6cd8c4e0 00000000
  7ec64958 00000000 7ec64958 00000000
  BUCKET 26928 total object count=1
--
BUCKET 49724:
  row cache parent object: addr=0x61783968 cid=8(dc_objects) conid=3 conuid=2451138288
  hash=f5dac23b typ=11 transaction=(nil) flags=00000002 inc=1, pdbinc=1
  objectno=92371 ownerid=0 nsp=1
  name=DEMO_MDL_FUNCTION
  own=0x61783a38[0x61783a38,0x61783a38] wat=0x61783a48[0x61783a48,0x61783a48] mode=N
  status=VALID/-/-/-/-/-/-/-/-
  set=0, complete=FALSE
  data=
  00000000 45440011 4d5f4f4d 465f4c44 54434e55 004e4f49 00000000 00000000
  00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 000168d3 0000ffff ffffffff 02737808 08271508
  08027378 78073115 15080273 00010731 00510000 00000000 00000000 00000000
  00000000 00000006 00000000 00000000 00000000 00000000 7a7871ab b01c11e7
  61dbc633 23ade680 00000001 00000000 00000000 f5dac23b 61783968 00000000
  7ecc63d8 00000000 7ecc63d8 00000000 00000001 f452019d 61783968 00000000
  7ec059f8 00000000 7ec059f8 00000000
  BUCKET 49724 total object count=1
--
BUCKET 414:
  row cache parent object: addr=0x61783968 cid=8(dc_objects) conid=3 conuid=2451138288
  hash=f5dac23b typ=11 transaction=(nil) flags=00000002 inc=1, pdbinc=1
  objectno=92371 ownerid=0 nsp=1
  name=DEMO_MDL_FUNCTION
  own=0x61783a38[0x61783a38,0x61783a38] wat=0x61783a48[0x61783a48,0x61783a48] mode=N
  status=VALID/-/-/-/-/-/-/-/-
  set=0, complete=FALSE
  data=
  00000000 45440011 4d5f4f4d 465f4c44 54434e55 004e4f49 00000000 00000000
  00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 000168d3 0000ffff ffffffff 02737808 08271508
  08027378 78073115 15080273 00010731 00510000 00000000 00000000 00000000
  00000000 00000006 00000000 00000000 00000000 00000000 7a7871ab b01c11e7
  61dbc633 23ade680 00000001 00000000 00000000 f5dac23b 61783968 00000000
  7ecc63d8 00000000 7ecc63d8 00000000 00000001 f452019d 61783968 00000000
  7ec059f8 00000000 7ec059f8 00000000
  BUCKET 414 total object count=1
--
BUCKET 24724:
  row cache parent object: addr=0x6cd8c4e0 cid=8(dc_objects) conid=1 conuid=1
  hash=102a692f typ=11 transaction=(nil) flags=00000002 inc=1, pdbinc=1
  objectno=92013 ownerid=0 nsp=1
  name=DEMO_MDL_FUNCTION
  own=0x6cd8c5b0[0x6cd8c5b0,0x6cd8c5b0] wat=0x6cd8c5c0[0x6cd8c5c0,0x6cd8c5c0] mode=N
  status=VALID/-/-/-/-/-/-/-/-
  set=0, complete=FALSE
  data=
  00000000 45440011 4d5f4f4d 465f4c44 54434e55 004e4f49 00000000 00000000
  00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 0001676d 0000ffff ffffffff 02737808 08271508
  08027378 78063115 15080273 00010631 00510000 00000000 00000000 00000000
  00000000 00000006 00000000 00000000 00000000 00000000 7a7871ab b01c11e7
  61dbc633 23ade680 00000001 00000000 00000000 102a692f 6cd8c4e0 00000000
  7ec6d318 00000000 7ec6d318 00000000 00000001 9a2a6093 6cd8c4e0 00000000
  7ec64958 00000000 7ec64958 00000000
  BUCKET 24724 total object count=1
Do we learn something else from that? Not a lot, except that we have actually dump different entries for both containers, and object_id matches.

So it seems that sharing the dictionary do not share the dictionary cache. However, I'm not sure that the overhead is significant or not. But if you want to play with those object/metadata links then don't forget to flush the shared_pool when you see something weird (such as here).

Vsphere 6.0 improvements for Windows Failover Clusters

Wed, 2015-02-04 01:57

A couple of days ago, VMWare announced vSphere 6.0. I guess many of our customers have been waiting for this new release and probably I will see in the next few months this new version rolled out on the top of their virtual SQL Server vitual machines. If you missed this event, you can still register here.

I’m not a VMWare expert but VSphere 6.0 seems to be a storage-oriented version with a lot of improvements and new functionalities (including Virtual SAN 6.0). Among the long list of improvements one of them may probably interest many of our customers: using vMotion with pRDMs disks will be possible in the future. As a reminder, pRDMs disks are a prerequisite with Microsoft Failover Clusters virtual machines (in CAB scenarios).

There are also other interesting features that will probably benefit our SQL Server instances as storage IOPS reservation, application level backup and restore of SQL Server (including Exchange and SharePoint) with new vSphere Data Protection capabilities. I will surely blog about it in the coming months …

Happy virtualization!

IOUG Collaborate #C15LV

Tue, 2015-02-03 01:48

The IOUG - Independant Oracle User Group - has a great event each year: the COLLABORATE. This year it's in April 12-16, 2015 at The Mandalay Bay Resort & Casino in Las Vegas.

I'll be a speaker and a RAC Attack Ninja as well.

alt  IOUG COLLABORATE provides all the real-world technical training you need – not sales pitches. The IOUG Forum presents hundreds of educational sessions on Oracle technology, led by the most informed and accomplished Oracle users and experts in the world, bringing more than 5,500 Oracle technology and applications professionals to one venue for Oracle education, customer exchange and networking.         

SQL Server: Online index rebuild & minimally logged operations

Fri, 2015-01-30 01:37

A couple of days ago, I encountered an interesting case with a customer concerning a rebuild index operation on a datawarehouse environment. Let me introduce the situation: a “usual DBA day” with an almost usual error message found in your dedicated mailbox: “The transaction log for database 'xxx' is full”. After checking the concerned database, I notice that its transaction log has grown up and has fulfilled the entire volume. In the same time, I also identify the root cause of our problem: an index rebuild operation performed last night that concerns a big index (approximately 20 GB in size) on a fact table. On top of all, the size of the transaction log before raising the error message was 60 GB.

As you know, on datawarehouse environment, the database recovery model is usually configured either to SIMPLE or BULK_LOGGED to minimize write operations of bulk activity and of course the concerned database meets this requirement. According to the Microsoft document we could expect to get minimally logged records for index rebuild operations (ALTER INEX REBUILD) regardless the offline / online mode used to rebuild the index. So why the transaction log has grown heavily in this case?

To get a response we have first to take a look at the rebuild index tool used by my customer: the OLA script with INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE values for FragmentationHigh parameter. Don't worry OLA scripts work perfectly and the truth is out there :-) In the context of my customer, rebuild indexes online was permitted because the edition of the concerned SQL Server instance was Enterprise and this is precisely where we have to investigate here.

Let me demonstrate with a pretty simple example. On my lab environment I have a SQL Server 2014 instance with Enterprise edition. This instance hosts the well-known AdventureWorks2012 database with a dbo.bigTransactionHistory_rs1 table (this table is derived from the original script provided by Adam Machanic).

Here the current size of the AdventureWorks2012 database:

 

select        name as logical_name,        physical_name,        size / 128 as size_mb,        type_desc,        cast(FILEPROPERTY(name, 'SpaceUsed') * 100. / size as decimal(5, 2)) as [space_used_%] from sys.database_files

 

blog_28_1_database_situation

 

and here the size of the dbo.bigTransactionHistory_rs1 table:

 

exec sp_spaceused@objname = N'dbo.bigTransactionHistory_rs1'; go

 

blog_28_2_bigTransactionHistory_rs1_situation

 

Total used size: 1.1 GB

Because we are in SIMPLE recovery model, I will momentary disable the checkpoint process in order to have time to get log records inside the transaction log by using the traceflag 3505

 

dbcc traceon(3505, -1);


Case 1: ALTER REBUID INDEX OFFLINE


alter index pk_bigTransactionHistory on dbo.bigTransactionHistory_rs1 rebuild with (online = off, maxdop = 1); go

 

Let's check the size of transaction log of the AdventureWorks2012 database

 

blog_28_3_database_situation_after_rebuild_offline

 

Case 2: ALTER REBUID INDEX ONLINE


-- to initiate a tlog truncation before rebuilding the same index online Checkpoint;   alter index pk_bigTransactionHistory on dbo.bigTransactionHistory_rs1 rebuild with (online = off, maxdop = 1); go

 

Let's check again the size of the transaction log of the AdventureWorks2012 database:

 

blog_28_3_database_situation_after_rebuild_online

 

It is clear that we have an obvious difference of size concerning the transaction log for each operation.

- offline: 4096 * 0.35% = 14MB - online: 4096 * 5.63% = 230MB.

Stay curious and let's have a look deeper at the records written inside the transaction log for each mode by using the undocumented function sys.fn_dblog() as follows:

 

select        COUNT(*) as nb_records,        SUM([Log Record Length])/ 1024 as kbytes from sys.fn_dblog(NULL, NULL); go

 

Offline Online  blog_28_4_tlog_detail_offline_mode  blog_28_4_tlog_detail_online_mode

 

As expected we may notice a lot of records with index rebuild online operation comparing to the index rebuild offline operation (x21)

Let's continue looking at the operations performed by SQL Server during the index rebuild operation in both cases:

 

select        Operation,        COUNT(*) as nb_records,        SUM([Log Record Length])/ 1024 as kbytes from sys.fn_dblog(NULL, NULL) group by Operation order by kbytes desc go

 

Offline Online blog_28_5_tlog_detail_offline_mode_2 blog_28_5_tlog_detail_online_mode_2

 

The above picture is very interesting because we may again see an obvious difference between each mode. For example, if we consider the operations performed in the second case (on the right, some of them doesn't concern bulk activity as LOP_MIGRATE_LOCKS, LOP_DELETE_ROWS, LOP_DELETE_SPLITS, LOP_MODIFY_COLUMS an unknown allocation unit, which probably concerns the new structure. At this point I can't confirm it (I don't show here all details o these operations. I let you see by yourself). Furthermore, in the first case (on the left), the majority of operations concerns only LOP_MODIFY_OPERATION on the PFS page (context).

Does it mean that the online mode doesn't use minimaly mechanism for the whole rebuild process? I retrieved an interesting response from this Microsoft KB which confirms my suspicion.

Online Index Rebuild is a fully logged operation on SQL Server 2008 and later versions, whereas it is minimally-logged in SQL Server 2005. The change was made to ensure data integrity and more robust restore capabilities.

However I guess we don't have the same behavior than the FULL recovery model here. Indeed, there still exists a difference between SIMPLE / BULK_LOGGED and FULL recovery models in term of amount of log records generated. Here a picture of the transaction log size after rebuilding the big index online in full recovery model in my case:

 

blog_28_3_database_situation_after_rebuild_online_full_recovery

 

Ouch! 230MB (SIMPLE / BULK-LOGGED) vs 7GB  (FULL). It is clear that using FULL recovery model with rebuild index online operations will have a huge impact on the transaction log compared to the SIMPLE / BULK-LOGGED recovery model. So the solution in my case consisted in switching to offline mode or at least reducing the online operation for the concerned index.

Happy maintenance!

Birst: an attractive "all-in-one" Business Intelligence solution

Thu, 2015-01-29 09:02

The review of the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms has revealed a new challenger that could become one of the leader of this market: Birst - an interesting "all-in-one" BI solution.

 

Gartner_magic_quadrant_bi_2014_20150123-152555_1.jpg

 

Birst is an american company based in the Silicon Valley near San Francisco. Its SaaS (Software as a Service) BI solution mixes simplicity and power.

Just a little reminder before going forward: a complete Business Intelligence solution has two kind of tools:

 

b2ap3_thumbnail_BI_schema.jpg


  • Back-end BI tools: these tools are used to load and transform the data before using it for reporting
  • Front-end BI tools: these tools are used by end user to do the reporting (creating reports, dashboards, drill down reports …)

So what are the assets of Birst’s BI Solution?


Regarding the ergonomics of the solution

All in One: All the BI leader solutions on the market are using more than one applications for back-end and front-end tools. Birst is the only one to provide all the tools using one interface.

Online product: Birst is an online solution. No installation on a device is needed. You can save your data, reports, and dashboard in your own "space" (in the example above, the name of the space is "SupremEats DEV) located in the cloud.

 

Regarding the back-end tool (ETL, semantic layer, …)

Cloud connectivity. As a cloud-based solution, Birst can load data from the cloud using special connectors. So you can very easilymix  your own business data with cloud data.

Tasks automation. The Birst admin application (used to design the data model) uses a lot of automated tasks especially regarding the generation of the star schema or business model:

  • Automatic generation from the joins in regard to the different facts and dimension tables
  • Automatic generation from the time dimension (year, quarter, month, ...)
  • Automatic generation from the measures with context (e. g. sum of revenue, max revenue, min revenue, ...)

Time development reduction. The data modelling tasks are very time consuming in a Business Intelligence project. The automation of such tasks can very much decrease the time of development. Of course, Birst has the possibility to create or transform its own schema like a traditional ETL tool.

 

Regarding the front-end tool (report, dashboard, navigation)

Intuitive interface. For the end-user, the simplicity of the report / dashboard creation interface is another advantage of this solution. All the creation steps are guided by an assistant. If you compare it with other products like QlikView or Webi from SAP Business Objects, the interface is easy to learn:


b2ap3_thumbnail_Qlik_BO_Birst_Comparison.jpg


Powerful reporting tool. However, it remains a high-performance product with the possibility of creating professional and complex reports or dashboards:

 

b2ap3_thumbnail_Birst_reports_examples_1.jpgb2ap3_thumbnail_Birst_reports_examples_2.jpgb2ap3_thumbnail_Birst_reports_examples_3.jpg

Multi saving format options: Dashboards and reports can be saved using different format (saving in PDF format, XLS export, creation of a Microsoft Powerpoint, presentation ...).

Scheduling option for the end user: The end user has the possibility to publish his reports using schedule options.

b2ap3_thumbnail_Birst_scheduling_options.jpg

Conclusion

Birst could become a future leader in BI solutions. Combining both simplicity and power, Birst can seduce a lot of mid-sized enterprises or business units within large enterprises with small- or medium-sized BI budgets.