Skip navigation.

Yann Neuhaus

Syndicate content
All blog entries from http://www.dbi-services.com/
Updated: 9 hours 15 min ago

Using VSS snapshots with SQL Server - part I

Wed, 2015-05-13 09:55

 

This is probably a series of blog posts about some thoughts concerning VSS snapshots with database servers. Let’s begin with this first story:

Some time ago, I implemented a backup strategy at one of my customers based on FULL / DIFF and log backups. No issues during a long time but one day a call of my customer who told me that since some days, the differential backup didn’t work anymore with the following error message:

 

Msg 3035, Level 16, State 1, Line 1 Cannot perform a differential backup for database "demo", because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE, omitting the WITH DIFFERENTIAL option. Msg 3013, Level 16, State 1, Line 1 BACKUP DATABASE is terminating abnormally.

 

After looking at the SQL Server error log message I was able to find out some characteristic entries:

 

I/O is frozen on database demo. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

...

I/O was resumed on database demo. No user action is required.

 

Just in case, did you have implemented a snapshot of your database server? And effectively the problem came from the implementation of Veeam backup software for bare-metal recovery purpose. In fact after checking out the Veeam backup software user guide, I noticed that my customer forgot to switch the transaction log option value to the “perform backup only” with application-aware image processing method.

This is a little detail that makes the difference here. Indeed, in this case, Veeam backup software relies on VSS framework and using process transaction log option doesn’t preserve the chain of full/differential backup files and transaction logs. For those who like internal stuff you can interact with the VSS writers by specifying some options during the initialization of the backup dialog. The requestor may configure VSS_BACKUP_TYPE option by using the IVssBackupComponents interface and SetBackupState method.

In this case, configuring the “perform backup only” means that Veeam backup software will specify to the SQL writer to use the option VSS_BT_COPY rather than VSS_BT_FULL to preserve the log of the databases. There are probably other specific tools that will run on the same way, so you will have to check outeach related user guide.

Let’s demonstrate the kind of issue you mayface in this case.

First let’s perform a full database backup as follows:

 

BACKUP DATABASE demo TO DISK = 'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\demo.bak' WITH INIT, STATS = 10;

 

Next, let’s take a snapshot. If you take a look at the SQL Server error log you will find the related entries that concern I/O frozen an I/O resume operations for your databases.

Moreover, thereis another way to retrieve snapshot events. Let’s have a look at the msdb.dbo.backupset table. You can identify a snapshot by referring to the is_snapshot column value

 

USE msdb; GO   SELECT        backup_start_date,        backup_finish_date,        database_name,        backup_set_id,        type,        database_backup_lsn,        differential_base_lsn,        checkpoint_lsn,        first_lsn,        last_lsn,        is_snapshot FROM msdb.dbo.backupset WHERE database_name = 'demo' order by backup_start_date DESC;

 

blog_43_-_1_-_backup_history

 

… and this time the backup failed with the following error message:

 

Msg 3035, Level 16, State 1, Line 1 Cannot perform a differential backup for database "demo", because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE, omitting the WITH DIFFERENTIAL option. Msg 3013, Level 16, State 1, Line 1 BACKUP DATABASE is terminating abnormally.

 

In fact, the differential database backup relies on the last full database backup (most recent database_backup_lsn value) which is a snapshot and a non-valid backup in this case.

Probably the best advice I may provide here is to double check potential conflicts you may get from your existing backup processes and additional stuff like VSS snapshots. The good thing is that one of my other customersthat uses Veeam backup software was aware of this potential issue but we had to deal with other interesting issue. I will discuss about it to the next blog post dedicated to VSS snapshots.

News SharePoint 2016: new alternative to InfoPath Forms

Wed, 2015-05-13 01:26

infopath_logoMicrosoft announced in January 2015 that it was the END OF INFOPATH, that the 2013 version would be the last one. However, Microsoft updated the Infopath 2013 app will work with SharePoint Server 2016.
Following the new users needs, Microsoft decided InfoPath wasn't suited for the job, that's why Microsoft won't release a new version but only propose an alternative.

 

 

 

what
What is InfoPath Forms?

InfoPath is used to create forms to capture information and save the contents as a file on a PC or on a web server when hosted on SharePoint. InfoPath forms can submit to SharePoint lists and libraries, and submitted instances can be opened from SharePoint using InfoPath Filler or third-party products.

 

 

InfoPath provides several controls:

  • Rules
  • Data validation
  • Conditional Formatting
  • XPath Expression and Functions
  • Connection to external Datasources: SQL, Access, SharePoint
  • Coding languages: C#, Visual Basic, Jscript, HTML
  • User Roles
InfoPath History

Microsoft InfoPath is an application for designing, distributing, filling and submitting electronic forms containing structured data.
Microsoft initially released InfoPath as part of Microsoft Office 2003 family.

VERSION INCLUDED IN... RELEASE DATE InfoPath 2003 Microsoft Office 2003 Professional and Professional Enterprise November 19, 2003 InfoPath 2007 Microsoft Office 2007 Ultimate, Professional Plus and Enterprise January 27, 2007 InfoPath 2010 Microsoft Office 2010 Professional Plus; Office 365 July 15, 2010 InfoPath 2013 Microsoft Office 2013 Professional Plus; Office 365 January 29, 2013

 

In other words, an InfoPath Form is helping you to define some design, rules, data, connections, etc…

why What will happen with SharePoint 2016? Which alternative?

Because of the new user’s perspective about their needs: design, deployment, intelligence, all integrated between servers, services and clients.
Microsoft would like to present a tools available on Mobiles, Tablets and PCs, this due to the SharePoint Online, Windows 8 (almost 10), Windows Phone and Office 365.

 

 

Image-13 Solutions:

Customized Forms in SharePoint using .Net language: easy to use with Visual Studio, but a developer or a SharePoint developer is needed.

Nintex Forms: users can easily build custom forms and publish them to a SharePoint environment.

 

What is Nintex Forms

Nintex Forms is a web-based designer that enables forms to be created within SharePoint quickly and easily. Forms can then be consumed on most common mobile devices using internet, anywhere at anytime. Nintex Forms integrates seamlessly with Nintex Workflow to automate business processes and deliver rich SharePoint applications.

Learn more about nintex: http://www.nintex.com/

CONCLUSION

Let’s see what will be announced but I think Nintex will find its way as a great alternative for InfoPath:

  • No speciific knowledge is needed to build forms (HTML or JavaScript)
  • No client application needed
  • Nintex is completely web-based
  • Works mobile devices


Microsoft Project Management Tools: SharePoint 2013 and Project Server 2013

Tue, 2015-05-12 06:07
Project Management?

Project management is the process and activity of planning, organizing, motivating, and controlling resources, procedures and protocols to achieve specific goals in scientific or daily problems.

Why using Project Management?

Using Project Management methods and strategies decrease risks, reduce costs and improve success rates of your projects!

What are the Business Needs?
  • Rationalize investment
  • Align work with strategic objectives
  • Manage projects and resources effectively
  • Gain visibility and control over projects
Microsoft has 3 tools to manage project:
  • Microsoft Project: powerful and enhance way to manage a wide range of projects
  • Microsoft Project Server: builds upon the Microsoft SharePoint Server – it provides a basis for Project Web Application pages, site admin., and authentication infrastructure for Project Server Users and Project Workspace.

Standard: keep projects organized and communicate progress with reports
Professional: COLLABORATION with other in order to start-manage-deliver projects. Synchronization with SharePoint 2013 (2016 to be confirmed), that’s enable you to track status from anywhere.
Professional for Office 365: Work virtually anywhere - Go Online with flexible plans that help you quickly and easily sign up for the service that best fits your business need

  • Microsoft SharePoint: CMS to share, organize, discover, build and manage projects / data. This is a Collaboration tool.

Without any tools, managing a project could be worse, boring and inefficient.

think You would better  stop … and think about a SOLUTION ! 


Image-11

 

 

 Imagine 2 super products : peperoni and cheese… Both are just waiting
 for you to
make a great pizza Laughing !

 

What about Project and Sharepoint together???

As for Pizza, both mixed will for sure driving you to get a great Project Management Tool !

It will bring Success to manage your tasks and timelines easily.

For Management
  • Manage projects as SharePoint task lists
  • Manage projects with FULL CONTROL
For Collaboration
  • Teams connected
  • Easy way to create reports and specified dashboards
SharePoint + Project allows:
  • creating Site dedicated to project as central repository for project content
  • importing Sharepoint task list as an Entreprise Project for full Project Server control
  • synchronization and storage locally or in Sharepoint Library
The requirements for an installation are: Project Server 2013
  • Windows server 2012 or a 2008 R2 SP1 (minimum)
  • SQL Server 2012 or 2008 R2 with Service Pack 1
  • SharePoint 2013
SharePoint 2013
  • Windows server 2012 or a 2008 R2 SP1 (minimum)
  • SQL Server 2012 or 2008 R2 with Service Pack 1
CONCLUSION success    Great things are done by a serie of small things brought together, use
 SharePoint 2013 and Project Server 2013 TOGETHER to bring Success !

Variations on 1M rows insert (2): write commit

Mon, 2015-05-11 22:29

In this blog post, I will try to do the same than my colleagues about Oracle and for PostgreSQL. As a reminder, we’ve seen in my previous blog post that SQL Server is designed to commit transactions implicitly by default and inserting 1M rows in this case may have a huge impact on the transaction log throughput. Each transaction is synchronously committed to the transaction log. In this blog post, we’ll see a variation of the previous test

Indeed, since SQL Server 2014 version, it is possible to change a little bit this behaviour to improve the overall performance of our test by using a feature called delayed durability transaction. This is a performance feature for sure but you will have to trade durability for performance. As explained here, SQL Server uses a write-ahead logging protocol (WAL) and using this new feature will temporarily suspend this requirement.

So, let’s perform the same test but this time I will favour the overall throughput performance by using delayed durability option.

 

alter database demo set delayed_durability = allowed; go     DECLARE @i INT = 1;   WHILE @i &lt= 1000000 BEGIN           begin tran;          INSERT INTO DEMO VALUES (@i, CASE ROUND(RAND() * 10, 0) WHEN 1 THEN 'Marc' WHEN 2 THEN 'Bill' WHEN 3 THEN 'George' WHEN 4 THEN 'Eliot' WHEN 5 THEN 'Matt' WHEN 6 THEN 'Trey' ELSE 'Tracy' END, RAND() * 10000);                     commit tran with (delayed_durability = on);          SET @i += 1; END

 

-- 00:00:20 – Heap table -- 00:00:19 – table with clustered index

 

If I refer to my first test results with implicit commit behaviour, I may effectively notice a big performance improvement (86%). You may also note that I used this option at the transaction level after enabling delayed durability at the database level but in fact you have other possibilities. Depending on your context, you may prefer either to enable or to force this option directly at the database level.

Do I have to enable it? If your business is comfortable making throughput versus durability and this option improves the overall performance, go ahead but keep in mind that you also have others ways to improve your transaction log throughput before enabling this option (please read the Paul Randal’s blog posts)

Want to meet in one day our experts from all technologies? Come to our Event In-Memory: boost your IT performance! where we talk about SQL Server, Oracle and SAP HANA.

 

 

Get SQL Server Network Configuration with PowerShell

Mon, 2015-05-11 09:29

This blog is part of a set which try to automate the SQL Server administration with PowerShell. It explains how to retrieve the network configuration of a SQL Server instance with PowerShell.

I will refer to my previous blog Get SQL Server services with PowerShell. I presented how to get the SQL Server Engine service. I will assume the service has already been retrieved.

 

Disclaimer: I am not a developer but a SQL Server dba. If you find errors or some ways of improvement, I will be glad to read your comments!

 

SQL Engine service

I use the SMO objects to access the SQL Server Engine service. The object obtained has several properties.

This object can retrieve important information concerning the SQL Server Engine service, such as:

 

service_properties.png

 

TCP/IP information

This time, we will use the WMI objects issued from the SMO.  To retrieve the corresponding TCP/IP object concerning the specific instance, proceed as follows:

wmi_object_tcp_ip.png

 

The “Microsoft.SqlServer.Management.Smo.Wmi” namespace contains all the classes that represent the SQL Server WMI.

 

Now, to display the TCP/IP information of your instance, proceed as follows:

tcp_ip_information.png

 

Named Pipe information

As for TCP/IP, we build the WMI object for the Named Pipe with:

wmi_object_named_pipe.png

 

Now to display the Named Pipe information, proceed as follows:

named_pipe_information.png

 

Shared Memory information

As for the previous ones, we build the WMI object for the Shared Memory with:

wmi_object_shared_memory_20150512-063808_1.png

 

Now to display the Shared Memory information, proceed as follows:

shared_memory_information.png

 

Next steps

With these commands, you can generate a dashboard for a specific instance. Here is an example from our Database Management Kit (DMK):

dmk_service_20150512-064356_1.png

 

I hope this blog will help you in your work ;)

SharePoint 2016: What’s new? And What to expect?

Mon, 2015-05-11 02:11

The Ignite Event took place in Chicago last week. According to what has been presented, here are the first news about the new SharePoint 2016! 

SharePoint 2016 looks like an enhancement of the existing SharePoint 2013 version.
Most of the SP2013 features will continue to work but in a more powerful way.

alt SharePoint 2016 will be focused on the following points:
  • Security Compliance
  • Advanced Analytics and Insights (data analysis and reporting)
  • Cloud experience
The new features for this release include:
  • Links: All URLs will be “Resource ID” based URLs. That means that if one of your document is rename, the document link won’t be broken.
  • User Profile Service Application won’t be available anymore.
  • In order to use the User Profile Services, you will need to use the Forefront Identity Manager separated from the SharePoint Farm (outside). The Active Directory will be there but there are no sync anymore
  • SharePoint 2016 is a Cloud Inspired Infrastructure. The emphasis here is building a Hybrid platform to let users benefit from enhanced cloud innovations that Microsoft often releases for Office 365. One of the most important experience that MS will be introducing to On-Premises clients is the use of Cloud based Search. The power of Delve & Office Graph can also be now applied to the On-Premises Content and the integrated cloud of course.
  • Add On-Premises Content to online Delve board
  • Zero downtime Patching: all upgrades of SharePoint can be done online.

  • New Service applications will have an integration option for Delve
  • The New Hybrid Extranet will allow many scenarios
  • New Repair button to restore Server Services depending on its role
  • The Cloud subscription of services such as Azure Rights Management Services to on-premises setup
  • The InfoPath 2013 application will work with SharePoint Server 2016
Analytics and Insights:

There is a huge improvement in how SharePoint records and displays data analytic and Reporting. SharePoint 2016 will provide monitoring form:

  • Services
  • Actions
  • Usage
  • Engagement
  • Diagnostics
  • In-Memory Analytics (SQL Server 2016 release). Let's see in the next few months what will be announced. Maybe some enhancement of the integration between BI and SP2016

Social:
  • Social improvements start with better Yammer integration to avoid social silos, better integration with the micro-blogging platform, with new capabilities..
Content Management capabilities:
  • Improvement in Content limits: removed Character limit in URLs, increased File size to 10GB, remove the 5000 item limit. The list threshold has been increased

  • Encryption enhancements
  • Multi-factor authentication
  • Data loss prevention
  • eDiscovery
  • Metadata management
  • Policy control
  • Mobile device management
Installation & Configuration Enhancements:
  • OS – Windows Server 2012 R2, Windows Server 2010
  • New Server Installation Wizard will allow users to perform a Role based Installations
  • Database– 64 bit of SQL 2014
A brief summary of SharePoint 2016 using one word would be: CLOUD

 

what will be coming with the next major version of postgresql ?

Sun, 2015-05-10 13:14

A PostgreSQL major version is usually released once a year. For the next major version, which is 9.5, this probably will be in the second half of 2015.

Variations on 1M rows insert (4): APPEND_VALUES

Sun, 2015-05-10 12:39

In the first variation I did a bulk insert with the PL/SQL FORALL statement. A comment suggests to add the APPEND_VALUES hint that appeared in 11gR2. APPEND_VALUES hint does the same than APPEND but it's for inserts with VALUES clause: It inserts in direct-path - directly into the datafile, bypassing the buffer cache. Of course, this makes sense only with bulk insert because you don't want to do that with only one row.

APPEND_VALUES

So, from the previous test, inserting 1 million rows in only one transaction (no intermediate commits) into a table with only one index (primary key) took 46 seconds. With bulk insert, the time is only 7 seconds. Let's do the same bulk, but with the APPEND_VALUES hint. I changed only the insert line:

SQL> create table DEMO ("id" number primary key, "text" varchar2(15), "number" number) ;
Table created.

SQL> set timing on
SQL> declare
  2   type people_array is varray(12) of varchar(15);
  3   people people_array := people_array( 'Marc', 'Bill', 'George', 'Eliot', 'Matt', 'Trey', 'Tracy','Greg', 'Steve', 'Kristina', 'Katie', 'Jeff' );
  4   people_count number :=people.COUNT;
  5   type DEMO_record_type is record ("user_id" number , "name" varchar2(15), "number" number) ;
  6   type DEMO_collection_type is table of DEMO_record_type index by binary_integer;
  7   DEMO_collection DEMO_collection_type;
  8  begin
  9   -- fill the collection
 10   for i in 1..1e6 loop
 11    DEMO_collection(i)."user_id":=i;
 12    DEMO_collection(i)."name":=people( dbms_random.value(1,people.count) );
 13    DEMO_collection(i)."number":=trunc(dbms_random.value(0,10000));
 14    null;
 15   end loop;
 16   forall i in 1..DEMO_collection.count insert /*+ append_values */ into DEMO values DEMO_collection(i);
 17   commit;
 18  end;
 19  /
and here is the execution time:
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.28
It's a bit faster. However this testcase is not very well suited to show the benefit.

First, you must know that inserting in direct-path locks the table. We bypass the buffer cache and it's the buffer cache that manages concurrency when updating the blocks. Only one session can modify a block, and that's available also in RAC. So, when you bypass the buffer cache you must lock the table (or the partition if you insert into a partition) to be sure you are the only one that inserts after the high water mark.

Now about the testcase. Here I insert into an empty table. The direct-path insert is very efficient for index maintenance because instead of maintaining index row-by-row, the insert is build afterwards (data inserted after high water mark is read and sorted) and merged to the existing index. This optimization gives better performance on an index that has an higher depth.

APPEND

If you want to have an idea about how fast is direct-path insert, let's see how long it takes to insert the 1 million rows into another table - identical, with a primary key. This is easy with CTAS:

SQL> set timing on
SQL> create table DEMO2 ("id" primary key , "text" , "number" ) as select * from DEMO;
Table created.
Elapsed: 00:00:01.01
One second. That's the fastest way to insert 1 million rows in batch when you have them in a table (or file through external table), when you can lock the whole table and you don't need to do intermediate commits.
We can also do the same with INSERT /*+ APPEND */
SQL> truncate table DEMO2 reuse storage;
Table truncated.
Elapsed: 00:00:00.17

SQL> insert /*+ append */ into DEMO2 select * from DEMO;
1000000 rows created.
Elapsed: 00:00:01.36
An it's the same time (I'll not compare when difference is less than one second).

The index maintenance is fast, but you can go further by disabling the indexes (ALTER INDEX ... UNUSABLE) before and rebuild them afterwards.

However, my initial testcase was made thinking of OLTP, concurrent inserts from difference sessions. So direct-path writes is definitly not for that as it locks the table. But I wanted to show how 1 million rows is very small when we are doing thinks in bulk from one session.

Kerberos SSO for Documentum CS 6.7 SP1

Sun, 2015-05-10 11:00


In a previous post, I shared some tips to configure the Kerberos SSO with Documentum D2 3.1 SP1. Since that day, I worked on different projects to also setup the Kerberos SSO on some other components of Documentum. In this post I will try to explain in detail what need to be done to configure the Kerberos SSO for the Content Server. Actually it's not that hard to do it but you may face some issues if you try to follow the official documentation of EMC.


So what are the pre-requisites to setup the Kerberos SSO for the Content Server? Well in fact you just need a Content Server of course and an Active Directory to generate the keytab(s). Just to let you know, I used a Content Server 6.7 SP1 and an Active Directory on a Windows Server 2008 R2. Let's define the following properties:

  • Active Directory - user = cskrb
  • Active Directory - password = ##cskrb_pwd##
  • Active Directory - domain = DOMAIN.COM
  • Active Directory - hostname1 = adsrv1.domain.com
  • Active Directory - hostname2 = adsrv2.domain.com
  • Documentum - repository (docbase) = REPO


I. Active Directory prerequisites


As always when working with Kerberos on an Active Directory, the first thing to do is to create a user. So let's create this user with the following properties:

  • User name: cskrb
  • Support AES 128 bits encryption
  • WARNING: This account MUST NOT support AES 256 bits encryption
  • Trust for Delegation to any service (Kerberos Only)
  • Password never expires
  • Account never expires
  • Account not locked


Once the user has been created, you can proceed with the keytab creation using the comment prompt on the Active Directory host:

cs_keytab.png


According to the documentation of EMC, you can create one keytab with several keys inside for the Documentum repositories. Actually, that's wrong! It's not possible in the Microsoft world to generate a keytab with more than one Service Principal Name (SPN) in it, only the Linux implementations of Kerberos allow that. If you try to do so, your Active Directory may loop forever trying to add a second SPN to the keytab. That will considerably slow down your Active Directory and it may even crash...


If you want to setup the Kerberos SSO for more than one repository, you will have to create one user per repository and generate one keytab per user. So just repeat these two steps above for each repository, replacing the user name, user password and repository name... What is possible with an Active Directory is to map more than one SPN to a user. That can be useful for a Load Balancer setting for example but the keytab will always contain one SPN and therefore it seems that this solution isn't suitable for the Content Server.


The second remark here is that the documentation of EMC often uses the DES encryption only for the keytab but as shown above, you can of course specify the encryption to use or simply specify "ALL" to add all possible encryptions in this keytab. By default Kerberos will always use the stronger encryption. In our case as the Content Server doesn't support AES 256 bits encryption, the AES 128 bits encryption will be used instead.


II. Configuration of the Content Server side


So let's start the configuration of the Kerberos SSO for the Content Server. The first thing to do is of course to transfer the keytab created previously (REPO.keytab) from the Active Directory to the Content Server's host. This CS's host can be a Windows Server or a Linux Server, it doesn't matter as long as the Linux Server is part of your enterprise network (well if it's properly configured). In this post, I will use a Linux server because we usually install Documentum on Linux.


During the installation of the Content Server, the installer creates some default authentication folders, some security elements, aso... Therefore, you have to put the newly created keytab in this specific location for the Content Server to automatically recognize it. Please make sure that the keytab belongs to the Documentum installation owner (user and group) on the file system with the appropriate permissions (640). The correct location is:

$DOCUMENTUM/dba/auth/kerberos/



Then create the file "/etc/krb5.conf" with the following content:

[libdefaults]
noaddresses = true
udp_preference_limit = 1
default_realm = DOMAIN.COM
default_tgs_enctypes = aes128-cts arcfour-hmac-md5 des-cbc-md5 des-cbc-crc rc4-hmac
default_tkt_enctypes = aes128-cts arcfour-hmac-md5 des-cbc-md5 des-cbc-crc rc4-hmac
permitted_enctypes = aes128-cts arcfour-hmac-md5 des-cbc-md5 des-cbc-crc rc4-hmac
dns_lookup_realm = true
dns_lookup_kdc = true
passwd_check_s_address = false
ccache_type = 3
kdc_timesync = 0
forwardable = true
ticket_lifetime = 24h
clockskew = 72000

[domain_realm]
.domain.com = DOMAIN.COM
domain.com = DOMAIN.COM
adsrv1.domain.com = DOMAIN.COM
adsrv2.domain.com = DOMAIN.COM

[realms]
DOMAIN.COM = {
master_kdc = adsrv1.domain.com:88
kdc = adsrv1.domain.com:88
kpasswd = adsrv1.domain.com:464
kpasswd_server = adsrv1.domain.com:464
kdc = adsrv2.domain.com:88
kpasswd = adsrv2.domain.com:464
kpasswd_server = adsrv2.domain.com:464
}

[logging]
default = /var/log/kerberos/kdc.log
kdc = /var/log/kerberos/kdc.log

[appdefaults]
autologin = true
forward = true
forwardable = true
encrypt = true



You can of course customize this content with what is suitable for you depending on your environment. Moreover, if the file "/etc/krb5.conf" already exist and you don't want to modify it or if you can't modify it, then you can still create this file wherever you want. For example, create the folder "$DOCUMENTUM/kerberos/" and put this file inside. Then edit the file "~/.bash_profile" and add the following line into it to reference this new location:

export KRB5_CONFIG=$DOCUMENTUM/kerberos/krb5.conf



Once done, simply restart your ssh session or source the file "~/.bash_profile" for the new environment variable to be available.


The last thing to do is to refresh the Kerberos configuration for the Content Server to know that a keytab is available on the File System (and therefore enable the dm_krb authentication plugin). This process is known as the re-initialization of the Content Server. There are two main ways to do it: with or without Documentum Administrator (DA). When using DA, you can simply click on a button while the Content Server is running and that's the main advantage of this method. If you don't have a DA installed, then I guess you will have to reboot the Content Server for the changes to take effect.


To re-initialize the Content Server using DA, here is what need to be done:

  • Open DA in Internet Explorer
  • Log in to the repository "REPO" with the account of the Documentum installation owner or any account with sufficient permissions
  • Expand the "Basic Configuration item"
  • Click on "Content Servers"
  • Right-click on the repository you are connected to
  • Click on "Properties"
  • Check "Re-Initialize Server"
  • Click on "Ok"


Once done, you should be able to confirm that the reload of the dm_krb plugin was successful by checking the log file of the repository:

$DOCUMENTUM/dba/log/REPO.log



If everything goes well, you will see some lines showing that the Content Server was able to parse the keytab successfully. On a next blog, I will certainly explain how to configure the Kerberos SSO for the DFS part. Stay tuned!


bulk loading semi structured data in postgresql

Sun, 2015-05-10 10:00

The last post took a look at how to efficiently load 1m rows into a table in PostgreSQL. In this post I'll take a look on how to do the same with semi structured data.

D2 performance issues due to KB3038314 IE patch

Sun, 2015-05-10 08:58

I ran into a strange issue by a customer. When trying to open a huge VD on the D2’s right panel the browser freezes.

It seems to be due to an Internet Explorer security patch. It is introducing huge performance issues. So if you run into strange issues concerning your web browser check the patch version of IE. The security patch which causes issues is KB3038314.

Alfresco: some useful database queries

Sun, 2015-05-10 04:50


In my previous post, I talked about the Lifecycle of Alfresco Nodes. You may have noticed that I tried to insert in my explanations some elements that are specific to databases (tables, fields, aso...). These elements are quite essential to prepare a post like this one: more database oriented. I already explained what exactly are the consequences on the database side when a node is removed and I will try in this post to share some useful queries regarding these points but not only!


For this post, I used my local Alfresco Community 4.2.c installation with a PostgreSQL database. For your information, it just take 30 minutes to get this test environment ready with the Alfresco's installer (Windows, Mac or Unix). Of course, use the Database only for your daily administration work is certainly not the best idea but in some cases, it can really be faster and easier to just run some SQL commands at the DB level...


I. Document information


So let start this post with some generic queries that can be used to retrieve some information about documents. In this part, all columns of the results will be the same because I just pick up the same fields in my queries but the filter part (the WHERE clause) changes a little bit to be able to retrieve some information from different elements.

The first command I would like to show you is how to retrieve some information about documents based on the size of the content. Here, I just uploaded the document "Test_Lifecycle.docx" with a size of 52MB. So based on that, let's say that I want to retrieve all elements on my Alfresco installation with a content that is bigger than 40MB. In the same approach, you can select all elements with a content that is smaller than or between XX and YYMB. The conversion in MB is done using the round() function. Therefore, if you want this value to be in KB instead, just remove one division by 1024 in each round() function:

All documents bigger than 40MB
SELECT n.id AS "Node ID",
       n.store_id AS "Store ID",
       round(u.content_size/1024/1024,2) AS "Size (MB)",
       n.uuid AS "Document ID (UUID)",
       n.audit_creator AS "Creator",
       n.audit_created AS "Creation Date",
       n.audit_modifier AS "Modifier",
       n.audit_modified AS "Modification Date",
       p1.string_value AS "Document Name",
       u.content_url AS "Location"
FROM alf_node AS n,
     alf_node_properties AS p,
     alf_node_properties AS p1,
     alf_namespace AS ns,
     alf_qname AS q,
     alf_content_data AS d,
     alf_content_url AS u
WHERE n.id=p.node_id
  AND ns.id=q.ns_id
  AND p.qname_id=q.id
  AND p.long_value=d.id
  AND d.content_url_id=u.id
  AND p1.node_id=n.id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND round(u.content_size/1024/1024,2)>40
ORDER BY u.content_size DESC;



I will just put it once but here is the result of this command in this case:

 Node ID | Store ID | Size (MB) |          Document ID (UUID)          | Creator |         Creation Date         | Modifier |       Modification Date       |    Document Name    |                            Location                            
---------+----------+-----------+--------------------------------------+---------+-------------------------------+----------+-------------------------------+---------------------+----------------------------------------------------------------
131856 | 6 | 52.00 | eb267742-c018-4ba5-8ca4-75ca23c860f0 | Morgan | 2015-04-30T12:05:50.613+02:00 | Morgan | 2015-04-30T12:05:50.613+02:00 | Test_Lifecycle.docx | store://2015/4/30/12/5/0e111f05-7fcf-4a44-b719-b94cd04dd5ab.bin


So why did I selected these fields?!

  • Node ID: can be useful to join different tables
  • Store ID: a Store ID of 6 means that your document is in its active life. A Store ID of 5 means that this document has been deleted by a user and is now in the global trashcan
  • Size (MB): what we are searching for...
  • Document ID (UUID): the unique identifier of this document. The simplest way to preview this document is just to open the following url in any browser: http://HOSTNAME:PORT/share/page/document-details?nodeRef=workspace://SpacesStore/eb267742-c018-4ba5-8ca4-75ca23c860f0 (workspace://SpacesStore for store_id=6)
  • Creator, Modifier, Dates: well...
  • Document Name: can be useful to know the type of document without opening an URL (file extension)
  • Location: the actual location of the content's file on the File System. The "store://" refers to $ALF_DATA/contentstore/


The second command I would like to show you is how to retrieve some information based on the actual UUID of a document. As explained above, the UUID of a document can be found in the URL of its detail's page:

A document using its UUID
SELECT n.id AS "Node ID",
       n.store_id AS "Store ID",
       round(u.content_size/1024/1024,2) AS "Size (MB)",
       n.uuid AS "Document ID (UUID)",
       n.audit_creator AS "Creator",
       n.audit_created AS "Creation Date",
       n.audit_modifier AS "Modifier",
       n.audit_modified AS "Modification Date",
       p1.string_value AS "Document Name",
       u.content_url AS "Location"
FROM alf_node AS n,
     alf_node_properties AS p,
     alf_node_properties AS p1,
     alf_namespace AS ns,
     alf_qname AS q,
     alf_content_data AS d,
     alf_content_url AS u
WHERE n.id=p.node_id
  AND ns.id=q.ns_id
  AND p.qname_id=q.id
  AND p.long_value=d.id
  AND d.content_url_id=u.id
  AND p1.node_id=n.id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND n.uuid='eb267742-c018-4ba5-8ca4-75ca23c860f0';



Another possible command would be to find some information based on the File System location. That can be useful for example if there is a big document on the File System and you want to know the type of this document with the extension, its name or maybe some other information about the creator/modifier:

A document using its path on the File System
SELECT n.id AS "Node ID",
       n.store_id AS "Store ID",
       round(u.content_size/1024/1024,2) AS "Size (MB)",
       n.uuid AS "Document ID (UUID)",
       n.audit_creator AS "Creator",
       n.audit_created AS "Creation Date",
       n.audit_modifier AS "Modifier",
       n.audit_modified AS "Modification Date",
       p1.string_value AS "Document Name",
       u.content_url AS "Location"
FROM alf_node AS n,
     alf_node_properties AS p,
     alf_node_properties AS p1,
     alf_namespace AS ns,
     alf_qname AS q,
     alf_content_data AS d,
     alf_content_url AS u
WHERE n.id=p.node_id
  AND ns.id=q.ns_id
  AND p.qname_id=q.id
  AND p.long_value=d.id
  AND d.content_url_id=u.id
  AND p1.node_id=n.id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND u.content_url='store://2015/4/30/12/5/0e111f05-7fcf-4a44-b719-b94cd04dd5ab.bin';



II. Number of...


From a reporting point of view, let's say that you need some information regarding the number of... something. In this case and if you want to use your DB directly, then there is a really simple solution (simple but is it the best?) because Alfresco provide a Database architecture that is quite simple to understand and to use to get what you need. Indeed, if you take a look at the "alf_qname" table, you will see that every element that is part of Alfresco has its QName listed here. A QName is the Qualified Name of a repository item. This can be seen as a kind of "Super-Type":

alfresco=> SELECT * FROM alf_qname;
 id | version | ns_id | local_name
----+---------+-------+------------
  1 |       0 |     1 | store_root
  2 |       0 |     1 | aspect_root
  3 |       0 |     1 | container
  4 |       0 |     1 | children
  5 |       0 |     2 | user
...
 24 |       0 |     6 | folder
...
 51 |       0 |     6 | content
...
133 |       0 |     6 | thumbnail
134 |       0 |    13 | rendition
...


As you can see above, if you are searching for something that has a content, it can be done quite easily using the id or the local_name that correspond to that. So based on this table, here are some queries that can be useful:

Retrieve the number of users in the Repository
SELECT count(*)
FROM alf_node AS n,
     alf_qname AS q
WHERE n.type_qname_id=q.id
  AND q.local_name='user';

 

Retrieve the number of elements with a content in the Repository (include system's documents)
SELECT count(*)
FROM alf_node AS n,
     alf_qname AS q
WHERE n.type_qname_id=q.id
  AND q.local_name='content';

 

Retrieve the number of thumbnails in the Repository
SELECT count(*)
FROM alf_node AS n,
     alf_qname AS q
WHERE n.type_qname_id=q.id
  AND q.local_name='thumbnail';

 

Retrieve the number of renditions in the Repository
SELECT count(*)
FROM alf_node AS n,
     alf_qname AS q
WHERE n.type_qname_id=q.id
  AND q.local_name='rendition';



Of course you can do that for all QNames but you can also be more precise! So based on the query to retrieve the number of elements with a content, if you only want the number of documents of a specific type, then you can simply complete your query:

Retrieve the number of XML documents in the Repository
SELECT count(*)
FROM alf_node AS n,
     alf_qname AS q,
     alf_node_properties AS p
WHERE n.type_qname_id=q.id
  AND p.node_id=n.id
  AND p.qname_id IN
    (SELECT id
     FROM alf_qname
     WHERE local_name='name')
  AND q.local_name='content'
  AND p.string_value LIKE '%.xml';

 

Retrieve the number of PDF documents in the Repository
SELECT count(*)
FROM alf_node AS n,
     alf_qname AS q,
     alf_node_properties AS p
WHERE n.type_qname_id=q.id
  AND p.node_id=n.id
  AND p.qname_id IN
    (SELECT id
     FROM alf_qname
     WHERE local_name='name')
  AND q.local_name='content'
  AND p.string_value LIKE '%.pdf';


As the creation date, creator, modification date and modifier information are also stored on the "alf_node" table, you can also very easily filter your query based on the creation/update date of an Alfresco Node. That's pretty cool, right?! ;)


III. Lifecycle specific


To complete the relation between this blog post and the previous one, I wanted to share some queries that can be used to identify the current state of a document. As explained in my previous post, a document that is not yet deleted will be in the store named "workspace://SpacesStore". A document that has been deleted by a user will be in the sotre named "archive://SpacesStore" and when this document is removed from the global trashcan, the orphan_time is set to the current timestamp. With all these information and with the "alf_node" and "alf_content_url" tables we can easily build our own queries to find what is needed.

alfresco=> SELECT * FROM alf_store;
 id | version | protocol  |       identifier        | root_node_id
----+---------+-----------+-------------------------+--------------
  1 |       1 | user      | alfrescoUserStore       |            1
  2 |       1 | system    | system                  |            5
  3 |       1 | workspace | lightWeightVersionStore |            9
  4 |       1 | workspace | version2Store           |           10
  5 |       1 | archive   | SpacesStore             |           11
  6 |       1 | workspace | SpacesStore             |           12
(6 rows)


So let's find all documents that have been created and aren't deleted yet:

All documents created in their active life
SELECT *
FROM alf_node
WHERE store_id=6
  AND type_qname_id=51;


The next step on the lifecycle is when the documents have been deleted by a user but aren't deleted from the global trashcan (orphan_time is still NULL):

All documents created that are in the global trashcan (deleted by users)
SELECT *
FROM alf_node
WHERE store_id=5
  AND type_qname_id=51;


Finally, when the documents are removed from the global trashcan, some references/fields are removed, the QName of these documents change from "content" (51) to "deleted" (140) on the "alf_node" table and the orphan_time is set to the current timestamp on the "alf_content_url" table:

All elements that have been removed from the global trashcan and that are now orphaned
SELECT *
FROM alf_content_url
WHERE orphan_time IS NOT NULL;



I hope you enjoyed this blog post because it was quite hard for me to write something about database queries without giving up my soul to the DB world! See you soon ;).


Variations on 1M rows insert (1): bulk insert

Fri, 2015-05-08 09:19

I think you already have read the interesting series of blog posts of my colleagues Franck and Daniel about inserting on 1 million rows for Oracle and PostGreSQL. So it's time to write the first of the same series concerning SQL Server. First of all, just to clarify, the idea is not to make a direct comparison between Oracle, PostGreSQL and SQL Server but just to see variations that exist for each of them to insert quickly 1 million rows.

So I will perform the same basic test that my colleagues with roughly the same environment, one virtual machine on Hyper-V including only one processor, 512MB of memory and one SQL Server 2014 instance enterprise edition capped to 512 MB of memory:

 

Get-WmiObject –Class Win32_processor | ft Manufacturer, Name, NumberOfCores, NumberOfLogicalProcessors –Autosize

 

blog_42_-_1_-cpu_config

 

SELECT @@VERSION

 

blog_42_-_2_-_sql_version

 

SELECT        name,        value_in_use FROM sys.configurations WHERE name = 'max server memory (MB)'

 

blog_42_-_3_-_sql_config_mem

 

Row-by-row method

 

Let's start by using the same test tables with one heap table, clustered table and the same kind of script as well. I just modified the original script written by Franck but translating PL-SQL in T-SQL implies often using a completely different syntax but anyway, we will produce roughly the same bunch of data.

My user objects are stored to an user database called DEMO for this first test:

 

if object_id('DEMO', 'U') is not null        drop table DEMO;   create table DEMO("id" int , "text" varchar(15), "number" int);   if object_id('DEMO_PK', 'U') is not null        drop table DEMO_PK;   create table DEMO_PK("id" int , "text" varchar(15), "number" int,                  constraint demo_pk_pk primary key (id) );

...

DECLARE @i INT = 1; WHILE @i &lt= 1000000
BEGIN        INSERT INTO DEMO VALUES (@i, CASE ROUND(RAND() * 10, 0) WHEN 1 THEN 'Marc' WHEN 2 THEN 'Bill' WHEN 3 THEN 'George' WHEN 4 THEN 'Eliot' WHEN 5 THEN 'Matt' WHEN 6 THEN 'Trey' ELSE 'Tracy' END, RAND() * 10000);        SET @i += 1; END

 

Here my first result for both tables:

-- 00:02:29 – Heap table

-- 00:02:25 – table with clustered index

 

There are no big differences between inserting data into a heap table and a clustered table in this scenario because we insert rows basically in the same manner (always in the last page). At this point it is important to keep in mind that by default SQL Server uses implicit transaction mode. It means that each insert statement represents a transaction which has to be commited to the transaction log.

If we take a look at the specific wait statistics we can expect that the most waits will concern the log writes activity.

 

blog_42_-_4_-_waitstats

 

That’s it! The average values are pretty low but our results are far away from those of my colleagues. Let's motivated and let's talk about a kind of workaroud to speed-up the insert query. In fact, putting the user objects on tempdb database might be a kind of workaround. The main drawback is that tempdb database is temporary by design. Thus, our user objects will be persisted until the restart of the SQL Server but let's perform the previous test on tempdb. 

Here the results I get from this test:

-- 00:00:16 – Heap table

-- 00:00:15 – table with clustered index


 

So, a big improvement here.  Furthermore we may notice that related wait statistics have also changed as follows:

 

blog_42_-_5_-_waitstats


 

This main wait type is just related to a sustained CPU usage … so our final result is not so bad. At this point we may wonder why putting user objects on tempdb database increases the global procedure? In fact, we're using the special logging mechanism used by tempdb database that includes the lazy commit feature and the nonlogged after image feature for insert and update statements.


Go back to the user database DEMO and let's finish this row-by-row section by inserting data in an single transaction (or explicit mode) and let's take a look at the following results:

 

begin transaction   declare @i int = 1;     while @i <= 1000000
begin               insert DEMO values (@i, case cast(rand() * 10 as tinyint) when 1 then 'Marc' when 2 then 'Bill' when 3 then 'George' when 4 then 'Eliot' when 5 then 'Matt' when 6 then 'Trey'                                                                 when 7 then 'Tracy' when 8 then 'Greg' when 9 then 'Steve' else 'Patricia' end, rand() * 1000)          set @i = @i + 1; end   commit transaction

 

-- 00:00:10 – Heap table

-- 00:00:09 – table with clustered index

 

As excepted, we may notice a drastic drop of the duration value of both tests.

 

Bulk-insert method

Row-by-row commit is not the strength of SQL Server becauseeach commit requires to flush data to the transaction log. So let’s switch to bulk insert mode now. There are several ways to bulk insert data with SQL Server (either from client or server side by using for instance bcp or SSIS tool, BULK INSERT, SELECT INTO or OPENROWSET command and so on). In this test, I will use bcp to export data to a file before importing this file to my two tables.

To export my data I used the bcp command as follows:

 

blog_42_-_6_-_bcp_heap

 

...

 

blog_42_-_6_-_bcp_heap

 

I used native data types (-n option) in this case because my test concerns only transferring data from and to SQL Server. This option can improve performance but to be honest with this bunch of data the difference is not relevant.

Let’s bulk import our data to the two tables DEMO and DEMO_PK in my user database (not tempdb this time). At this point I want to be sure to be more efficient and I will use minimal logging for bulk-import operations in order to reduce the possibility to fill the log space and the potential contention (as a reminder writing to the transaction log file is always synchronous by default). Moreover, don’t forget that in this mode writing to the data file switches from asynchronous to synchronous mode. So becareful about your storage performance to avoid facing some unexpected behaviours during your import process.

So for my tests, the database DEMO is configured to SIMPLE recovery model and I will use BUKL INSERT command with TABLOCK option (which is a requirement to use minimally logging). Using options is possible but after some testing they appear to be not helpful in this context.

Concerning my heap table:

 

bulk insert demo.dbo.DEMO from 'C:bcpDEMO.dat' with (        DATAFILETYPE = 'native',        TABLOCK )

 

SQL Server Execution Times:

   CPU time = 703 ms, elapsed time = 725 ms.

 

Concerning my clustered table:

 

bulk insert demo.dbo.DEMO_PK from 'C:bcpDEMO_PK.dat' with (        DATAFILETYPE = 'native',        ORDER (id ASC),        TABLOCK )

 

SQL Server Execution Times:

   CPU time = 1437 ms, elapsed time = 1489 ms.

 

A little bit higher execution time than bulk import to a heap table. My table with a clustered index seems to introduce some overheads.

 

The bottom line is pretty the same than my colleagues. Insert and committing data rows by rows is not an optimized way if you plan to import a lot of data. So let’s continue on the same way than my colleagues with the next post of this series until the famous In-Memory feature. Don't forget that if you are in Switzerland in June, our experts from Oracle, Microsoft and SAP technologies will talk about In-Memory as implemented by SQL Server, Oracle and SAP HANA. All the same day. It's free and you can register now: Event In-Memory: boost your IT performance

Rebuild index on increasing values after deletes?

Fri, 2015-05-08 00:39

Yesterday while giving our Oracle tuning workshop I discussed with the customer about a job they have that rebuilds indexes every Sunday. Except in very rare cases Oracle indexes are maintained so that free space is reused by further inserts. But an index is an ordered structure. When we insert from a sequence, the value is always increasing, and go at the end of the index. And when we delete old data we delete index entries at the beginning of the index.
Is this a case where we need to manage it ourselves?

Test case

As usual I reproduce the issue. Here is my DEMO table with a DEMOPK index on the primary key:

SQL> create table DEMO (id number constraint DEMOPK primary key);
Table created.

I insert 10000 rows:

SQL> begin
  2   for i in 1..1e4 loop
  3    insert into DEMO values(i);
  4    commit;
  5   end loop;
  6  end;
  7  /
PL/SQL procedure successfully completed.

gather and check the stats:

SQL> exec dbms_stats.gather_table_stats(user,'DEMO');

PL/SQL procedure successfully completed.

SQL> select blocks,blevel,leaf_blocks from user_indexes join user_tables using(table_name) where index_name='DEMOPK';

    BLOCKS     BLEVEL LEAF_BLOCKS
---------- ---------- -----------
        20          1          18

So I have 1 branch and 18 leaf blocks.

 

Fragmentation

I'll check fragmentation from a user point of view. Having too much free space in leaf blocks is a problem with index range scan only. So let's fo an index range scan from the beginning to the end of the index:

SQL> alter session set statistics_level=all;
Session altered.

SQL> select /*+ index(DEMO) */ count(*) from DEMO;

  COUNT(*)
----------
     10000

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  7b6qc9m1cw3zd, child number 0
-------------------------------------
select /*+ index(DEMO) */ count(*) from DEMO

Plan hash value: 3019291478

-------------------------------------------------------------------------------------
| Id  | Operation        | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |      1 |        |      1 |00:00:00.01 |      19 |
|   1 |  SORT AGGREGATE  |        |      1 |      1 |      1 |00:00:00.01 |      19 |
|   2 |   INDEX FULL SCAN| DEMOPK |      1 |  10000 |  10000 |00:00:00.01 |      19 |
-------------------------------------------------------------------------------------

Exactly what we expected: 19 blocks reads is 1 branch and 18 leaves.

 

I have a script that does the same - range scan on an index - and shows how many index entries we have in each block. The script is here: How to measure Oracle index fragmentation. Let's run it on my index, with a bucket size large enough to see all blocks:

SQL> @ index_fragmentation.sql

        ID ->         ID rows/block bytes/block %free space     blocks free
---------- -- ---------- ---------- ----------- ----------- ---------- -----
         1 ->          1        578        8566                      1
       579 ->        579        571        8559                      1
      1150 ->       1150        571        8559                      1
      1721 ->       1721        571        8560                      1
      2292 ->       2292        571        8559                      1
      2863 ->       2863        571        8559                      1
      3434 ->       3434        571        8559                      1
      4005 ->       4005        571        8560                      1
      4576 ->       4576        571        8559                      1
      5147 ->       5147        571        8559                      1
      5718 ->       5718        571        8560                      1
      6289 ->       6289        571        8559                      1
      6860 ->       6860        571        8559                      1
      7431 ->       7431        571        8559                      1
      8002 ->       8002        571        8560                      1
      8573 ->       8573        571        8559                      1
      9144 ->       9144        571        8559                      1
      9715 ->       9715        286        4287          47          1 oo

Here are our 18 leaf blocks, covering values from 1 to 10000 (the ID displayed is the first one in each bucket - blocks here). The blocks are full (size is an approximation so this is why it's a bit higher than 8k), with about 570 entries per block. This is expected because when we insert increasing values, the block split fills the block instead of doing a 50-50 split.

 

delete insert lifecycle

Here is what I want to reproduce: delete old rows at the beginning of the index and insert new rows at the end. I'll do that for the same number of rows:10000 so I'm sure I've delete rows from all those 18 leaf blocks.

SQL> begin
  2   for i in 1..1e4 loop
  3    delete from DEMO where id=i;
  4    commit;
  5    insert into DEMO values(i+1e4);
  6    commit;
  7   end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

Then run my index range scan:

SQL> select /*+ index(DEMO) */ count(*) from DEMO;

  COUNT(*)
----------
     10000

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  7b6qc9m1cw3zd, child number 0
-------------------------------------
select /*+ index(DEMO) */ count(*) from DEMO

Plan hash value: 3019291478

-------------------------------------------------------------------------------------
| Id  | Operation        | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |      1 |        |      1 |00:00:00.01 |      24 |
|   1 |  SORT AGGREGATE  |        |      1 |      1 |      1 |00:00:00.01 |      24 |
|   2 |   INDEX FULL SCAN| DEMOPK |      1 |  10000 |  10000 |00:00:00.01 |      24 |
-------------------------------------------------------------------------------------

Did I double the number of blocks to read? No.

 

Do you think that we lost some space because we read 24 blocks instead of 19? Look at the numbers. The new numbers are above 10000 and are larger than the initial ones. It's 4 bytes vs. 3 bytes.
Don't believe me?

SQL> select min(rownum),max(rownum),sum(vsize(rownum)) from (select * from dual connect by 1000>=level),(select * from dual connect by 20>=level) group by ceil(rownum/10000);

MIN(ROWNUM) MAX(ROWNUM) SUM(VSIZE(ROWNUM))
----------- ----------- ------------------
          1       10000              29801
      10001       20000              39899

Yes... No place for guesses and myth... Everything can be measured... Do you know how many block we need when data in 18 blocks are increased by that ratio? 18 * (4/3) = 24 so we are not bad at all.

 

The fact is that the 18 leaf blocks has only been increased to 20 leaf blocks:

SQL> @ index_fragmentation.sql

        ID ->         ID rows/block bytes/block %free space     blocks free
---------- -- ---------- ---------- ----------- ----------- ---------- -----
     10001 ->      10001        266        4254          47          1 oo
     10267 ->      10267        533        8523                      1
     10800 ->      10800        533        8522                      1
     11333 ->      11333        533        8523                      1
     11866 ->      11866        533        8523                      1
     12399 ->      12399        533        8522                      1
     12932 ->      12932        533        8523                      1
     13465 ->      13465        533        8523                      1
     13998 ->      13998        533        8522                      1
     14531 ->      14531        533        8523                      1
     15064 ->      15064        533        8523                      1
     15597 ->      15597        533        8522                      1
     16130 ->      16130        533        8523                      1
     16663 ->      16663        533        8523                      1
     17196 ->      17196        533        8522                      1
     17729 ->      17729        533        8523                      1
     18262 ->      18262        533        8523                      1
     18795 ->      18795        533        8522                      1
     19328 ->      19328        533        8523                      1
     19861 ->      19861        140        2237          72          1 ooo

and they are all full - except first and last one.

 

This is optimal. Do the same test case and you will see that if you coalesce or shrink the index then the number of blocks will not change. More info about it in a previous blog post: index coalesce vs. shrink vs rebuild

Is it new?

Ok, I've run my tests on 12c and you want to know if it's something new. No it's not new.
Oracle 7.3.3 reuses the deleted space as well:

b2ap3_thumbnail_CaptureORA73index.JPG

It's the same test case except that here I'm with 2k block size.

Index rebuild

Do you think index rebuild can help, or at least is not harmful?

SQL> alter index DEMOPK rebuild;
Index altered.

SQL> @ index_fragmentation.sql

        ID ->         ID rows/block bytes/block %free space     blocks free
---------- -- ---------- ---------- ----------- ----------- ---------- -----
     10001 ->      10001        478        7644           5          1
     10479 ->      10479        479        7659           5          1
     10958 ->      10958        479        7659           5          1
     11437 ->      11437        479        7659           5          1
     11916 ->      11916        478        7644           5          1
     12394 ->      12394        479        7659           5          1
     12873 ->      12873        479        7659           5          1
     13352 ->      13352        479        7659           5          1
     13831 ->      13831        479        7659           5          1
     14310 ->      14310        478        7644           5          1
     14788 ->      14788        479        7659           5          1
     15267 ->      15267        479        7659           5          1
     15746 ->      15746        479        7659           5          1
     16225 ->      16225        479        7659           5          1
     16704 ->      16704        478        7644           5          1
     17182 ->      17182        479        7659           5          1
     17661 ->      17661        479        7659           5          1
     18140 ->      18140        479        7659           5          1
     18619 ->      18619        478        7644           5          1
     19097 ->      19097        479        7659           5          1
     19576 ->      19576        425        6794          15          1

The index rebuild has increased the size of the index. One more leaf block here. Because it has left 5% of free space in each block. And that free space will never be reused because there are no future rows that will go there.

 

Conclusion

Is the free space reused in an index on a sequence - always increasing - when we are purging old data?
Answer is: yes... unless to are doing regular index rebuilds.

EMC World 2015 - Last day at Momentum

Thu, 2015-05-07 22:52

So Momentum is over, Philippe Schweitzer and I finished with a 4 hours hackaton session. For Philippe the subject was "Developing an EMC Documentum application with REST, AngularJS, Bootstrap, Node.js and Socket.io=" and I choosed "From the Ground Up - Developing an EMC InforArchive Solution".

But the main subject in this post is more to thank all people we met during these four enriching days, EMC people - who hold the sessions, who made demos on the booths, Catherine Weiss - Partner Sales Manager, who introduced us to great people. Also thank you to people from fme, Reveille Software, Flatiron with whom we had good discussions.

The atmosphere was amazing, not only during the working days but also in the evening events organized by EMC Smile

So to be short, Momemtum 2015 was a great and successful journey.

Philippe and Gerard

the fastest way to load 1m rows in postgresql

Thu, 2015-05-07 13:00
postreslogo.png

There have been several posts on how to load 1m rows into a database in the last days:

Variations on 1M rows insert (1): bulk insert
Variations on 1M rows insert(2): commit write
Variations on 1M rows insert (1): bulk insert - PostgreSQL
Variations on 1M rows insert(2): commit write - PostgreSQL
Variations on 1M rows insert (3): TimesTen

In this post I'll focus on how to prepare a PostgreSQL database for bulk loading in more detail.

Concrete5 CMS

Thu, 2015-05-07 03:30

Today, a lot of CMS are existing, WordPress, Joomla, Magento, and others, in this blog I will share my experience Concrete5 through a web agency specialized based in Geneva: 8 Ways Media


What Concrete5?

alt


Not only a CMS (Content Management System) open source based on a webserver, it is coded in PHP using a MySQL database, but also it's a great Framework for developers. A simplified system (optional), URL rewriting is present to increase the performance of indexing sites from search engines.

C5 can also be used in the development of Web applications.


C5 also provides, through its content management and user rights, create intranets for companies (small scale in my opinion, it is better for an intranet, stay on a SharePoint or Alfresco).

This CMS is designed to make life easier for the end user, the handling is simple and intuitive.

Advanced management of dynamic websites with modern design, the editing mode is made directly via the FrontEnd, the possibilities are numerous: Drag and Drop, templates, etc ...

INSTALLATION


Verifying Prerequisites


The following components are required to enable concrete5 run correctly: http://www.concrete5.org/documentation/developers/5.7/installation/system-requirements/


To install the tutorial, please refer to the main site: http://www.concrete5.org/documentation/developers/5.7/installation/installation/


FEATURES


The Dashboard manages the properties related to:
 

alt

  • Rights management

alt

 

  • Management of imported content


  alt alt

  • Templates

alt

  • Block, pages

alt

  • Features: video, forms, presentations, blog, guestbook, etc ...

  alt alt

The GUI is configurable, C5 is based on a system with access to a punctilious customization.
The editing of pages, texts and other is done via the FrontEnd as soon as you are logged on as an administrator or with the writing rights on the site. The editing mode has two modes: HTML or "Composer".
      
Versioning is an asset, it means that in case of error, a trace of the old version before changes is easily restorable.
The updates are performed through a simple upload, followed by a single click.

CONCLUSION


Despite the trio "Wordpress - Joomla - Drupal" according to studies, having discovered Concrete5, I recommend it for its very intuitive look and ease of use, on the other hand the developer of communities seem to be active and growing, what facilitates the resolution of "small issues" in cases. Also, exports of all site content in HTML is possible, this could help if you have to change the web server. However, the majority of bases useful plugins have a high cost, the most "design" themes are not free (expense of taste of each) and the support service is paid if the host is not through their bias.
I highly recommend this CMS! Its simplicity and amplitude adaptation to different needs allows the creation of a site with confidence and ease.

EMC World 2015 - Day 3 at Momentum

Wed, 2015-05-06 18:53

In this post I would like to relay some advices around the upgrade to Documentum 7.2 I have got in the session from Patrick Walsh about "What's New, what's Next: EMC Documentum Platform".

With Documentum 7.2 the processes for the upgrade is more cleaner and there are less restrictions. For instance more upgrade scenarios are documented, that can help us to define the best upgrade path to use.

There was also a slide which listed the following points to take into account when we have to define the right time to upgrade the current installation.

- Features
- Costs
- Dates

Is there not something missing?
On my point of view, there is at least an additional point to consider. When I do the impact assessment to know if we have to upgrade or not and when, I study the list of issues that have been fixed with this new version - if we are impacted or not by them - but also which open issues (who knows an application without bugs? ) are coming with it that are acceptable.

Another helpful information - which can give an insight to customers - is the time to achieve a typical upgrade project.
Documentum considers 6 to 12 months for the planning (evaluation of the release, gathering business requirements, budget approval aso) and 6 to 24 months for the implementation and testing.
Based on that, customers still using the version 6.7 (End Of Support is April 30 2018) should think to upgrade to version 7.x

To facilitate the upgrade, the client and platform do not have to be upgraded in one big bunch. the D6.x clients can be upgraded first and then the 6.7 platform.

Documentum introduced also "Phased Service Upgrades". For instance, we have the possibility to upgrade xPlore from version 1.3 to 1.5 in phase one and a couple of months later in phase two the platform from 6.7 to 7.2.
Or vice-versa, we start with the platform and later on we upgrade xPlore.
With this approach, having de-coupled services, we have more flexibility and less downtime.

And now, last but not least, the aim for the future is to have no downtime at all during the upgrade. THIS would be wonderfull !

 

EMC World Las Vegas – Momentum 2015 third day D2 news

Wed, 2015-05-06 17:36

This was a more day of networking with EMC partner contact and third party software editors. On the other side I attended a session about D2 news and what is comming next.

 

EMC divided D2 enhancements by 3 main major themes.

 

First was about productivity and a modern look and feel with:

 

    • graphical workflow widget

    • drag and drop from D2 to Desktop

    • better browsing with enhanced facet navigation

    • multi-document support in workflows

    • faster content transfer with new D2-BOCS for distributed environments

 

Second was about Information integrity with

    • more SSO implementation support, like Tivoli

    • folder import with inner documents as virtual document

 

Then finally about software agility with

 

    • ease of new user on-boarding with default configuration settings

    • PDF export of D2 configuration for multi environment comparison

 

I hope you enjoyed reading this summary of today at EMC world – Momentum 2015. Thanks for your attention.

getting started with postgres plus advanced server (4) - setting up the monitoring server

Wed, 2015-05-06 04:26

If you followed the first, second and the third post the current ppas infrastructure consists of a primary database, a hot standby database and a backup and recovery server.