Skip navigation.

Yann Neuhaus

Syndicate content
All blog entries from http://www.dbi-services.com/
Updated: 2 hours 54 min ago

Oracle cloud control / SQL Details / Statistics

Wed, 2014-10-29 06:06

A question that I had several times: in Enterprise Manager, in the screen about one SQL statement, the 'statistics' tab shows the number of executions, elapsed time, etc. Question is: which time window does it cover? There is a one hour chart above, and two timestamps displayed as 'First Load Time' and 'Last load Time', and we don't know which one is related with the execution statistics numbers. I'll explain it clearly on an example.

I'll check a query I have on my system which has several cursors, with two different execution plans. And I check from V$SQL because here is where is the most detailed information, and columns are well documented.

From the documentation:

  • FIRST_LOAD_TIME is the Timestamp of the parent creation time
  • LAST_LOAD_TIME is the Time at which the query plan was loaded into the library cache

It's clear that because V$SQL show information about child cursors, the FIRST_LOAD_TIME will be the same for all children.

SQL> select sql_id,plan_hash_value,executions,first_load_time,last_load_time,last_active_time from v$sql where sql_id='dcstr36r0vz0d' order by child_number

SQL_ID        PLAN_HASH_VALUE EXECUTIONS FIRST_LOAD_TIME     LAST_LOAD_TIME      LAST_ACTIVE_TIME
------------- --------------- ---------- ------------------- ------------------- -------------------
dcstr36r0vz0d        17720163         60 2014-10-29/07:01:59 2014-10-29/07:01:59 2014-10-29/13:01:25
dcstr36r0vz0d      3798950322        102 2014-10-29/07:01:59 2014-10-29/07:03:49 2014-10-29/13:05:54
dcstr36r0vz0d      3798950322         24 2014-10-29/07:01:59 2014-10-29/07:05:55 2014-10-29/13:05:54
dcstr36r0vz0d      3798950322          1 2014-10-29/07:01:59 2014-10-29/08:11:19 2014-10-29/08:11:19
dcstr36r0vz0d      3798950322          1 2014-10-29/07:01:59 2014-10-29/08:29:34 2014-10-29/08:29:34

The plan with hash value 17720163 has been executed 60 times since 07:01:59. It was the first child cursor (child_number=0) for that parent, so this is why FIRST_LOAD_TIME=LAST_LOAD_TIME

And, the plan with hash value 3798950322 has been executed 128 times since 07:03:49 by cursors that are not shared but have come to the same plan anyway

Two remarks:

  • FIRST_LOAD_TIME is the same for all children because it is a parent information
  • LAST_LOAD_TIME is different for each child and that's important because Enterprise Manager don't show that detail, aggregating together the children with same execution plan.
Time to look at the Enterprise Manager screen.   I'm talking about the 'Real Time'  statistics:   EMLastLoadTime1.png   and I've selected the plan hash value 17720163:   EMLastLoadTime2.png   Ok. So we have 60 executions here. This matches the line in V$SQL. And we know that is it 60 executions since 07:01:59 because both timestamps are the same. No doubt here.   Then, let's select the other plan hash value from the popup:   EMLastLoadTime3.png   128 executions for this plan. This is what we had when summing the lines from V$SQL. And look at the Shared Cursor Statistics. The number of 'Child Cursors' is 4 which is what we know. The 'First Load Time' is the one of the parent.   However, what is the 'Last Load Time' when we know that there are 4 different values in V$SQL for it? Look, they choose the latest one, 08:29:34, and that's a good choice according to the name. It's the last load time.   But what I want to know is the time from which the 128 executions are counted. And that should be the earliest one. In my example, we know from V$SQL what we had 128 executions since 07:03:49 but that timestamp is not displayed here.   If you want a date, you should take the 'First Load time' because it's true that there were 128 executions of cursors with that plan hash value since 07:01:59   Sometimes the first load time is very old and it would be better to have the MIN(LAST_LOAD_TIME). But anyway if we want better time detail, we can choose the 'Historical' view instead of the 'Real Time' one and we have the numbers related with the AWR snapshots.   Here is an example for the cursor with plan hash value 17720163:   EMLastLoadTime4.png   From the historical view, we select a timestamp, we see the begin and end timestamps. Here I have 10 executions per hour.   Everything looks good there, except that 'Child Cursors' is 5, which is for the whole statement and not only for the cursors selected by the plan hash value.   Then I've two conclusions:
  • 'Last Load Time' is not useful to know the time window covered by the Real Time statistics. Use First 'Load time instead'
  • In case of any doubt, fall back to V$ views which are much more documented, and give more detail.

Oracle Grid Infrastructure: fixing the PRVG-10122 error during installation

Tue, 2014-10-28 02:22

Making errors is human and when you configure a new Oracle Grid Infrastructure environment (especially one with a large number of nodes), mistakes can happen when configuring ASMLIB on all nodes. If you get an error looking like"PRVG-10122 : ASMLib configuration...does not match with cluster nodes", there is a simple solution to fix it.

When you are installing Grid Infrastructure, the following error can occor in cluvfy output or directly in the pre-requisites check step of the OUI:

 

ERROR:
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_UID" on the node "srvora01" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_GID" on the node "srvora01" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_ENABLED" on the node "srvora01" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_UID" on the node "srvora02" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_GID" on the node "srvora02" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_ENABLED" on the node "srvora02" does not match with cluster nodes
Result: Check for ASMLib configuration failed.

 

The three parameters ORACLEASM_UID, ORACLEASM_GID and ORACLEASM_ENABLED displayed in the error should be defined when configuring ASMLIB on the system (see Grid Infrastructure 12c pre-requisites for installation, configure ASM step). To check if the configuration is coherent between the nodes specified in the error above, run the following command as root on all concerned nodes. In my case, srvora01 and srvora02 are the involved servers:

 

On srvora01

[root@srvora01 ~]# oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

 

On srvora02

[root@srvora02 ~]# oracleasm configure
ORACLEASM_ENABLED=false
ORACLEASM_UID=..
ORACLEASM_GID=..
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

 

As we can see, it seems that ASMLIB has not been configured on srvora02: ORACLEASM_ENABLED is false, and no UID or GID are provided. Thse are the default values! The parameters are different between the two nodes.

To solve the issue, simply reconfigure ASMLIB on the second node by running the following command with the right parameters:

 

On srvora02

 

[root@srvora02 ~]# oracleasm configure -i

 

Configuring the Oracle ASM library driver.This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting without typing an
answer will keep that current value.  Ctrl-C will abort.

 

Default user to own the driver interface []: grid
Default group to own the driver interface []: asmadmin
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done

 

Now, the parameters are the same between the two nodes:

 

[root@srvora02 ~]# oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

 

The Grid Infrastructure installation can now continue. I know that the error message was quite explicit, but after having spent hours to configure all nodes of my cluster, it took me some time to understand my big mistake. I hope this will help you.

SQL Server & memory leak: Are you sure?

Mon, 2014-10-27 22:02

I have recently come across an issue with one of my customer that told me that their SQL Server 2008 R2 instance had a memory leak. These are strong words! The server has 64 GB of RAM and the concerned SQL Server instance is limited to 54 GB according to the max server memory (GB) server option. However, he noticed that SQL Server used 60 GB of memory which did not correspond to the configured max server memory option. What’s going on? Let’s investigate what is causing this memory overhead.

I admit to work mainly on SQL Server 2012 now because most of my customers have upgraded to this version and I had to revisit my skills about memory architecture with SQL Server 2008 R2. Smile

Let’s start by the following question: Is it possible for SQL Server to consume more memory than a configured memory limit? For SQL Server versions older than 2012 the answer is yes and this is directly related to the memory architecture design. From this point on, we will talk exclusively about the SQL Server 2005 / 2008 versions. So, as you certainly know, SQL Server caps only the buffer pool memory area that mainly concerns the single page allocator mechanism.

Let’s have a little deep dive on SQL Server memory architecture here to understand where the single page allocator comes from. My goal is not to provide a complete explanation of the SQL Server memory management with SQLOS but just the concept to understand how to troubleshoot the issue presented in this article.

First of all, threads with SQL Server cannot directly interface with memory. They must go through memory allocators routines from a memory node that know which Windows APIs to use to honor a memory request. The memory node is a hidden component and provides a locality of allocation. Page allocators are one of the memory allocator types and most commonly used with the SQLOS memory manager because they allocate memory in multiple of SQLOS's page – (payload = 8KB). Because memory node is a hidden component threads cannot use it directly and it must create a memory object that has its own memory clerk depending on its type.

A memory clerk is another component that provides data caching, memory control (in case of memory pressure for instance), memory usage statistics tracking capabilities and supports the same type of memory allocators than a memory node. There are several type of memory allocators like single page allocator that can only provide one page at the time, multipage allocator that provide a set of pages at time and others.

To summarize, when a request is asking memory to create a memory object, it goes to the concerned memory clerk which in turn requests the concerned memory allocator. Here a simple representation of memory allocation:

 

{Thread} -> {Memory object} --> {Memory Clerk} --> {Memory allocator in Memory Node}

 

The most interesting part here is that the buffer pool in SQL Server versions older than 2012 acts as both a memory clerk and a consumer. It means that it can provide single pages from others consumers and track its own memory consumption. You can read the detailed explainations in the excellent articles from Slava Oks here.

This is why we have only control of the buffer pool size by configuring the min. / max. server memory options because it provide single page allocations unlike other memory clerks with multi-page allocations. Note that SQL Server 2012 memory management is completely different and fortunately, we have a better control of the memory limitation with SQL Server.

So, now that we know exactly what is capped by the “max. server memory server” level option, let's go back to my problem: Unfortunately I cannot reveal the real context of my customer here, but no problem: I am able to simulate the same problem.

In my scenario, I have a SQL Server 2008 R2 instance capped to 2560 MB and the total memory on my server is 8192MB. If I take a look at the task manager panel like my customer I can see that SQL Server uses more than my configured value MB:

 

Blog_19_-_1_-_task_manager_

 

A quick math tells us that SQL Server is using 3.4 GB rather than 2.5 GB. Well, where then does the remaining part come from? If we take a look at the DMV sys.dm_os_sys_info, we can confirm that the buffer pool is capped to 2560MB as expected (single page allocations):

 

SELECT        physical_memory_in_bytes / 1024 / 1024 AS physical_memory_MB,        bpool_committed / 128 AS bpool_MB,        bpool_commit_target / 128 AS bpool_target_MB,        bpool_visible / 128 AS bpool_visible_MB FROM sys.dm_os_sys_info(nolock)

 

 

Blog_19_-_2_-_sys_dm_os_sys_info_

 

Go ahead and take a look at the multipage allocators statistics information related on memory clerks by using the DMV sys.dm_os_memory_clerks (remember that memory clerks have memory usage capabilities):

 

SELECT        name AS clerk_name,        memory_node_id,        sum(single_pages_kb) / 1024 as single_page_total_size_mb,        sum(multi_pages_kb) / 1024 as multi_page_total_size_mb,        sum(awe_allocated_kb) / 1024 as awe_allocaed_size_MB FROM sys.dm_os_memory_clerks(nolock) WHERE memory_node_id 64 group by memory_node_id, name HAVING SUM(multi_pages_kb) > 0 ORDER BY sum(single_pages_kb) + sum(multi_pages_kb) + sum(awe_allocated_kb) DESC;

 

Blog_19_-_3_-_sys_dm_os_memory_clerks_

 

Note that in my case (same case as for my customer, but with a different order of magnitude), we have a particular memory clerk with a multipage allocator size greater than others. A quick math (2560 + 863 = 3423 MB) can confirm most part of the memory overhead in this case. So, at this point we can claim that SQL Server does not suffer from any memory leaks. This is a normal behavior 'by design'.

Finally, let's go back to the root cause of my customer that comes from the memory clerk related to the TokenAndPermUserStore cache that grows with time (approximatively 6GB of memory overhead). What exactly is the TokenAndPermUserStore cache?

Well, this is a security cache that maintains the different security token types like LoginToken, TokenPerm, UserToken, SecContextToken, and TokenAccessResult generated when a user executes a query. The problem can occur when this cache store grows, the time to search for existing security entries to reuse increases, causing potentially slower query times because access to this cache is controlled by only one thread (please refer to the Microsoft KB 927396). In my demo we can also notice a lot of different entries related on the TokenAndPermUserStore cache by using the following query:

 

WITH memory_cache_entries AS (        SELECT              name AS entry_name,              [type],              in_use_count,              pages_allocated_count,              CAST(entry_data AS XML) AS entry_data        FROM sys.dm_os_memory_cache_entries(nolock)        WHERE type = 'USERSTORE_TOKENPERM' ), memory_cache_entries_details AS (        SELECT              entry_data.value('(/entry/@class)[1]', 'bigint') AS class,              entry_data.value('(/entry/@subclass)[1]', 'int') AS subclass,              entry_data.value('(/entry/@name)[1]', 'varchar(100)') AS token_name,              pages_allocated_count,              in_use_count        FROM memory_cache_entries ) SELECT        class,        subclass,        token_name,        COUNT(*) AS nb_entries FROM memory_cache_entries_details GROUP BY token_name, class, subclass ORDER BY nb_entries DESC;

 

 

Blog_19_-_4_-_TokenAndPermUserStoreDetails_

 

The situation above was very similar to my customer issue but we didn’t find any relevant related performance problems, only an important number of SQL Server logins and ad-hoc queries used by the concerned application.

The fix consisted of flushing entries from the TokenAndPermUserStore cache according to the workaround provided by Microsoft in the Microsoft KB 927396. I hope it will be a temporary solution, the time to investigate from the side of the application code, but this is another story!

However here the moral of my story: check carefully how SQL Server uses memory before saying that it suffers from memory leak Smile

Oracle: an unexpected lock behavior with rollback

Sun, 2014-10-26 15:20

Here is an odd Oracle behavior I observed when a transaction that acquired a lock is rolled back. Note that this is related to a specific combination of locks that should not be encountered in production. So it's not a bug. Just something unexpected.

 

First session

In my first session I lock the DEPT table in share mode (RS)

20:56:56 SQL1> lock table dept in row share mode;
Table(s) Locked.

My first session (SID=53) has acquired a TM lock in mode 2 (row share):

20:56:56 SQL1> select object_name,session_id,LOCKED_MODE from v$locked_object join dba_objects using(object_id);

OBJECT_NAME SESSION_ID LOCKED_MODE
----------- ---------- -----------
DEPT                53           2

 

Second session

In my second session I lock the DEPT table in share + row exclusive mode (SRX). This is compatible with the RS.

20:56:59 SQL2> lock table dept in share row exclusive mode;
Table(s) Locked.

My second session (SID=59) has acquired a TM lock in mode 5 (share + row exclusive):

20:56:59 SQL2> select object_name,session_id,LOCKED_MODE from v$locked_object join dba_objects using(object_id);

OBJECT_NAME SESSION_ID LOCKED_MODE
----------- ---------- -----------
DEPT                59           5
DEPT                53           2

Then I rollback my transaction

20:56:59 SQL2> rollback;

My lock was released and I've only the one from Session 1 (SID=53):

OBJECT_NAME SESSION_ID LOCKED_MODE
----------- ---------- -----------
DEPT                53           2

Now comes the funny part. I run exactly the same SSX lock, which was immediately acquired before:

21:14:30 SQL2> lock table dept in share row exclusive mode wait 5 ;

But now it hangs. Let's see the blocking tree with utllockt.sql:

WAITING_SESSION   LOCK_TYPE         MODE_REQUESTED MODE_HELD      LOCK_ID1          LOCK_ID2
----------------- ----------------- -------------- -------------- ----------------- -----------------
53                None
   59             DML               Exclusive      Row-S (SS)     94228             0

What? My session wants to acquire an Exclusive lock? I've never requested that.
And finally it fails because my Session 1 has a RS lock which prevents exclusive locks.

lock table dept in share row exclusive mode wait 5
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

 

Third session

But there is worse. I disconnect my second session and try to do the same from a third one. And I even try only a RS lock:

21:15:20 SQL3> lock table dept in row share mode wait 5 ;

And I'm blocked again. It seems that because a session had acquired a SRX lock and has rolled back, while another session held a RS one, then any new transaction that wants to acquire any lock must acquire temporarily an exclusive one before.

 

Event 10704 "Print out information about what enqueues are being obtained"

In order to get further I traced the locks with event 10704 (see how in a previous blog).

Here is the trace when it failed, filtering on '00017014' which is the object_id of DEPT in hexadecimal:

ksqgtl *** TM-00017014-00000000-00000000-00000000 mode=2 flags=0x400 timeout=5 ***
ksqrcl: TM-00017014-00000000-00000000-00000000
ksqgtl *** TM-00017014-00000000-00000000-00000000 mode=6 flags=0x400 timeout=5 ***
ksqcmi: TM-00017014-00000000-00000000-00000000 mode=6 timeout=5
ksqcmi: returns 51
ksqgtl: RETURNS 51
ksqrcl: returns 0

My 'lock table dept in row share mode' acquires a mode 2 (which is the 'row share' mode) but then released it and tried to acquire a mode 6 (which is exclusive)

And then here is a trace when it was successful, after the transaction in Session 1 has been committed:

ksqgtl *** TM-00017014-00000000-00000000-00000000 mode=2 flags=0x400 timeout=5 ***
ksqrcl: TM-00017014-00000000-00000000-00000000
ksqgtl *** TM-00017014-00000000-00000000-00000000 mode=6 flags=0x400 timeout=5 ***
ksqcmi: TM-00017014-00000000-00000000-00000000 mode=6 timeout=5
ksqcmi: returns 0
ksqgtl: RETURNS 0
ksqgtl *** TM-00017014-00000001-00000000-00000000 mode=3 flags=0x400 timeout=5 ***
ksqgtl: RETURNS 0
ksqcnv: TM-00017014-00000000-00000000-00000000 mode=2 timeout=5
ksqcmi: TM-00017014-00000000-00000000-00000000 mode=2 timeout=5
ksqcmi: returns 0
ksqrcl: TM-00017014-00000001-00000000-00000000
ksqrcl: returns 0

So it it did the same, but that time the mode 6 can be acquired. Then we see a conversion to mode 2 which is the RS we wanted.

Finally I tried with all other combinations of locks, but it seems that only that one (RS then rolled back SRX) show that behavior. I tried also with DML instead of 'lock table' statement but the Share lock acquired by DML (the non-indexed foreign key case) is released immediately so I cannot rollback it. And a failure in the statement do not trigger the same behaviour.

Final note: the only reference I've find for that behavior is this post on oracle-l

Documentum Multiple ADTS: Ratio of rendition creations between instances (Part 1)

Fri, 2014-10-24 04:34

I recently realized there are very few information about multiple ADTS instances installation and management. And we have a customer which has two ADTS running for one docbase. I therefore decided to write a couple of blog posts on "Multiple ADTS", since it is likely to be helpful for someone in the same situation.

 

Rendition processing & ADTS instance

This first blog post will tell you how to find out which rendition has been processed by which ADTS instance. Hence, we will be able to calculate what the ratio of processing between two (or more) servers is. The blog post will be split in two parts as I know two ways of getting the same result.

The best architecture is to have 2 dedicated servers for the two different ADTS instances. Let assume we have SERVER1 and SERVER2. Each one is hosting an instance of ADTS with the same version and patch level. We will see in another blog how to configure both servers to allow a kind of load balancing.

You have to find some audit or log in the docbase to figure out when a rendition was requested. When a rendition is requested, it is posted in the dmi_queue_items queue, but you may already be aware of this. If not, you can take a short look at my other blog post (http://www.dbi-services.com/index.php/blog/entry/documentum-useful-queries-dqliapi) which presents some DQLS to find what is pending in the rendition queue.

But unfortunately, when the rendition is done, the queued item is deleted and all your hope of getting some information vanishes with it. We can get a little hint if the rendition failed as it is kept in this queue, but with the attribute event set to DTS (go to my other post for more info). But still, it doesn't show which server failed it.

 

CTS Report

In this part I will talk about the CTS report method. In the next part of this blog post, I will introduce the dmr_content method.

If you heard about Digital Asset Manager you may be on the way. It is an EMC tool which can be used to generate reports of ADTS activities. If you can get it, just use it, as it is exactly what you need. But in this post, I will show you how to do without. So, just like Digital Asset Manager, you will have to enable a specific option in the DA which is not enabled by default. It is some kind of audit of ADTS activities and it will populate a table with rendition information and, as you may guess, with information of which server processed each rendition.

Login to your DA as installation owner and navigate to Content Transformation Services and then to CTS Reporting Configuration. You should see the following form:

MA-RR-01_config_report.PNG

Now check Reporting Configuration to "On", select Archiving Configuration mode and put 7 for Archiving Interval. You just configured your repository to write an entry each time a rendition is done by whichever server with a seven days period before archiving. It means that each time a rendition is done, an entry will be stored in dm_cts_response table and after seven days (starting now), all entries will be stored in dmc_cts_request_archiving. With DQL, you can look at these tables which should be empty because no rendition was done yet.

 

select * from dm_cts_response;
select * from dmc_cts_request_archiving;

 

After a period of time, let's say 5 days, some entries have been added to the first table. Only if you have rendition activities of course. And you can check which server is involved in the process thanks to the following query:

 

select cts_server_name, trans_completed_time from dm_cts_response order by trans_completed_time desc;

MA-RR-01_config_report2.PNG

Conclusion

These tables are used by the EMC tool Digital Asset Manager and if you know a little DQL, you can build some stats without buying the tool. However there is a drawback for this method: In fact, you enable some kind of audit trace which will populate 2 tables in the repository. Hence, if you have a lot of rendering processes, the tables could grow fast and it will need space. In addition, you will not be able to trace older renditions as everything is stored only since you activated the report.

But it's quite useful as you can get the rendition ratio in the following way:

 

select cts_server_name, count(*) as val from dm_cts_response group by cts_server_name;

From 8.0 extended rowid to 12c pluggable db: Why Oracle Database is still a great software

Fri, 2014-10-24 03:31

Do you know why Oracle Database is a leader in the database market since 30 years? Do you know any other software that is still the leading solution after decades? It think that it is because Oracle has been a good software from the get-go. Many early decisions in the software architecture have revealed themselves later to be the right decision. Several decisions, such as the C language that made it portable to all platforms that were relevant during those 30 years, or the parallel server that has brought RAC when standalone servers were not able to scale anymore. Here, I will illustrate a decision made 15 years ago that has made the whole 12c pluggable databases stuff possible.

 

Oracle 7 rowid

The ROWID is the physical address of a row in the database. Physically, a database is just a set of files where data is stored as rows in blocks. So, at the time of Oracle 7, the ROWID was just:

  • the file number
  • the block number in that file
  • the row number in that block
That was printed in hexadecimal when we selected the ROWID pseudo-column:  

CaptureOracle7Rowid.PNG

 

Here my rows are numeroted from 0 to 3 and are all in the file 2 at block offset 58 (0x3a)

At that time, the maximum number of datafiles in a database was 1022. That was enough until the datawarehouse days brought the need for Very Large DataBases.

 

Relative file number

In Oracle 8 they wanted to increase the possible number of datafile without changing the rowid (which would have involved updating all blocks during an upgrade). And here is how they did:

1. they introduced the 'relative file number'. The file identification part is not unique for the database but only for the tablespace. That means that you can have 1022 datafiles per tablespace instead of 1022 datafiles per database.

2. they kept the relative_fno equal to the absolute file_id for the first 1022 datafiles, so that an Oracle 7 datafile is still compatible: the number that was the absolute file_id being now the relative_fno.

3. when going over the 1022 datafiles per database, the file_id can go beyond (it is not stored in the rowid) but the relative_fno just recycles to 1.

The consequence is that a rowid is not unique anymore. When we have to look for a row by its rowid, we must know on which tablespace it is, because the rowid identification is related to the tablespace.

No problem. Where are the rowid used? In chained rows, to get the other piece. A table cannot store parts of its row in different tablespace, so we know that if we are in tablespace USERS and have a pointer to a chained row, it has to be in the USER tablespace as well. They are used in indexes as well. And that's the same: if we unique scan the index EMP_PK to get the rowid of the EMP row, we know that it is in the USERS tablespace because we know that EMP table is in the USERS tablespace.

However, a new feature appeared at that time: a table can be partitioned. We cannot say anymore that EMP rows are in USERS tablespace because EMP may be partitioned over several tablespaces. That does not concern chained rows. That does not concern non-partitioned tables. And with partitioned tables, that does not concern local indexes because local index partitions always address rows in one table partition.

This is where Oracle has introduced an extended rowid, for global indexes on partitioned tables, which is larger and is able to give the tablespace information in addition to the relative file number. It is called extended as opposite to 'restricted' rowid which is restricted to cases where we know the tablespace.

They could have choosen to store only the tablespace number. But they choose to store the object id instead, which - from the dictionary - can be used to know the tablespace. People were puzzled by that choice at that time, but it has been revealed later as a great software architecture decision because this is what allows us to have pluggable databases 15 years later.

But before that, I have to introduce another 8.0 change, which is the data object id.

 

Data object id

In Oracle 7, there is only one object_id. Because there is a one-to-one relation between the logical object (table) and the physical object (segment). That has changed with the introduction of partitioning, where one table may have several partitions.

Partitions are also logical objects, and each of them has a unique object_id. But once again, the software architects made a very good decision by separating the identification of the physical object: they introduced the data_object_id. When you create a table, the data_object_id of the segment is equal to the object_id of the table. But that can change.

For example, what happens when you truncate a table? You just reset the high water mark without going to all the blocks. And you can insert new rows that may go into the same place. But how are you sure that concurrent users will not mix old blocks with new ones ? Remember that the reader do not lock anything, so they didn't notice your truncate. The truncate changes the data object_id as if it were a new segment, but still related to the same logical table.

And with partitioning you can exchange partition: the logical object_id changes but the physical data_object_id is still the same because it is still the same segment.

It's not always easy to know if we are dealing with an object_id or a data_object_id. Basically, things that are logical (for example lock table or lock partition) show the object_id and they are - depending on the place - called OBJECT_ID, OBJ# or OBJN. When it is related to the physical segment (blocks in buffer cache, block corruption) we see DATA_OBJECT_ID, DATAOBJ# or OBJD or even OBJ or OBJECT_NUMBER.

When I'm not sure about what to use in my scripts, I test it on a table that has been truncated at least once, so that both values are different. I advise you to always test your scripts on a table that has been truncated and on a tablespace that has been transported.

 

Oracle 8 extended rowid

So I said that the extended rowid does not store the tablespace id. It stores the data_object_id, which is associated with one and only one tablespace. The format of the extended rowid is:

  • the data object id (which gives the tablespace from the data dictionary)
  • the relative file number (relative to the tablespace)
  • the block number in that file
  • the row number in that block

and it is needed only when we don't navigate from an object which can be used to define exactly which tablespace can contain the segment.

Now let's understand why the Oracle software architects have decided to store the data_object_id instead of the tablespace number. By doing that, they made the tablespaces physically independant of the database.

 

Pluggable tablespaces

Yes, I know it's called transportable tablespaces. But it was the premise of pluggable database. Anyway, pluggable databases are just transportable tablespaces that include the system tablespace (which contain the metadata for the other tablespaces).

You can transport a tablespace from another database and plug it in you database. Of course, the absolute file number will change as it is assigned by your database. The tablespace number will change. But the relative file numbers - relative to the tablespace - do not have to change. 

And of course the object_id will change: a new one will be used when importing the metadata. But the data_object_id do not have to change. The reason is that data_object_id is not expected to be unique in the database. It must be unique only whithin the object (two partitions of the same table cannot have the same data_object_id).

 

Oracle 8 extended rowid was designed for pluggable databases

And this is where those early decisions have all their meaning. You can plug a tablespace and the rowid of the rows in that tablespace do not have to be updated. This is what makes it a quick operation because only the dictionary and the file headers have to be updated. The time depends on the number of objects, but not on the volume of data. The agility brought by pluggable databases in 2013 were actually designed in 1997.

 

rowid is not unique

So I said that rowid is not unique? Let's prove it. I create a table DEMO2 in tablespace DEMO1, export that tablespace, rename the table to DEMO2 to DEMO1, import that tablespace as DEMO2, so that I have now two tables DEMO1 and DEMO2 respectively in tablespaces DEMO1 and DEMO1.

Here is my table:

 

SQL> select dname,rowid
  2   ,dbms_rowid.rowid_object(rowid) data_object_id
  3   ,dbms_rowid.rowid_relative_fno(rowid) relative_fno
  4   ,dbms_rowid.rowid_block_number(rowid) block_id
  5   ,dbms_rowid.rowid_row_number(rowid) row_number
  6   from DEMO2
  7  /

DNAME          ROWID              DATA_OBJECT_ID RELATIVE_FNO   BLOCK_ID ROW_NUMBER
-------------- ------------------ -------------- ------------ ---------- ----------
ACCOUNTING     AAAX7uAACAAAACEAAA          98030            2        132          0
RESEARCH       AAAX7uAACAAAACEAAB          98030            2        132          1
SALES          AAAX7uAACAAAACEAAC          98030            2        132          2
OPERATIONS     AAAX7uAACAAAACEAAD          98030            2        132          3

 

I export the tablespace with transportable tablespaces:

 

SQL> host expdp '"/ as sysdba"' transport_tablespaces=DEMO1
Export: Release 12.1.0.2.0 - Production on Fri Oct 24 15:33:35 2014
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Starting "SYS"."SYS_EXPORT_TRANSPORTABLE_01":  "/******** AS SYSDBA" transport_tablespaces=DEMO1
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Master table "SYS"."SYS_EXPORT_TRANSPORTABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TRANSPORTABLE_01 is:
  /u02/app/oracle/admin/DEMO/dpdump/expdat.dmp
******************************************************************************
Datafiles required for transportable tablespace DEMO1:
  /tmp/demo1.dbf
Job "SYS"."SYS_EXPORT_TRANSPORTABLE_01" successfully completed at Fri Oct 24 15:34:35 2014 elapsed 0 00:00:59
SQL> host cp '/tmp/demo1.dbf' '/tmp/demo2.dbf'
SQL> alter tablespace DEMO1 read write;
Tablespace altered.

Then import it to DEMO2 tablespace (after renaming my previous table)

 

SQL> rename DEMO2 to DEMO1;
SQL> host impdp '"/ as sysdba"' transport_datafiles='/tmp/demo2.dbf ' remap_tablespace='DEMO1:DEMO2';
Import: Release 12.1.0.2.0 - Production on Fri Oct 24 15:34:35 2014

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Master table "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_TRANSPORTABLE_01":  "/******** AS SYSDBA" transport_datafiles=/tmp/demo2.dbf  remap_tablespace=DEMO1:DEMO2
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Job "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully completed at Fri Oct 24 15:34:49 2014 elapsed 0 00:00:13
SQL> alter tablespace DEMO2 read write;
Tablespace altered.

 

Now I have 2 tables with different object_id but same data_object_id:

 

SQL> select object_name,object_id,data_object_id from user_objects where object_name like 'DEMO_';

OBJECT_NAM  OBJECT_ID DATA_OBJECT_ID
---------- ---------- --------------
DEMO2           98076          98030
DEMO1           98029          98030

 

And 2 segments in different files (file_id) but same relative_fno:

 

SQL> select segment_name,tablespace_name,file_id,relative_fno,block_id from dba_extents where segment_name like 

SEGMENT_NAME    TABLESPACE    FILE_ID RELATIVE_FNO   BLOCK_ID
--------------- ---------- ---------- ------------ ----------
DEMO1           DEMO1               2            2        128
DEMO2           DEMO2               4            2        128

 

I update the rows so that I be sure to select on different tables

 

SQL> update DEMO1 set dname=upper(dname);
4 rows updated.
SQL> update DEMO2 set dname=lower(dname);
4 rows updated.

 

And now showing the ROWID from both tables:

 

SQL> select dname,rowid
  2   ,dbms_rowid.rowid_object(rowid) data_object_id
  3   ,dbms_rowid.rowid_relative_fno(rowid) relative_fno
  4   ,dbms_rowid.rowid_block_number(rowid) block_id
  5   ,dbms_rowid.rowid_row_number(rowid) row_number
  6   from DEMO1
  7  /

DNAME          ROWID              DATA_OBJECT_ID RELATIVE_FNO   BLOCK_ID ROW_NUMBER
-------------- ------------------ -------------- ------------ ---------- ----------
ACCOUNTING     AAAX7uAACAAAACEAAA          98030            2        132          0
RESEARCH       AAAX7uAACAAAACEAAB          98030            2        132          1
SALES          AAAX7uAACAAAACEAAC          98030            2        132          2
OPERATIONS     AAAX7uAACAAAACEAAD          98030            2        132          3

SQL> select dname,rowid
  2   ,dbms_rowid.rowid_object(rowid) data_object_id
  3   ,dbms_rowid.rowid_relative_fno(rowid) relative_fno
  4   ,dbms_rowid.rowid_block_number(rowid) block_id
  5   ,dbms_rowid.rowid_row_number(rowid) row_number
  6   from DEMO2
  7  /

DNAME          ROWID              DATA_OBJECT_ID RELATIVE_FNO   BLOCK_ID ROW_NUMBER
-------------- ------------------ -------------- ------------ ---------- ----------
accounting     AAAX7uAACAAAACEAAA          98030            2        132          0
research       AAAX7uAACAAAACEAAB          98030            2        132          1
sales          AAAX7uAACAAAACEAAC          98030            2        132          2
operations     AAAX7uAACAAAACEAAD          98030            2        132          3

 

Conclusion: I have in my database two different tables with sames rowid because it is a physical copy. Only the data dictionary makes the difference.

Les journées SQL Server 2014 - un event qui devrait vous intéresser!

Fri, 2014-10-24 02:05

Comme vous le savez peut-être déjà, le plus grand événement SQL Server francophone arrive bientôt. Les journées SQL Server 2014, organisées par le GUSS, (Groupement des Utilisateurs SQL Server) sont à Paris les 1 et 2 décembre 2014. dbi services y est aussi - et vous?

 

Les journées SQL Server 2014

JSS_2014.png

Voici le site de l'event: http://guss.pro/2014/12/01/journees-sql-server-2014/

Les 2 jours traitent aussi bien de l'administration, du développement que de la BI SQL Server 2014.

A cette occasion, je vous annonce que dbi services sera présent avec 2 sessions:

 

  • Lundi 01/12/2014, 10h30: Infrastructure et Always-On - par David Barbarin

Une session basée sur des retours d'expérience concernant les architectures SQL Server AlwaysOn et les groupes de disponibilités on-Premise chez différents clients. Les problématiques que nous aborderons sont diverses et variées et concernent aussi bien les phases d'implémentation d'une architecture AlwaysOn et ainsi que l'opérationnel.

 

  • Mardi 02/12/2014, 10h30: Sécurité via policies-  par Stéphane Haby

Fort de mon expérience dans le domaine bancaire (suisse bien sûr), je vous propose une manière simple de contrôler la sécurité via les policies. Quels types de contrôle doit-on mettre en place au niveau de l'instance, au niveau de la base de données? Comment les gérer sur plusieurs serveurs? Avoir de beaux petits rapports ou e-mails en retour…

Essayons de répondre ensemble à toutes ces petites questions que nous avons au niveau des audits de sécurité.

 

Vous retrouvez tout l'agenda des sessions ici. dbi services aura également un stand ou vous pourrez venir échanger avec nos experts SQL Server.

N'hésitez pas à venir pour nous retrouver lors de cet événement francophone important! En plus de sa beauté, Paris n’est pas si loin de la Suisse. Wink

SQL Server failover cluster, VSphere, & SCSI-3 reservation nightmares

Wed, 2014-10-22 22:53

When I have to install a virtualized SQL Server FCI at a customer place as an SQL Server consultant, the virtualized environment usally is ready. I guess this is the same for most part of the database consulting people. Since we therefore lack practice, I have to admit that we do not always know the good configuration settings to apply to the virtualized layer in order to correctly run our SQL Server FCI architecture.

A couple of days ago, I had an interesting case where I had to help my customer to correctly configure the storage layer on VSphere 5.1. First, I would like to thank my customer because I seldom have the opportunity to deal with VMWare (other than via my personal lab).

The story begins with a failover testing that fails randomly on a SQL Server FCI after switching the SQL Server volumes from VMFS to RDM in a physical compatibility mode. We had to switch because the first configuration was installed in CIB configuration (aka Cluster-In-Box configuration). As you certainly know, it does not provide a full additional layer of high availability over VMWare in this case, because all the virtual machines are on the same host. So we decided to move to a CAB configuration (Cluster across Box) scenario that is more reliable than the first configuration.

In the new configuration, a failover triggers a Windows 170 error randomly with this brief description: "The resource is busy". At this point, I suspected that the ISCI-3 reservation was not performed correctly, but the cluster failover validate report didn’t show any errors concerning the SCSI-3 reservation. I then decided to generate the cluster log to see if I could find more information about my problem - and here is what I found:

 

00000fb8.000006e4::2014/10/07-17:50:34.116 INFO [RES] Physical Disk : HardDiskpQueryDiskFromStm: ClusterStmFindDisk returned device='?scsi#disk&ven_dgc&prod_vraid#5&effe51&0&000c00#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}' 00000fb8.000016a8::2014/10/07-17:50:34.116 INFO [RES] Physical Disk : ResHardDiskArbitrateInternal request Not a Space: Uses FastPath 00000fb8.000016a8::2014/10/07-17:50:34.116 INFO [RES] Physical Disk : ResHardDiskArbitrateInternal: Clusdisk driver handle or event handle is NULL. 00000fb8.000016a8::2014/10/07-17:50:34.116 INFO [RES] Physical Disk : HardDiskpQueryDiskFromStm: ClusterStmFindDisk returned device='?scsi#disk&ven_dgc&prod_vraid#5&effe51&0&000d00#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}' 00000fb8.000006e4::2014/10/07-17:50:34.117 INFO [RES] Physical Disk : Arbitrate - Node using PR key a6c936d60001734d 00000fb8.000016a8::2014/10/07-17:50:34.118 INFO [RES] Physical Disk : Arbitrate - Node using PR key a6c936d60001734d 00000fb8.000006e4::2014/10/07-17:50:34.120 INFO [RES] Physical Disk : HardDiskpPRArbitrate: Fast Path arbitration... 00000fb8.000016a8::2014/10/07-17:50:34.121 INFO [RES] Physical Disk : HardDiskpPRArbitrate: Fast Path arbitration... 00000fb8.000016a8::2014/10/07-17:50:34.122 WARN [RES] Physical Disk : PR reserve failed, status 170 00000fb8.000006e4::2014/10/07-17:50:34.122 INFO [RES] Physical Disk : Successful reserve, key a6c936d60001734d 00000fb8.000016a8::2014/10/07-17:50:34.123 ERR   [RES] Physical Disk : HardDiskpPRArbitrate: Error exit, unregistering key... 00000fb8.000016a8::2014/10/07-17:50:34.123 ERR   [RES] Physical Disk : ResHardDiskArbitrateInternal: PR Arbitration for disk Error: 170. 00000fb8.000016a8::2014/10/07-17:50:34.123 ERR   [RES] Physical Disk : OnlineThread: Unable to arbitrate for the disk. Error: 170. 00000fb8.000016a8::2014/10/07-17:50:34.124 ERR   [RES] Physical Disk : OnlineThread: Error 170 bringing resource online. 00000fb8.000016a8::2014/10/07-17:50:34.124 ERR   [RHS] Online for resource AS_LOG failed.

 

An arbitration problem! It looks to be related to my first guess doesn’t it? Unfortunately I didn’t have access to the vmkernel.log to see potential reservation conflicts. After that, and this is certainly the funny part of this story (and probably the trigger for this article), I took a look at the multi-pathing configuration for each RDM disk. The reason for this is that I remembered some of the conversations I had with one of my friends (he will surely recognize himself Smile) in which we talked about ISCSI-3 reservation issues with VMWare.

As a matter of fact, the path selection policy was configured to round robin here. According to the VMWare KB1010041, PSP_RR is not supported with VSphere 5.1 for Windows failover cluster and shared disks. This is however the default configuration when creating RDM disks with EMC VNX storage which is used by my customer. After changing this setting for each shared disk, no problem occurred!

My customer inquired about the difference between VMFS and RDM disks. I don’t presume to be a VMWare expert because I'm not, but I know that database administrators and consultants cannot just ignore anymore how VMWare (or Hyper-V) works.

Fortunately, most of the time there will be virtual administrators with strong skills, but sometimes not and in this case, you may feel alone facing such problem. So the brief answer I gave to the customer was the following: If we wouldn’t use physical mode RDMs and used VMDKs or virtual mode RDMs instead, the SCSI reservation would be translated to a file lock. In CIB configuration, there is no a problem, but not for CAB configuration, as you can imagine. Furthermore, using PSP_RR with older versions than VSphere 5.5 will free the reservation and can cause potential issues like the one described in this article.

Wishing you a happy and problem-free virtualization!

Documentum upgrade project: from D2-Config 3.1 to D2-Config 4.1

Mon, 2014-10-20 22:20

During a migration from D2-Config 3.0 to D2-Config 4.1, we have encountered 2 issues. In order to have a clean installation and the insurance of being able to backup the configuration, we have decided to perform a full config export on D2-Config 3.0 and a full import with a configuration reset to D2-Config 4.1.

 

1. Configuration export

The first issue we experienced was regarding docapps existing in the repository. As you can see in following screenshot, D2-Config crashes very quickly.

 

alt

 

But we were quickly able to find a solution. We just had to unselect the docapps check boxes and wait a little bit.

 

alt

 

2. Configuration update

After the first import we discovered that some conditions in the menu management were not manageable anymore. But they still applied to D2-Client.

Actually, a class name is seen - i. e. “eu.c6.web.conditions.permits.HasPermit” - instead of the real condition: “User has minimum “write” permit on the selection”.

 

alt

 

We have this result because no migration path was anticipated between the 2 releases of D2-Config and some configuration items kept its old class names...

Class names now have to start with “com.emc” instead of “eu.c6”

 

In fact (and this was confirmed by C), we need to change the menu, toolbar, and tree configuration items manually.

Manually here mean to export the full configuration and perform the changes directly inside the zip-archived XML files of the related configuration items. After a quick assessment, we counted around 100 occurrences to be changed in our case. That was too much to be done during an upgrade. So we decided to automatize the process.

We used a tailor-made Linux script to do that:

 

 

$> ./updateConfig.sh <config archive name>

 

 

After execution, the result was a new D2-Config archive, ready to be deployed, with “Update_” as a prefix for the archive name given in parameter.

You can see the content of the script, executed and tested under RHEL 5.10:

 

 

 #!/bin/sh

 #

 # Script to update D2-config from 3.0 objects level to 3.1/4.1

 

 if [[ $# -ne 1 ]] ; then

 echo "ERROR usage : \"$0 \""

 exit 1

 fi

 

 root_dir="work"

 mkdir -p ${root_dir}

 unzip -d ${root_dir} "$1"

 

 for file in `grep -RH -i eu.c6 ${root_dir}/* | cut -d ":" -f 1`

 do

 echo $file

 

 mv $file ${file}_old

 sed -e s/eu.c6/com.emc/g ${file}_old > $file

 rm ${file}_old

 done

 

 cd $root_dir

 zip -r "../UPDATED_$1" *

 cd ..

 chmod 777 "UPDATED_$1"

 

 rm -rf $root_dir

 

 

You can now try to import the updated configuration archive and see the difference.

Be careful, a full config import with configuration reset implies the deletion and recreation of some objects like plublic searches (d2c_query with new r_object_id).

Thanks for reading!

What about ALTER INDEX ... SHRINK SPACE ?

Mon, 2014-10-20 12:37

I have recently published a script to check index fragmentation. But then, do you COALESCE or REBUILD? Well, there also is another option: ALTER INDEX SHRINK SPACE. Let's compare all those index defragmentation operations.

Fragmented index

I have an index created when the table had 1 million rows. Then, I deleted 90% of the rows. Here is the index state from:

  • dbms_space.space_usage
  • index_stats after an analyze index validate structure
  • my index fragmentation checking script with 4 buckets

and here are the results:

 

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0          0       2230    2304

 HEIGHT  BLOCKS    LF_ROWS  LF_BLKS  BR_ROWS  BR_BLKS DEL_LF_ROWS USED_SPACE PCT_USED
------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      3    2304     100404     2226     2225        5         404    1622013       10

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     250280         45         714          91     557 oooo
    250730 ->     500370         45         714          91     557 oooo
    500820 ->     750010         45         714          91     556 oooo
    750460 ->     999660         45         714          91     556 oooo

 

I have 2226 leaf blocks, the index height is 3 with 5 branch blocks. The leaves are only 91.
However dbms_space shows only full blocks: the deleted entries are still there, just marked as deleted, and the blocks are still seen as full.

 

COALESCE

Now let's COALESCE:

SQL> alter index DEMO_N coalesce;

And before checking the same values about the index space I measure the amount of work that has been done by the operation (from v$mystat):

 

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                          22067
db block gets                                                         32818
session logical reads                                                 32886
db block changes                                                      40601
undo change vector size                                            35199264
redo size                                                          47878800

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0       2004          0          0        227    2304


 HEIGHT  BLOCKS    LF_ROWS  LF_BLKS  BR_ROWS  BR_BLKS DEL_LF_ROWS USED_SPACE PCT_USED
------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      3    2304     100000      223      222        5           0    1591530       88


         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     500800        450        7158          11      56
    505310 ->     752020        449        7132          11      56
    756530 ->     998730        443        7038          12      55

 

COALESCE is an online operation that defragments the leaf blocks. We have now only 223 leaf blocks that are 90% full (because my pctfree is the default 10%). But the index stil has the same height and still has 2300 blocks. Where are the reclaimed 2000 blocks? They are available if the index need a new block (for a block split). They are seen as FS2 (at least 25 to 50% free space) by dbms_space because they still contain the deleted rows, but they are fully reclaimable anyway.

 

SHRINK SPACE COMPACT

Back with the same table, and doing a SHRINK SPACE COMPACT instead of COALESCE:

 

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                          28794
db block gets                                                         40940
session logical reads                                                 41527
db block changes                                                      49387
undo change vector size                                            36990460
redo size                                                          51848880

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0       2003        227    2304


 HEIGHT  BLOCKS    LF_ROWS  LF_BLKS  BR_ROWS  BR_BLKS DEL_LF_ROWS USED_SPACE PCT_USED
------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      3    2304     100000      223      222        5           0    1591530       88


         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     500800        450        7158          11      56
    505310 ->     752020        449        7132          11      56
    756530 ->     998730        443        7038          12      55

 

So what is the difference? Slightly more work (about 20% more logical reads and block changes) for the same result. Except that now the reclaimed blocks are in FS4 (75 to 100% free space).

 

SHRINK SPACE

What if we use SHRINK SPACE instead of SHRINK SPACE COMPACT?

 

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                          29352
db block gets                                                         45496
session logical reads                                                 46190
db block changes                                                      50032
undo change vector size                                            36981524
redo size                                                          51901500

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0          0        227     240


 HEIGHT  BLOCKS    LF_ROWS  LF_BLKS  BR_ROWS  BR_BLKS DEL_LF_ROWS USED_SPACE PCT_USED
------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      3     240     100000      223      222        5           0    1591530       88


         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     500800        450        7158          11      56
    505310 ->     752020        449        7132          11      56
    756530 ->     998730        443        7038          12      55

 

With tables, the shrink space lowers the high water mark. Here it is the same idea: in addition to the shrink space the reclaimed blocks are no more allocated to the index, so it can be used for other segments in the tablespace. We see that from dbms_space: the index is now 240 blocks only.

 

REBUILD

The previous requires a lock only for a short duration (according that we did the shrink space compact before). The rebuild needs a Share lock on the table during the whole operation, blocking concurrent DML.

 

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
db block gets                                                           953
redo entries                                                           1832
db block changes                                                       1906
session logical reads                                                  4019
undo change vector size                                                9152
redo size                                                            173732

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0          0        222     256


 HEIGHT  BLOCKS    LF_ROWS  LF_BLKS  BR_ROWS  BR_BLKS DEL_LF_ROWS USED_SPACE PCT_USED
------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      2     256     100000      222      221        1           0    1591520       90


         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     501250        451        7170          11      56
    505760 ->     749300        451        7170          11      55
    753810 ->     997350        448        7117          11      55

 

The result is the same as the previous operation (SHRINK SPACE) except that the index height has decreased. A rebuild is the right operation if the index blevel has become too high. And we did that offline but with much less work. Minimal undo and redo. And small blocks to read (when the index is still usable the rebuild can use the current index to rebuild the new segment).

 

REBUILD ONLINE

Last operation, possible only in Enterprise Edition, is the rebuild online which doesn't need to lock the table.

 

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                            660
db block changes                                                        876
db block gets                                                          1419
session logical reads                                                  4989
undo change vector size                                               24932
redo size                                                            114924

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0          0        222     256


 HEIGHT  BLOCKS    LF_ROWS  LF_BLKS  BR_ROWS  BR_BLKS DEL_LF_ROWS USED_SPACE PCT_USED
------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      2     256     100000      222      221        1           0    1591520       90


         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     501250        451        7170          11      56
    505760 ->     749300        451        7170          11      55
    753810 ->     997350        448        7117          11      55

 

Here we don't see an overhead to do it online. This is because my table is small (my testcase has only one column which is the indexed one). On a real table you will probably see that the online rebuild takes longer than the offline one, because it cannot use the current index. But anyway, the fact that it is online means that the duration is not a big issue.

 

Conclusion

This is an example on a table that had a massive purge: all blocks were touched. In that case the REBUILD is the right solution. However if you are in Standard Edition and cannot do it online, then you will probably do a SHRINK SPACE COMPACT because it lets you do a SHRINK SPACE (need a quick locks but for a very short duration - to do on a period of low activity).
COALESCE will make sense here only if you know you will insert back a lot of rows, so that you don't need to deallocate the blocks from the index.

Now, what to do if the free space to reclaim is only on small part of the index blocks? As in the following case:

 

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->      50468        374        5974          26     112 o
     50917 ->     100756        449        7179          11     112
    101205 ->     151044        449        7179          11     112
    151493 ->     201332        449        7179          11     112
    201781 ->     251620        449        7179          11     112
    252069 ->     301908        449        7179          11     112
    302357 ->     351747        449        7179          11     111
    352196 ->     401586        449        7179          11     111
    402035 ->     451425        449        7179          11     111
    451874 ->     501264        449        7179          11     111
    501713 ->     551103        449        7179          11     111

 

In that case a COALESCE is probably the best because you keep the blocks allocated to the index for future growth. And the few blocks reclaimed will probably not change the index height anyway. However, if you did a small purge that concern only a small part of the index and want to reclaim the space for other segments, then the SHRINK SPACE is the right solution. But do you really need to reclaim space in that case?

Now you see the usage for my index fragmentation script: I don't need only the average free space. I need to have a clear picture of the fragmentation in order to decide what to do and how.

Windows 10: The new features

Fri, 2014-10-17 02:03

First, it should have been Windows 9, but finally the new Microsoft operating system is named Windows 10! In the same way as for my blog post on Windows 8, I have decided to write a few words about the new features of Windows 10.

Documentum upgrade project: Configuring NTLM SSO for D2 3.1 SP1

Thu, 2014-10-16 20:11

The Documentum D2 3.1 SP1 release is kind of a mix between D2 4.1 APIs (using D2FS in backend) and D2 3.1 front-end. It needs SSO to be fully implemented and the configuration has to be applied for 3.1 as well as for D2FS. For D2FS, the same configuration applies whether you are using the NT Lan Manager (NTLM) or Kerberos authentication.

If you want to implement Kerberos Single Sign On instead of NTLM, have a look at this blog post: http://www.dbi-services.com/index.php/blog/entry/kerberos-sso-with-documentum-d2-31-sp1

 

1. NTLM configuration for D2 3.1 SP1

The D2 3.1 documentation explains how to configure NTLM for D2 3.1 SP1.

Referring to the D2 3.1 installation guide, you can see the following:

 

Locate the file « shiro.ini » used by D2 applications and add the following lines:

 

 

[main]

D2-NTLM=eu.c6.d2.web.filters.authc.D2NtlmHttpAuthenticationFilter

D2-NTLM.domainController=<domain controller>

D2-NTLM.domainName=<domain name>

D2-NTLM.domainUser=<domain user to authentify>

D2-NTLM.domainPassword=<user's passwords>

D2-NTLM.docbases=<docbase1,superUser1,password1,domain1|docbase2,...>

 

[urls]

/** = D2-NTLM

 

 

“docbaseX”: corresponds to a docbase using NTLM

“loginX”: corresponds to a supersuser login of “docbaseX”

“passwordX”: corresponds to an encrypted password of the superuser of “docbaseX”.

 

In our case, the file is located in the following path: <Tomcat root>/webapps/D2-Client/WEB-INF/classes/  

At first look, everything is there. However, some clarifications are very welcome.

 

About Active directory connection:

  • <domain controller>: enter the domain controller IP address

  • <domain name>: This is the active directory domain. You must write “InternalDom” for “InternalDom\userloginname” user principal name.

  • <domain user to authentify>: User name for the authentication concerning the domain controller. You must write “userloginname” for “InternalDom\userloginname” user principal name.

 

About Documentum repository connection:

  • <docbaseX>: enter the name of the docbase

  • <superUserX>: enter a user name which is a super user for docbaseX

  • <passwordX>: enter encoded password for related super user name

 

2. NTLM configuration for D2FS

2.1 D2 3.1 SP1

You must be aware - at least since patch 02 for D2 3.1 SP1 - that the way to store the password for the related super user name has changed.

Referring to D2 4.1 installation guide, you can see the following:

 

If d2fs-trust.properties does not exist, create the file in the webapps/D2/WEB-INF/classes/ folder using a text editor. Open d2fs-trust.properties in the folder webapps/D2/WEB-INF/classes/ and add following lines:

 

 

*.user=<administrator user>

*.password=<encoded password>

*.domain=<your domain> [not mandatory]

#or for each repository

<repository>.user=<administrator user>

<repository>.password=<encoded password>

<repository>.domain=<your domain>

 

 

  • Repository corresponds to the repository.
  • User and password are the username and password of an inline Super User account in the
  • repository.
  • Domain means the domain of the repository and can be left out for inline accounts.

 

In our case, the file is located in the following path: <Tomcat root>/webapps/D2-Client/WEB-INF/classes/

Everything is there. However, again, some clarifications are welcome:

  • <repository>: replace it with the name of the docbase

  • <administrator user>: enter a user name which is a super user for docbaseX.

  • <encoded password>: enter the password for the related super user name.

  • <domain>: Active directory domain You must write “InternalDom” for “InternalDom\userloginname” user principal name.

 

2.2 D2 3.1 SP1 P02

Since this release of D2, you must store the <encoded password> in the D2 lockbox.

Make sure you have installed the lockbox functionality properl, and that it is already working between D2 and its Java Method Server.

Then you can remove all lines related to passwords in the d2fs-trust.properties files:

 

 

*.user=<administrator user>

*.domain=<your domain> [not mandatory]

#or for each repository

<repository>.user=<administrator user>

<repository>.domain=<your domain>

 

 

Then, you can execute the following command:


 

java -classpath "<Tomcat root>\webapps\D2-Client\WEB-INF\lib\*" com.emc.common.java.crypto.SetLockboxProperty <D2 lockbox path>D2FS-trust.<Repository name>.password <user password>

 

 

Where:

  • <Tomcat root>: Root path of the tomcat instance

  • <D2 lockbox path>: Folder path where the D2.lockbox is stored

  • <Repository name>: Name of the repository

  • <user password>: Clear password of super user name setup in d2fs-trust.properties file

 

Make sure “D2FS” is in uppercase.

 

3. Working Example with lockbox

We will now see a few examples of working configurations. Obviously, this setup may not be the only one working to achieve the goal for Single Sign On authentication. You are also certainly able to identify where some adaptation can be performed.

 

Suppose we have following environment (with the following information):

MS domain controller address : “10.0.0.1”

MS domain name: “InternalDomain”

MS domain user principal name: “InternalDomain\DomUser”

MS domain user password: “DomPasswd”

 

Tomcat root: “C:\Tomcat”

Lockbox file location: “C:\Lockbox\d2.lockbox”

 

First repository name: “DCTMRepo1”

Second repository name: “DCTMRepo2”

 

Ensure that you have stopped all D2-Client application instances on the application server, as well as for the D2-Config.

 

3.1 Inline super user creation

The user you are going to create must have the following attributes:

 

 

- State: Active

- Name: SSOAdmin

- Login Name: SSOAdmin

- Login Domain: InternalDomain

- Password: RepoPasswd

- User Source: Inline Password

- Privileges: Superuser

- Extended Privileges: None

- Client Capability: Consumer

 

 

Create a user for all repositories. In this example, we will see it as given that the same has be done for both repositories.

 

3.2 Shiro.ini file content

First, we must encode the password of the MS domain user name and the SSOAdmin:

 

 

java -classpath "C:\Tomcat\webapps\D2-Client\WEB-INF\lib\*" com.emc.d2.api.utils.GetCryptedPassword DomPasswd

UCmaB39fRLM6gRj/Gy3MJA==

 

java -classpath "C:\Tomcat\webapps\D2-Client\WEB-INF\lib\*" com.emc.d2.api.utils.GetCryptedPassword RepoPasswd

8RLQerkftOBCedjQNEz57Q==

 

 

Then, we can fill in the file:

 

 

D2-NTLM=eu.c6.d2.web.filters.authc.D2NtlmHttpAuthenticationFilter

D2-NTLM.domainController=10.0.0.1

D2-NTLM.domainName=InternalDomain

D2-NTLM.domainUser=DomUser

D2-NTLM.domainPassword=UCmaB39fRLM6gRj/Gy3MJA==

D2- NTLM.docbases=DCTMRepo1,SSOAdmin,8RLQerkftOBCedjQNEz57Q==,InternalDomain|DCTMRepo2,SSOAdmin,8RLQerkftOBCedjQNEz57Q==,InternalDomain

 

[urls]

/** = D2-NTLM

 

 

3.3 d2fs-trust.properties file content


 

DCTMRepo1.user=SSOAdmin

DCTMRepo1.domain=InternalDomain

DCTMRepo2.user=SSOAdmin

DCTMRepo2.domain=InternalDomain

 


3.4 D2.lockbox password store


 

java -classpath "C:\Tomcat\webapps\D2-Client\WEB-INF\lib\*" com.emc.common.java.crypto.SetLockboxProperty C:\LockboxD2FS-trust.DCTMRepo1.password RepoPasswd

 

java -classpath "C:\Tomcat\webapps\D2-Client\WEB-INF\lib\*" com.emc.common.java.crypto.SetLockboxProperty C:\LockboxD2FS-trust.DCTMRepo2.password RepoPasswd

 

 

That's it. Restart the D2-Client application and test it.

Thanks for reading!

Oracle Cloud Control 12c: removing an agent is much easier in OEM 12.1.0.4

Thu, 2014-10-16 19:54

To remove an agent in the previous Oracle OEM Cloud Control 12c versions, you first had to delete its targets, stop the agent, and remove the host target. Only then were you able to remove the agent. In version 12.1.0.4, the decommission agent feature has been greatly improved.

From the agent menu, you select the new feature Agent Decommission:

 

ag1

 

The agent must be in stopped status to be decommissioned:

 

ag2

ag3

 

It shows the targets which will be removed.

 

ag4

 

In a very short time, the targets are delete, and the agent is removed.

The agent decommissioning feature is just a little new feature, but this will greatly simplify the agent emoval procedure :=)

Oracle IOT: when to use Index Organized Tables

Thu, 2014-10-16 12:17

When can we use IOT? That's a question I had when giving recently the Oracle DBA essentials workshop. the DBA Essential is very dense and there is only half a day about performance. We have the Oracle Performance Tuning workshop to go into those details. But IOTs are under used in my opinion, so I'll post a use case for them where they are a good idea.

Basically:

  • Index access is very fast to get all indexed columns, but can be longer when having to get the non-indexed columns
  • IOT are fast to retreive all columns (except overflow) by primary key
  • IOT are long when retreiving by secondary index when we get columns that are not in the index and that are not the primary key
Here is a good use case: you have a many-to-many relationship that you implement with an association table   This is your UML diagram: CaptureIOT1.PNG (grantOption is a property of the association. It's a boolean that tells if the user can grant the granted privilege to another user)   And this is how you implement it in a relational database: CaptureIOT2.PNG So you have an association table that has a primary key which is composed of the two foreign keys, and an additional column for the association property.   And you have an index on USER_PRIVS(USER_ID,PRIV_ID) to enable the primary key. And you need to have all the foreign key indexed so you have to add an index on USER_PRIVS(PRIV_ID).   Now, for performance reasons you will probably want to add all the columns to both indexes, so you have finally those two indexes:
  • USER_PRIVS(USER_ID,PRIV_ID,GRANT_OPTION)

  • USER_PRIVS(PRIV_ID,USER_ID,GRANT_OPTION)

If you have a doubt about that modeling (such as why I don't introduce a surrogate key here) please tell me, we can discuss that. But I can guarantee that this is the right approach at least in Oracle.   So when you navigate from any of the tables, you have to access only to the index. No need for the table. So why store a table ? This is where IOT comes. Let's create that as an IOT:   CREATE TABLE USER_PRIVS ( USER_ID,PRIV_ID,GRANT_OPTION , PRIMARY KEY(USER_ID,PRIV_ID) ) ORGANISATION INDEX;   and a secondary index:   CREATE INDEX PRIV_USERS (PRIV_ID,USER_ID,GRANT_OPTION)  

So, if you navigate from USERS you access directly to the index leaf that has all information.

And if you navigate from PRIVILEGES, you have also all information. Someone said that access via secondary index is long? Yes but that's only when we have to get to the other columns because the secondary index don't have a rowid. It stores the primary key instead and must go through the primary index to get the other columns. But there are two point in my example:

  • I already have all the primary key columns in my secondary index, and Oracle is clever enough to not store them in double
  • I don't have to get to the primary index because I have evrey columns in my secondary index
Those are the indexes I would have anyway, so here I avoid to store a table: less storage, cheaper inserts and deletes.   In general, the use cases for IOT are:
  • we access mainly through the primary key and accept more expensive queries when accessing otherwise. Note that the secondary index can be optimized is we ALTER INDEX ... UPDATE BLOCK REFERENCES regularly
  • the secondary indexes have all the columns needed for our query
The association table is in the second category and is a good occasion to implement IOT. Note that the IOT and index defined above are used by Oracle to avoid forign key table locks as both starts with one of the foreign key columns.   I will illustrate the costs of access to IOT in a future blog post. If you want to learn more about indexes, I'll never stop to advise you to read http://use-the-index-luke.com/

SQL Server: Forced failover with AlwaysOn & Availability Groups

Wed, 2014-10-15 20:38

During a workshop at one of my customer about the implementation of SQL Server AlwaysOn and availability groups, I had an interesting discussion. It was about asynchronous replication between 2 replicas and the effect of the forced failover operation with allowed data loss. Does SQL Server resynchronize availability databases if I loose some transactions during the failover process?

The quick answer is yes but probably the most interesting question is how SQL Server achieves resynchronization and this is exactly what we will see in this blog post.

Simulate a data loss is pretty easy if you have a SQL Server AlwaysOn architecture with availability groups and 2 replicas enrolled in an asynchronous replication. You can pause the replication on secondary and then perform a manual forced failover from the same server for instance. During this blog post I will use the undocumented function sys.fn_db_log() and some dynamic management views related to availability groups as well.

Let's begin with my SQL Server architecture: One availability group with the adventureworks2012 database that hosts a table dbo.aag_test and 2 replicas SQL141 and SQL143 configured in asynchronous replication. SQL141 is the current primary.

 

Step 1: Create a replication issue by pausing the availability database on the secondary

After suspending the replication from the secondary, we insert an additional record in a table aag_test on the primary (SQL141) which will be not replicated to the secondary (SQL143). Notice that we are not aligned on both sides by viewing the content of the sys.fn_db_log() function on each transaction log.

 

:connect sql141 use AdventureWorks2012;   select * from dbo.aag_test;   select * from sys.fn_dblog(null, null); go     :connect sql143 use AdventureWorks2012;   select * from dbo.aag_test;   select * from sys.fn_dblog(null, null); go

 

Here is the content of the AdventureWorks2012 transaction log file on the primary:

 

00000104:00003a88:0001     LOP_MODIFY_ROW      LCX_BOOT_PAGE 0000:00000000     00000104:00003a90:0001     LOP_MODIFY_ROW      LCX_SCHEMA_VERSION  0000:00000000 00000104:00003a90:0002     LOP_BEGIN_XACT      LCX_NULL     0000:000343da 00000104:00003a90:0003     LOP_BEGIN_XACT      LCX_NULL     0000:000343db 00000104:00003a90:0004     LOP_FORMAT_PAGE     LCX_HEAP     0000:000343db 00000104:00003a90:0005     LOP_MODIFY_ROW      LCX_PFS      0000:000343db 00000104:00003a90:0006     LOP_MODIFY_ROW      LCX_IAM      0000:000343db 00000104:00003a90:0007     LOP_HOBT_DELTA      LCX_NULL     0000:000343db 00000104:00003a90:0008     LOP_FORMAT_PAGE     LCX_HEAP     0000:000343db 00000104:00003a90:0009     LOP_COMMIT_XACT     LCX_NULL     0000:000343db 00000104:00003a90:000a     LOP_INSERT_ROWS     LCX_HEAP     0000:000343da 00000104:00003a90:000b     LOP_SET_FREE_SPACE  LCX_PFS      0000:00000000 00000104:00003a90:000c     LOP_COMMIT_XACT     LCX_NULL     0000:000343da

 

And this is the content of the same database transaction log file on the secondary:

 

00000104:00003a70:0018     LOP_INSERT_ROWS     LCX_HEAP     0000:000343d7 00000104:00003a70:0019     LOP_SET_FREE_SPACE  LCX_PFS      0000:00000000 00000104:00003a70:001a     LOP_COMMIT_XACT     LCX_NULL     0000:000343d7 00000104:00003a88:0001     LOP_MODIFY_ROW      LCX_BOOT_PAGE 0000:00000000

 

On the secondary, the last record in the transaction log file is identified by the LSN: 260:14984:1 (in decimal format). We notice additional records in the transaction log file on the primary after pausing the replication - we can identify among them LOP_INSERT_ROWS / LCX_HEAP records that correspond in fact to the insertion of additional records in the table aag_test.

We can also take a look at the DMV related to the availability groups by using this query:

 

SELECT        ar.replica_server_name as ServerName,        drs.synchronization_state_desc as SyncState,        drs.last_hardened_lsn,        drs.last_redone_lsn FROM sys.dm_hadr_database_replica_states drs        LEFT JOIN sys.availability_replicas ar              ON drs.replica_id = ar.replica_id ORDER BY ServerName


The query's output below:

 

ServerName   SyncState           last_hardened_lsn   last_redone_lsn SQL141       SYNCHRONIZED        260000001501600001  NULL SQL143       NOT SYNCHRONIZING   260000001499200001  260000001498400001 -- (260:14984:1)

 

The column last_redone_lsn for the secondary (SQL143) represents the last log record that has been redone. The value (convert in decimal format) is the same previously found in the transaction log by using the sys.fn_db_log() function. The last_hardened_lsn value represents the last transaction blog hardened to the transaction log file. This value is usually greater than the last_redone_lsn value.

 

Step 2: SQL143 becomes the new primary - after restarting SQL141 and initiating a manuel forced failover with potential data loss

At this point we have some lost data that have not been replicated (LSN between 00000104:00003a88:0001 and 00000104:00003a90:000c).

 

00000104:00003a90:0001     LOP_MODIFY_ROW      LCX_SCHEMA_VERSION  0000:00000000 00000104:00003a90:0002     LOP_BEGIN_XACT      LCX_NULL     0000:000343da 00000104:00003a90:0003     LOP_BEGIN_XACT      LCX_NULL     0000:000343db 00000104:00003a90:0004     LOP_FORMAT_PAGE     LCX_HEAP     0000:000343db 00000104:00003a90:0005     LOP_MODIFY_ROW      LCX_PFS      0000:000343db 00000104:00003a90:0006     LOP_MODIFY_ROW      LCX_IAM      0000:000343db 00000104:00003a90:0007     LOP_HOBT_DELTA      LCX_NULL     0000:000343db 00000104:00003a90:0008     LOP_FORMAT_PAGE     LCX_HEAP     0000:000343db 00000104:00003a90:0009     LOP_COMMIT_XACT     LCX_NULL     0000:000343db 00000104:00003a90:000a     LOP_INSERT_ROWS     LCX_HEAP     0000:000343da 00000104:00003a90:000b     LOP_SET_FREE_SPACE  LCX_PFS      0000:00000000 00000104:00003a90:000c

 

The new status of each replica is as following:

 

ServerName   SyncState           last_hardened_lsn   last_redone_lsn SQL141       NOT SYNCHRONIZING   0                   0 SQL143       NOT SYNCHRONIZING   260000001502400001  NULL

 

Some additional records are inserted to the transaction log but they are not related directly to the replicated data lost previously from the old primary SQL141.

 

Step 3: Replication from the new primary SQL143 is resumed

Here the new status of the replication of each replica:

  ServerName   SyncState           last_hardened_lsn   last_redone_lsn SQL141       NOT SYNCHRONIZING   260000001499200001  249000007658400001 SQL143       SYNCHRONIZED        26000001502400001  NULL

 

We can confirm now that we have lost the data remained in the send queue list on the old primary SQL141. The last hardened lsn is not the same between the initial situation and the new situation (260000001501600001 vs 260000001499200001). Remember that the hardened lsn (transaction log block) 260000001499200001 corresponds to the last hardened lsn on the old secondary SQL143 before the synchronization issue.

 

Step 4: Last step - resuming the database movement on the new secondary SQL141 in order to resynchronize the both replicas

At this point SQL Server has resynchronized perfectly the both replicas. Take a look at the output of the sys.fn_db_log() on each side. We retrieve effectively the same pattern on both sides.

Content of the transaction log on the new secondary SQL141:

 

00000104:00003a88:0001     LOP_MODIFY_ROW      LCX_BOOT_PAGE 0000:00000000 00000104:00003a90:0001     LOP_MODIFY_ROW      LCX_BOOT_PAGE 0000:00000000 00000104:00003a98:0001     LOP_COUNT_DELTA     LCX_CLUSTERED 0000:00000000 00000104:00003a98:0002     LOP_COUNT_DELTA     LCX_CLUSTERED 0000:00000000 00000104:00003a98:0003     LOP_COUNT_DELTA     LCX_CLUSTERED 0000:00000000 00000104:00003a98:0004     LOP_COUNT_DELTA     LCX_CLUSTERED 0000:00000000 00000104:00003a98:0005     LOP_BEGIN_CKPT      LCX_NULL     0000:00000000 00000104:00003aa0:0001     LOP_XACT_CKPT LCX_BOOT_PAGE_CKPT  0000:00000000 00000104:00003aa8:0001     LOP_END_CKPT LCX_NULL     0000:00000000

 

Content of the transaction log on the new primary SQL143:

 

00000104:00003a88:0001     LOP_MODIFY_ROW      LCX_BOOT_PAGE 0000:00000000 00000104:00003a90:0001     LOP_MODIFY_ROW      LCX_BOOT_PAGE 0000:00000000 00000104:00003a98:0001     LOP_COUNT_DELTA     LCX_CLUSTERED 0000:00000000 00000104:00003a98:0002     LOP_COUNT_DELTA     LCX_CLUSTERED 0000:00000000 00000104:00003a98:0003     LOP_COUNT_DELTA     LCX_CLUSTERED 0000:00000000 00000104:00003a98:0004     LOP_COUNT_DELTA     LCX_CLUSTERED 0000:00000000 00000104:00003a98:0005     LOP_BEGIN_CKPT      LCX_NULL     0000:00000000 00000104:00003aa0:0001     LOP_XACT_CKPT LCX_BOOT_PAGE_CKPT  0000:00000000 00000104:00003aa8:0001     LOP_END_CKPT LCX_NULL     0000:00000000

 

Compared to the first output in the transaction log of the old primary SQL141 we definitively lost the entries identified previously at the top of this article.

Wonderful but how SQL Server has achieved this process? The answer is in the SQL Server error log of the new secondary SQL141:

  • First, we notice the recovery LSN (transaction log block) identified by SQL Server for the AdventureWorks2012 database here (260:14992:1). The same value was founded previously by looking at the availability groups DMV and the column last_hardened_lsn. This is the starting point for the resynchronizing process
SQL141 error log 2014-10-07 22:40:57.120    spid17s      The recovery LSN (260:14992:1) was identified for the database with ID 5. This is an informational message only. No user action is required.

 

  • Then we resumed manually the replication and we retrieve the event in the error log.

 

2014-10-07 22:40:57.130    spid17s      AlwaysOn Availability Groups data movement for database 'AdventureWorks2012' has been suspended for the following reason: "failover from partner" (Source ID 1; Source string: 'SUSPEND_FROM_PARTNER'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books On   …   2014-10-07 22:43:33.950    spid57 ALTER DB param option: RESUME 2014-10-07 22:43:33.950    spid57 AlwaysOn Availability Groups data movement for database 'AdventureWorks2012' has been resumed. This is an informational message only. No user action is required. 2014-10-07 22:43:33.960    spid17s      AlwaysOn Availability Groups connection with primary database established for secondary database 'AdventureWorks2012' on the availability replica 'SQL143' with Replica ID: {546df6b8-d07d-4aa4-afb6-5f543d5fae98}. This is an informational message only. No user action is required.

 

  • The new secondary automatically restarted to resynchronize with the current primary. The commit LSN here is probably in the transaction log block (14992) that is lower than the transaction log block related to the Hardened Lsn (15016).

 

2014-10-07 22:43:34.060    spid17s      Availability database 'AdventureWorks2012', which is in the secondary role, is being restarted to resynchronize with the current primary database. This is an informational message only. No user action is required. 2014-10-07 22:43:34.060    spid43s      Nonqualified transactions are being rolled back in database AdventureWorks2012 for an AlwaysOn Availability Groups state change. Estimated rollback completion: 100%. This is an informational message only. No user action is required. 2014-10-07 22:43:34.070    spid43s      State information for database 'AdventureWorks2012' - Hardended Lsn: '(260:15016:1)'   Commit LSN: '(260:14992:12)'   Commit Time: 'Oct 7 2014 10:21PM'

 

  • And finally the most important information in this error log. SQL Server reverted to the transaction log block related to the recovery LSN (260:14992:1) and reinitialize to the last log record redone by the old secondary SQL143.

 

2014-10-07 22:43:34.810    spid43s      Using the recovery LSN (260:14992:1) stored in the metadata for the database with ID 5. This is an informational message only. No user action is required.

This is exactly the picture we have by looking at the last output of the sys.fn_db_log() function above that starting from the LSN 00000104:00003a90:0001 (260:14992:1 in decimal format) that corresponds to the begin of the transaction block to the LSN 00000104:00003aa8:0001 (104:15016:1 in decimal format).

As you can see SQL Server uses a sophisticated mechanism that allows to recover to a situation before data lost. You probably also notice that the key point of this revert process is the transaction log block and not the transaction itself.

Enjoy!

Documentum upgrade project: Wrong icons in D2 search results with xPlore

Wed, 2014-10-15 20:11

After migrating from "Fast" to "xPlore" Documentum full-text search engine, our team discovered wrong icons in D2 search results. Documents in checked-out state continued to be displayed with the normal icon, even if xPlore indexed attribute value for “r_lock_owner”. This attribute contains the user name who has checked-out the document.

In the following picture, you can see the search result with the wrong icon, whereas in the “Version” window, the checked-out icon appears for the current version:

 

1-FulltextSearch-Wrong-Icon.png

 

This result implies that D2 displays the results provided by xPlore directly, without any further check based on “r_object_id” for updated metadata values.

At this stage, xPlore only returned document information without value for “r_lock_owner”.

The reason is that after the user had checked-out the document, no queue item was created to tell the index-agent to re-index the document with a value for the “r_lock_owner” attribute.

In order to ask index agent to index again the document, we have to register the “dm_checkout” event on “dm_sysobject” type for “dm_fulltext_index_user”.

However, the reverse operation must also been registered, for “dm_unlock” and inform xPlore the document is not checked-out anymore.

 

First, we need to verify that the event is not registered for the user index. Using DQL, the following query must return no result:

 

 

select * from dmi_registry where user_name='dm_fulltext_index_user' and event='dm_checkout';

 

 

Then we need to retrieve r_object_id of the dm_sys_object type:

 

 

select r_object_id from dm_type where name ='dm_sysobject';

 

 

Using API, request a login session ticket and open a session as 'dm_fulltext_index_user':

 

 

API> getlogin,c,dm_fulltext_index_user
...
DM_TICKET=T0JKIE5VTEwgMAoxMwp2ZXJzaW9uIElOVCBTIDAKMwpmbGFncyBJTlQgUyAwCjAKc2VxdWVuY2VfbnVtIElOVCBTIDAKMjE5OApjcmVhdGVfdGltZSBJTlQgUyAwCjE0MDA3NjU1NzQKZXhwaXJlX3RpbWUgSU5UIFMgMAoxNDAwNzY1ODc0CmRvbWFpbiBJTlQgUyAwCjAKdXNlcl9uYW1lIFNUUklORyBTIDAKQSAyMiBkbV9mdWxsdGV4dF9pbmRleF91c2VyCnBhc3N3b3JkIElOVCBTIDAKMApkb2NiYXNlX25hbWUgU1RSSU5HIFMgMApBIDYgRDJETVNVCmhvc3RfbmFtZSBTVFJJTkcgUyAwCkEgOCBsYm5kczEzMQpzZXJ2ZXJfbmFtZSBTVFJJTkcgUyAwCkEgNiBEMkRNU1UKc2lnbmF0dXJlX2xlbiBJTlQgUyAwCjU2CnNpZ25hdHVyZSBTVFJJTkcgUyAwCkEgNTYgZEl2MXgyTEVFWGloM2RrTFloNUJFcmZVT1lwQnRZMkxpMFU2M293MnJZcFJna29VbWtSOVRnPT0K

 

 

Copy login ticket result into the next command and update with:

 

 

API> connect,,dm_fulltext_index_user,DM_TICKET=T0JKIE5VTEwgMAoxMwp2ZXJzaW9uIElOVCBTIDAKMwpmbGFncyBJTlQgUyAwCjAKc2VxdWVuY2VfbnVtIElOVCBTIDAKMjE5OApjcmVhdGVfdGltZSBJTlQgUyAwCjE0MDA3NjU1NzQKZXhwaXJlX3RpbWUgSU5UIFMgMAoxNDAwNzY1ODc0CmRvbWFpbiBJTlQgUyAwCjAKdXNlcl9uYW1lIFNUUklORyBTIDAKQSAyMiBkbV9mdWxsdGV4dF9pbmRleF91c2VyCnBhc3N3b3JkIElOVCBTIDAKMApkb2NiYXNlX25hbWUgU1RSSU5HIFMgMApBIDYgRDJETVNVCmhvc3RfbmFtZSBTVFJJTkcgUyAwCkEgOCBsYm5kczEzMQpzZXJ2ZXJfbmFtZSBTVFJJTkcgUyAwCkEgNiBEMkRNU1UKc2lnbmF0dXJlX2xlbiBJTlQgUyAwCjU2CnNpZ25hdHVyZSBTVFJJTkcgUyAwCkEgNTYgZEl2MXgyTEVFWGloM2RrTFloNUJFcmZVT1lwQnRZMkxpMFU2M293MnJZcFJna29VbWtSOVRnPT0K
...
s1

 


“s1” is now the 'dm_fulltext_index_user' session descriptor.

 

Use this session descriptor to register “dm_checkout” and “dm_unlock” events for “dm_sysobject” type.

Replace in both lineswith r_object_id you have got previously:

 

 

API> register,s1,,dm_checkout,,F
...
OK
API> register,
s1,,dm_unlock,,F
...
OK

 

 

In case you have configured another index agent for your repository, do not forget to reproduce same procedure for the other user: “dm_fulltext_index_user_01”.

I hope this information will be of help to you.

Oracle system statistics: Display AUX_STATS$ with calculated values and formulas

Wed, 2014-10-15 01:13

System statistics can be gathered in NOWORKLOAD or WORKLOAD mode. Different values will be set depending on that and the others will be calculated - derived from them. We can see defined values from SYS.AUX_STATS$ but here is a script that shows the calculated ones as well.

With no system statistics or NOWORKLOAD the values of IOSEEKTIM (latency in ms) and IOTFRSPEED (transfer in bytes/ms) are set and the SREADTIM (time to read 1 block in ms) and MREADTIM (for multiblock read) are calculated from them. MBRC depends on the defaults or the db_file_multiblock_read_count settings.

With WORKLOAD statistics, the SREADTIM and MREADTIM as well as MBRC are measured and those are the ones that are used by the optimizer.

Here is my script:

set echo off
set linesize 200 pagesize 1000
column pname format a30
column sname format a20
column pval2 format a20

select pname,pval1,calculated,formula from sys.aux_stats$ where sname='SYSSTATS_MAIN'
model
  reference sga on (
    select name,value from v$sga 
        ) dimension by (name) measures(value)
  reference parameter on (
    select name,decode(type,3,to_number(value)) value from v$parameter where name='db_file_multiblock_read_count' and ismodified!='FALSE'
    union all
    select name,decode(type,3,to_number(value)) value from v$parameter where name='sessions'
    union all
    select name,decode(type,3,to_number(value)) value from v$parameter where name='db_block_size'
        ) dimension by (name) measures(value)
partition by (sname) dimension by (pname) measures (pval1,pval2,cast(null as number) as calculated,cast(null as varchar2(60)) as formula) rules(
  calculated['MBRC']=coalesce(pval1['MBRC'],parameter.value['db_file_multiblock_read_count'],parameter.value['_db_file_optimizer_read_count'],8),
  calculated['MREADTIM']=coalesce(pval1['MREADTIM'],pval1['IOSEEKTIM'] + (parameter.value['db_block_size'] * calculated['MBRC'] ) / pval1['IOTFRSPEED']),
  calculated['SREADTIM']=coalesce(pval1['SREADTIM'],pval1['IOSEEKTIM'] + parameter.value['db_block_size'] / pval1['IOTFRSPEED']),
  calculated['   multi block Cost per block']=round(1/calculated['MBRC']*calculated['MREADTIM']/calculated['SREADTIM'],4),
  calculated['   single block Cost per block']=1,
  formula['MBRC']=case when pval1['MBRC'] is not null then 'MBRC' when parameter.value['db_file_multiblock_read_count'] is not null then 'db_file_multiblock_read_count' when parameter.value['_db_file_optimizer_read_count'] is not null then '_db_file_optimizer_read_count' else '= _db_file_optimizer_read_count' end,
  formula['MREADTIM']=case when pval1['MREADTIM'] is null then '= IOSEEKTIM + db_block_size * MBRC / IOTFRSPEED' end,
  formula['SREADTIM']=case when pval1['SREADTIM'] is null then '= IOSEEKTIM + db_block_size        / IOTFRSPEED' end,
  formula['   multi block Cost per block']='= 1/MBRC * MREADTIM/SREADTIM',
  formula['   single block Cost per block']='by definition',
  calculated['   maximum mbrc']=sga.value['Database Buffers']/(parameter.value['db_block_size']*parameter.value['sessions']),
  formula['   maximum mbrc']='= buffer cache size in blocks / sessions'
);
set echo on

Here is an exemple with default statistics:

PNAME                               PVAL1 CALCULATED FORMULA
------------------------------ ---------- ---------- --------------------------------------------------
CPUSPEEDNW                           1519
IOSEEKTIM                              10
IOTFRSPEED                           4096
SREADTIM                                          12 = IOSEEKTIM + db_block_size        / IOTFRSPEED
MREADTIM                                          26 = IOSEEKTIM + db_block_size * MBRC / IOTFRSPEED
CPUSPEED
MBRC                                               8 = _db_file_optimizer_read_count
MAXTHR
SLAVETHR
   maximum mbrc                           117.152542 = buffer cache size in blocks / sessions
   single block Cost per block                     1 by definition
   multi block Cost per block                  .2708 = 1/MBRC * MREADTIM/SREADTIM

You see the calculated values for everything. Note the 'maximum mbrc' which limits the multiblock reads when the buffer cache is small. It divides the buffer cache size (at startup - can depend on ASMM and AMM settings) by the sessions parameter.

Here is an example with workload system statistics gathering:

PNAME                               PVAL1 CALCULATED FORMULA
------------------------------ ---------- ---------- --------------------------------------------------
CPUSPEEDNW                           1511
IOSEEKTIM                              15
IOTFRSPEED                           4096
SREADTIM                            1.178      1.178
MREADTIM                              .03        .03
CPUSPEED                             3004
MBRC                                    8          8 MBRC
MAXTHR                            6861824
SLAVETHR
   maximum mbrc                           114.983051 = buffer cache size in blocks / sessions
   single block Cost per block                     1 by definition
   multi block Cost per block                  .0032 = 1/MBRC * MREADTIM/SREADTIM

here all values are explicitely set

And an example with exadata system statistics that defines noworkload values and sets also the MBRC (see Chris Antognini post about it)

PNAME                               PVAL1 CALCULATED FORMULA
------------------------------ ---------- ---------- --------------------------------------------------
CPUSPEEDNW                           1539
IOSEEKTIM                              16
IOTFRSPEED                         204800
SREADTIM                                       16.04 = IOSEEKTIM + db_block_size        / IOTFRSPEED
MREADTIM                                       18.28 = IOSEEKTIM + db_block_size * MBRC / IOTFRSPEED
CPUSPEED
MBRC                                   57         57 MBRC
MAXTHR
SLAVETHR
   maximum mbrc                           114.983051 = buffer cache size in blocks / sessions
   single block Cost per block                     1 by definition
   multi block Cost per block                    .02 = 1/MBRC * MREADTIM/SREADTIM

And finaly here is a workload system statistics result but with explicitly setting the db_file_multiblock_read_count to 128:

PNAME                               PVAL1 CALCULATED FORMULA
------------------------------ ---------- ---------- --------------------------------------------------
CPUSPEEDNW                           1539
IOSEEKTIM                              15
IOTFRSPEED                           4096
SREADTIM                                          17 = IOSEEKTIM + db_block_size        / IOTFRSPEED
MREADTIM                                         271 = IOSEEKTIM + db_block_size * MBRC / IOTFRSPEED
CPUSPEED
MBRC                                             128 db_file_multiblock_read_count
MAXTHR
SLAVETHR
   maximum mbrc                           114.983051 = buffer cache size in blocks / sessions
   single block Cost per block                     1 by definition
   multi block Cost per block                  .1245 = 1/MBRC * MREADTIM/SREADTIM

Here you see that the MBRC in noworkload is coming from the value which is set by the db_file_multiblock_read_count rather from the value 8 which is used by default by the optimizer when it is not set. And the MREADTIM is calculated from that i/o size

For more historical information about system statistics and how multiblock reads are costed (index vs. full table scan choice) see my article on latest OracleScene

As usual if you find anything to improve in that script, please share.

How to measure Oracle index fragmentation

Mon, 2014-10-13 14:39

At Oracle Open World 2014, or rather the Oaktable World, Chris Antognini has presented 'Indexes: Structure, Splits and Free Space Management Internals'. It's not something new, but it's still something that is not always well understood: how index space is managed, block splits, fragmentation, coalesce and rebuilds. Kyle Hailey has made a video of it available here.
For me, it is the occasion to share the script I use to see if an index is fragmented or not.

First, forget about those 'analyze index validate structure' which locks the table, and the DEL_LEAF_ROWS that counts only the deletion flags that are transient. The problem is not the amount of free space. The problem is where is that free space. Because if you will insert again in the same range of values, then that space will be reused. Wasted space occurs only when lot of rows were deleted in a range where you will not insert again. For exemple, when you purge old ORDERS, then the index on the ORDER_DATE - or on the ORDER_ID coming from a sequence - will be affected. Note that the problem occurs only for sparse purges because full blocks are reclaimed when needed and can get rows from an other range of value.

I have a script that shows the number of rows per block, as well as used and free space per block, and aggregates that by range of values.

First, let's create a table with a date and an index on it:

SQL> create table DEMOTABLE as select sysdate-900+rownum/1000 order_date,decode(mod(rownum,100),0,'N','Y') delivered , dbms_random.string('U',16) cust_id
  2  from (select * from dual connect by level

My script shows 10 buckets with begin and end value and for each of them the averge number of rows per block and the free space:

SQL> @index_fragmentation 

ORDER_DAT -> ORDER_DAT rows/block bytes/block %free space     blocks free                   
--------- -- --------- ---------- ----------- ----------- ---------- -----                     
24-APR-12 -> 02-AUG-12        377        7163          11        266                        
03-AUG-12 -> 11-NOV-12        377        7163          11        266                        
11-NOV-12 -> 19-FEB-13        377        7163          11        266                        
19-FEB-13 -> 30-MAY-13        377        7163          11        265                        
30-MAY-13 -> 07-SEP-13        377        7163          11        265                        
07-SEP-13 -> 16-DEC-13        377        7163          11        265                        
16-DEC-13 -> 26-MAR-14        377        7163          11        265                        
26-MAR-14 -> 03-JUL-14        377        7163          11        265                        
04-JUL-14 -> 11-OCT-14        377        7163          11        265                        
12-OCT-14 -> 19-JAN-15        376        7150          11        265                        

Note that the script reads all the table (it can do a sample but here it is 100%). Not exactly the table but only the index. It counts the index leaf blocks with the undocumented function sys_op_lbid() which is used by oracle to estimate the clustering factor.

So here I have no fragmentation. All blocks have about 377 rows and no free space. This is because I inserted them in increasing order and the so colled '90-10' block split occured.

Let's see what I get if I delete most of the rows before the 01-JAN-2014:

SQL> delete from DEMOTABLE where order_dateSQL> @index_fragmentation 

ORDER_DAT -> ORDER_DAT rows/block bytes/block %free space     blocks free                   
--------- -- --------- ---------- ----------- ----------- ---------- -----                     
25-APR-12 -> 02-AUG-12          4          72          99        266 oooo                   
03-AUG-12 -> 11-NOV-12          4          72          99        266 oooo                   
11-NOV-12 -> 19-FEB-13          4          72          99        266 oooo                   
19-FEB-13 -> 30-MAY-13          4          72          99        265 oooo                   
30-MAY-13 -> 07-SEP-13          4          72          99        265 oooo                   
07-SEP-13 -> 16-DEC-13          4          72          99        265 oooo                   
16-DEC-13 -> 26-MAR-14          4          72          99        265 oooo                   
26-MAR-14 -> 03-JUL-14          4          72          99        265 oooo                   
04-JUL-14 -> 11-OCT-14         46         870          89        265 oooo                   
12-OCT-14 -> 19-JAN-15        376        7150          11        265                        

I have the same buckets, and same number of blocks. But blocks which are in the range below 01-JAN-2014 have only 4 rows and a lot of free space. This is exactly what I want to detect: I can check if that free space will be reused.

Here I know I will not enter any orders with a date in the past, so those blocks will never have an insert into them. I can reclaim that free space with a COALESCE:

SQL> alter index DEMOINDEX coalesce;

Index altered.
SQL> @index_fragmentation 

ORDER_DAT to ORDER_DAT rows/block bytes/block %free space     blocks free                      
--------- -- --------- ---------- ----------- ----------- ---------- -----                     
25-APR-12 -> 03-OCT-14        358        6809          15         32                        
03-OCT-14 -> 15-OCT-14        377        7163          11         32                        
15-OCT-14 -> 27-OCT-14        377        7163          11         32                        
27-OCT-14 -> 08-NOV-14        377        7163          11         32                        
08-NOV-14 -> 20-NOV-14        377        7163          11         32                        
20-NOV-14 -> 02-DEC-14        377        7163          11         32                        
02-DEC-14 -> 14-DEC-14        377        7163          11         32                        
14-DEC-14 -> 26-DEC-14        377        7163          11         32                        
27-DEC-14 -> 07-JAN-15        377        7163          11         32                        
08-JAN-15 -> 19-JAN-15        371        7056          12         32                        

I still have 10 buckets because this is defined in my script, but each bucket noew has less rows. I've defragmented the blocks and reclaimed the free blocks.

Time to share the script now. Here it is: index_fragmentation.zip

The script is quite ugly. It's SQL generated by PL/SQL. It's generated because it selects the index columns. And as I don't want to have it too large it is not indented nor commented. However, if you run it with set servertoutput on you will see the generated query.

How to use it? Just change the owner, table_name, and index name. It reads the whole index so if you have a very large index you may want to change the sample size.

Alfresco Summit 2014 - Conferences, Day 2 - Best Practices & Monitoring

Fri, 2014-10-10 07:00


As I said in my last blog, the Alfresco Summit 2014 took place in London from October 7 to 9. So yeah, unfortunately yesterday was the last day of this amazing event. It's the first time I had the chance to participate in an Alfresco Summit but I'm quite sure it will not be the last! I can't tell what moment I preferred during this event because the Keynote (presentation of Alfresco One 5.0) was very impressive and I really enjoyed it! All presentations were very interesting too but if I should pick just one thing, I think I would choose... The two evening parties Cool.

So more seriously, on the second day of conferences, I had the opportunity to attend a lot of sessions about best practices. The principal reason why I choose these kind of session is because I wanted to confront dbi services' best practices about Alfresco to best practices of other Alfresco consultants (either Alfresco employees or Alfresco partners). Best practices sessions covered several different fields like:

  • Security
  • Performance
  • Replication, Backup & Disaster Recovery
  • Monitoring
  • Sizing
  • Aso...


Yesterday morning, I had the chance to attend to a session presented by Miguel Rodriguez (Technical Account Manager at Alfresco) about how to monitor an Alfresco installation. For the end of this blog, I will summarize the presentation of Miguel to show you how this solution is powerful! So the presentation wasn't just about how to monitor a JVM, it was really more advanced because Miguel presented how to control all Alfresco environments (Dev, Test, Prod, aso...) in a single place with several tools.

The monitoring of Alfresco can be divided into three different categories:

  • Monitoring and treatment of Alfresco logs
  • Monitoring of the OS/JVM/Alfresco/Solr/Database
  • Sending alerts to a group of people regarding the two categories above

 

I. Monitor and treat Alfresco logs


This first part about the monitoring and treatment of Alfresco logs needs three software:

  • Logstash: it's an utility that can be used to monitor events or log files and to execute little commands (like a "top")
  • ElasticSearch: this tool is able to index log files to do some restful search and analytics after that. This will be very useful for this part
  • Kibana3: the last software is the one used for the interface. It will just be used to query the index created by ElasticSearch and beautifully display the results. The Kibana3 page is composed of boxes fully customizable directly from the interface and that are refreshed automatically every X seconds (also customizable)


So here, we have a complete log monitoring & treatment solution. This solution can be used to display all errors in all alfresco log files in the last X secondes/minutes/hours, aso... But it can also be used to apply filters on all log files. For example, if you want to find all log entries related to "License", then you just have to type "license" in the search box. As all log entries are indexed by ElasticSearch (using Lucene), then all boxes on the interface will be refreshed to only display data related to "license".

b2ap3_thumbnail_Kibana3.jpgImage from Jeff Potts' blog. This image display data related to JMeter but same results could be applied to Alfresco.

 

II. Monitoring of the OS/JVM/Alfresco/Solr/Database/...


This second part is as simple as the first one:

  • Logstash: this time, logstash is more used to execute commands to retrieve numerical data. These commands can be Unix commands (e.g. "top" to retrieve current CPU consumption) but they can also be JMX commands to retrieve certain values from the JVM like free heap space. More details are available below.
  • Graphite: this utility will just retrieve all numerical information generated by Logstash and then store them in a database
  • Grafana: as above, this part need an interface... And in the same way, the page is fully customizable directly from the interface. You can add as many boxes (each box represent a diagram, often a line chart) as you want with as many information available in your database.


When I first saw the interface of Grafana, I thought "Oh, it looks like jconsole". But this was really only my first impression. On this interface, you can add different boxes to display numerical data about not only one Alfresco Node but about all Alfresco Nodes! For example, you will be able to display boxes with the CPU consumption, memory consumption, the JVM heap size (permgen, used, free, aso...) and all these boxes will show data related to all Nodes of your cluster. All this kind of stuff is already available through jconsole. But in addition to what jconsole is capable of, you will be able to display boxes related to (non-exhaustive list):

  • All database sessions available and all used
  • Tomcat threads
  • Disk i/o
  • Alfresco documents in the repository and in Solr (Solr store documents in a different way)
  • Health of the indexing: based on different values, we can know if the indexing is working well or if some tuning is required
  • Index workers and tracking status: this chart show if there is a gap between the tracking of documents by Solr and the indexing of these documents which would means that Solr isn't indexing at that time (probably an issue)
  • Solr transactions
  • Concurrent users
  • Aso...

 Grafana.pngImage extracted from the demonstration of Miguel Rodriguez

 

III. Sending alerts


You are probably aware of some tools like Nagios that can be used to send alerts of different sorts. So let me present you a tool that you may not know: Icinga. This is the last tool that we will use for this monitoring solution and it's pretty much the same thing that Nagios (it's a fork of Nagios (2009)). With this tool, you are able to define some rules with a low threshold and a high threshold. If a threshold is reached, then the status of the rule is updated to Warning or Critical (e.g. Server Free Memory below 20% = Warning ; Server Free Memory below 10% = Critical). At the same time, an alert is sent to the group of people defined as contact in case of threshold reached. Well it's pretty much the same thing that Nagios or other tools like that so I will not describe Icinga deeper.

 b2ap3_thumbnail_Icinga.jpgImage from Toni de la Fuente's Blog

 

So here we are, we now have a lot of different tools used to provide a really good passive and active monitoring solution. But I hear your complaints from here about "Hey, there are way too much components to install, this is too difficult!". Well let me finish! All tools above are Open Source tools and are available as a package created by Packer. This utility let you create a virtual machine "template" with all needed components pre-installed. So you are able to just download the package and after the execution of a single command, you will have a new virtual machine up and running with all monitoring components. The only remaining step is to install a Logstash agent on each components of Alfresco (Alfresco Nodes, Solr Nodes) and configure this agent to retrieve useful data.


I was really impressed by this session because Miguel Rodriguez showed us a lot of very interesting tools that I didn't know. His monitoring solution looks really pretty well and I'm quite sure I will take some time to try it in the next few weeks.


I hope you have enjoyed this article as much as I enjoyed Miguel's session and if you want to download your own version of this Monitoring VM, take a look at Miguel's Github account.

 

Documentum upgrade project: ADTS local user or domain user?

Thu, 2014-10-09 18:43

ADTS local user or domain user: Which user should I choose as the installation owner? This question is important if you are installing a new Advanced Document Transformation Services (ADTS).  You can opt for a domain user in order to have one user for several ADTS installations with less AD management. Or you may want to opt for local users to separate the installations, but you will have to manage all users independently. In fact, we will see that you don't really have the choice.

As you may know, the ADTS server is a bunch of software interacting between each other. You have:

  • Microsoft Office products (Word, Excel, PowerPoint ...)
  • EMC Software
  • Adlib for EMC Software

And most of these software work thanks to Windows Services and some of these services need a user to run:

 

 Service  Log On As
 Start Order  Stop Order  Adlib Exponent Connector  ADTS Install Owner  auto  6  Adlib Exponent Manager  ADTS Install Owner  3  5  Adlib FMR  Local System  2  4  Adlib Process Manager  Local System  1  3  Documentum Content Transformation Monitor Service  Local System  4  auto  Documentum Content Transformation Service  ADTS Install Owner  auto  1  Documentum CTS Admin Agent  Local System  5  2


As you can see in the previous table I put 'ADTS Install Owner' into the table as 'Log On As'. That means a local user must be created for these services. Why can't we use AD user?

That is simple, the services are sessions-related. They are started under the specific user. But an AD user cannot be connected for ever on the system, maybe due to policies. In fact, the session is disabled and the services are stopped or cannot be started properly. Thus, the renditions are not created anymore.

Also note that if you are using Remote Desktop Connection and login with the installation owner, do not forget to unmap the printer as it can cause issues with the Adlib printer which is normally set as the default printer. Edit your .RDP file, go to tab Local Resources and untick Printers. The Adlib printer can be replaced by your own one as default, preventing the ADTS to use the correct printer.

A good practice is to use a different user than the installation owner for basic administration tasks.