Skip navigation.

Feed aggregator

OracleText: deletes and garbage

Yann Neuhaus - Sun, 2015-02-15 15:21

In the previous post we have seen how the OracleText index tables are maintained when new document arrives: At sync the new documents are read up to the available memory and words are inserted in the $I table with their mapping information. Now we will see how removed documents are processed. We will not cover updates as their are just delete + insert.

Previous state

Here is the state from the previous post where I had those 3 documents:

  • 'Hello World'
which was synced alone, and then the two following ones were synced together:
  • 'Hello Moon, hello, hello'
  • 'Hello Mars'
The $K is a IOT which maps the OracleText table ROWID to the DOCID (the fact that the primary key TEXTKEY is not at start is a bit misleading):
SQL> select * from DR$DEMO_CTX_INDEX$K;

---------- ------------------
The $R is a table mapping the opposite navigation (docid to rowid) storing a fixed-length array of ROWIDs indexed by the docid, and split into several lines:
SQL> select * from DR$DEMO_CTX_INDEX$R;

---------- ------------------------------------------------------------
         0 00001752D000280000056940414100001752D00028000005694041420000
The $I table stores the tokens, the first 5 columns being indexed ($X) and the TOKEN_INFO blob stores detailed location of the token:
SQL> select * from DR$DEMO_CTX_INDEX$I;

---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802
We have seen that the $I table can be fragmented for 3 reasons:
  • Each sync insert his tokens (instead of merging with other ones)
  • TOKEN_INFO size is limited to fit in-row (we will see 12c new features later)
  • Only tokens that fit in the allocated memory can be merged
And the $N is empty for the moment:
SQL> select * from DR$DEMO_CTX_INDEX$N;

no rows selected


Do you remember how inserts are going to the CTXSYS.DR$PENDING table? Deletes are going to CTXSYS.DR$DELETE table:

SQL> delete from DEMO_CTX_FRAG where num=0002;

1 row deleted.

SQL> select * from CTXSYS.DR$DELETE;

---------- ---------- ----------
      1400          0          2
I've deleted docid=2 but the tokens are still there:
SQL> select * from DR$DEMO_CTX_INDEX$I;

---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802
as well as their mapping to the ROWID:
SQL> -- $R is for rowid - docid mapping (IOT)
SQL> select * from DR$DEMO_CTX_INDEX$R;

---------- ------------------------------------------------------------
         0 00001752D000280000056940414100001752D00028000005694041420000
However, the $N has been maintained to know that docid=2 has been removed:
SQL> select * from DR$DEMO_CTX_INDEX$N;

---------- -
         2 U
This is the goal of $N (Negative) table which records the docid that should not be there and that must be deleted at next optimization (garbage collection).

From there, a search by words ('normal lookup') will give docid's and rowid's and the CTXSYS.DR$DELETE must be read in order to know that the document is not there anymore. It's an IOT and the docid can be found with an index unique scan.

However for the opposite way, having a ROWID and checking if it contains some words ('functional lookup') we need to know that there is no document. In my case I deleted the row, but you may update the document, so the ROWID is still there. There is no pending table for that. It is maintained immediately in the $K table:

SQL> select * from DR$DEMO_CTX_INDEX$K;

---------- ------------------
the entry that addressed docid=2 has been deleted.


All those changes were done within the same transaction. Other sessions still see the old values. No need to read CTXSYS.DR$DELETE for them. What I described above is only for my session: the normal lookup reading the queuing table, and the functional lookup stopping at $K. We don't have to wait a sync to process CTXSYS.DR$DELETE. It's done at commit:

SQL> commit;

Commit complete.

SQL> select * from CTXSYS.DR$DELETE;

no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$R;

---------- ------------------------------------------------------------
         0 00001752D000280000056940414100000000000000000000000000000000
Of course we can't read it but we see that part of it has been zeroed. That $R table is definitely special: it's not stored in a relational way, and its maintenance is deferred at commit time.

But nothing has changed in $I which contains garbage (and sync is not changing anything to that):

SQL> select * from DR$DEMO_CTX_INDEX$I;

---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802
And of course $N row is still there to record the deleted docid:
SQL> select * from DR$DEMO_CTX_INDEX$N;

---------- -
         2 U

I've not reproduced it here, but sync is not changing anything. Sync is for new documents - not for deleted ones.

Conclusion What you need to remember here is:
  • New documents are made visible through OracleText index at sync
  • Removed document are immediately made invisible at commit
Of course, you can sync at commit, but the second thing to remember is that
  • New documents brings fragmentation
  • Removed document brings garbage
and both of them increase the size of the $I table and its $X index, making range scans less efficient. We will see more about that but the next post will be about queries. I've talked about normal and functional lookups and we will see how they are done. Let's detail that.

OracleText: inserts and fragmentation

Yann Neuhaus - Sun, 2015-02-15 14:37

I plan to write several posts about OracleText indexes, which is a feature that is not used enough in my opinion. It's available in all editions and can index small text or large documents to search by words. When you create an OracleText index, a few tables are created to store the words and the association between those words and the table row that contains the document. I'll start to show how document inserts are processed.

Create the table and index

I'm creating a simple table with a CLOB

SQL> create table DEMO_CTX_FRAG
     (num number constraint DEMO_CTX_FRAG_PK primary key,txt clob);

Table created.
and a simple OracleText on that column
SQL> create index DEMO_CTX_INDEX on DEMO_CTX_FRAG(txt)
     indextype is ctxsys.context;

Index created.
That creates the following tables:
  • DR$DEMO_CTX_INDEX$I which stores the tokens (e.g words)
  • DR$DEMO_CTX_INDEX$K which index the documents (docid) and links them to the table ROWID
  • DR$DEMO_CTX_INDEX$R which stores the opposite way navigation (get ROWID from a docid)
  • DR$DEMO_CTX_INDEX$N which stores docid for deferred maintenance cleanup.


I'm inserting a row with some text in the clob column

SQL> insert into DEMO_CTX_FRAG values (0001,'Hello World');

1 row created.
I commit
SQL> commit;

Commit complete.
And here is what we have in the OracleText tables:
SQL> select * from DR$DEMO_CTX_INDEX$K;
no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$R;
no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$I;
no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$N;
no rows selected
Nothing is stored here yet. Which means that we cannot find our newly inserted row from an OracleText search.

By default, all inserts maintain the OracleText tables asynchronously.
The inserted row is referenced in a CTXSYS queuing table that stores the pending inserts:

SQL> select * from CTXSYS.DR$PENDING;

---------- ---------- ------------------ --------- -
      1400          0 AAAXUtAAKAAABWlAAA 13-FEB-15 N
and we have a view over it:
SQL> select pnd_index_name,pnd_rowid,pnd_timestamp from ctx_user_pending;

PND_INDEX_NAME                 PND_ROWID          PND_TIMES
------------------------------ ------------------ ---------


let's synchronize:

SQL> exec ctx_ddl.sync_index('DEMO_CTX_INDEX');

PL/SQL procedure successfully completed.
The queuing table has been processed:
SQL> select pnd_index_name,pnd_rowid,pnd_timestamp from ctx_user_pending;

no rows selected
and here is how that document is sotred in our OracleText tables.

$K records one document (docid=1) and the table rowid that contains it:

SQL> select * from DR$DEMO_CTX_INDEX$K;

---------- ------------------
$R table stores the docid -> rowid is a non-relational way:
SQL> select * from DR$DEMO_CTX_INDEX$R;

---------- ------------------------------------------------------------
         0 00001752D0002800000569404141
How is it stored? It's an array of ROWIDs which are fixed length. Then from the docid we can directly go to the offset and get the rowid. Because DATA is limited to 4000 bytes, there are several rows. But a docid determines the ROW_NO as well as the offset in DATA.

$I stores the tokens (which are the words here as we have TEXT token - which is the type 0) as well as their location information:

SQL> select * from DR$DEMO_CTX_INDEX$I;

---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
For each word it stores the range of docid that contains the work (token_first and token_last are those docid) and token_info stores in an binary way the occurrences of the word within the documents (it stores pairs of docid and offest within the document). It's a BLOB but is limited to 4000 bytes so that it is stored inline. Which means that if a token is present in a lot of document, several lines in $I will be needed, each covering a different range of docid. This has changed in 12c and we will see that in future blog posts.

Thus, we can have several rows for one token. This is the first cause of fragmentation. Searching for documents that contain such a word will have to read several lines of the $I table. The $N has nothing here because we synchronized only inserts and there is nothing to cleanup.

SQL> select * from DR$DEMO_CTX_INDEX$N;

no rows selected

Several inserts

I will insert two lines, which also contain the 'hello' word.

SQL> insert into DEMO_CTX_FRAG values (0002,'Hello Moon, hello, hello');

1 row created.

SQL> insert into DEMO_CTX_FRAG values (0003,'Hello Mars');

1 row created.

SQL> commit;

Commit complete.
And I synchronize:
SQL> exec ctx_ddl.sync_index('DEMO_CTX_INDEX');

PL/SQL procedure successfully completed.
So, I've now 3 documents:
SQL> select * from DR$DEMO_CTX_INDEX$K;

---------- ------------------
The reverse mapping array has increased:
SQL> select * from DR$DEMO_CTX_INDEX$R;

---------- ------------------------------------------------------------
         0 00001752D000280000056940414100001752D00028000005694041420000
And now the tokens:
SQL> select * from DR$DEMO_CTX_INDEX$I;

---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802
What is interesting here is that the previous lines (docid 1) have not been updated and new lines have been inserted for docid 2 and 3.
  • 'moon' is only in docid 2
  • 'mars' is only in docid 3
  • 'hello' is in 2 (token_count) documents, from docid 2 to docid 3 (token_first and token_last)

This is the other cause of fragmentation coming from frequent sync. Each sync will add new rows. However, when multiple documents are processed in the same sync, then only one $I entry per token is needed.

There is a third cause of fragmentation. We see here that the token_info is larger for that HELLO covering docid 2 to 3 because there are several occurrences of the token. All that must fit in memory when we synchronize. So it's good to synchronize when we have several documents (so that the common tokens are not too fragmented) but we need also to have enough memory. The default is 12M and is usually too small. It can be increased with the 'index memory' parameter of the index. And there is also a maximum set by ctx_adm.set_parameter for which the default (50M) is also probably too low.

Nothing yet in the $N table that we will see in the next post:

SQL> select * from DR$DEMO_CTX_INDEX$N;

no rows selected


The important points here is that inserted document are visible only after synchronization, and synchronizing too frequently will cause fragmentation. If you need to synchronize in real time (on commit) and you commit for each document inserted, then you will probably have to plan frequent index optimization. If on the other hand we are able to synchronize only when we have inserted a lot of documents, then fragmentation is reduced according that we had enough memory to process all documents in one pass.

The next post will be about deletes and updates.

Wanted – A Theory of Change

Michael Feldstein - Sun, 2015-02-15 14:25

By Michael FeldsteinMore Posts (1014)

Phil and I went to the ELI conference this week. It was my first time attending, which is odd given that it is one of the best conferences that I’ve attended in quite a while. How did I not know this?

We went, in part, to do a session on our upcoming e-Literate TV series, which was filmed for use in the series. (Very meta.) Malcolm Brown and Veronica Diaz did a fantastic job of both facilitating and participating in the conversation. I can’t wait to see what we have on film. Phil and I also found that an usually high percentage of sessions were ones that we actually wanted to go to and, once there, didn’t feel the urge to leave. But the most important aspect of any conference is who shows up, and ELI did not disappoint there either. The crowd was diverse, but with a high percentage of super-interesting people. On the one hand, I felt like this was the first time that there were significant numbers of people talking about learning analytics who actually made sense. John Whitmer from Blackboard (but formerly from CSU), Mike Sharkey from Blue Canary (but formerly from University of Phoenix), Rob Robinson from Civitas (but formerly from the University of Texas), Eric Frank of Acrobatiq (formerly of Flat World Knowledge)—these people (among others) were all speaking a common language, and it turns out that language was English. I feel like that conversation is finally beginning to come down to earth. At the same time, I got to meet Gardner Campbell for the first time and ran into Jim Groom. One of the reasons that I admire both of these guys is that they challenge me. They unsettle me. They get under my skin, in a good way (although it doesn’t always feel that way in the moment).

And so it is that I find myself reflecting disproportionately on the brief conversations that I had with both of them, and about the nature of change in education.

I talked to Jim for maybe a grand total of 10 minutes, but one of the topics that came up was my post on why we haven’t seen the LMS get dramatically better in the last decade and why I’m pessimistic that we’ll see dramatic changes in the next decade. Jim said,

Your post made me angry. I’m not saying it was wrong. It was right. But it made me angry.

Hearing this pleased me inordinately, but I didn’t really think about why it pleased me until I was on the plane ride home. The truth is that the post was intended to make Jim (and others) angry. First of all, I was angry when I wrote it. We should be frustrated at how hard and slow change has been. It’s not like anybody out there is arguing that the LMS is the best thing since sliced bread. Even the vendors know better than to be too boastful these days. (Most of them, anyway.) At best, conversations about the LMS tend to go like the joke about the old Jewish man complaining about a restaurant: “The food here is terrible! And the portions are so small!” After a decade of this, the joke gets pretty old. Somehow, what seemed like Jack Benny has started to feel more like Franz Kafka.

Second, it is an unattractive personal quirk of mine than I can’t resist poking at somebody who seems confident of a truth, no matter what that truth happens to be. Even if I agree with them. If you say to me, “Michael, you know, I have learned that I don’t really know anything,” I will almost inevitably reply, “Oh yeah? Are you sure about that?” The urge is irresistible. If you think I’m exaggerating, then ask Dave Cormier. He and I had exactly this fight once. This may make me unpopular at parties—I like to tell myself that’s the reason—but it turns out to be useful in thinking about educational reform because just about everybody shares some blame in why change is hard, and nobody likes to admit that they are complicit in a situation that they find repugnant. Faculty hate to admit that some of them reinforce the worst tendencies of LMS and textbook vendors alike by choosing products that make their teaching easier rather than better. Administrators hate to admit that some of them are easily seduced by vendor pitches, or that they reflexively do whatever their peer institutions do without a lot of thought or analysis. Vendors hate to admit that their organizations often do whatever they have to in order close the sale, even if it’s bad for the students. And analysts and consultants…well…don’t get me started on those smug bastards. It would be a lot easier if there were one group, one cause that we could point to as the source of our troubles. But there isn’t. As a result, if we don’t acknowledge the many and complex causes of the problems we face, we risk having an underpants gnomes theory of change:

Click here to view the embedded video.

I don’t know what will work to bring real improvements to education, but here are a few things that won’t:

  • Just making better use of the LMS won’t transform education.
  • Just getting rid of the LMS won’t transform education.
  • Just bringing in the vendors won’t transform education.
  • Just getting rid of the vendors won’t transform education.
  • Just using big data won’t transform education.
  • Just busting the faculty unions won’t transform education.
  • Just listening to the faculty unions won’t transform education.

Critiques of some aspect of education or other are pervasive, but I almost always feel like I am listening to an underpants gnomes sales presentation, no matter who is pitching it, no matter what end of the political spectrum they are on. I understand what the speaker wants to do, and I also understand the end state to which the speaker aspires, but I almost never understand how the two are connected. We are sorely lacking a theory of change.

This brings me to my conversation with Gardner, which was also brief. He asked me whether I thought ELI was the community that could…. I put in an ellipse there both because I don’t remember Gardner’s exact wording and because a certain amount of what he was getting at was implied. I took him to mean that he was looking for the community that was super-progressive that could drive real change (although it is entirely possible that I was and am projecting some hope that he didn’t intend). It took me a while to wrap my head around this encounter too. On the one hand, I am a huge believer in the power of communities as networks for identifying and propagating positive change. On the other hand, I have grown to be deeply skeptical of them as having lasting power in broad educational reform. Every time I have found a community that I got excited about, one of two things inevitably happened: either so many people piled into it that it lost its focus and sense of mission, or it became so sure of its own righteousness that the epistemic closure became suffocating. There may be some sour grapes in that assessment—as Groucho Marx said, I don’t want to belong to any club that would have me as a member—but it’s not entirely so. I think communities are essential. And redeeming. And soul-nourishing. But I think it’s a rare community indeed—particularly in transient, professional, largely online communities, where members aren’t forced to work out their differences because they have to live with each other—that really provides transformative change. Most professional communities feel like havens, when I think we need to feel a certain amount of discomfort for real change to happen. The two are not mutually exclusive in principle—it is important to feel like you are in a safe environment in order to be open to being challenged—but in practice, I don’t get the sense that most of the professional communities I have been in have regularly encouraged  creative abrasion. At least, not for long, and not to the point where people get seriously unsettled.

Getting back to my reaction to Jim’s comment, I guess what pleased me so much is that I was proud to have provided a measure of hopefully productive and thought-provoking discomfort to somebody who has so often done me the same favor. This is a trait I admire in both Jim and Gardner. They won’t f**king leave me alone. Another thing that I admire about them is that they don’t just talk, and they don’t just play in their own little sandboxes. Both of them build experiments and invite others to play. If there is a way forward, that is it. We need to try things together and see how they work. We need to apply our theories and find out what breaks (and what works better than we could have possibly imagined). We need to see if what works for us will also work for others. Anyone who does that in education is a hero of mine.

So, yeah. Good conference.


The post Wanted – A Theory of Change appeared first on e-Literate.

e-Literate TV Case Study Preview: Middlebury College

Michael Feldstein - Sun, 2015-02-15 10:48

By Michael FeldsteinMore Posts (1013)

As we get closer to the release of the new e-Literate TV series on personalized learning, Phil and I will be posting previews highlighting some of the more interesting segments from the series. Both our preview posts and the series itself start with Middlebury College. When we first talked about the series with its sponsors, the Bill & Melinda Gates Foundation, they agreed to give us the editorial independence to report what we find, whether it is good, bad, or indifferent. And as part of our effort to establish a more objective frame, we started the series by going not to a school that was a Gates Foundation grantee but to the kind of place that Americans probably think of first when they think of a high-quality personalized education outside the context of technology marketing. We decided to go to an elite New England liberal arts college. We wanted to use that ideal as the context for talking about personalizing learning through technology. At the same time, we were curious to find out how technology is changing these schools and their notion of what a personal education is.

We picked Middlebury because it fit the profile and because we had a good connection through our colleagues at IN THE TELLING.[1] We really weren’t sure what we would find once we arrived on campus with the cameras. Some of what we found there was not surprising. In a school with a student/teacher ratio of 8.6 to 1, we found strong student/teacher relationships and empowered, creative students. Understandably, we heard concerns that introducing technology into this environment would depersonalize education. But we also heard great dialogues between students and teachers about what “personalized” really means to students who have grown up with the internet. And, somewhat unexpectedly, we saw some signs that the future of educational technology at places like Middlebury College may not be as different from what we’re seeing at public colleges and universities as you might think, as you’ll see in the interview excerpt below.

Jeff Howarth is an Assistant Professor of Geography at Middlebury. He teaches a very popular survey-level course in Geographic Information Systems (GIS). But it’s really primarily a course about thinking about spaces. As Jeff pointed out to me, we typically provide little to no formal education on spacial reasoning in primary and secondary schooling. So the students walking into his class have a wide range of skills, based primarily on their natural ability to pick them up on their own. This broad heterogeneity is not so different from the wide spread of skills that we saw in the developmental math program at Essex County College in Newark, NJ. Furthermore, the difference between a novice and an expert within a knowledge domain is not just about how many competencies they have racked up. It’s also about how they acquire those competencies. Jeff did his own study of how students learn in his class which confirmed broader educational research showing that novices in a domain tend to start with specific problems and generalize outward, while experts (like professors, but also like more advanced students) tend to start with general principles and apply them to the specific problem at hand. As Jeff pointed out to me, the very structure of the class schedule conspires against serving novice learners in the way that works best for them. Typically, students go to a lecture in which they are given general principles and then are sent to a lab to apply those principles. That order works for students who have enough domain experience to frame specific situations in terms of the general principles but not for the novices who are just beginning to learn what those general principles might even look like.

When Jeff thought about how to serve the needs of his students, the solution he came up with—partly still a proposal at this point—bears a striking resemblance to the basic design of commercial “personalized learning” courseware. I emphasize that he arrived at this conclusion through his own thought process rather than by imitating commercial offerings. Here’s an excerpt in which he describes deciding to flip his classroom before he had ever even heard of the term:

Click here to view the embedded video.

In the full ten-minute episode, we hear Jeff talk about his ideas for personalized courseware (although he never uses that term). And in the thirty-minute series, we have a great dialogue between students and faculty as well as some important context setting from the college leadership. The end result is that the Middlebury case study shows us that personalized learning software tools do not just have to be inferior substitutes for the real thing that are only for “other people’s children” while simultaneously reminding us of what a real personal education looks like and what we must be careful not to lose as we bring more technology into the classroom.

  1. Full disclosure: Since filming the case study, Middlebury has become a client of MindWires Consulting, the company that Phil and I run together.

The post e-Literate TV Case Study Preview: Middlebury College appeared first on e-Literate.

Database Flashback -- 3

Hemant K Chitale - Sun, 2015-02-15 10:02
Continuing my series on Oracle Database Flashback

As I pointed out in my previous post, the ability to flashback is NOT strictly specified by db_flashback_retention_target.  The actual scope may be greater than or even less than the target.

[oracle@localhost Hemant]$ sqlplus '/ as sysdba'

SQL*Plus: Release Production on Sun Feb 15 23:39:24 2015

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SYS>select sysdate from dual;


SYS>show parameter db_flashback_retention_target

------------------------------------ ----------- ------------------------------
db_flashback_retention_target integer 1440
SYS>select * from v$flashback_database_log;

-------------------- --------- ---------------- -------------- ------------------------
14573520 15-FEB-15 1440 24576000 0

SYS>select to_char(oldest_flashback_time,'DD-MON HH24:MI')
2 from v$flashback_database_log
3 /

15-FEB 23:31


In my previous post, the OLDEST_FLASHBACK_TIME was a week ago. Now, it doesn't appear to be so !

SYS>select * from v$flash_recovery_area_usage;

-------------------- ------------------ ------------------------- ---------------
REDO LOG 0 0 0
ARCHIVED LOG .95 .94 5
BACKUP PIECE 29.12 .12 6

7 rows selected.


After the blog of 08-Feb, my database had been SHUTDOWN. The instance was STARTED at 23:24 today and the database was OPENed at 23:25.

Sun Feb 08 23:08:21 2015
Shutting down instance (immediate)
Sun Feb 08 23:08:37 2015
ARCH shutting down
ARC0: Archival stopped
Thread 1 closed at log sequence 1
Successful close of redo thread 1
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Sun Feb 08 23:08:38 2015
Stopping background process VKTM
Sun Feb 08 23:08:40 2015
Instance shutdown complete
Sun Feb 15 23:24:46 2015
Starting ORACLE instance (normal)
Sun Feb 15 23:25:33 2015
QMNC started with pid=34, OS id=2449

So, it seems that, this time, Oracle says I cannot Flashback to 08-Feb. Although, on 08-Feb, it did say that I could Flashback to 01-Feb. I strongly recommend periodically query V$FLASH_RECOVERY_AREA_USAGE and V$FLASHBACK_DATABASE_LOG.
I have seen DBAs only referring to the parameter db_flashback_retention_target without querying these views.
Categories: DBA Blogs

SQL Server monitoring with nagios: installation

Yann Neuhaus - Sun, 2015-02-15 07:37

Nagios is an IT Infrastructure monitoring solution which is able to monitor SQL Server instances with a specific plugin.
I have installed this plugin to test those functionalities and I will explain here how to do it.

Prerequisites installation FreeTDS installation

FreeTDS is a set of libraries for Unix and Linux that allows your programs to natively talk to Microsoft SQL Server and Sybase databases.
First we will download this set of libraries, for that go to the freetds website and click on Stable Release:


The file Freetds-stable.tgz will be downloaded.
When the file is downloaded, go to the nagios download directory and uncompressed the tgz file:


Go to the Freetds directory and list files:


Prepare and control libraries before compilation:


Now, we will compile and install the Freetds libraries:


Please, check if you don't have any errors.
If not, Freetds is now installed.

Perl-module DBD::Sybase installation

DBD::Sybase is a Perl module which works with the DBI module to provide access to Sybase databases.
To download this module go to and click the download link:


When the file DBD-Sybase-1.15.tar.gz is downloaded, go to the download directory, uncompressed the file and go to the new directory:


We will now compile and install this new module:


Now the module is installed. Check if you received errors.
Create a symbolic link:


We are able now to edit the Freetds.conf file to change:

  • the tds version due to security advice (follow the link and find the chapter Security Advice)
  • add SQL Server parameters: IP Address, TCP/IP port, instance name if not default instance


The Perl module is installed.
Prerequisites are now installed and we have to install the SQL Server plugin.

Nagios SQL Server plugin Installation

First, we have to download the SQL Server plugin of Nagios. To do it connect here and click the below link:


When the file is downloaded:

  • go to the download directory
  • list files
  • uncompressed the file
  • re-list files to see the new plugin directory
  • change the location to the new directory


The next command to type is Configure which will check that all dependencies are present and configure & write a Makefile that will contain build orders.


After we will have to execute a make which will make the compilation (can be run as normal user):


The next command to run is a Make install which will install the plugin (as to be run as root):


The SQL Server Nagios plugin is successfully installed.

Set environment

As user root, change the permissions on file freetds.conf to be able to add server/instance from user nagios:


Add libraries /usr/local/freetds/lib to file /etc/


As user nagio, export the freetds libraries to the LD_LIBRARY_PATH environment variable:


Test the connection:


Connection failed due to wrong PERLLIB path...
To resolve this error, unset the PERLLIB variable.
We force the plugin named check_mssql_health to use the PERLLIB libraries of the OS instead of these of the Oracle client by unsetting the PERLLIB libraries of the check_oracle_health plugin:


Verification of the connection:


Verification succeeded.

The next steps will be now to test this plugin by checking which counters can be used, how to configure them, with which thresholds, how to subscribe to alerts...
Those steps will be covered by a next blog.

Next Windows Server edition postponed 'til 2016 [VIDEO]

Chris Foot - Fri, 2015-02-13 10:50


Hi, welcome to RDX! With all the talk about Windows 10, systems administrators are wondering what’s in store for the next version of Windows Server.

Apparently, interested parties will have to wait until 2016. ZDNet’s Mary Jo Foley noted Microsoft intends to further refine its flagship server operating system instead of releasing it at the same time as Windows 10.

Windows Server 2003 will cease to receive any support from Microsoft this July. As a result, it’s expected that companies will upgrade to Windows Server 2012 R2. In light of this prediction, some have speculated that a new version of Windows Server would not receive the adoption rates Microsoft would want to see.

Thanks for watching! Check in next time for more OS news.

The post Next Windows Server edition postponed 'til 2016 [VIDEO] appeared first on Remote DBA Experts.

EM 12C RACI and Maintenance Task

Arun Bavera - Fri, 2015-02-13 10:10
TASK RESPONSIBLE ACCOUNTABLE CONSULTED INFORMED Define Monitoring Requirements Target Owners, Infrastructure Teams, EM Admin EM Admin Installation planning and architecture EM Admin EM Admin Target Owners, Infrastructure Installation and configuration of EM EM Admin EM Admin Defining Agent deployment and patching procedures and processes EM Admin EM Admin Security and User Administration EM Admin/Security Admin EM Admin Admin Group Creation EM Admin EM Admin Target Owners Agent Deployment (can be performed by target owners) Target Owners Target Owners EM Admin Agent Patching (can be performed by target owners) Target Owners Target Owners EM Admin Target Configuration and Availability Target Owners Target Owners Agent Troubleshooting Target Owners, EM Admin EM Admin Target Troubleshooting Target Owners Target Owners EM Admin Weekly/Monthly/Quarterly Maintenance EM Admin EM Admin Target Owners OMS Patching EM Admin EM Admin Target OwnersRECOMMENDED MAINTENANCE TASKS TASK DAILY BIWEEKLY MONTHLY QUARTERLY Review critical EM component availability X Review events, incidents and problems for EM related infrastructure X Review overall health of the system including the job system, backlog, load, notifications and task performance X Review Agent issues for obvious problems (i.e. large percentage of agents with an unreachable status) X Review Agent issues (deeper /more detailed review of agents with consistent or continual problems) X Review metric trending for anything out of bounds X Evaluate database (performance, sizing, fragmentation) X Check for updates in Self Update (plug-ins, connectors, agents, etc.) Note that there is an out-of-box ruleset that will provide notification for the availability of new updates X Check for recommended patches XReferences:
Categories: Development

How to convert Excel file to csv

Laurent Schneider - Fri, 2015-02-13 10:02

#excel to #csv in #powershell (New-Object -ComObject Excel.Application).Workbooks.Open("c:x.xlsx").SaveAs("c:x.csv",6)

— laurentsch (@laurentsch) February 13, 2015

One-liner to convert Excel to CSV (or to and from any other format).
There is a bug 320369 if you have excel in English and your locale is not America. Just change your settings to us_en before conversion.

Ephemeral Port Issue with Essbase Has Been Fixed!

Tim Tow - Fri, 2015-02-13 09:24
The issue that has plagued a number of Essbase customers over the years related to running out of available ports has finally been fixed!

This issue, which often manifested itself with errors in the Essbase error 10420xx range, was caused by how the Essbase Java API communicated with the server. In essence, whenever a piece of information was needed, the Essbase Java API grabbed a port from the pool of available ports, did its business, and then released the port back to the pool. That doesn’t sound bad, but the problem occurs due to how Windows handles this pool of ports. Windows will put the port into a timeout status for a period of time before it makes the port available for reuse and the default timeout in Windows is 4 minutes! Further, the size of the available pool of ports is only about 16,000 ports in the later versions of Windows. That may sound like a lot of ports, but the speed of modern computers makes it possible, and even likely, that certain operations, such as the outline APIs, that call Essbase many, many times to get information would be subject to this issue. Frankly, we see this issue quite often with both VB and the Java Essbase Outline Extractors.

We brought this issue to the attention of the Java API team and assisted them by testing a prerelease version of the Java API jars. I am happy to report the fix was released with Essbase In addition, there is a new setting that allows you to turn the optimization on or off:


It is our understanding that this optimization is turned on by default. I also checked the default files shipped with both Essbase and and did not see that setting in those files. It may be one of those settings that is there in case it messes something else up. The work of our own Jay Zuercher in our labs and searching Oracle Support seems to have confirmed that thought. There is apparently an issue where EIS drill-through reports don't work in Smart View if socket optimization is turned on. It is documented in Oracle Support Doc ID 1959533.1.

There is also another undocumented setting:


According to Oracle development, this value defaults to 300 ms but there should be little need to ever change it. The only reason it is there is to tune socket optimization in case more than 2 sockets are used per Java API session.

Jay also tested the version in our labs with the Next Generation Outline Extractor. With the default settings, one large test outline we have, "BigBad", with about 120,000 members in it, extracted in 1 minute and 50 seconds.  With socket optimization turned off, the same outline was only about 25% complete after 2 hours.   In summary, this fix will be very useful for a lot of Oracle customers.
Categories: BI & Warehousing

Video: Oracle Platinum Partner Redstone Content Solutions

WebCenter Team - Fri, 2015-02-13 08:47

John Klein, Principal of Redstone Content Solutions, sits down with Sam Sharp, Senior Director at Oracle, to discuss Redstone’s expertise in integrating WebCenter products, and tying the entire WebCenter Suite together.

Health care needs and database service requirements

Chris Foot - Fri, 2015-02-13 08:03

Delivering affordable, quality care to patients is an objective every health care organization in the United States would like to achieve, but doing so necessitates the appropriate backend systems. 

From a database administration service standpoint, hospitals, care clinics and other institutions require solutions with strict authentication protocols and integrated analytics functions. In addition, these databases must be accessible by those who frequently view patient information. These needs demand a lot from DBAs, but sacrificing operability isn't an option. 

Assessing the health care arena 
Bruce Johnson, a contributor to Supply Chain Digital, outlined how U.S. health care providers are managing the industry's forced evolution. The implementation of the Affordable Care Act incited a wave of supply chain redevelopment, technology consolidation and electronic health record systems integration. It's an atmosphere that has administrators and upper management wearing more hats than they can fit on their heads. 

Consider the impact of EHR solutions on health care workflow. The primary reason why their deployment is required by law is because they promise to allow professionals in the industry to share patient information more effectively. For example, if a person's primary care physician suspects that his or her patient may have a spinal problems, the PCP may refer that individual to a chiropractor. In order to provide the specialist with as much information as he or she needs, the PCP delivers the patient's EHR to the chiropractor.

What does this mean for databases? 
EHR software primarily handles structured information, containing data regarding an individual's ailments, past medications, height, weight and so forth. Therefore, it makes sense that these solutions would operate on top of databases using structured query language. 

One of the reasons why relational engines such as MySQL, SQL Server and Oracle 12c are necessary is because of the transaction protocol these solutions abide by: atomicity, consistency, isolation and durability.

According to TechTarget, what the ACID model does is ensure that data transfers or manipulations can be easily monitored and validated. The rule set also negates the prevalence of unauthorized or invalid data transportation or changes. For example, the "consistency" component of ACID returns all information to its original state in the event a transaction failure occurs. 

With this protocol in mind, health care organizations using relational databases to support their EHR systems should consult a database monitoring service. As EHR hold a wealth of sensitive information, hackers are likely to target these records whenever possible. 

The post Health care needs and database service requirements appeared first on Remote DBA Experts.

How to add a Cordova NFC plugin in MAF 2.1 application

You may have already noticed that the latest release of Oracle MAF 2.1 brings a major refresh to the MAF-Cordova story. In short, we have two big improvements here: 1. The version of Cordova has...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Fine Grained Auditing (FGA) and Protecting Oracle E-Business PII Data for Executives

With the recent news about yet another database breach of Personally Identifiable Information (PII), Integrigy had a discussion with a client about how to better protect the PII data of their executives.

The following Fine-Grained-Auditing (FGA) policy started the discussion. The policy below will conditionally log direct connections to the Oracle E-Business Suite database when the PII data of corporate executives is accessed. For example, it will ignore E-Business Suite end-user connections to the database, but will catch people directly connecting to the database from their laptop. However, it will only do so if PII data for executives is accessed:


   object_schema     =>  'HR',
   object_name       =>  'PER_ALL_PEOPLE_F',
   policy_name       =>  'FGA_PPF_NOT_GUI_AND_OFFICER',
   audit_condition   =>  ' PER_ALL_PEOPLE_F.PERSON_ID IN (
         (''IP of your DB server’’, ‘’IP of your cm server’’, 
           ‘’IP of your application server’’) 
        AND SYS_CONTEXT (''USERENV'',''CURRENT_USER'') = ''APPS'' ) ',
   audit_column      =>   NULL,
   handler_schema    =>   NULL,
   handler_module    =>   NULL,
   enable            =>   TRUE,
   statement_types   =>  'SELECT',
   audit_trail       =>   DBMS_FGA.DB,
   audit_column_opts =>   DBMS_FGA.ANY_COLUMNS);


Here is an explanation of the policy above:

  • Audits only direct database activity and ignores database connections from the E-Business Suite user interface, the database server, the web and application servers, as well as the concurrent manager.
  • Audits SELECT activity against PER_ALL_PEOPLE_F or any view based on the table PER_ALL_PEPOPLE_F. PII data exists outside of PER_ALL_PEOPLE_F but this table is the central table within the E-Business Suite that defines a person and thus contains critical PII data such as name, birthdate and National Identifier.
  • Audits ALL columns in the table but could easily be restricted to only specific columns.
  • Audits ONLY those result sets that includes current or ex-employee whose job title has ‘%Executive%' in the Job Title. Note this policy was demonstrated using the Vision demo database. Your Job Key Flexfield definition will be different.
  • FGA comes standard with the Enterprise license of the Oracle database. If you own the Oracle E-Business Suite, you don't need an additional license to use FGA.

The policy above would certainly strengthen an overall database security posture, but it does have several immediate drawbacks:

  • While it does address risks with direct database activity, including the use of the APPS account from a laptop, it will not guard against privileged database users such as DBAs.
  • Spoofing of USRENV attributes is possible which precludes using any USERENV attribute other than the IP address and DB username.
  • Audit data needs security stored and regularly purged. Privileged users may have access to FGA data and policies. Audit data also needs to be retained and purged per corporate policies.
  • Lastly, the performance impact of the policy above would need to be carefully measured. If the policy above were to be implemented, it would need to be seriously tested, especially if modules are to be used such as Oracle Advanced Benefits and/or Payroll.

As part of a database security program, Integrigy recommends that all clients implement defense in depth. No one tool or security feature will protect your data. Oracle Traditional Auditing (TA) as well as FGA policies similar to the above should be implemented, but the both TA and FGA have limitations and trade-offs.

Integrigy recommends that both Oracle TA and FGA be used with database security solutions such as the Oracle Audit Vault and Database Firewall (AVDF), Splunk, Imperva, and IBM Guardium.  Database monitoring and alerting needs to be automated and should done using a commercial tool. You also need to secure and monitor privileged users such as DBAs and database security cannot come at the cost of overall application performance.

Our client conversation about the FGA policy above concluded that while the policy could work, given the variety of different database connections, a better solution would be to utilize a variation of the policy above along with Splunk, which they already own.

If you have questions about the sample FGA policy above or about database security, please contact us at:


Tags: AuditingSensitive DataHIPAAOracle E-Business Suite
Categories: APPS Blogs, Security Blogs

Oracle Querayable Patch Interface

Pakistan's First Oracle Blog - Thu, 2015-02-12 18:39
Starting from Oracle 12c, from within the SQL patching information can be obtained. A new package DBMS_QOPATCH offers some really nifty procedures to get the patch information. Some of that information is shared below:

To get patch information from the inventory:

SQL> select xmltransform(dbms_qopatch.get_opatch_install_info, dbms_qopatch.get_opatch_xslt) from dual;

Oracle Home      : /u01/app/oracle/product/12.1.0/db_1
Inventory      : /u01/app/oraInventory

The following is an equivalent of opatch lsinventory command at the OS level:

SQL> select xmltransform(dbms_qopatch.get_opatch_lsinventory, dbms_qopatch.get_opatch_xslt) from dual;

Oracle Querayable Patch Interface 1.0
Oracle Home      : /u01/app/oracle/product/12.1.0/db_1
Inventory      : /u01/app/oraInventory
--------------------------------------------------------------------------------Installed Top-level Products (1):
Oracle Database 12c             
Installed Products ( 131)

Oracle Database 12c              
Sun JDK                   
Installer SDK Component           
Oracle One-Off Patch Installer          
Oracle Universal Installer          
Oracle USM Deconfiguration          
Oracle Configuration Manager Deconfiguration  
Oracle RAC Deconfiguration          
Oracle DBCA Deconfiguration          
Oracle Database Plugin for Oracle Virtual Assembly Builder
Oracle Configuration Manager Client      
Oracle Net Services              
Oracle Database 12c              
Oracle OLAP                  
Oracle Spatial and Graph          
Oracle Partitioning              
Enterprise Edition Options          

Interim patches:

Categories: DBA Blogs

Multiple Utilities Pack versions on Self Update

Anthony Shorten - Thu, 2015-02-12 15:31

Customers of the Oracle Application Management Pack for Oracle Utilities will notice that there are multiple entries for the pack within Self Update. Let me clarify:

  • The plugin, marked below, named the Oracle Utilities Application is the correct pack to install and use. This is the new addon version of the pack with the latest functionality.
  • The plugin named the Oracle Utilities is the original version of the pack. Whilst the version number is higher than the new pack, it is only listed for customers of that version to have access to the software.

Self Update

In the future, the Oracle Utilities entries will disappear and only the Oracle Utilities Application entries will remain.

In short, use the Oracle Utilities Application plugin not the Oracle Utilities plugins.

How do I ...

Tim Dexter - Thu, 2015-02-12 15:23

An email came in this morning to an internal mailing list,

We have an Essbase customer with some reporting requirements and we are evaluating BI Publisher as the potential solution. I'd like to ask for your help with any document, blog or white paper with guidelines about using BI Publisher with Essbase as the main data source.

Is there any tutorial showing how to use BI Publisher with Essbase as the main data source?

There is not one to my knowledge but trying to be helpful I came up with the following response

I'll refer to the docs ...
First set up your connection to Essbase
Then create your data model using that Essbase connection
Use the MDX query builder to create the query or write it yourself (lots of fun :)
Add parameters (optional)
Then build layouts for your Essbase query
annnnd your're done :)

Simple, right? Well simple in its format but it required me to know the basic steps to build said report and then where to find the appropriate pages in the doc for the links. Leslie saw my reply and commented on how straightforward it was and how our docs are more like reference books than 'how to's.' This got us thinking. I have noticed that the new 'cloud' docs have How do I ... sections where a drop down will then show maybe 10 tasks associated with the page Im on right now in the application.

Getting that help functionality into the BIP is going to take a while. We thought, in the mean time, we could carve out a section on the blog for just such content. Here's where you guys come in. What do you want to know how to do? Suggestions in the comment pleeeease!

Categories: BI & Warehousing

Considerations about SQL Server database files placement, Netapp storage and SnapManager

Yann Neuhaus - Thu, 2015-02-12 14:06

When you install SQL Server, you have to consider how to place the database files. At this point you will probably meet the well-known best practices and guidelines provided by Microsoft but are you really aware of the storage vendor guidelines? Let’s talk about it in this blog post.

A couple of weeks ago, I had the opportunity to implement a new SQL Server architecture over on a Netapp storage model FAS 3140. The particularity here is that my customer wants to manage his backups with NetApp SnapManager for SQL Server. This detail is very important because in this case we have to meet some requirements about storage layout. After reading the Netapp storage documents I was ready to begin my database files.

First of all, we should know how SnapManager works. In fact SnapManager performs SQL Server backups by using either a snapshot of database files or by issuing streaming backups. Streaming backups concern only the system database files and the transaction log files. To achieve this task, SnapManager will coordinate several components like SnapDrive for Windows to control the storage LUNS and SQL Server to freeze IO by using the VSS Framework. It is important to precise that without SnapManager, performing snapshots from a pure storage perspective is still possible but unfortunately in this case the storage layer is not aware of the database files placement. We may find in such situation that a database is not consistent. Here a simplified scheme of what's happen during the SnapManager backup process:


SnapManager (1) --> VDI (2) - - backup database .. with snapshot --> VSS (3) - database files IO freeze  --> SnapDrive (3) - snapshot creation


So why do we have to take care about database file placements with Netapp Storage? Well, according to our first explanation above, if we perform snapshot backups the unit of work is in fact the volume (FlexVol is probably the better term here. FlexVol is a kind of virtual storage pool to manage physical storage efficiently and can include one or more LUNs). In others words, when a snapshot is performed by SnapManager all related LUNs are considered together and all concerned database files in single volume are frozen when snapshot is created. Notice that if a database is spread across several volumes, IO are frozen for all concerned volumes at the same time but snapshots are taken serially.

This concept is very important and as a database administrator, we also now have to deal with these additional constraints (that I will describe below) to place our database files regarding the storage layout and the RPO / RTO needs.

1- System database files must be placed on a dedicated volume including their respective transaction log files. Only streaming backups are performed for system databases

2- Tempdb database files must also be placed on a dedicated volume that will be excluded from SnapManager backup operations.

3- Otherwise, placement of user databases depends on several factors as their size or the storage layout architecture in-place. For example, we may encounter cases with small databases that will share all their database files on the same volume. We may also encounter cases where we have to deal with placement of large database files. Indeed, large databases often include different Filegroups with several database files. In this case, spreading database files on different LUNs may be a performance best practice and we have to design the storage with different LUNs that share the same dedicated volume. Also note that the database transaction log files have to be on separated volume (remember that transaction log files are concerned by streaming backups).

4- In consolidated scenario, we may have several SQL Server instances on the same server with database files on the same volume or we may have dedicated volumes for each instance.

Regarding these constraints, keep in mind that your placement strategy may presents some advantages and but also some drawbacks:

For example, restoring a database on a dedicated volume from a snapshot will be quicker than using streaming backups but in the same time we may reach quickly the maximum number of drives we may use if we are in the situation where we dedicate one or several volumes per database. Imagine that you have 5 databases spreaded on 4 volumes (system, tempdb, data and logs)... I'll let you do the math. A possible solution is to replace the letter-oriented volume identification by mount points.

In the same time, sharing volumes between several databases makes more difficult the restore in-place process of only one database because in this case we will have to copy data from a mounted snapshot back into the active file system. But increasing the number of shared files on a single volume may cause timeout issues during the snapshot operation. According to Netapp Knowledgebase here the maximum recommended number of databases on a single volume is 35.  If you want to implement a policy that verify that the number of database files is lower than this threshold please see the blog post of my colleague Stéphane Haby. The number of files concerned by a snapshot creation, the size of the volumes or the storage performance are all factors to consider during the database files placement.

After digesting all these considerations, let me present briefly my context: 3 SQL Server instances on a consolidated server with the following storage configuration. We designed the storage layout as follows




One volume for system databases, one volume for tempdb, one volume for user database data files, one volume for user database log files and finally one volume dedicated for the snapinfo. The snapinfo volume is used by SnapManager to store streaming backups and some metadata

But according to what I said earlier my configuration seems to have some caveats. Indeed, to restore only one database in-place in this case is possible but because we’re actually sharing database files between several databases and several SQL Server instances, we will force the snapmanager to copy data from a mounted snapshot back into the file system. Otherwise, we may also face the timeout threshold issue (10s hard-coded into the VSS framework) during the freezing database IO step. Finally we may not exclude issues caused by the limit of 35 databases on a single volume because I’m in a consolidated environment scenario. Fortunately my customer is aware of all the limitations and he is waiting for the upgrade of its physical storage layout. We will plan to redesign the final storage layout at this time. Until then, the number of databases should not be a problem.

See you!

California Community College OEI Selects LMS Vendor

Michael Feldstein - Thu, 2015-02-12 13:53

By Phil HillMore Posts (289)

The Online Education Initiative (OEI) for California’s Community College System has just announced its vendor selection for a Common Course Management System (CCMS)[1]. For various reasons I cannot provide any commentary on this process, so I would prefer to simply direct people to the OEI blog site. Update: To answer some questions, the reason I cannot comment is that CCC is a MindWires client, and I facilitated the meetings. Based on this relationship we have a non-disclosure agreement with OEI.

Here is the full announcement.

The California Community Colleges (CCC) Online Education Initiative (OEI) announced its intent to award Instructure Inc. the contract to provide an online course management system and related services to community colleges statewide.

Support for Instructure’s Canvas system was nearly unanimous among the OEI’s Common Course Management System (CCMS) Committee members, with overwhelming support from student participants, officials said. Canvas is a course management platform that is currently being used by more than 1,000 colleges, universities and school districts across the country.

“Both the students and faculty members involved believed that students would be most successful using the Canvas system,” said OEI Statewide Program Director Steve Klein. “The student success element was a consistent focus throughout.”

The announcement includes some information on the process as well.

A 55-member selection committee participated in the RFP review that utilized an extensive scoring rubric. The decision-making process was guided by and included the active involvement of the CCMS Committee, which is composed of the CCMS Workgroup of the OEI Steering Committee, the members of OEI’s Management Team, and representatives from the eight Full Launch Pilot Colleges, which will be the first colleges to test and deploy the CCMS tool.

The recommendation culminated an extremely thorough decision-making process that included input from multiple sources statewide, and began with the OEI’s formation of a CCMS selection process in early 2014. The selection process was designed to ensure that a partner would be chosen to address the initiative’s vision for the future.

  1. Note that this is an Intent to Award, not yet a contract.

The post California Community College OEI Selects LMS Vendor appeared first on e-Literate.

MongoDB 3.0

DBMS2 - Thu, 2015-02-12 13:44

Old joke:

  • Question: Why do policemen work in pairs?
  • Answer: One to read and one to write.

A lot has happened in MongoDB technology over the past year. For starters:

  • The big news in MongoDB 3.0* is the WiredTiger storage engine. The top-level claims for that are that one should “typically” expect (individual cases can of course vary greatly):
    • 7-10X improvement in write performance.
    • No change in read performance (which however was boosted in MongoDB 2.6).
    • ~70% reduction in data size due to compression (disk only).
    • ~50% reduction in index size due to compression (disk and memory both).
  • MongoDB has been adding administration modules.
    • A remote/cloud version came out with, if I understand correctly, MongoDB 2.6.
    • An on-premise version came out with 3.0.
    • They have similar features, but are expected to grow apart from each other over time. They have different names.

*Newly-released MongoDB 3.0 is what was previously going to be MongoDB 2.8. My clients at MongoDB finally decided to give a “bigger” release a new first-digit version number.

To forestall confusion, let me quickly add:

  • MongoDB acquired the WiredTiger product and company, and continues to sell the product on a standalone basis, as well as bundling a version into MongoDB. This could cause confusion because …
  • … the standalone version of WiredTiger has numerous capabilities that are not in the bundled MongoDB storage engine.
  • There’s some ambiguity as to when MongoDB first “ships” a feature, in that …
  • … code goes to open source with an earlier version number than it goes into the packaged product.

I should also clarify that the addition of WiredTiger is really two different events:

  • MongoDB added the ability to have multiple plug-compatible storage engines. Depending on how one counts, MongoDB now ships two or three engines:
    • Its legacy engine, now called MMAP v1 (for “Memory Map”). MMAP continues to be enhanced.
    • The WiredTiger engine.
    • A “please don’t put this immature thing into production yet” memory-only engine.
  • WiredTiger is now the particular storage engine MongoDB recommends for most use cases.

I’m not aware of any other storage engines using this architecture at this time. In particular, last I heard TokuMX was not an example. (Edit: Actually, see Tim Callaghan’s comment below.)

Most of the issues in MongoDB write performance have revolved around locking, the story on which is approximately:

  • Until MongoDB 2.2, locks were held at the process level. (One MongoDB process can control multiple databases.)
  • As of MongoDB 2.2, locks were held at the database level, and some sanity was added as to how long they would last.
  • As of MongoDB 3.0, MMAP locks are held at the collection level.
  • WiredTiger locks are held at the document level. Thus MongoDB 3.0 with WiredTiger breaks what was previously a huge write performance bottleneck.

In understanding that, I found it helpful to do a partial review of what “documents” and so on in MongoDB really are.

  • A MongoDB document is somewhat like a record, except that it can be more like what in a relational database would be all the records that define a business object, across dozens or hundreds of tables.*
  • A MongoDB collection is somewhat like a table, although the documents that comprise it do not need to each have the same structure.
  • MongoDB documents want to be capped at 16 MB in size. If you need one bigger, there’s a special capability called GridFS to break it into lots of little pieces (default = 1KB) while treating it as a single document logically.

*One consequence — MongoDB’s single-document ACID guarantees aren’t quite as lame as single-record ACID guarantees would be in an RDBMS.

By the way:

  • Row-level locking was a hugely important feature in RDBMS about 20 years ago. Sybase’s lack of it is a big part of what doomed them to second-tier status.
  • Going forward, MongoDB has made the unsurprising marketing decision to talk about “locks” as little as possible, relying instead on alternate terms such as “concurrency control”.

Since its replication mechanism is transparent to the storage engine, MongoDB allows one to use different storage engines for different replicas of data. Reasons one might want to do this include:

  • Fastest persistent writes (WiredTiger engine).
  • Fastest reads (wholly in-memory engine).
  • Migration from one engine to another.
  • Integration with some other data store. (Imagine, for example, a future storage engine that works over HDFS. It probably wouldn’t have top performance, but it might make Hadoop integration easier.)

In theory one can even do a bit of information lifecycle management (ILM), by using different storage engines for different subsets of database, by:

  • Pinning specific shards of data to specific servers.
  • Using different storage engines on those different servers.

That said, similar stories have long been told about MySQL, and I’m not aware of many users who run multiple storage engines side by side.

The MongoDB WiredTiger option is shipping with a couple of options for block-level compression (plus prefix compression that is being used for indexes only). The full WiredTiger product also has some forms of columnar compression for data.

One other feature in MongoDB 3.0 is the ability to have 50 replicas of data (the previous figure was 12). MongoDB can’t think of a great reason to have more than 3 replicas per data center or more than 2 replicas per metropolitan area, but some customers want to replicate data to numerous locations around the world.

Related link

Categories: Other