Adding An EMC XtremIO Volume As An ASM Disk With Oracle Database 12c On Linux – It Does Not Get Any Easier Than This.
Provisioning high-performance storage has always been a chore. Care and concern over spindle count, RAID type, RAID attributes, number of controller arms involved and a long list of other complexities have burdened storage administrators. Some of these troubles were mitigated by the advent of Automatic Storage Management–but not entirely.
Wouldn’t it be nice if the complexity of storage provisioning could be boiled down to but a single factor? Wouldn’t it be nice if that single factor was, simply, capacity? With EMC XtremIO the only factor storage administrators need to bear in mind when provisioning storage is, indeed, capacity.
With EMC XtremIO a storage administrator hears there is a need for, say, one terabyte of storage and that is the entirety of information needed. No more questions about the I/O pattern (e.g., large sequential writes ala redo logging, etc). The Database Administrator simply asks for capacity with a very short sentence and the Storage Administrator clicks 3 buttons in the XtremIO GUI and that’s all there is to it.Pictures Speak Thousands of Words
I too enjoy the simplicity of XtremIO in my engineering work. Just the other day I ran short on space in a tablespace while testing Oracle Database 12c intra-node parallel query. I was studying a two-node Real Application Clusters setup attached to an EMC XtremIO array via 8 paths of 8GFC Fibre Channel. The task at hand was a single parallel CTAS (Create Table As Select) but the command failed because my ASM disk group ran out of space when Oracle Database tried to extend the BIGFILE tablespace.
Since I had to add some space I thought I’d take a few screen shots to show readers of this blog how simple it is to perform the full cycle of tasks required to add space to an active cluster with ASM in an XtremIO environment.
The following screen shot shows the error I was reacting to:
Since the following example shows host configuration steps please note the Linux distribution (Oracle Linux) and kernel version (UEK) I was using:
The following screenshot shows the XtremIO GUI configuration tab. I selected “Add” and then typed a name and size (1TB) of the volume I wanted to create:
NOTE: Right click the embedded images for greater clarity
The following screenshot shows how I then selected the initiators (think hosts) from the right-hand column that I wanted to see the new volume:
After I clicked “apply” I could see my new volume in my “12C” folder. With the folder construct I can do things like create zero-overhead, immediate, writable snapshots with a single mouse click. As the following screenshot shows, I highlighted “data5″ so I could get details about the volume in advance of performing tasks on the host. The properties tab shows me the only information I need to proceed–the NAA Identifier. Once I had the NAA Identifier I moved on to the task of discovering the new volume on the hosts.
Host discovery consists of three simple steps:
- Multipath discovery
- Updating the udev rules file with a text editor
- Updating udev state with udevadm commands
On both nodes of the cluster I executed the following series of commands. This series of commands generates a lot of terminal output so I won’t show that in this blog post.
# multipath -F ;service multipathd restart ; rescan-scsi-bus.sh -r
After executing the multipath related commands I was able to see the new volume (0002a) on both nodes of the cluster. Notice how the volume has different multipath names (mpathab, mpathai) on the hosts. This is not an issue since the volumes will be controlled by udev:
After verifying the volumes were visible under DM-MPIO I moved on to the udev actions. The following screenshot shows how I added an ACTION line in the udev rules file and copied it to the other RAC host and then executed the udev update commands on both RAC hosts:
I then could see “/dev/asmdisk6″ on both RAC hosts:
The next task was to use ASMCA (ASM Configuration Assistant) to add the XtremIO volume to the ASM disk group called “DATA”:
As the following screenshot shows the volume is visible as /dev/asmdisk6:
I selected asmdisk6 and the task was complete:
I then saw evidence of ASM rebalancing in the XtremIO GUI Performance tab:
With EMC XtremIO you provision capacity and that allows you to speak in very short sentences with the application owners that share space in the array.
It doesn’t get any easier than this.
Filed under: oracle
Little Things Doth Crabby Make – Part XVIII. Automatic Storage Management Won’t Let Me Use My Disk For My Files! Yes, It Will!
It’s been a long time since my last installment in the Little Things Doth Crabby Make series and to be completely honest this particular topic isn’t really all that fit for a LTDCM installment because it covers something that is possible but less than expedient. That said, there are new readers of this blog and maybe it’s time they google “Little Things Doth Crabby Make” to see where this series has been. This post might rustle up that curiosity!
So what is this blog post about? It’s about stuffing any file system file into Automatic Storage Management space. OK, so maybe this is just morbid curiosity or trivial pursuit. Maybe it’s just a parlor trick. I would agree with any of those descriptions. Nonetheless maybe there are 42 or so people out there who didn’t know this. If so, this post is for them.ASMCMD cp Command
The cp sub-command of ASM lets you stuff certain database files into ASM. We all know this. However, just to make it all fresh in people’s minds I’ll show a screen shot of me trying to push a compressed tar archive of $ORACLE_HOME/bin/oracle up into ASM:
Well, that’s not surprising. But what happens if I take heed of the error message and attempt to placate? The block size is 8KB so the following screen shot shows me rounding up the size of the compressed tar archive to an 8192B blocking factor:
ASMCMD still won’t gobble up the file. That’s still not all that surprising because after ASMCMD checked the geometry of the file it then read the file looking for a header or any file magic it could understand. As you can see ASMCMD doesn’t see a file type it understands. The following screen shot shows me pre-pending the tar archive with file magic I know ASMCMD must surely understand. I have a database with a tablespace called foo that I created in a non-Oracle Disk Manager naming convention (foo.dbf). The screen shot shows me:
- Extracting the foo.dbf file
- “Borrowing” 1MB from the head of the file
- Creating a compressed tar archive of the Oracle Database executable
- Rounding up the size of the compressed tar archive to an 8192B blocking factor
So now I have a file that has the “shape” of a datafile and the necessary header information from a datafile. The next screen shot shows:
- ASMCMD cp command pushing my file into ASM
- Removal of all of my current working directory files
- ASMCMD cp command pulling the file form ASM and into my current working directory
- Extracting the contents of the “embedded” tar archive
- md5sum(1) proof the file contents survived the journey
OK, so that’s either a) something nobody would ever do or b) something that can be done with some elegant execution of some internal database package in a much less convoluted way or c) a combination of both “a” and “b” or d) a complete waste of my time to post, or, finally, e) a complete waste of your time reading the post. I’m sorry for “a”,”b”,”c” and certainly “e” if the case should be so.
Now you must wonder why I put this in the Little Things Doth Crabby Make series. That’s simple. I don’t like any “file system” imposing restrictions on file types :)
Filed under: oracle
I want to make these two points right out of the gate:
- I do not question Oracle’s IOPS claims in Exadata datasheets
- Everyone makes mistakes
Like me. On January 21, 2015, Oracle announced the X5 generation of Exadata. I spent some time studying the datasheets from this product family and also compared the information to prior generations of Exadata namely the X3 and X4. Yesterday I graphed some of the datasheet numbers from these Exadata products and tweeted the graphs. I’m sorry to report that two of the graphs were faulty–the result of hasty cut and paste. This post will clear up the mistakes but I owe an apology to Oracle for incorrectly graphing their datasheet information. Everyone makes mistakes. I fess up when I do. I am posting the fixed slides but will link to the deprecated slides at the end of this post.We’re Only Human
Wouldn’t IT be a more enjoyable industry if certain IT vendors stepped up and admitted when they’ve made little, tiny mistakes like the one I’m blogging about here? In fact, wouldn’t it be wonderful if some of the exceedingly gruesome mistakes certain IT vendors make would result in a little soul-searching and confession? Yes. It would be really nice! But it’ll never happen–well, not for certain IT companies anyway. Enough of that. I’ll move on to the meat of this post. The rest of this article covers:
- Three Generations of Exadata IOPS Capability
- Exadata IOPS Per Host CPU
- Exadata IOPS Per Flash SSD
- IOPS Per Exadata Storage Server License Cost
The following chart shows how Oracle has evolved Exadata from the X3 to the X5 EF model with regard to IOPS capability. As per Oracle’s datasheets on the matter these are, of course, SQL-driven IOPS. Oracle would likely show you this chart and nothing else. Why? Because it shows favorable, generational progress in IOPS capability. A quick glance shows that read IOPS improved just shy of 3x and write IOPS capability improved over 4x from the X3 to X5 product releases. These are good numbers. I should point out that the X3 and X4 numbers are the datasheet citations for 100% cached data in Exadata Smart Flash Cache. These models had 4 Exadata Smart Flash Cache PCIe cards in each storage server (aka, cell). The X5 numbers I’m focused on reflect the performance of the all-new Extreme Flash (EF) X5 model. It seems Oracle has started to investigate the value of all-flash technology and, indeed, the X5 EF is the top-dog in the Exadata line-up. For this reason I choose to graph X5 EF data as opposed to the more pedestrian High Capacity model which has 12 4TB SATA drives fronted with PCI Flash cards (4 per storage server). The tweets I hastily posted yesterday with the faulty data points aimed to normalize these performance numbers to important factors such as host CPU, SSD count and Exadata Storage Server Software licensing costs. The following set of charts are the error-free versions of the tweeted charts.Exadata IOPS Per Host CPU
Oracle’s IOPS performance citations are based on SQL-driven workloads. This can be seen in every Exadata datasheet. All Exadata datasheets for generations prior to X4 clearly stated that Exadata IOPS are limited by host CPU. Indeed, anyone who studies Oracle Database with SLOB knows how all of that works. SQL-driven IOPS requires host CPU. Sadly, however, Oracle ceased stating the fact that IOPS are host-CPU bound in Exadata as of the advent of the X4 product family. I presume Oracle stopped correctly stating the factual correlation between host CPU and SQL-driven IOPS for only the most honorable of reasons with the best of customers’ intentions in mind. In case anyone should doubt my assertion that Oracle historically associated Exadata IOPS limitations with host CPU I submit the following screen shot of the pertinent section of the X3 datasheet: Now that the established relationship between SQL-driven IOPS and host CPU has been demystified, I’ll offer the following chart which normalizes IOPS to host CPU core count: I think the data speaks for itself but I’ll add some commentary. Where Exadata is concerned, Oracle gives no choice of host CPU to customers. If you adopt Exadata you will be forced to take the top-bin Xeon SKU with the most cores offered in the respective Intel CPU family. For example, the X3 product used 8-core Sandy Bridge Xeons. The X4 used 12-core Ivy Bridge Xeons and finally the X5 uses 18-core Haswell Xeons. In each of these CPU families there are other processors of varying core counts at the same TDP. For example, the Exadata X5 processor is the E5-2699v3 which is a 145w 18-core part. In the same line of Xeons there is also a 145w 14c part (E5-2697v3) but that is not an option to Exadata customers.
All of this is important since Oracle customers must license Oracle Database software by the host CPU core. The chart shows us that read IOPS per core from X3 to X4 improved 18% but from X4 to X5 we see only a 3.6% increase. The chart also shows that write IOPS/core peaked at X4 and has actually dropped some 9% in the X5 product. These important trends suggest Oracle’s balance between storage plumbing and I/O bandwidth in the Storage Servers is not keeping up with the rate at which Intel is packing cores into the Xeon EP family of CPUs. The nugget of truth that is missing here is whether the 145w 14-core E5-2697v3 might in fact be able to improve this IOPS/core ratio. While such information would be quite beneficial to Exadata-minded customers, the 22% drop in expensive Oracle Database software in such an 18c versus 14c scenario is not beneficial to Oracle–especially not while Oracle is struggling to subsidize its languishing hardware business with gains from traditional software.Exadata IOPS Per Flash SSD
Oracle uses their own branded Flash cards in all of the X3 through X5 products. While it may seem like an implementation detail, some technicians consider it important to scrutinize how well Oracle leverages their own components in their Engineered Systems. In fact, some customers expect that adding significant amounts of important performance components, like Flash cards, should pay commensurate dividends. So, before you let your eyes drift to the following graph please be reminded that X3 and X4 products came with 4 Gen3 PCI Flash Cards per Exadata Storage Server whereas X5 is fit with 8 NVMe flash cards. And now, feel free to take a gander at how well Exadata architecture leverages a 100% increase in Flash componentry: This chart helps us visualize the facts sort of hidden in the datasheet information. From Exadata X3 to Exadata X4 Oracle improved IOPS per Flash device by just shy of 100% for both read and write IOPS. On the other hand, Exadata X5 exhibits nearly flat (5%) write IOPS and a troubling drop in read IOPS per SSD device of 22%. Now, all I can do is share the facts. I cannot change people’s belief system–this I know. That said, I can’t imagine how anyone can spin a per-SSD drop of 22%–especially considering the NVMe SSD product is so significantly faster than the X4 PCIe Flash card. By significant I mean the NVMe SSD used in the X5 model is rated at 260,000 random 8KB IOPS whereas the X4 PCIe Flash card was only rated at 160,000 8KB read IOPS. So X5 has double the SSDs–each of which is rated at 63% more IOPS capacity–than the X4 yet IOPS per SSD dropped 22% from the X4 to the X5. That means an architectural imbalance–somewhere. However, since Exadata is a completely closed system you are on your own to find out why doubling resources doesn’t double your performance. All of that might sound like taking shots at implementation details. If that seems like the case then the next section of this article might be of interest.IOPS Per Exadata Storage Server License Cost
As I wrote earlier in this article, both Exadata X3 and Exadata X4 used PCIe Flash cards for accelerating IOPS. Each X3 and X4 Exadata Storage Server came with 12 hard disk drives and 4 PCIe Flash cards. Oracle licenses Exadata Storage Server Software by the hard drive in X3/X4 and by the NVMe SSD in the X5 EF model. To that end the license “basis” is 12 units for X3/X5 and 8 for X5. Already readers are breathing a sigh of relief because less license basis must surely mean less total license cost. Surely Not! Exadata X3 and X4 list price for Exadata Storage Server software was $10,000 per disk drive for an extended price of $120,000 per storage server. The X5 EF model, on the other hand, prices Exadata Storage Server Software at $20,000 per NVMe SSD for an extended price of $160,000 per Exadata Storage Server. With these values in mind feel free to direct your attention to the following chart which graphs the IOPS per Exadata Storage Server Software list price (IOPS/license$$). The trend in the X3 to X4 timeframe was a doubling of write IOPS/license$$ and just short of a 100% improvement in read IOPS/license$$. In stark contrast, however, the X5 EF product delivers only a 57% increase in write IOPS/license$$ and a troubling, tiny, 17% increase in read IOPS/license$$. Remember, X5 has 100% more SSD componentry when compared to the X3 and X4 products.Summary
No summary needed. At least I don’t think so.About Those Faulty Tweeted Graphs
As promised, I’ve left links to the faulty graphs I tweeted here: Faulty / Deleted Tweet Graph of Exadata IOPS/SSD: http://wp.me/a21zc-1ek Faulty / Deleted Tweet Graph of Exadata IOPS/license$$: http://wp.me/a21zc-1ejReferences
Exadata X3-2 datasheet: http://www.oracle.com/technetwork/server-storage/engineered-systems/exadata/exadata-dbmachine-x3-2-ds-1855384.pdf Exadata X4-2 datasheet: http://www.oracle.com/technetwork/database/exadata/exadata-dbmachine-x4-2-ds-2076448.pdf Exadata X5-2 datasheet: http://www.oracle.com/technetwork/database/exadata/exadata-x5-2-ds-2406241.pdf X4 SSD info: http://www.oracle.com/us/products/servers-storage/storage/flash-storage/f80/overview/index.html X5 SSD info: http://docs.oracle.com/cd/E54943_01/html/E54944/gokdw.html#scrolltoc Engineered Systems Price List: http://www.oracle.com/us/corporate/pricing/exadata-pricelist-070598.pdf , http://www.ogs.state.ny.us/purchase/prices/7600020944pl_oracle.pdf
Filed under: oracle
This is a short post to recommend some recent blog posts by Nikolay Manchev and Bertrand Drouvot on the topic of Oracle Database 12c NUMA awareness.
Nikolay provides a very helpful overview on Linux Control Groups and how they are leveraged by Oracle Database 12c. Bertrand Drouvot carried the topic a bit further by leveraging SLOB to assess the impact of NUMA remote memory on a cached Oracle Database workload. Yes, SLOB is very useful for more than physical I/O! Good job, Bertrand!
These are good studies and good posts!
Also, one can refer to MOS 1585184.1 for more information on Control Groups and a helpful script to configure CGROUPS.
The following links will take you to Nikolay and Bertrand’s writings on the topic:
Filed under: oracle
SLOB 2.2 Not Generating AWR reports? Testing Large User Counts With Think Time? Think Processes and SLOB_DEBUG.
I’ve gotten a lot of reports of folks branching out into SLOB 2.2 large user count testing with the SLOB 2.2 Think Time feature. I’m also getting reports that some of the same folks are not getting the resultant AWR reports one expects from a SLOB test.
If you are not getting your AWR reports there is the old issue I blogged about here (click here). That old issue was related to a Redhat bug. However, if you have addressed that problem, and still are not getting your AWR reports from large user count testing, it might be something as simple as the processes initialization parameter. After all, most folks have been accustomed to generating massive amounts of physical I/O with SLOB at low session counts.
I’ve made a few changes to runit.sh that will help future folks should they fall prey to the simple processes initialization parameter folly. The fixes will go into SLOB 188.8.131.52. The following is a screen shot of these fixes and what one should expect to see in such situation in the future. In the meantime, do take note of SLOB_DEBUG as mentioned in the screenshot:
Filed under: oracle
This is Part II in a series. Part I can be found here (click here). Part I in the series covered a very simple case of SLOB data loading. This installment is aimed at how one can use SLOB as a platform test for a unique blend of concurrent, high-bandwidth data loading, index creation and CBO statistics gathering.Put SLOB On The Box – Not In a Box
As a reminder, the latest SLOB kit is always available here: kevinclosson.net/slob .
Often I hear folks speak of what SLOB is useful for and the list is really short. The list is so short that a single acronym seems to cover it—IOPS, just IOPS and nothing else. SLOB is useful for so much more than just testing a platform for IOPS capability. I aim to make a few blog installments to make this point.SLOB for More Than Physical IOPS
I routinely speak about how to use SLOB to study host characteristics such as NUMA and processor threading (e.g., Simultaneous Multithreading on modern Intel Xeons). This sort of testing is possible when the sum of all SLOB schemas fit into the SGA buffer pool. When testing in this fashion, the key performance indicators (KPI) are LIOPS (Logical I/O per second) and SQL Executions per second.
This blog post is aimed at suggesting yet another manner of platform testing with SLOB–specifically concurrent bulk data loading.
The SLOB data loader (~SLOB/setup.sh) offers the ability to test non-parallel, concurrent table loading, index creation and CBO statistics collection.
In this blog post I’d like to share a “SLOB data loading recipe kit” for those who wish to test high performance SLOB data loading. The contents of the recipe will be listed below. First, I’d like to share a platform measurement I took using the data loading recipe. The host was a 2s20c40t E5-2600v2 server with 4 active 8GFC paths to an XtremIO array.
The tar archive kit I’ll refer to below has the full slob.conf in it, but for now I’ll just use a screen shot. Using this slob.conf and loading 512 SLOB schema users generates 1TB of data in the IOPS tablespace. Please note the attention I’ve drawn to the slob.conf parameters SCALE and LOAD_PARALLEL_DEGREE. The size of the aggregate of SLOB data is a product of SCALE and the number of schemas being loaded. I drew attention to LOAD_PARALLEL_DEGREE because that is the key setting in increasing the concurrency level during data loading. Most SLOB users are quite likely not accustomed to pushing concurrency up to that level. I hope this blog post makes doing so seem more worthwhile in certain cases.
The following is a screenshot of the output from the SLOB 2.2 data loader. The screenshot shows that the concurrent data loading portion of the procedure took 1,474 seconds. On the surface that would appear to be a data loading rate of approximately 2.5 TB/h. One thing to remember, however, is that SLOB data is loaded in batches controlled by LOAD_PARALLEL_DEGREE. Each batch loads LOAD_PARALLEL_DEGREE number of tables and then creates a unique indexes and performs CBO statistics gathering. So the overall “data loading” time is really data loading plus these ancillary tasks. To put that another way, it’s true this is a 2.5TB data loading use case but there is more going on than just simple data loading. If this were a pure and simple data loading processing stream then the results would be much higher than 2.5TB/h. I’ll likely blog about that soon.
As the screenshot shows the latest SLOB 2.2 data loader isolates the concurrent loading portion of setup.sh. In this case, the seed table (user1) was loaded in 20 seconds and then the concurrent loading portion completed in 1,474 seconds.That Sounds Like A Good Amount Of Physical I/O But What’s That Look Like?
To help you visualize the physical I/O load this manner of testing places on a host, please consider the following screenshot. The screenshot shows peaks of vmstat 30-second interval reporting of approximately 2.8GB/s physical read I/O combined with about 435 MB write I/O for an average of about 3.2GB/s. This host has but 4 active 8GFC fibre channel paths to storage so that particular bottleneck is simple to solve by adding another 4 port HBA! Note also how very little host CPU is utilized to generate the 4x8GFC saturating workload. User mode cycles are but 15% and kernel mode utilization was 9%. It’s true that 24% sounds like a lot, however, this is a 2s20c40t host and therefore 24% accounts for only 9.6 processor threads–or 5 cores worth of bandwidth. There may be some readers who were not aware that 5 “paltry” Ivy Bridge Xeon cores are capable of driving this much data loading!
NOTE: The SLOB method is centered on the sparse blocks. Naturally, fewer CPU cycles are required for loading data into sparse blocks.
Please note, the following vmstat shows peaks and valleys. I need to remind you that SLOB data loading consists of concurrent processing of not only data loading (Insert as Select) but also a unique index creation and CBO statistics gathering. As one would expect I/O will wane as the loading process shifts from the bulk data load to the index creation phase and then back again.
Finally, the following screenshot shows the very minimalist init.ora settings I used during this testing.
The recipe kit can be found in the following downloadable tar archive. The kit contains the necessary files one would need to reproduce this SLOB data loading time so long as the platform has sufficient performance attributes. The tar archive also has all output generated by setup.sh as the following screenshot shows:
The SLOB 2.2 data loading recipe kit can be downloaded here (click here). Please note, the screenshot immediately above shows the md5 checksum for the tar archive.Summary
This post shows how one can tune the SLOB 2.2 data loading tool (setup.sh) to load 1 terabyte of SLOB data in well under 25 minutes. I hope this is helpful information and that, perhaps, it will encourage SLOB users to consider using SLOB for more than just physical IOPS testing.
Filed under: oracle
This is a quick blog post to show SLOB users how to determine whether they are using the latest SLOB kit. If you visit kevinclosson.net/slob you’ll see the webpage I captured in the following screenshot.
Once on the SLOB Resources page you can simply hover over the “SLOB 2.2 (Click here)” hyperlink and the bottom of your browser will show the full name of the tar archive. Alternatively you can use md5sum(1) on Linux (or md5 on Mac) to get the checksum of the tar archive you have and compare it to the md5sum I put on the web page (see the arrow).
Filed under: oracle
I invite you to please read this report.
NAND Flash is good for a lot of things but not naturally good with write-intensive workloads. Unless, that is, skillful engineering is involved to mitigate the intrinsic weaknesses of NAND Flash in this regard. I assert EMC XtremIO architecture fills this bill.
Regardless of your current or future plans for adopting non-mechanical storage I hope this lab report will show some science behind how to determine suitability for non-mechanical storage–and NAND Flash specifically–where Oracle Database redo logging is concerned.
Please note: Not all lab tests are aimed at achieving maximum theoretical limits in all categories. This particular lab testing required sequestering precious lab gear for a 104 hour sustained test.
The goal of the testing was not to show limits but, quite to the contrary, to show a specific lack of limits in the area of Oracle Database redo logging. For a more general performance-focused paper please download this paper (click here). With that caveat aside, please see the following link for the redo logging related lab report:
Filed under: oracle