Re: Question on Exadata X8 - IO

From: Lok P <loknath.73_at_gmail.com>
Date: Wed, 17 Feb 2021 23:38:04 +0530
Message-ID: <CAKna9VYCEw+AbHQsuLBQOvakxKd_1PnNoSK_OimoBZka5ycqCw_at_mail.gmail.com>

In our case we have X5-2 half rack with total disk storage of 614TB and flash storage of 42TB.

On Wed, 17 Feb 2021, 11:17 pm Rajesh Aialavajjala, <r.aialavajjala_at_gmail.com> wrote:

> Lok,
> You should be able to find the X5-2 data sheets that will have the IOPS
> information you are seeking.
>
>
> https://www.oracle.com/technetwork/database/exadata/exadata-storage-expansion-x5-2-ds-2406252.pdf
>
> That is for the X5-2 storage expansion rack but it should have the
> information you want. What size drives are in your X5-2? As I mentioned
> earlier the storage size changed mid generation.
>
> Thanks,
>
> —Rajesh
>
>
>
> On Wed, Feb 17, 2021 at 12:31 Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thank you Gabriel. I was trying to compare the current X5-2 VS the new X8
>> figures side by side. I got most of these for X8M-2from your suggested doc
>> but am not able to get a few IOPS figures for X5-2 as below. Would you
>> suggest some document which has this information available?
>>
>> IOPS For Half Rack
>> Flash Read IOPS Flash Write IOPS Disk Read IOPS Disk Write IOPS
>> X5-2 High capacity
>> X8M-2 High capacity 6million 3.2million
>> Storage capacity for half rack
>> Raw Disk Storage Usable Disk Storage with High Redundancy Usable Disk
>> Storage with Normal Redundancy Flash Storage Number of Cores
>> X5-2 High capacity 672TB 210TB 280TB 42TB
>> X8M-2 High capacity 1176TB 349TB 477TB 179TB 224
>>
>>
>> Regards
>> Lok
>>
>> On Wed, Feb 17, 2021 at 5:38 PM Gabriel Hanauer <
>> gabriel.hanauer_at_gmail.com> wrote:
>>
>>> Hello Lok,
>>>
>>> You could look at the X8M-2 Datasheet.
>>>
>>>
>>> https://www.oracle.com/a/ocom/docs/engineered-systems/exadata/exadata-x8m-2-ds.pdf
>>>
>>> There is a handful of information in there.
>>>
>>> Regards,
>>>
>>>
>>>
>>> On Wed, Feb 17, 2021 at 7:17 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>
>>>> Thank you so much Rajesh. It really helps.
>>>>
>>>> My thought was that ~168TB Raw storage per server means 168/2= ~84TB
>>>> usable storage per cell server with Normal redundancy and 168/3=~56TB
>>>> usable storage per cell server with High redundancy. But it seems it's not
>>>> that simple and some additional amount of space is really going out of the
>>>> way.
>>>>
>>>> In current X5-2 high capacity Half Rac , it looks like we are capped
>>>> around ~1 to 1.5million IOPS for flash and ~18K IOPS for hard disk. What is
>>>> the flash and hard disk IOPS limit for X8 High capacity half rac ?
>>>>
>>>> And i confirmed with the infra team, we are planning to use the
>>>> 19.0.0.0.0 ASM grid in new X8 keeping the Oracle database version at
>>>> 11.2.0.4 only , so it means we are eligible for flex disk groups.
>>>>
>>>> Regards
>>>> Lok
>>>>
>>>> On Wed, Feb 17, 2021 at 1:52 AM Rajesh Aialavajjala <
>>>> r.aialavajjala_at_gmail.com> wrote:
>>>>
>>>>> Lok,
>>>>> Here are the numbers as relates to "usable" storage when it comes to
>>>>> X8-2 / X8M-2 Exadata
>>>>>
>>>>> A 1/2 rack X8-2 will offer <this is 4 compute nodes + 7 storage
>>>>> cells> approximately
>>>>>
>>>>> Data Capacity (Usable) – Normal Redundancy = 477 TB
>>>>> Data Capacity (Usable) – High Redundancy = 349 TB
>>>>>
>>>>> What Grid Infrastructure version are you planning to use? You are
>>>>> correct that FLEX disk groups/redundancy was introduced with GI/ASM version
>>>>> 12.2.0.1 and higher.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --Rajesh
>>>>>
>>>>>
>>>>> On Mon, Feb 15, 2021 at 10:45 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>>> Thanks a lot Rajesh and Shane. It really gave valuable info about the
>>>>>> X8 configuration.
>>>>>>
>>>>>> If I sum up the data file(v$datafile) + tempfile(v$tempfile)+log
>>>>>> file(v$log), it's coming as ~150TB now out of which ~45TB is showing as
>>>>>> free in dba_free_space. And the break up of the ~150TB is coming as ,
>>>>>> ~144TB is data file , 1.2TB is tempfile, ~.5TB of log file. So it seems
>>>>>> this figure of ~150TB is the size of a single copy but not the sum of three
>>>>>> copies which are maintained for the HIGH redundancy. And with HIGH
>>>>>> redundancy we must be occupying ~150*3=~450TB of space from ASM or from
>>>>>> total 7 storage cells. Please correct me if I am wrong.
>>>>>>
>>>>>> And as per the X8 configuration you mentioned, it looks like the new
>>>>>> X8 half RAC, we will get ~25.6*7= ~180TB of flash (against ~41TB in current
>>>>>> X5) and ~168*7= ~1PB of hard disk(as opposed to ~614TB of hard disk in
>>>>>> current X5). So considering for large reads, we are touching max flash IOPS
>>>>>> of ~2million during peak load(opposed to ~1.3million IOPs limit in current
>>>>>> X5) but here in new X-8, we will have 4.5time more overall flash, so hope
>>>>>> things(both flash IOPS and Storage) will be lot better and well under the
>>>>>> limit/capacity on new X8.
>>>>>>
>>>>>> We will try to push for making the ASM redundancy HIGH. But I was
>>>>>> thinking with high Redundancy what will be the maximum size of the data we
>>>>>> can hold with this X-8 half RAC. As in new X8 , we will be having ~1PB of
>>>>>> raw storage which is almost twice of current ~614TB in X-5, so what will
>>>>>> be the max usable storage available for our database if we go for HIGH
>>>>>> redundancy VS NORMAL redundancy with our new X-8 machine?
>>>>>>
>>>>>> I was also checking regarding flex disk groups and seeing in a few
>>>>>> docs the RDBMS is asked to be minimum on 12.2, but we are currently on
>>>>>> 11.2.0.4 , so perhaps that is not an option at this moment at least.
>>>>>>
>>>>>> Regards
>>>>>> Lok
>>>>>>
>>>>>> On Mon, Feb 15, 2021 at 6:44 PM Rajesh Aialavajjala <
>>>>>> r.aialavajjala_at_gmail.com> wrote:
>>>>>>
>>>>>>> Lok,
>>>>>>> To try and answer your question about the percentage/share of flash
>>>>>>> vs hard disk in the Exadata X8-2/X8M-2
>>>>>>>
>>>>>>> HC Storage Cells are packaged with
>>>>>>>
>>>>>>> 12x 14 TB 7,200 RPM disks = 168 TB Raw
>>>>>>> / Hard Disk
>>>>>>> 4x 6.4TB NVMe PCIe 3.0 Flashcard = 25.6 TB Raw /
>>>>>>> Flash
>>>>>>>
>>>>>>> So the ~166TB that Andy references is purely spinning drives/hard
>>>>>>> disks - you have another ~25TB of Flash.
>>>>>>>
>>>>>>> If your organization is acquiring the X8M-2 hardware - you will add
>>>>>>> on 1.5 TB of PMEM (Persistent Memory) to this.
>>>>>>>
>>>>>>> You may certainly add more storage cells to your environment -
>>>>>>> elastic configurations in Exadata is a common thing - expansions or
>>>>>>> starting with elastic configurations is often seen. You might want to
>>>>>>> consider expanding storage after you evaluate your new X8 configuration.
>>>>>>>
>>>>>>> Your X5- 2 hardware has/had <the drive sizes did change mid-way
>>>>>>> through the X5-2 generation - they started out with 12x4 TB drives and that
>>>>>>> was doubled to the 8 TB drives>
>>>>>>>
>>>>>>> 4 PCI Flash cards each with 1.6 TB (raw) Exadata Smart Flash Cache
>>>>>>> 12x 8 TB 7,200 RPM High Capacity disks
>>>>>>>
>>>>>>> So you are looking at quite an increase in raw storage capacity and
>>>>>>> Flash.
>>>>>>>
>>>>>>> +1 to Shane's mention about not using NORMAL redundancy in a
>>>>>>> Production configuration - FLEX disk groups are permitted in Exadata but
>>>>>>> they are not widely used to the best of my understanding - the Oracle best
>>>>>>> practices recommend HIGH redundancy - in fact I do not think you can
>>>>>>> configure FLEX redundancy within OEDA as it falls outside best practices -
>>>>>>> so you would have to tear down/recreate the disk groups manually. Or
>>>>>>> customize the initial install prior to moving your database(s)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> --Rajesh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 15, 2021 at 8:09 AM Shane Borden <
>>>>>>> dmarc-noreply_at_freelists.org> wrote:
>>>>>>>
>>>>>>>> I think it's a mistake to go with NORMAL redundancy in a production
>>>>>>>> system. I could probably understand the argument for a test system, but
>>>>>>>> not production. How do you think a regular storage array is configured?
>>>>>>>> Likely not with a normal redundancy scheme. Aside from all of the other
>>>>>>>> things mentioned already, you are now also bound to offline patching or if
>>>>>>>> you try to do rolling you run a risk being able to tolerate only 1 disk
>>>>>>>> failure or 1 storage server failure. Not only does this protect against
>>>>>>>> some sort of technical failure, but also the human factor that could occur
>>>>>>>> during patching
>>>>>>>>
>>>>>>>> If you must consider normal redundancy, I would go with FLEX
>>>>>>>> Diskgroups vs configuring the entire rack as normal redundancy. That way,
>>>>>>>> if you must, then you can specify the redundancy at the table space level
>>>>>>>> rather than the disk group level. Should you change your mind later, its a
>>>>>>>> simple alter command to change the redundancy rather than tearing down the
>>>>>>>> entire rack and rebuilding it.
>>>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>
>>>>>>>> Shane Borden
>>>>>>>> sborden76_at_yahoo.com
>>>>>>>>
>>>>>>>> On Feb 15, 2021, at 1:55 AM, Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>>>
>>>>>>>> Thanks much Andy.
>>>>>>>>
>>>>>>>> Yes we have an existing machine that is X5-2 half Rac. (It's
>>>>>>>> basically a full RAC machine logically splitted to two half RAC and we have
>>>>>>>> only this database hosted on this half RAC). (And we are currently ~150TB
>>>>>>>> and keep growing so we're planning for NORMAL redundancy.)
>>>>>>>>
>>>>>>>> In current X5 I am seeing its ~80TB hard disk+6TB flash disk per
>>>>>>>> storage cell. When you said *"The Exadata X8 and X8M storage cells
>>>>>>>> have 14TB disks. With 12 per cell, that's 168TB *per cell*." *Does
>>>>>>>> it mean the sum of flash+hard disk is ~168TB per cell? What is the
>>>>>>>> percentage/share of flash disk and hard disk in that?
>>>>>>>>
>>>>>>>> Apart from current storage saturation, with regards to the IOPS
>>>>>>>> issue in our current X5 system, I am seeing in OEM the flash IOPS is
>>>>>>>> reaching to ~2000K for large reads and the max limit is showing somewhere
>>>>>>>> near ~1.3milion. Overall IO utilization for flash disk is showing ~75%.
>>>>>>>> Hard disk IO limit shows as ~20K and most of the time it looks to be both
>>>>>>>> small reads and large reads are staying below this limit. The overall IO
>>>>>>>> utilization is staying below ~30% for the hard disks.
>>>>>>>>
>>>>>>>> Just got to know from the infra team the X8 to which we are
>>>>>>>> planning to move into is not extreme flash rather high capacity disk
>>>>>>>> only(similar to what we have on current X5) , but considering more flash
>>>>>>>> storage and hard disk storage in each of the 7 - storage cells, we are
>>>>>>>> expecting the new X8 will satisfy the current capacity crunch both wrt
>>>>>>>> space and IOPS.
>>>>>>>>
>>>>>>>> And as you just mentioned in your explanation that adding more
>>>>>>>> storage cells will also help in bumping up the capacity. So should we
>>>>>>>> consider adding a few more new storage cells on and above half rac to make
>>>>>>>> it 8 or 9 storage cells in total and if this is standard practice in the
>>>>>>>> exadata world?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Lok
>>>>>>>>
>>>>>>>> On Sat, Feb 13, 2021 at 9:00 PM Andy Wattenhofer <watt0012_at_umn.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The Exadata X8 and X8M storage cells have 14TB disks. With 12 per
>>>>>>>>> cell, that's 168TB *per cell*. You haven't mentioned which rack size your
>>>>>>>>> X5 machine is, but from the numbers you're showing it looks like maybe a
>>>>>>>>> half rack. A half rack of X8M will come with 1PB of total disk, giving you
>>>>>>>>> over 300TB of usable space to divide between your RECO and DATA disk groups
>>>>>>>>> if you are using HIGH redundancy. That seems plenty for your 150TB
>>>>>>>>> database. But if you need more, add another storage cell.
>>>>>>>>>
>>>>>>>>> As for performance degradation from using HIGH redundancy, you
>>>>>>>>> need to consider that the additional work of that extra write is being
>>>>>>>>> taken on by the storage cells. By definition the redundant block copies
>>>>>>>>> must go to separate cells. NORMAL redundancy writes to two cells and HIGH
>>>>>>>>> goes to three. In aggregate, each write will be as fast as your slowest
>>>>>>>>> cell. So any difference in write performance is more a function of the
>>>>>>>>> total number of cells you have to share the workload. That difference would
>>>>>>>>> be diminished as you increase the number of cells in the cluster.
>>>>>>>>>
>>>>>>>>> And of course that difference would be mitigated by the write back
>>>>>>>>> cache too because writes to the flash cache are faster than writes to disk.
>>>>>>>>>
>>>>>>>>> Honestly, I can't imagine that Oracle would sell you an Exadata
>>>>>>>>> machine where any of this would be a problem for you. It would be so
>>>>>>>>> undersized from the beginning that your problems with it would be much
>>>>>>>>> greater than any marginal difference in write performance from using high
>>>>>>>>> redundancy.
>>>>>>>>>
>>>>>>>>> Andy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Feb 12, 2021 at 10:31 AM Lok P <loknath.73_at_gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Much.
>>>>>>>>>>
>>>>>>>>>> I got some doc but missed. below URL also pointing to high
>>>>>>>>>> redundancy as requirement but may be its not compulsory as you stated.
>>>>>>>>>>
>>>>>>>>>> We have the size of our database i.e. (~150TB), so we were
>>>>>>>>>> thinking to have some space reduction using double mirror rather triple
>>>>>>>>>> mirroring. but i was not aware that the disk size itself is a lot bigger in
>>>>>>>>>> X8 and as you stated in X8 we have a bigger size disk and so the mirroring
>>>>>>>>>> will take a lot of time(in case of crash/failure) and thus HIGH redundancy
>>>>>>>>>> is recommended. I think we have to relook into the same. Note- What I see
>>>>>>>>>> in the current X-5 machine, we have ~6TB flash/storage server and ~80TB
>>>>>>>>>> hard disk/storage server. Not sure what that is in case of X8 though.
>>>>>>>>>>
>>>>>>>>>> And another doubt i had was, Is it also true that the IOPS will
>>>>>>>>>> be degraded by some percentage in case of tripple mirror as compared to
>>>>>>>>>> double mirror as because it has to write one more additional copy of data
>>>>>>>>>> block to flash/disk?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://stefanpanek.wordpress.com/2017/10/20/exadata-flash-cache-enabled-for-write-back/
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 12, 2021 at 8:52 PM Ghassan Salem <
>>>>>>>>>> salem.ghassan_at_gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Please, can you point to where you saw that write-back is only
>>>>>>>>>>> possible with high-redundancy?
>>>>>>>>>>> High redundancy is very much recommended with X8 due to the size
>>>>>>>>>>> of the disks, and the time it takes to re-mirror in case of disk loss: if
>>>>>>>>>>> you're in a normal redundancy, and you loose a disk, while re-mirroring is
>>>>>>>>>>> being done, you don't have any second copy of that data, and so, if you
>>>>>>>>>>> loose yet-another disk, you're in big trouble. With lower capacity disks,
>>>>>>>>>>> the re-mirroring takes much less time, and so the risk is lower.
>>>>>>>>>>>
>>>>>>>>>>> regards
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 12, 2021 at 3:57 PM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Basically i am seeing many doc stating triple mirroring is
>>>>>>>>>>>> recommended with "write back flash cache" and some other stating write back
>>>>>>>>>>>> flash cache is not possible without HIGH redundancy/triple mirroring. So
>>>>>>>>>>>> there is a difference between these two statements because if we decide to
>>>>>>>>>>>> go for NORMAL redundancy to save some space and to have some IO benefit(in
>>>>>>>>>>>> terms of not writing one more additional data block copy). But we want to
>>>>>>>>>>>> utilize the "write back flash cache" option to get benefits on write IOPS.
>>>>>>>>>>>> And in this case if restriction is put in place for "High redundancy" we
>>>>>>>>>>>> won't be able to do that. Please Correct me if my understanding is wrong?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Feb 12, 2021 at 12:53 PM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Seeing in below doc which state its recommended to go for high
>>>>>>>>>>>>> redundancy ASM disk group(i.e. triple mirroring) in case we are using write
>>>>>>>>>>>>> back flash cache as because the data will be first written/stay in flash
>>>>>>>>>>>>> cache and flushed to the disk later stage and in case of failure it has to
>>>>>>>>>>>>> be recovered from mirror copy. But i am wondering , is this not possible
>>>>>>>>>>>>> with double mirroring , will it not survive the data loss in case of
>>>>>>>>>>>>> failure? Want to understand what is the suggested setup which will give
>>>>>>>>>>>>> optimal space usage without compromising IOPS and data loss.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-software-introduction.html#GUID-E10F7A58-2B07-472D-BF31-28D6D0201D53
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Lok
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Feb 12, 2021 at 10:42 AM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello Listers, We are moving from exadata X5 to X8 and there
>>>>>>>>>>>>>> are multiple reasons behind it. Few of them are , 1)we are almost going to
>>>>>>>>>>>>>> saturate the existing storage capacity(DB size reaching ~150TB) in current
>>>>>>>>>>>>>> X5. 2)And the current IOPS on X5 is also reaching its max while the system
>>>>>>>>>>>>>> works during its peak load.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We are currently having HIGH redundancy(triple mirroring) for
>>>>>>>>>>>>>> our existing X5 machines for DATA and RECO disk group and DBFS is kept as
>>>>>>>>>>>>>> NORMAL redundancy(double mirroring). Now few folks raised questions on the
>>>>>>>>>>>>>> impact on IOPS and storage space consumption, if we use double
>>>>>>>>>>>>>> mirroring(NORMAl redundancy) vs triple mirroring(High redundancy) in the
>>>>>>>>>>>>>> new X8 machine. I can see the benefit of double mirroring(Normal
>>>>>>>>>>>>>> redundancy) being saved in storage space(around 1/3rd in terms of DATA and
>>>>>>>>>>>>>> REDO copies), but then what is the risk wrt data loss, is it okay in a
>>>>>>>>>>>>>> production system? (Note- We do use ZDLRA backup for taking the DB backup.
>>>>>>>>>>>>>> And for disaster recovery we have active data guard physical standby in
>>>>>>>>>>>>>> place which runs in read only mode).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With regards to IOPS, we are going with default write back
>>>>>>>>>>>>>> flash cache enabled here. Is it correct that with double mirroring we have
>>>>>>>>>>>>>> to write/read into two places VS in triple mirroring we have to do that in
>>>>>>>>>>>>>> three places , so there will also be degradation in IOPS with triple
>>>>>>>>>>>>>> mirroring/High redundancy as compared to double mirroring? if it's true
>>>>>>>>>>>>>> then by what percentage the IOPS degradation will be there? And then if
>>>>>>>>>>>>>> it's okay if we go for double mirroring as that will benefit us wrt IOPS
>>>>>>>>>>>>>> and also saves a good amount of storage space?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lok
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>
>>>
>>> --
>>> Gabriel Hanauer
>>>
>> --
> Sent from Gmail Mobile
>

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Feb 17 2021 - 19:08:04 CET

This message: [ Message body ]
Next message: Jonathan Lewis: "Re: Optimizer "enhancements" in 19.9?"
Previous message: Lok P: "Re: Question on Exadata X8 - IO"
In reply to Rajesh Aialavajjala: "Re: Question on Exadata X8 - IO"
Next in thread: Mohamed Houri: "Re: Performance issue - Library cache lock"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message