Re: Question on Exadata X8 - IO
Date: Wed, 17 Feb 2021 15:46:52 +0530
Message-ID: <CAKna9Vb-Z9Fenuxv1ZzhcEqtyw0ncOyinPcLZPGN-Oy8f81X0g_at_mail.gmail.com>
Thank you so much Rajesh. It really helps.
My thought was that ~168TB Raw storage per server means 168/2= ~84TB usable
storage per cell server with Normal redundancy and 168/3=~56TB usable
storage per cell server with High redundancy. But it seems it's not that
simple and some additional amount of space is really going out of the way.
In current X5-2 high capacity Half Rac , it looks like we are capped around
~1 to 1.5million IOPS for flash and ~18K IOPS for hard disk. What is the
flash and hard disk IOPS limit for X8 High capacity half rac ?
And i confirmed with the infra team, we are planning to use the 19.0.0.0.0
ASM grid in new X8 keeping the Oracle database version at 11.2.0.4 only ,
so it means we are eligible for flex disk groups.
Regards
On Wed, Feb 17, 2021 at 1:52 AM Rajesh Aialavajjala <
r.aialavajjala_at_gmail.com> wrote:
> Lok,
Lok
> Here are the numbers as relates to "usable" storage when it comes to X8-2
> / X8M-2 Exadata
>
> A 1/2 rack X8-2 will offer <this is 4 compute nodes + 7 storage cells>
> approximately
>
> Data Capacity (Usable) – Normal Redundancy = 477 TB
> Data Capacity (Usable) – High Redundancy = 349 TB
>
> What Grid Infrastructure version are you planning to use? You are correct
> that FLEX disk groups/redundancy was introduced with GI/ASM version
> 12.2.0.1 and higher.
>
> Thanks,
>
> --Rajesh
>
>
> On Mon, Feb 15, 2021 at 10:45 AM Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thanks a lot Rajesh and Shane. It really gave valuable info about the X8
>> configuration.
>>
>> If I sum up the data file(v$datafile) + tempfile(v$tempfile)+log
>> file(v$log), it's coming as ~150TB now out of which ~45TB is showing as
>> free in dba_free_space. And the break up of the ~150TB is coming as ,
>> ~144TB is data file , 1.2TB is tempfile, ~.5TB of log file. So it seems
>> this figure of ~150TB is the size of a single copy but not the sum of three
>> copies which are maintained for the HIGH redundancy. And with HIGH
>> redundancy we must be occupying ~150*3=~450TB of space from ASM or from
>> total 7 storage cells. Please correct me if I am wrong.
>>
>> And as per the X8 configuration you mentioned, it looks like the new X8
>> half RAC, we will get ~25.6*7= ~180TB of flash (against ~41TB in current
>> X5) and ~168*7= ~1PB of hard disk(as opposed to ~614TB of hard disk in
>> current X5). So considering for large reads, we are touching max flash IOPS
>> of ~2million during peak load(opposed to ~1.3million IOPs limit in current
>> X5) but here in new X-8, we will have 4.5time more overall flash, so hope
>> things(both flash IOPS and Storage) will be lot better and well under the
>> limit/capacity on new X8.
>>
>> We will try to push for making the ASM redundancy HIGH. But I was
>> thinking with high Redundancy what will be the maximum size of the data we
>> can hold with this X-8 half RAC. As in new X8 , we will be having ~1PB of
>> raw storage which is almost twice of current ~614TB in X-5, so what will
>> be the max usable storage available for our database if we go for HIGH
>> redundancy VS NORMAL redundancy with our new X-8 machine?
>>
>> I was also checking regarding flex disk groups and seeing in a few docs
>> the RDBMS is asked to be minimum on 12.2, but we are currently on 11.2.0.4
>> , so perhaps that is not an option at this moment at least.
>>
>> Regards
>> Lok
>>
>> On Mon, Feb 15, 2021 at 6:44 PM Rajesh Aialavajjala <
>> r.aialavajjala_at_gmail.com> wrote:
>>
>>> Lok,
>>> To try and answer your question about the percentage/share of flash vs
>>> hard disk in the Exadata X8-2/X8M-2
>>>
>>> HC Storage Cells are packaged with
>>>
>>> 12x 14 TB 7,200 RPM disks = 168 TB Raw
>>> / Hard Disk
>>> 4x 6.4TB NVMe PCIe 3.0 Flashcard = 25.6 TB Raw / Flash
>>>
>>> So the ~166TB that Andy references is purely spinning drives/hard disks
>>> - you have another ~25TB of Flash.
>>>
>>> If your organization is acquiring the X8M-2 hardware - you will add on
>>> 1.5 TB of PMEM (Persistent Memory) to this.
>>>
>>> You may certainly add more storage cells to your environment - elastic
>>> configurations in Exadata is a common thing - expansions or starting with
>>> elastic configurations is often seen. You might want to consider expanding
>>> storage after you evaluate your new X8 configuration.
>>>
>>> Your X5- 2 hardware has/had <the drive sizes did change mid-way through
>>> the X5-2 generation - they started out with 12x4 TB drives and that was
>>> doubled to the 8 TB drives>
>>>
>>> 4 PCI Flash cards each with 1.6 TB (raw) Exadata Smart Flash Cache
>>> 12x 8 TB 7,200 RPM High Capacity disks
>>>
>>> So you are looking at quite an increase in raw storage capacity and
>>> Flash.
>>>
>>> +1 to Shane's mention about not using NORMAL redundancy in a Production
>>> configuration - FLEX disk groups are permitted in Exadata but they are not
>>> widely used to the best of my understanding - the Oracle best practices
>>> recommend HIGH redundancy - in fact I do not think you can configure FLEX
>>> redundancy within OEDA as it falls outside best practices - so you would
>>> have to tear down/recreate the disk groups manually. Or customize the
>>> initial install prior to moving your database(s)
>>>
>>> Thanks,
>>>
>>> --Rajesh
>>>
>>>
>>>
>>>
>>> On Mon, Feb 15, 2021 at 8:09 AM Shane Borden <
>>> dmarc-noreply_at_freelists.org> wrote:
>>>
>>>> I think it's a mistake to go with NORMAL redundancy in a production
>>>> system. I could probably understand the argument for a test system, but
>>>> not production. How do you think a regular storage array is configured?
>>>> Likely not with a normal redundancy scheme. Aside from all of the other
>>>> things mentioned already, you are now also bound to offline patching or if
>>>> you try to do rolling you run a risk being able to tolerate only 1 disk
>>>> failure or 1 storage server failure. Not only does this protect against
>>>> some sort of technical failure, but also the human factor that could occur
>>>> during patching
>>>>
>>>> If you must consider normal redundancy, I would go with FLEX Diskgroups
>>>> vs configuring the entire rack as normal redundancy. That way, if you
>>>> must, then you can specify the redundancy at the table space level rather
>>>> than the disk group level. Should you change your mind later, its a simple
>>>> alter command to change the redundancy rather than tearing down the entire
>>>> rack and rebuilding it.
>>>>
>>>>
>>>> ---
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Shane Borden
>>>> sborden76_at_yahoo.com
>>>>
>>>> On Feb 15, 2021, at 1:55 AM, Lok P <loknath.73_at_gmail.com> wrote:
>>>>
>>>> Thanks much Andy.
>>>>
>>>> Yes we have an existing machine that is X5-2 half Rac. (It's
>>>> basically a full RAC machine logically splitted to two half RAC and we have
>>>> only this database hosted on this half RAC). (And we are currently ~150TB
>>>> and keep growing so we're planning for NORMAL redundancy.)
>>>>
>>>> In current X5 I am seeing its ~80TB hard disk+6TB flash disk per
>>>> storage cell. When you said *"The Exadata X8 and X8M storage cells
>>>> have 14TB disks. With 12 per cell, that's 168TB *per cell*." *Does it
>>>> mean the sum of flash+hard disk is ~168TB per cell? What is the
>>>> percentage/share of flash disk and hard disk in that?
>>>>
>>>> Apart from current storage saturation, with regards to the IOPS issue
>>>> in our current X5 system, I am seeing in OEM the flash IOPS is reaching to
>>>> ~2000K for large reads and the max limit is showing somewhere near
>>>> ~1.3milion. Overall IO utilization for flash disk is showing ~75%. Hard
>>>> disk IO limit shows as ~20K and most of the time it looks to be both small
>>>> reads and large reads are staying below this limit. The overall IO
>>>> utilization is staying below ~30% for the hard disks.
>>>>
>>>> Just got to know from the infra team the X8 to which we are planning
>>>> to move into is not extreme flash rather high capacity disk only(similar to
>>>> what we have on current X5) , but considering more flash storage and hard
>>>> disk storage in each of the 7 - storage cells, we are expecting the new X8
>>>> will satisfy the current capacity crunch both wrt space and IOPS.
>>>>
>>>> And as you just mentioned in your explanation that adding more storage
>>>> cells will also help in bumping up the capacity. So should we consider
>>>> adding a few more new storage cells on and above half rac to make it 8 or 9
>>>> storage cells in total and if this is standard practice in the exadata
>>>> world?
>>>>
>>>> Regards
>>>> Lok
>>>>
>>>> On Sat, Feb 13, 2021 at 9:00 PM Andy Wattenhofer <watt0012_at_umn.edu>
>>>> wrote:
>>>>
>>>>> The Exadata X8 and X8M storage cells have 14TB disks. With 12 per
>>>>> cell, that's 168TB *per cell*. You haven't mentioned which rack size your
>>>>> X5 machine is, but from the numbers you're showing it looks like maybe a
>>>>> half rack. A half rack of X8M will come with 1PB of total disk, giving you
>>>>> over 300TB of usable space to divide between your RECO and DATA disk groups
>>>>> if you are using HIGH redundancy. That seems plenty for your 150TB
>>>>> database. But if you need more, add another storage cell.
>>>>>
>>>>> As for performance degradation from using HIGH redundancy, you need to
>>>>> consider that the additional work of that extra write is being taken on by
>>>>> the storage cells. By definition the redundant block copies must go to
>>>>> separate cells. NORMAL redundancy writes to two cells and HIGH goes to
>>>>> three. In aggregate, each write will be as fast as your slowest cell. So
>>>>> any difference in write performance is more a function of the total number
>>>>> of cells you have to share the workload. That difference would be
>>>>> diminished as you increase the number of cells in the cluster.
>>>>>
>>>>> And of course that difference would be mitigated by the write back
>>>>> cache too because writes to the flash cache are faster than writes to disk.
>>>>>
>>>>> Honestly, I can't imagine that Oracle would sell you an Exadata
>>>>> machine where any of this would be a problem for you. It would be so
>>>>> undersized from the beginning that your problems with it would be much
>>>>> greater than any marginal difference in write performance from using high
>>>>> redundancy.
>>>>>
>>>>> Andy
>>>>>
>>>>>
>>>>> On Fri, Feb 12, 2021 at 10:31 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>>> Thanks Much.
>>>>>>
>>>>>> I got some doc but missed. below URL also pointing to high redundancy
>>>>>> as requirement but may be its not compulsory as you stated.
>>>>>>
>>>>>> We have the size of our database i.e. (~150TB), so we were thinking
>>>>>> to have some space reduction using double mirror rather triple mirroring.
>>>>>> but i was not aware that the disk size itself is a lot bigger in X8 and as
>>>>>> you stated in X8 we have a bigger size disk and so the mirroring will take
>>>>>> a lot of time(in case of crash/failure) and thus HIGH redundancy is
>>>>>> recommended. I think we have to relook into the same. Note- What I see in
>>>>>> the current X-5 machine, we have ~6TB flash/storage server and ~80TB hard
>>>>>> disk/storage server. Not sure what that is in case of X8 though.
>>>>>>
>>>>>> And another doubt i had was, Is it also true that the IOPS will be
>>>>>> degraded by some percentage in case of tripple mirror as compared to double
>>>>>> mirror as because it has to write one more additional copy of data block
>>>>>> to flash/disk?
>>>>>>
>>>>>>
>>>>>> https://stefanpanek.wordpress.com/2017/10/20/exadata-flash-cache-enabled-for-write-back/
>>>>>>
>>>>>> On Fri, Feb 12, 2021 at 8:52 PM Ghassan Salem <
>>>>>> salem.ghassan_at_gmail.com> wrote:
>>>>>>
>>>>>>> Please, can you point to where you saw that write-back is only
>>>>>>> possible with high-redundancy?
>>>>>>> High redundancy is very much recommended with X8 due to the size of
>>>>>>> the disks, and the time it takes to re-mirror in case of disk loss: if
>>>>>>> you're in a normal redundancy, and you loose a disk, while re-mirroring is
>>>>>>> being done, you don't have any second copy of that data, and so, if you
>>>>>>> loose yet-another disk, you're in big trouble. With lower capacity disks,
>>>>>>> the re-mirroring takes much less time, and so the risk is lower.
>>>>>>>
>>>>>>> regards
>>>>>>>
>>>>>>> On Fri, Feb 12, 2021 at 3:57 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> Basically i am seeing many doc stating triple mirroring is
>>>>>>>> recommended with "write back flash cache" and some other stating write back
>>>>>>>> flash cache is not possible without HIGH redundancy/triple mirroring. So
>>>>>>>> there is a difference between these two statements because if we decide to
>>>>>>>> go for NORMAL redundancy to save some space and to have some IO benefit(in
>>>>>>>> terms of not writing one more additional data block copy). But we want to
>>>>>>>> utilize the "write back flash cache" option to get benefits on write IOPS.
>>>>>>>> And in this case if restriction is put in place for "High redundancy" we
>>>>>>>> won't be able to do that. Please Correct me if my understanding is wrong?
>>>>>>>>
>>>>>>>> On Fri, Feb 12, 2021 at 12:53 PM Lok P <loknath.73_at_gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Seeing in below doc which state its recommended to go for high
>>>>>>>>> redundancy ASM disk group(i.e. triple mirroring) in case we are using write
>>>>>>>>> back flash cache as because the data will be first written/stay in flash
>>>>>>>>> cache and flushed to the disk later stage and in case of failure it has to
>>>>>>>>> be recovered from mirror copy. But i am wondering , is this not possible
>>>>>>>>> with double mirroring , will it not survive the data loss in case of
>>>>>>>>> failure? Want to understand what is the suggested setup which will give
>>>>>>>>> optimal space usage without compromising IOPS and data loss.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-software-introduction.html#GUID-E10F7A58-2B07-472D-BF31-28D6D0201D53
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Lok
>>>>>>>>>
>>>>>>>>> On Fri, Feb 12, 2021 at 10:42 AM Lok P <loknath.73_at_gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Listers, We are moving from exadata X5 to X8 and there are
>>>>>>>>>> multiple reasons behind it. Few of them are , 1)we are almost going to
>>>>>>>>>> saturate the existing storage capacity(DB size reaching ~150TB) in current
>>>>>>>>>> X5. 2)And the current IOPS on X5 is also reaching its max while the system
>>>>>>>>>> works during its peak load.
>>>>>>>>>>
>>>>>>>>>> We are currently having HIGH redundancy(triple mirroring) for our
>>>>>>>>>> existing X5 machines for DATA and RECO disk group and DBFS is kept as
>>>>>>>>>> NORMAL redundancy(double mirroring). Now few folks raised questions on the
>>>>>>>>>> impact on IOPS and storage space consumption, if we use double
>>>>>>>>>> mirroring(NORMAl redundancy) vs triple mirroring(High redundancy) in the
>>>>>>>>>> new X8 machine. I can see the benefit of double mirroring(Normal
>>>>>>>>>> redundancy) being saved in storage space(around 1/3rd in terms of DATA and
>>>>>>>>>> REDO copies), but then what is the risk wrt data loss, is it okay in a
>>>>>>>>>> production system? (Note- We do use ZDLRA backup for taking the DB backup.
>>>>>>>>>> And for disaster recovery we have active data guard physical standby in
>>>>>>>>>> place which runs in read only mode).
>>>>>>>>>>
>>>>>>>>>> With regards to IOPS, we are going with default write back flash
>>>>>>>>>> cache enabled here. Is it correct that with double mirroring we have to
>>>>>>>>>> write/read into two places VS in triple mirroring we have to do that in
>>>>>>>>>> three places , so there will also be degradation in IOPS with triple
>>>>>>>>>> mirroring/High redundancy as compared to double mirroring? if it's true
>>>>>>>>>> then by what percentage the IOPS degradation will be there? And then if
>>>>>>>>>> it's okay if we go for double mirroring as that will benefit us wrt IOPS
>>>>>>>>>> and also saves a good amount of storage space?
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> Lok
>>>>>>>>>>
>>>>>>>>>
>>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Feb 17 2021 - 11:16:52 CET