Re: Question on Exadata X8 - IO

From: Lok P <loknath.73_at_gmail.com>
Date: Mon, 15 Feb 2021 21:15:44 +0530
Message-ID: <CAKna9Va-rBOL7hrPLB2uSmPOVdm7A5d9SZy0kDbGaDm1zqWNqQ_at_mail.gmail.com>



Thanks a lot Rajesh and Shane. It really gave valuable info about the X8 configuration.

   If I sum up the data file(v$datafile) + tempfile(v$tempfile)+log file(v$log), it's coming as ~150TB now out of which ~45TB is showing as free in dba_free_space. And the break up of the ~150TB is coming as , ~144TB is data file , 1.2TB is tempfile, ~.5TB of log file. So it seems this figure of ~150TB is the size of a single copy but not the sum of three copies which are maintained for the HIGH redundancy. And with HIGH redundancy we must be occupying ~150*3=~450TB of space from ASM or from total 7 storage cells. Please correct me if I am wrong.

And as per the X8 configuration you mentioned, it looks like the new X8 half RAC, we will get ~25.6*7= ~180TB of flash (against ~41TB in current X5) and ~168*7= ~1PB of hard disk(as opposed to ~614TB of hard disk in current X5). So considering for large reads, we are touching max flash IOPS of ~2million during peak load(opposed to ~1.3million IOPs limit in current X5) but here in new X-8, we will have 4.5time more overall flash, so hope things(both flash IOPS and Storage) will be lot better and well under the limit/capacity on new X8.

We will try to push for making the ASM redundancy HIGH. But I was thinking with high Redundancy what will be the maximum size of the data we can hold with this X-8 half RAC. As in new X8 , we will be having ~1PB of raw storage which is almost twice of current ~614TB in X-5, so what will be the max usable storage available for our database if we go for HIGH redundancy VS NORMAL redundancy with our new X-8 machine?

I was also checking regarding flex disk groups and seeing in a few docs the RDBMS is asked to be minimum on 12.2, but we are currently on 11.2.0.4 , so perhaps that is not an option at this moment at least.

Regards
Lok

On Mon, Feb 15, 2021 at 6:44 PM Rajesh Aialavajjala < r.aialavajjala_at_gmail.com> wrote:

> Lok,
> To try and answer your question about the percentage/share of flash vs
> hard disk in the Exadata X8-2/X8M-2
>
> HC Storage Cells are packaged with
>
> 12x 14 TB 7,200 RPM disks = 168 TB Raw /
> Hard Disk
> 4x 6.4TB NVMe PCIe 3.0 Flashcard = 25.6 TB Raw / Flash
>
> So the ~166TB that Andy references is purely spinning drives/hard disks -
> you have another ~25TB of Flash.
>
> If your organization is acquiring the X8M-2 hardware - you will add on 1.5
> TB of PMEM (Persistent Memory) to this.
>
> You may certainly add more storage cells to your environment - elastic
> configurations in Exadata is a common thing - expansions or starting with
> elastic configurations is often seen. You might want to consider expanding
> storage after you evaluate your new X8 configuration.
>
> Your X5- 2 hardware has/had <the drive sizes did change mid-way through
> the X5-2 generation - they started out with 12x4 TB drives and that was
> doubled to the 8 TB drives>
>
> 4 PCI Flash cards each with 1.6 TB (raw) Exadata Smart Flash Cache
> 12x 8 TB 7,200 RPM High Capacity disks
>
> So you are looking at quite an increase in raw storage capacity and Flash.
>
> +1 to Shane's mention about not using NORMAL redundancy in a Production
> configuration - FLEX disk groups are permitted in Exadata but they are not
> widely used to the best of my understanding - the Oracle best practices
> recommend HIGH redundancy - in fact I do not think you can configure FLEX
> redundancy within OEDA as it falls outside best practices - so you would
> have to tear down/recreate the disk groups manually. Or customize the
> initial install prior to moving your database(s)
>
> Thanks,
>
> --Rajesh
>
>
>
>
> On Mon, Feb 15, 2021 at 8:09 AM Shane Borden <dmarc-noreply_at_freelists.org>
> wrote:
>
>> I think it's a mistake to go with NORMAL redundancy in a production
>> system. I could probably understand the argument for a test system, but
>> not production. How do you think a regular storage array is configured?
>> Likely not with a normal redundancy scheme. Aside from all of the other
>> things mentioned already, you are now also bound to offline patching or if
>> you try to do rolling you run a risk being able to tolerate only 1 disk
>> failure or 1 storage server failure. Not only does this protect against
>> some sort of technical failure, but also the human factor that could occur
>> during patching
>>
>> If you must consider normal redundancy, I would go with FLEX Diskgroups
>> vs configuring the entire rack as normal redundancy. That way, if you
>> must, then you can specify the redundancy at the table space level rather
>> than the disk group level. Should you change your mind later, its a simple
>> alter command to change the redundancy rather than tearing down the entire
>> rack and rebuilding it.
>>
>>
>> ---
>>
>> Thanks,
>>
>>
>> Shane Borden
>> sborden76_at_yahoo.com
>>
>> On Feb 15, 2021, at 1:55 AM, Lok P <loknath.73_at_gmail.com> wrote:
>>
>> Thanks much Andy.
>>
>> Yes we have an existing machine that is X5-2 half Rac. (It's basically
>> a full RAC machine logically splitted to two half RAC and we have only this
>> database hosted on this half RAC). (And we are currently ~150TB and keep
>> growing so we're planning for NORMAL redundancy.)
>>
>> In current X5 I am seeing its ~80TB hard disk+6TB flash disk per storage
>> cell. When you said *"The Exadata X8 and X8M storage cells have 14TB
>> disks. With 12 per cell, that's 168TB *per cell*." *Does it mean the
>> sum of flash+hard disk is ~168TB per cell? What is the percentage/share of
>> flash disk and hard disk in that?
>>
>> Apart from current storage saturation, with regards to the IOPS issue in
>> our current X5 system, I am seeing in OEM the flash IOPS is reaching to
>> ~2000K for large reads and the max limit is showing somewhere near
>> ~1.3milion. Overall IO utilization for flash disk is showing ~75%. Hard
>> disk IO limit shows as ~20K and most of the time it looks to be both small
>> reads and large reads are staying below this limit. The overall IO
>> utilization is staying below ~30% for the hard disks.
>>
>> Just got to know from the infra team the X8 to which we are planning to
>> move into is not extreme flash rather high capacity disk only(similar to
>> what we have on current X5) , but considering more flash storage and hard
>> disk storage in each of the 7 - storage cells, we are expecting the new X8
>> will satisfy the current capacity crunch both wrt space and IOPS.
>>
>> And as you just mentioned in your explanation that adding more storage
>> cells will also help in bumping up the capacity. So should we consider
>> adding a few more new storage cells on and above half rac to make it 8 or 9
>> storage cells in total and if this is standard practice in the exadata
>> world?
>>
>> Regards
>> Lok
>>
>> On Sat, Feb 13, 2021 at 9:00 PM Andy Wattenhofer <watt0012_at_umn.edu>
>> wrote:
>>
>>> The Exadata X8 and X8M storage cells have 14TB disks. With 12 per cell,
>>> that's 168TB *per cell*. You haven't mentioned which rack size your X5
>>> machine is, but from the numbers you're showing it looks like maybe a half
>>> rack. A half rack of X8M will come with 1PB of total disk, giving you over
>>> 300TB of usable space to divide between your RECO and DATA disk groups if
>>> you are using HIGH redundancy. That seems plenty for your 150TB database.
>>> But if you need more, add another storage cell.
>>>
>>> As for performance degradation from using HIGH redundancy, you need to
>>> consider that the additional work of that extra write is being taken on by
>>> the storage cells. By definition the redundant block copies must go to
>>> separate cells. NORMAL redundancy writes to two cells and HIGH goes to
>>> three. In aggregate, each write will be as fast as your slowest cell. So
>>> any difference in write performance is more a function of the total number
>>> of cells you have to share the workload. That difference would be
>>> diminished as you increase the number of cells in the cluster.
>>>
>>> And of course that difference would be mitigated by the write back cache
>>> too because writes to the flash cache are faster than writes to disk.
>>>
>>> Honestly, I can't imagine that Oracle would sell you an Exadata machine
>>> where any of this would be a problem for you. It would be so undersized
>>> from the beginning that your problems with it would be much greater than
>>> any marginal difference in write performance from using high redundancy.
>>>
>>> Andy
>>>
>>>
>>> On Fri, Feb 12, 2021 at 10:31 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>
>>>> Thanks Much.
>>>>
>>>> I got some doc but missed. below URL also pointing to high redundancy
>>>> as requirement but may be its not compulsory as you stated.
>>>>
>>>> We have the size of our database i.e. (~150TB), so we were thinking to
>>>> have some space reduction using double mirror rather triple mirroring. but
>>>> i was not aware that the disk size itself is a lot bigger in X8 and as you
>>>> stated in X8 we have a bigger size disk and so the mirroring will take a
>>>> lot of time(in case of crash/failure) and thus HIGH redundancy is
>>>> recommended. I think we have to relook into the same. Note- What I see in
>>>> the current X-5 machine, we have ~6TB flash/storage server and ~80TB hard
>>>> disk/storage server. Not sure what that is in case of X8 though.
>>>>
>>>> And another doubt i had was, Is it also true that the IOPS will be
>>>> degraded by some percentage in case of tripple mirror as compared to double
>>>> mirror as because it has to write one more additional copy of data block
>>>> to flash/disk?
>>>>
>>>>
>>>> https://stefanpanek.wordpress.com/2017/10/20/exadata-flash-cache-enabled-for-write-back/
>>>>
>>>> On Fri, Feb 12, 2021 at 8:52 PM Ghassan Salem <salem.ghassan_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Please, can you point to where you saw that write-back is only
>>>>> possible with high-redundancy?
>>>>> High redundancy is very much recommended with X8 due to the size of
>>>>> the disks, and the time it takes to re-mirror in case of disk loss: if
>>>>> you're in a normal redundancy, and you loose a disk, while re-mirroring is
>>>>> being done, you don't have any second copy of that data, and so, if you
>>>>> loose yet-another disk, you're in big trouble. With lower capacity disks,
>>>>> the re-mirroring takes much less time, and so the risk is lower.
>>>>>
>>>>> regards
>>>>>
>>>>> On Fri, Feb 12, 2021 at 3:57 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>>> Basically i am seeing many doc stating triple mirroring is
>>>>>> recommended with "write back flash cache" and some other stating write back
>>>>>> flash cache is not possible without HIGH redundancy/triple mirroring. So
>>>>>> there is a difference between these two statements because if we decide to
>>>>>> go for NORMAL redundancy to save some space and to have some IO benefit(in
>>>>>> terms of not writing one more additional data block copy). But we want to
>>>>>> utilize the "write back flash cache" option to get benefits on write IOPS.
>>>>>> And in this case if restriction is put in place for "High redundancy" we
>>>>>> won't be able to do that. Please Correct me if my understanding is wrong?
>>>>>>
>>>>>> On Fri, Feb 12, 2021 at 12:53 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>
>>>>>>> Seeing in below doc which state its recommended to go for high
>>>>>>> redundancy ASM disk group(i.e. triple mirroring) in case we are using write
>>>>>>> back flash cache as because the data will be first written/stay in flash
>>>>>>> cache and flushed to the disk later stage and in case of failure it has to
>>>>>>> be recovered from mirror copy. But i am wondering , is this not possible
>>>>>>> with double mirroring , will it not survive the data loss in case of
>>>>>>> failure? Want to understand what is the suggested setup which will give
>>>>>>> optimal space usage without compromising IOPS and data loss.
>>>>>>>
>>>>>>>
>>>>>>> https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-software-introduction.html#GUID-E10F7A58-2B07-472D-BF31-28D6D0201D53
>>>>>>>
>>>>>>> Regards
>>>>>>> Lok
>>>>>>>
>>>>>>> On Fri, Feb 12, 2021 at 10:42 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Listers, We are moving from exadata X5 to X8 and there are
>>>>>>>> multiple reasons behind it. Few of them are , 1)we are almost going to
>>>>>>>> saturate the existing storage capacity(DB size reaching ~150TB) in current
>>>>>>>> X5. 2)And the current IOPS on X5 is also reaching its max while the system
>>>>>>>> works during its peak load.
>>>>>>>>
>>>>>>>> We are currently having HIGH redundancy(triple mirroring) for our
>>>>>>>> existing X5 machines for DATA and RECO disk group and DBFS is kept as
>>>>>>>> NORMAL redundancy(double mirroring). Now few folks raised questions on the
>>>>>>>> impact on IOPS and storage space consumption, if we use double
>>>>>>>> mirroring(NORMAl redundancy) vs triple mirroring(High redundancy) in the
>>>>>>>> new X8 machine. I can see the benefit of double mirroring(Normal
>>>>>>>> redundancy) being saved in storage space(around 1/3rd in terms of DATA and
>>>>>>>> REDO copies), but then what is the risk wrt data loss, is it okay in a
>>>>>>>> production system? (Note- We do use ZDLRA backup for taking the DB backup.
>>>>>>>> And for disaster recovery we have active data guard physical standby in
>>>>>>>> place which runs in read only mode).
>>>>>>>>
>>>>>>>> With regards to IOPS, we are going with default write back flash
>>>>>>>> cache enabled here. Is it correct that with double mirroring we have to
>>>>>>>> write/read into two places VS in triple mirroring we have to do that in
>>>>>>>> three places , so there will also be degradation in IOPS with triple
>>>>>>>> mirroring/High redundancy as compared to double mirroring? if it's true
>>>>>>>> then by what percentage the IOPS degradation will be there? And then if
>>>>>>>> it's okay if we go for double mirroring as that will benefit us wrt IOPS
>>>>>>>> and also saves a good amount of storage space?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Lok
>>>>>>>>
>>>>>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Feb 15 2021 - 16:45:44 CET

Original text of this message