Re: Redundant/failure-tolerant Oracle setup

From: Phil Herring <revdoc_at_uow.edu.au>
Date: 1997/10/01
Message-ID: <60spjs$429$1@wyrm.its.uow.edu.au>#1/1

In article <60smj3$pvl_at_lovecraft.nwnet.net> Anthony Talltree, aad_at_nwnet.net writes:
>We're looking at setting up a mid-sized database, and would like to do
>so with redundant machines to maximize uptime and tolerance to component
>failures. [...]

It depends on how much downtime you can tolerate in the event of a failure. Any solution that keeps downtime very close to zero will probably have some shared resources (e.g., the dual-ported RAID that you mention), which will provide a common point of failure that may stop the whole show if it breaks. You have to decide whether or not you can live with that.

OTOH, a system that minimises or eliminates common points of failure will probably leave you with a longer period of downtime during a failure, and you may also lose some data. For example, you can run two totally separate systems, one for production, the other as a hot standby; continuously FTP the redo logs from the production system and apply them to the standby system to keep it up to date. You'll lose a longer period between the production system failing and the standby system becoming available, and possibly also some transactions made since the last log switch, but you have fewer shared components to take both out.

OTOH, you might just go with redundant hardware (hot-swappable CPUs, PSUs, disks, etc.) and hope that you don't have a catastrophic failure. You'll probably want a minimal backup system available if you go this way - or a *very* good maintenance contract.

And don't forget all the other system components that you're relying on: your network, your UPS, your staff, procedures, and security. The most reliable hardware in the world won't matter a damn if a disgruntled employee runs rm -rf * as root in /, and if nobody can connect to the box because your router has died, they'll be just as annoyed as if the box wasn't there at all.

In any case, you have to ask yourself if the cost and risk of any given solution is worth it. What does an outage cost per second/minute/hour? What is the probability of a given type of outage? How much downtime can you expect in a year, and how much will that cost? Ideally, you'll be able to answer these questions with numbers, and then maybe you can make a rational decision. Or you can do what most of us mortals do, and just make the best choice that you can, within your budget.

Lastly, I'd recommend not using RAID5 for an Oracle database. (Stand back - this may start a flamewar :-) Stick with mirroring and striping. Disk is cheap, and it's too fundamental to skimp on.

Received on Wed Oct 01 1997 - 00:00:00 CDT