Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: RAC+RMAN fails on AIX 5.2 ML7

Re: RAC+RMAN fails on AIX 5.2 ML7

From: herbert koelman <herbert.koelman_at_urbix.fr>
Date: 17 Dec 2006 22:37:21 GMT
Message-ID: <4585c6a1$0$19254$426a74cc@news.free.fr>


On Sun, 17 Dec 2006 09:26:22 -0800, Andreas Piesk wrote:

> herbert koelman schrieb:

>> It probably did show you that I haven't investigated this stuff for weeks
>> now. And it also did tell you that Oracle support hasn't been contacted yet.

>
> you should have mentioned it, nobody knows what you have done so far.
>
>> But if I'm asking if anybody is experiencing any sort of problem running
>> rman to backup a cluster, it's because we (oracle and our team) have
>> not been able to come up with any explanation so far.

>
> no, i don't have any problems backing up RAC (10.2.0.2) on AIX (now
> 5.3TL4).
> i assume you're using 10g too because you mentioned ONS (i highly doubt
> that ONS is the root of your problem).
>
>> But if you are not competent on this subject, don't waste
>> your (and our) time with this very stupid answer.

>
> stupid or not, he made some points.
>
> back to topic (i repeat some questions asked by sybrandb):
>
> what do you mean by 'backup locks up"? it stalls?
> did you use 'truss' to see what the server processes do when this "lock
> up" occurs? how exactly do you back up your database (backup script +
> tns config could help).
>
> next thing: "cluster is failing". please define "failiing". do you see
> node evictions or does the cluster hang or reboot or what exactly
> happens? any hints what's going on in crs logs (especially css)?
>
> regards,
> -ap

Cluster is failing: I meant that 3 nodes on 4 are slowly stopping.Normally when something goes wrong, VIP addresses are moved from the failing machine to one that is up and running. And service can be resumed on the rest of the cluster members. In my case nothing seems to happen. At first, when the rman fails (more infos on this soon), everything seems to be still running fine.

An hour or two after the rman incident, the client application is not able to access data anymore on the first failing node. Active connexion are timed out. New connexion (sqlplus for example) are not possible. Nothing else happens, no reboot or shutdown, simply no more service. Everything seem to be hung.

A couple of hour later all 4 nodes are going down one after an other.

By the way are you using Grid control ? Some nut in our company has installed it without informing the rest of us (no testing, no nothing). Do you know if there are known problems with this tool ? Received on Sun Dec 17 2006 - 16:37:21 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US