Nodes Get Rebooted [message #404730] |
Sat, 23 May 2009 01:26 |
kumarrajnishgupta
Messages: 43 Registered: October 2008 Location: noida
|
Member |
|
|
Dear Friends
I have followed this document "Build Your Own Oracle "Build Your Own Oracle RAC Cluster on Oracle Enterprise Linux and iSCSI" by Jeffrey Hunter at site "http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html"
it installed perfectly fine, no problem at all, but the problem is my both nodes get rebooted acidently or creating any tablespace on it or runnig rman for taking backup. If more information required will be provide.
with regds
rajnish
|
|
|
|
|
|
Re: Nodes Get Rebooted [message #404901 is a reply to message #404756] |
Mon, 25 May 2009 06:15 |
kumarrajnishgupta
Messages: 43 Registered: October 2008 Location: noida
|
Member |
|
|
Dear sir,
At googling some one suggenstion to make increase the "Heartbeat dead threshold" in ocfs configuration i do about 500 seconds i think problem get resolved but still today my both nodes going hang I am sending log here
/u01/app/crs/log/linux1/alertlinux1.log
009-05-25 13:35:29.331
[cssd(8441)]CRS-1606:CSSD Insufficient voting files available [1 of 3]. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:50:24.895
[cssd(14622)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:50:24.949
[cssd(14622)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror1. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:50:24.949
[cssd(14622)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror2. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:50:28.682
[cssd(14622)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 linux2 .
2009-05-25 14:50:29.334
[crsd(6327)]CRS-1012:The OCR service started on node linux1.
2009-05-25 14:50:29.612
[evmd(14510)]CRS-1401:EVMD started on node linux1.
2009-05-25 14:50:33.724
[crsd(6327)]CRS-1201:CRSD started on node linux1.
2009-05-25 14:51:02.995
[cssd(14622)]CRS-1603:CSSD on node linux1 shutdown by user.
2009-05-25 14:56:05.799
[cssd(8711)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:56:05.883
[cssd(8711)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror1. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:56:05.929
[cssd(8711)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror2. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:56:07.013
[cssd(8711)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 .
2009-05-25 14:56:07.668
[crsd(7125)]CRS-1012:The OCR service started on node linux1.
2009-05-25 14:56:07.677
[evmd(8580)]CRS-1401:EVMD started on node linux1.
2009-05-25 14:56:11.616
[crsd(7125)]CRS-1201:CRSD started on node linux1.
2009-05-25 14:56:45.170
[cssd(8711)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 linux2 .
2009-05-25 15:17:10.338
[cssd(8709)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:17:10.338
[cssd(8709)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror2. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:17:10.338
[cssd(8709)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror1. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:20:22.769
[cssd(8709)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 .
2009-05-25 15:20:23.298
[crsd(7152)]CRS-1012:The OCR service started on node linux1.
2009-05-25 15:20:23.301
[evmd(8577)]CRS-1401:EVMD started on node linux1.
2009-05-25 15:20:27.259
[crsd(7152)]CRS-1201:CRSD started on node linux1.
2009-05-25 15:35:08.831
[cssd(8592)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror1. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:35:08.850
[cssd(8592)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:35:08.956
[cssd(8592)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror2. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:35:12.811
[cssd(8592)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 linux2 .
2009-05-25 15:35:13.621
[crsd(7181)]CRS-1012:The OCR service started on node linux1.
2009-05-25 15:35:13.632
[evmd(8450)]CRS-1401:EVMD started on node linux1.
2009-05-25 15:35:18.120
[crsd(7181)]CRS-1201:CRSD started on node linux1.
2009-05-25 15:49:00.400
[crsd(4379)]CRS-1012:The OCR service started on node linux1.
2009-05-25 15:49:11.361
[crsd(4379)]CRS-1201:CRSD started on node linux1.
crsd log
2009-05-25 15:47:56.795: [ CRSEVT][3770674080]0CAAMonitorHandler :: 0:Action Script /u01/app/crs/bin/racgwrap(check) timed out for ora.linux1.gsd! (timeout=600)
2009-05-25 15:47:56.795: [ CRSAPP][3770674080]0CheckResource error for ora.linux1.gsd error code = -2
2009-05-25 15:47:56.795: [ CRSEVT][3654159264]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/db_1/bin/racgwrap(check) timed out for ora.orcl.orcl1.inst! (timeout=600)
2009-05-25 15:47:56.795: [ CRSAPP][3654159264]0CheckResource error for ora.orcl.orcl1.inst error code = -2
2009-05-25 15:47:56.896: [ CRSEVT][3696118688]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/db_1/bin/racgwrap(check) timed out for ora.linux1.ASM1.asm! (timeout=600)
2009-05-25 15:47:56.896: [ CRSAPP][3696118688]0CheckResource error for ora.linux1.ASM1.asm error code = -2
2009-05-25 15:47:56.901: [ CRSEVT][3717098400]0CAAMonitorHandler :: 0:Action Script /u01/app/crs/bin/racgwrap(check) timed out for ora.linux1.vip! (timeout=60)
2009-05-25 15:47:56.902: [ CRSAPP][3717098400]0CheckResource error for ora.linux1.vip error code = -2
2009-05-25 15:47:56.953: [ CRSEVT][3781163936]0CAAMonitorHandler :: 0:Action Script /u01/app/crs/bin/racgwrap(check) timed out for ora.linux1.ons! (timeout=600)
2009-05-25 15:47:56.953: [ CRSAPP][3781163936]0CheckResource error for ora.linux1.ons error code = -2
2009-05-25 15:47:56.953: [ CRSEVT][3643669408]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/db_1/bin/racgwrap(check) timed out for ora.linux1.LISTENER_LINUX1.lsnr! (timeout=600)
2009-05-25 15:47:56.953: [ CRSAPP][3643669408]0CheckResource error for ora.linux1.LISTENER_LINUX1.lsnr error code = -2
2009-05-25 15:48:59.687: [ default][4143806144][ENTER]0
2009-05-25 15:49:11.361: [ CRSMAIN][4143806144]0Starting Threads
2009-05-25 15:49:11.361: [ CRSMAIN][4143806144]0CRS Daemon Started.
2009-05-25 16:08:06.537: [ OCRSRV][4098997152]th_select_handler: Failed to retrieve procctx from ht. constr = [145255720] retval lht [-27] Signal CV.
2009-05-25 16:18:09.476: [ OCRSRV][4098997152]th_select_handler: Failed to retrieve procctx from ht. constr = [145255720] retval lht [-27] Signal CV.
ocssd log
CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clscsendx: (0x8354460) Connection not active
[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clssgmSendClient: Send failed rc 6, con (0x8354460), client (0x8354660), proc ((nil))
[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clscsendx: (0x83548a8) Connection not active
[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clssgmSendClient: Send failed rc 6, con (0x83548a8), client (0x8354aa8), proc ((nil))
[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clscsendx: (0x83543e8) Connection not active
evmd log
2009-05-23 15:34:27.907: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2009-05-23 15:34:28.910: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2009-05-23 15:34:29.912: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2009-05-23 15:34:30.915: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2009-05-23 15:34:31.917: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2009-05-23 15:34:32.920: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2009-05-23 15:34:33.922: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2009-05-23 15:34:34.925: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2009-05-23 15:34:35.928: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clssgmSendClient: Send failed rc 6, con (0x83543e8), client (0x8354f00), proc ((nil))
[ CSSD]2009-05-25 15:47:57.557 [4098169760] >TRACE: clscsendx: (0x83854b0) Connection not active
alert_orcl1.log
luster communication is configured to use the following interface(s) for this instance
192.168.2.100
Mon May 25 15:35:56 2009
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=10139
DIAG started with pid=3, OS id=10141
PSP0 started with pid=4, OS id=10143
LMON started with pid=5, OS id=10145
LMD0 started with pid=6, OS id=10147
LMS0 started with pid=7, OS id=10149
LMS1 started with pid=8, OS id=10159
MMAN started with pid=9, OS id=10169
DBW0 started with pid=10, OS id=10171
LGWR started with pid=11, OS id=10173
CKPT started with pid=12, OS id=10175
SMON started with pid=13, OS id=10177
RECO started with pid=14, OS id=10179
CJQ0 started with pid=15, OS id=10181
MMON started with pid=16, OS id=10183
Mon May 25 15:35:57 2009
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=17, OS id=10185
Mon May 25 15:35:57 2009
starting up 1 shared server(s) ...
Mon May 25 15:35:57 2009
lmon registered with NM - instance id 1 (internal mem no 0)
Mon May 25 15:35:57 2009
Reconfiguration started (old inc 0, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 not valid according to instance 1
* domain 0 valid = 0 according to instance 1
Mon May 25 15:35:58 2009
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon May 25 15:35:58 2009
LMS 0: 0 GCS shadows cancelled, 0 closed
Mon May 25 15:35:58 2009
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Mon May 25 15:35:58 2009
LMS 0: 0 GCS shadows traversed, 0 replayed
Mon May 25 15:35:58 2009
LMS 1: 0 GCS shadows traversed, 0 replayed
Mon May 25 15:35:58 2009
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=20, OS id=10241
Mon May 25 15:35:59 2009
ALTER DATABASE MOUNT
Mon May 25 15:36:07 2009
Starting background process ASMB
ASMB started with pid=22, OS id=10446
Starting background process RBAL
RBAL started with pid=23, OS id=10450
Mon May 25 15:36:15 2009
SUCCESS: diskgroup ORCL_DATA1 was mounted
SUCCESS: diskgroup FLASH_RECOVERY_AREA was mounted
Mon May 25 15:36:20 2009
Setting recovery target incarnation to 2
Mon May 25 15:36:22 2009
Successful mount of redo thread 1, with mount id 1215580665
Mon May 25 15:36:22 2009
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE MOUNT
Mon May 25 15:36:23 2009
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Mon May 25 15:36:51 2009
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=32, OS id=11587
Mon May 25 15:36:51 2009
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=33, OS id=11589
Mon May 25 15:36:54 2009
Thread 1 opened at log sequence 8
Current log# 1 seq# 8 mem# 0: +ORCL_DATA1/orcl/onlinelog/group_1.261.687634095
Current log# 1 seq# 8 mem# 1: +FLASH_RECOVERY_AREA/orcl/onlinelog/group_1.258.687634107
Successful open of redo thread 1
Mon May 25 15:36:54 2009
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon May 25 15:36:55 2009
ARC0: STARTING ARCH PROCESSES
Mon May 25 15:36:55 2009
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Mon May 25 15:36:55 2009
SMON: enabling cache recovery
Mon May 25 15:36:55 2009
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARC2 started with pid=34, OS id=11666
Mon May 25 15:37:05 2009
Successfully onlined Undo Tablespace 1.
Mon May 25 15:37:05 2009
SMON: enabling tx recovery
Mon May 25 15:37:05 2009
Database Characterset is WE8ISO8859P1
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=36, OS id=11970
Mon May 25 15:37:16 2009
Completed: ALTER DATABASE OPEN
Mon May 25 15:37:23 2009
ALTER SYSTEM SET service_names='orcl.idevelopment.info',' orcl_taf.idevelopment.info','orcl_taf' SCOPE=MEMORY SID='orcl1';
Mon May 25 15:42:51 2009
Shutting down archive processes
Mon May 25 15:42:56 2009
ARCH shutting down
ARC2: Archival stopped
|
|
|
|
|
|
Re: Nodes Get Rebooted [message #406228 is a reply to message #406013] |
Tue, 02 June 2009 22:57 |
kumarrajnishgupta
Messages: 43 Registered: October 2008 Location: noida
|
Member |
|
|
hi,
This is the RPM i was using for OCFS2
ocfs2-2.6.9-78.0.0.0.1.ELhugemem-1.2.9-1.el4
ocfs2console-1.2.7-1.el4
ocfs2-tools-devel-1.2.7-1.el4
ocfs2-2.6.9-78.0.0.0.1.EL-1.2.9-1.el4
ocfs2-2.6.9-78.0.0.0.1.ELxenU-1.2.9-1.el4
ocfs2-tools-1.2.7-1.el4
ocfs2-2.6.9-78.0.0.0.1.ELsmp-1.2.9-1.el4
with regds
rajnish
|
|
|