Re: Requesting a RFS walkthrough
Date: Mon, 15 Jun 2009 15:41:48 +1000
Message-ID: <d6f0def50906142241g19b96b1bx86447efba556dac3_at_mail.gmail.com>
Hi Charles,
Did you find the good explanation on what happened during the time
period reflected in allert.log bit you published?
My version is as following:
-A- 10:45:02 2009 both RFS processes lost their connections to the
processes those has been created by on Primary database.
To my understanding RFS processes are LOCAL=NO processes on Standby
side, therefore those are created from Primary side.
Just to confirm that you can do the following:
RAN ON STANDBY
SQL> SELECT PID, STATUS,PROCESS,CLIENT_PROCESS,thread#, sequence#, block# FROM V$MANAGED_STANDBY where PROCESS='RFS';
PID STATUS PROCESS CLIENT_P THREAD# SEQUENCE# BLOCK# ---------- ------------ --------- -------- ---------- ---------- ----------
6319 IDLE RFS LGWR 1 127 14 6419 IDLE RFS UNKNOWN 0 0 0
SQL>
$ ps -fp 6319
oracle 6319 1 0 07:27 ? 00:00:00 oracleGTST (LOCAL=NO)
$ /usr/sbin/lsof -p 6319 | grep TCP
oracle 6319 oracle 18u IPv4 519196 TCP
rhel4_1.testing.com:1528->rhel4_1.testing.com:8248 (ESTABLISHED)
# As both Standby and Primary are on the same host in my cfg you see
both records. In real world case you would see just one on primary
side.
$ netstat -tp | grep 8248
tcp 0 0 rhel4_1.testing.com:1528
rhel4_1.testing.com:8248 ESTABLISHED 6319/oracleGTST
tcp 0 0 rhel4_1.testing.com:8248
rhel4_1.testing.com:1528 ESTABLISHED 6301/ora_lns1_STST
===== CONCLUSION ===== > 6301/ora_lns1_STST is the farther (on
primary) of RFS LGWR process (on secondary)
oracle 6419 1 4 07:27 ? 00:00:11 oracleGTST (LOCAL=NO) oracle 6419 oracle 18u IPv4 519550 TCP rhel4_1.testing.com:1528->rhel4_1.testing.com:8256 (ESTABLISHED) tcp 0 0 rhel4_1.testing.com:1528 rhel4_1.testing.com:8256 ESTABLISHED 6419/oracleGTST tcp 0 0 rhel4_1.testing.com:8256rhel4_1.testing.com:1528 ESTABLISHED 17964/ora_arc1_STST ===== CONCLUSION ===== > 17964/ora_arc1_STST is the farther (on primary) of RFS UNKNOWN process (on secondary)
It might happen for 2 reasons:
1. network interrupted (you said it is unlikely) 2. both processes diapered on Primary side. If this is the case you would find some messages in Primary alert.log file
- B - 10:45:15 2009 Aster redolog switch Primary database successfully tried to reestablish RFS processes. Those started to ship redolog stream. Standby DB recognized that it missing a bit of REDO stream to continue real time redo processing and contacted Primary for gap resolution using FAL_SERVER parameter.
Those just my 0.02$,
Regards,
Yury
On Sat, Jun 6, 2009 at 4:09 AM, Charles Schultz<sacrophyte_at_gmail.com> wrote:
> Very interesting, but we are not receiving any ora-600 errors. As shown in > the example, the issue is self-correcting; I am merely trying to understand > the mechanics at a deeper level. > > On Fri, Jun 5, 2009 at 12:20, Goulet, Richard <Richard.Goulet_at_parexel.com> > wrote: >> >> Check out Metalink >> Doc ID: 386417.1 >> understanding >> >> Dick Goulet >> Senior Oracle DBA >> PAREXEL International >> >> >> ________________________________ >> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] >> On Behalf Of Charles Schultz >> Sent: Friday, June 05, 2009 12:57 PM >> To: ORACLE-L >> Subject: Requesting a RFS walkthrough >> >> Good day, list, >> I am curious if someone would be able to divulge some of the under >> pinnings of RFS. The snippet below is from our alert.log on the standby; >> this is one of two 10.2.0.2 physical standby databases with no DB Broker >> using real-time apply, and we have been seeing this situation happen more >> and more often. Our networking group tells us there are no issues with the >> infrastructure, so I am struggling to understand why this happening. >> My basic understanding of RFS explains the situation like so: >> >> At 10:45:02, the heartbeat process attempts to talk to Primary but is >> unable to. I assume there are two similar messages because we have two >> FAL_SERVERS, but I could be way off in that. >> The next few verbose messages show a connection being reestablished with >> the primary. >> At 10:45:20, FAL attempts to get logseq 6045. Why does it fail? The log >> was archived on the Primary at 10:45:15 (v$archived_log.completion_time). >> It fails again at 10:45:55, but succeeds at 10:45:56. Why? >> >> Fri Jun 5 10:45:02 2009 >> RFS[143]: Possible network disconnect with primary database >> Fri Jun 5 10:45:05 2009 >> RFS[144]: Possible network disconnect with primary database >> Fri Jun 5 10:45:15 2009 >> RFS[142]: Successfully opened standby log 22: >> '+DATA/gcprod/onlinelog/group_22.352.633713051' >> Fri Jun 5 10:45:15 2009 >> RFS[139]: Successfully opened standby log 25: >> '+DATA/gcprod/onlinelog/group_25.262.633713253' >> Fri Jun 5 10:45:16 2009 >> Redo Shipping Client Connected as PUBLIC >> -- Connected User is Valid >> RFS[147]: Assigned to RFS process 18916 >> RFS[147]: Identified database type as 'physical standby' >> Primary database is in MAXIMUM PERFORMANCE mode >> Primary database is in MAXIMUM PERFORMANCE mode >> Fri Jun 5 10:45:16 2009 >> Redo Shipping Client Connected as PUBLIC >> -- Connected User is Valid >> RFS[148]: Assigned to RFS process 18918 >> RFS[148]: Identified database type as 'physical standby' >> Primary database is in MAXIMUM PERFORMANCE mode >> Fri Jun 5 10:45:16 2009 >> RFS[147]: Successfully opened standby log 21: >> '+DATA/gcprod/onlinelog/group_21.351.633713041' >> Fri Jun 5 10:45:16 2009 >> Primary database is in MAXIMUM PERFORMANCE mode >> RFS[148]: Successfully opened standby log 24: >> '+DATA/gcprod/onlinelog/group_24.282.633713233' >> Fri Jun 5 10:45:20 2009 >> Fetching gap sequence in thread 1, gap sequence 6045-6045 >> FAL[client]: Error fetching gap sequence, no FAL server specified >> Fri Jun 5 10:45:55 2009 >> FAL[client]: Failed to request gap sequence >> GAP - thread 1 sequence 6045-6045 >> DBID 1222165961 branch 631792716 >> FAL[client]: All defined FAL servers have been attempted. >> ------------------------------------------------------------- >> Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization >> parameter is defined to a value that is sufficiently large >> enough to maintain adequate log switch information to resolve >> archivelog gaps. >> ------------------------------------------------------------- >> Fri Jun 5 10:45:56 2009 >> RFS[145]: Archived Log: '+FRA/gcprod/archivelog/1_6045_631792716.dbf' >> Fri Jun 5 10:46:30 2009 >> Media Recovery Log +FRA/gcprod/archivelog/1_6045_631792716.dbf >> >> -- >> Charles Schultz > > > > -- > Charles Schultz >
-- Jurijs +371 29268222 (+2 GMT) ============================================ http://otn.oracle.com/ocm/jvelikanovs.html -- http://www.freelists.org/webpage/oracle-lReceived on Mon Jun 15 2009 - 00:41:48 CDT