Re: One primary with two physical standbys exhibiting different behavior with regard to lag

From: Chris Stephens <cstephens16_at_gmail.com>
Date: Fri, 4 Jan 2019 13:43:08 -0600
Message-ID: <CAEFL0sxRtNJwYBOrD7hHgcQd1Fz_7veKRrHPwWCo-pdC+YfgUA_at_mail.gmail.com>



"Burn the haystack and the needle will appear."

i'm saving that one!

On Fri, Jan 4, 2019 at 1:30 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:

> Given that you are 12.1.0.2, I **think** the relevant document starting
> point is:
>
>
>
> https://docs.oracle.com/database/121/SBYDB/log_transport.htm#SBYDB00400
>
>
>
>
>
> This has definitely evolved over versions, starting with “roll your own”
> in 6.0.36 and various changes through the route to productization.
>
>
>
> I’m not confident any lease after 8 failed to involve some significant
> change (other than “roll your own” still being a process of shipping and
> applying the logs as if you were recovering while periodically cancelling
> recovery and making a clone/rename to test whether your continuous recovery
> could actually be warm started and opened and then mounting and resuming
> recovery of the standby thread.)
>
>
>
> I’d focus on how to do it right from the document before you try to figure
> out what is wrong, since several pairs of eyes have failed to quickly find
> that needle. Burn the haystack and the needle will appear.
>
>
>
> mwf
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Mark W. Farnham
> *Sent:* Friday, January 04, 2019 12:46 PM
> *To:* jt2354_at_gmail.com; 'Sandra Becker'
> *Cc:* 'Neil Chandler'; 'Andrew Kerber'; 'oracle-l'
> *Subject:* RE: One primary with two physical standbys exhibiting
> different behavior with regard to lag
>
>
>
> Is one of these applying from lgwr and the other by shipping logs? (Sorry
> no time to read the details.) That would explain both the lag and the
> faster maximum apply rate, because after the log switches and finally is
> shipped it applies very quickly locally with zero network ack/nak of
> success application.
>
>
>
> Before it was possibly to write to the standby from lgwr, that was the way
> it worked for all the standbys.
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [
> mailto:oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org>] *On
> Behalf Of *John Thomas
> *Sent:* Friday, January 04, 2019 12:27 PM
> *To:* Sandra Becker
> *Cc:* Neil Chandler; Andrew Kerber; oracle-l
> *Subject:* Re: One primary with two physical standbys exhibiting
> different behavior with regard to lag
>
>
>
> If that's not the source of your problems, you need to eliminate it before
> looking further.
>
>
>
> The max and average apply rates are very different for the two standby's.
> (But the slow one's faster. )
>
>
>
> Transport and apply lag on fitsdb1... maybe this is just the mismatch
> between redo and standby log sizes. They should all be the same size and
> you should have one more standby log than the number of redo logs on each.
> But I guess you know that already.
>
>
>
> I should've asked for the ping times., but it's probably not relevant.
>
>
>
> Database - itsdb2
>
> Role: PHYSICAL STANDBY
> Intended State: APPLY-ON
> Transport Lag: 0 seconds (computed 0 seconds ago)
> Apply Lag: 0 seconds (computed 0 seconds ago)
> Average Apply Rate: 71.00 KByte/s
> Active Apply Rate: 450.00 KByte/s
> Maximum Apply Rate: 15.18 MByte/s
>
>
>
> *The standby below is the one that lags.*
>
> DGMGRL> show database verbose fitsdb1
>
> Database - fitsdb1
>
> Role: PHYSICAL STANDBY
> Intended State: APPLY-ON
> Transport Lag: 12 minutes 45 seconds (computed 12 seconds ago)
> Apply Lag: 12 minutes 45 seconds (computed 12 seconds ago)
> Average Apply Rate: 79.00 KByte/s
> Active Apply Rate: 60.79 MByte/s
>
> Maximum Apply Rate: 61.21 MByte/s
>
>
>
>
>
> Regards
>
>
>
> Join
>
>
>
> On Fri, 4 Jan 2019, 15:41 Sandra Becker <sbecker6925_at_gmail.com wrote:
>
> I had an error pop up that indicated a problem with a size difference
> between the standby redo logs and the online redo logs. Standby on the
> primary was 10x the size of online redo logs. All the logs on the standby
> were set up the same as the online redo. I've corrected that just to be
> consistent and follow best practices. While checking correcting the sizes,
> I noticed that the standby that lags never assigns a standby redo log.
> Will be researching that next.
>
>
>
> Sandy
>
>
>
> On Fri, Jan 4, 2019 at 7:35 AM Sandra Becker <sbecker6925_at_gmail.com>
> wrote:
>
> Yes, archiving the current log allows the standby to catch up.
>
>
>
> DGMGRL> show configuration verbose
>
> Configuration - itsdb_DG_CONFIG
>
> Protection Mode: MaxPerformance
> Members:
> itsdb1 - Primary database
> itsdb2 - Physical standby database
> fitsdb1 - Physical standby database
>
> Properties:
> FastStartFailoverThreshold = '30'
> OperationTimeout = '30'
> TraceLevel = 'USER'
> FastStartFailoverLagLimit = '30'
> CommunicationTimeout = '180'
> ObserverReconnect = '0'
> FastStartFailoverAutoReinstate = 'TRUE'
> FastStartFailoverPmyShutdown = 'TRUE'
> BystandersFollowRoleChange = 'ALL'
> ObserverOverride = 'FALSE'
> ExternalDestination1 = ''
> ExternalDestination2 = ''
> PrimaryLostWriteAction = 'CONTINUE'
>
> Fast-Start Failover: DISABLED
>
> Configuration Status:
> SUCCESS
>
>
>
> DGMGRL> show database verbose itsdb1
>
> Database - itsdb1
>
> Role: PRIMARY
> Intended State: TRANSPORT-ON
> Instance(s):
> its1
>
> Properties:
> DGConnectIdentifier = 'itsdb1'
> ObserverConnectIdentifier = ''
> LogXptMode = 'ASYNC'
> RedoRoutes = ''
> DelayMins = '0'
> Binding = 'optional'
> MaxFailure = '0'
> MaxConnections = '1'
> ReopenSecs = '300'
> NetTimeout = '30'
> RedoCompression = 'DISABLE'
> LogShipping = 'ON'
> PreferredApplyInstance = ''
> ApplyInstanceTimeout = '0'
> ApplyLagThreshold = '0'
> TransportLagThreshold = '0'
> TransportDisconnectedThreshold = '30'
> ApplyParallel = 'AUTO'
> StandbyFileManagement = 'AUTO'
> ArchiveLagTarget = '0'
> LogArchiveMaxProcesses = '30'
> LogArchiveMinSucceedDest = '1'
> DbFileNameConvert = '+DATA/ITSDB2/, +DATA/ITSDB1/'
> LogFileNameConvert = '+DATA/ITSDB2/, +DATA/ITSDB1/'
> FastStartFailoverTarget = ''
> InconsistentProperties = '(monitor)'
> InconsistentLogXptProps = '(monitor)'
> SendQEntries = '(monitor)'
> LogXptStatus = '(monitor)'
> RecvQEntries = '(monitor)'
> StaticConnectIdentifier =
> '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=pitsdbs01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=itsdb1_DGMGRL)(INSTANCE_NAME=its1)(SERVER=DEDICATED)))'
> StandbyArchiveLocation = '/backup/itsdb/archives'
> AlternateLocation = ''
> LogArchiveTrace = '0'
> LogArchiveFormat = '%t_%s_%r.arc'
> TopWaitEvents = '(monitor)'
>
> Database Status:
> SUCCESS
>
>
>
> DGMGRL> show database verbose itsdb2
>
> Database - itsdb2
>
> Role: PHYSICAL STANDBY
> Intended State: APPLY-ON
> Transport Lag: 0 seconds (computed 0 seconds ago)
> Apply Lag: 0 seconds (computed 0 seconds ago)
> Average Apply Rate: 71.00 KByte/s
> Active Apply Rate: 450.00 KByte/s
> Maximum Apply Rate: 15.18 MByte/s
> Real Time Query: OFF
> Instance(s):
> its2
>
> Properties:
> DGConnectIdentifier = 'itsdb2'
> ObserverConnectIdentifier = ''
> LogXptMode = 'ASYNC'
> RedoRoutes = ''
> DelayMins = '0'
> Binding = 'optional'
> MaxFailure = '0'
> MaxConnections = '1'
> ReopenSecs = '300'
> NetTimeout = '30'
> RedoCompression = 'DISABLE'
> LogShipping = 'ON'
> PreferredApplyInstance = ''
> ApplyInstanceTimeout = '0'
> ApplyLagThreshold = '0'
> TransportLagThreshold = '0'
> TransportDisconnectedThreshold = '30'
> ApplyParallel = 'AUTO'
> StandbyFileManagement = 'AUTO'
> ArchiveLagTarget = '0'
> LogArchiveMaxProcesses = '30'
> LogArchiveMinSucceedDest = '1'
> DbFileNameConvert = '+DATA/ITSDB1/, +DATA/ITSDB2/'
> LogFileNameConvert = '+DATA/ITSDB1/, +DATA/ITSDB2/'
> FastStartFailoverTarget = ''
> InconsistentProperties = '(monitor)'
> InconsistentLogXptProps = '(monitor)'
> SendQEntries = '(monitor)'
> LogXptStatus = '(monitor)'
> RecvQEntries = '(monitor)'
> StaticConnectIdentifier =
> '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=pitsdbs02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=itsdb2_DGMGRL)(INSTANCE_NAME=its2)(SERVER=DEDICATED)))'
> StandbyArchiveLocation = '/backup/itsdb/archives'
> AlternateLocation = ''
> LogArchiveTrace = '0'
> LogArchiveFormat = '%t_%s_%r.arc'
> TopWaitEvents = '(monitor)'
>
> Database Status:
> SUCCESS
>
>
>
> *The standby below is the one that lags.*
>
> DGMGRL> show database verbose fitsdb1
>
> Database - fitsdb1
>
> Role: PHYSICAL STANDBY
> Intended State: APPLY-ON
> Transport Lag: 12 minutes 45 seconds (computed 12 seconds ago)
> Apply Lag: 12 minutes 45 seconds (computed 12 seconds ago)
> Average Apply Rate: 79.00 KByte/s
> Active Apply Rate: 60.79 MByte/s
> Maximum Apply Rate: 61.21 MByte/s
> Real Time Query: OFF
> Instance(s):
> fits1
>
> Properties:
> DGConnectIdentifier = 'fitsdb1'
> ObserverConnectIdentifier = ''
> LogXptMode = 'ASYNC'
> RedoRoutes = ''
> DelayMins = '0'
> Binding = 'optional'
> MaxFailure = '0'
> MaxConnections = '1'
> ReopenSecs = '300'
> NetTimeout = '30'
> RedoCompression = 'DISABLE'
> LogShipping = 'ON'
> PreferredApplyInstance = ''
> ApplyInstanceTimeout = '0'
> ApplyLagThreshold = '0'
> TransportLagThreshold = '0'
> TransportDisconnectedThreshold = '30'
> ApplyParallel = 'AUTO'
> StandbyFileManagement = 'AUTO'
> ArchiveLagTarget = '0'
> LogArchiveMaxProcesses = '30'
> LogArchiveMinSucceedDest = '1'
> DbFileNameConvert = '+DATA/ITSDB2/, +DATA/FITSDB1/'
> LogFileNameConvert = '+DATA/ITSDB2/, +DATA/FITSDB1/'
> FastStartFailoverTarget = ''
> InconsistentProperties = '(monitor)'
> InconsistentLogXptProps = '(monitor)'
> SendQEntries = '(monitor)'
> LogXptStatus = '(monitor)'
> RecvQEntries = '(monitor)'
> StaticConnectIdentifier =
> '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=fitsdbs1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=fitsdb1_DGMGRL)(INSTANCE_NAME=fits1)(SERVER=DEDICATED)))'
> StandbyArchiveLocation = '/backup/itsdb/archives'
> AlternateLocation = ''
> LogArchiveTrace = '0'
> LogArchiveFormat = '%t_%s_%r.arc'
> TopWaitEvents = '(monitor)'
>
> Database Status:
> SUCCESS
>
>
>
>
>
> I may be missing something, but I didn't see any differences in the config
> between the two standbys.
>
>
>
> Sandy
>
>
>
> On Fri, Jan 4, 2019 at 3:11 AM Neil Chandler <neil_chandler_at_hotmail.com>
> wrote:
>
> Sandy,
>
>
>
> So if you do a manual "alter system archive log current" on the Primary,
> it catches up?
>
>
>
> I would look at the config parameters to compare the configuration. Could
> you share the configuration info (db/server names suitably redacted):
>
>
>
> dgmgrl> show configuration verbose
>
> dgmgrl> show database verbose <primary|standby2|standby2>
>
>
>
>
>
> regards
>
>
>
> Neil Chandler
>
> Database Guy, Knows Things.
> ------------------------------
>
> *From:* Sandra Becker <sbecker6925_at_gmail.com>
> *Sent:* 03 January 2019 22:49
> *To:* John Thomas
> *Cc:* Andrew Kerber; Neil Chandler; oracle-l
> *Subject:* Re: One primary with two physical standbys exhibiting
> different behavior with regard to lag
>
>
>
> My understanding is the network setup is the same between the standbys.
> I didn't look at the network right away. :-) I made the changes
> suggested by Neil, but I'm still seeing the delay. Before I did a manual
> log switch, the delay was over 30 minutes. Not good for this critical
> production standby.
>
>
>
> Sandy
>
>
>
> On Thu, Jan 3, 2019 at 3:36 PM John Thomas <jt2354_at_gmail.com> wrote:
>
> There's no excessive delay between your primary and the lagging standby is
> there? Smaller pipe? Lots of network retries?
>
>
>
> Probably the second thing you checked...
>
>
>
> Regards,
>
>
>
> John
>
>
>
> On Thu, 3 Jan 2019 at 22:10, Sandra Becker <sbecker6925_at_gmail.com> wrote:
>
> Thanks, Andrew. That's one of the first things I checked. It's the same
> on both standbys.
>
>
>
> Sandy
>
>
>
> On Thu, Jan 3, 2019 at 2:47 PM Andrew Kerber <andrew.kerber_at_gmail.com>
> wrote:
>
> Neil most likely spotted the problem. But you should also check to make
> sure that the protection mode is the same on both standbys.If the instance
> that is behind is using the maximum performance (async) mode it can run a
> ways behind the primary.
>
>
>
> On Thu, Jan 3, 2019 at 3:36 PM Neil Chandler <neil_chandler_at_hotmail.com>
> wrote:
>
> Sandy,
>
>
>
> Have you checked the Standby Redo logs? There's a slight (annoying) change
> in Oracle 12.1 onwards which means that Standby Redo logs get created with
> Thread 0 instead of Thread 1 by default (for a single instance database).
> Redo can only use Standby Redo when the threads are the same. If this is
> RAC you need Standby Redo for each thread - and you must have 1 more
> Standby Redo than Online Redo for each thread.
>
>
>
> By coincidence, I wrote a blog post about this 10 minutes ago.
>
>
>
> https://chandlerdba.com/2019/01/03/data-guard-unexpected-lag/
>
>
>
> regards
>
>
>
> Neil Chandler
>
> Database Guy. Knows Things.
>
>
> ------------------------------
>
> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
> behalf of Sandra Becker <sbecker6925_at_gmail.com>
> *Sent:* 03 January 2019 20:29
> *To:* oracle-l
> *Subject:* One primary with two physical standbys exhibiting different
> behavior with regard to lag
>
>
>
> Oracle 12.1.0.2
>
> RHEL7
>
>
>
> To begin with, I have not worked much at all with standby databases, so my
> knowledge is somewhat lacking.
>
>
>
> For business reasons, we have a primary database with two physical
> standbys. Everything is configured in dgmgrl and enabled. Monitoring with
> EM13c is reporting the lag times, so all looks good for basic setup and
> monitoring. We seem to have significant lag at times on one of the
> standbys, as much as 20 minutes. When looking at v$managed_standby, we see
> the status as "WAIT_FOR_LOG". The other standby never seems to be more
> that a few seconds behind, if at all, and the status is "APPLYING_LOG".
>
>
>
> Is this normal? I've been researching, but haven't found an answer yet.
> I didn't create or start the standby databases, so I don't have any idea
> what was actually done that could be causing this behavior. Any
> suggestions would be appreciated.
>
>
>
> Thank you,
>
>
> --
>
> Sandy B.
>
>
>
> --
>
> Andrew W. Kerber
>
> 'If at first you dont succeed, dont take up skydiving.'
>
>
>
> --
>
> Sandy B.
>
>
>
> --
>
> Sandy B.
>
>
>
> --
>
> Sandy B.
>
>
>
> --
>
> Sandy B.
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jan 04 2019 - 20:43:08 CET

Original text of this message