Re: ADG lag after upgrading to 12.1

From: Neil Chandler <neil_chandler_at_hotmail.com>
Date: Tue, 19 Mar 2019 10:56:08 +0000
Message-ID: <DB7PR10MB2090326FC908601DF57BF48485470_at_DB7PR10MB2090.EURPRD10.PROD.OUTLOOK.COM>



So nothing obvious there, unless you happen to be running on SMT4-level cpu's. Oracle is fine when its on SMT2 (cpu 0 and 1) as each thread has its own L2 cache, but if it scales to use SMT4 the AIX server can struggle to be effective as the L2 is shared between threads, so the L2 is effectively halved and you get an increase in L2 cache misses. I'd compare the lag at 20% used and 60% used to see it is really 60% used or effectively running out. Can you see the AIX metrics to see if the threads are running on SMT4 a lot - the 3rd and 4th CPU's per processor (e.g. use mpstat -s to see the Processor->SMT cpu relationship, and something like mpstat -d 15 to see which are active)

I assume you're pulling the lag information from V$DATAGUARD_STATS ? As you are using Active DataGuard, V$STANDBY_EVENT_HISTOGRAM should be populated with useful lag information.

It might be worth checking the settings in LOG_ARCHIVE_DEST_n, and see if you are using AFFIRM or NOAFFIRM. Switching to NOAFFIRM would confirm if the lag was cause by waiting for the write to SRL to ack (and to confirm there's no parameters in there like DELAY=nnn ) although that may well be classed as transport lag.

Neil.



From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on behalf of Rich J <rjoralist3_at_society.servebeer.com> Sent: 18 March 2019 13:18
To: oracle-l_at_freelists.org
Subject: Re: ADG lag after upgrading to 12.1

On 2019/03/17 14:58, Neil Chandler wrote:

 A lag of 20-30 seconds seems very high on the same server, with no network latency. What resource contention do you have on the server? Any CPU starvation? Slow disk?

I'm thankfully swimming in CPU -- 4 core POWER7, SMT4, hanging around 20% utilization with peaks below 60%, except when parallel RMAN incremental backups hit their daily run. Disk is XIV SAN, where I don't come near my pre-live tests of 4-5GBs with a 50/50 read/write split. Basically, no hardware performance issues -- I've been spoiled there... :)

But since this hasn't changed for 5 years, it seems highly likely to be something related to the DB upgrade.

Any chance you could provide the config? Show configuration verbose
Show database verbose "dbname"

I don't run the broker for various reasons, including issues I had at setup 5 years ago with EM12c BP1, IIRC. But maybe these 2 queries run from the Primary will give you what you're looking for:

SELECT

    vd.database_role,
    vd.force_logging,
    vd.flashback_on,
    vd.log_mode,
    vd.open_mode,
    vd.guard_status,
    vd.protection_mode,
    vd.switchover_status,

    vad.dest_id
FROM
    v$database vd, v$archive_dest vad
WHERE
    vad.target = 'STANDBY';
DATABASE_ROLE   FORCE_LOGGING   FLASHBACK_ON    LOG_MODE        OPEN_MODE       GUARD_STATUS    PROTECTION_MODE SWITCHOVER_STATUS       DEST_ID
PRIMARY YES     NO      ARCHIVELOG      READ WRITE      NONE    MAXIMUM PERFORMANCE     TO STANDBY
2

SELECT
    ad.dest_id,ad.status,ad.target,ad.archiver,ad.process,ad.register,ad.transmit_mode,gap_status FROM v$archive_dest ad JOIN v$archive_dest_status ads ON ad.dest_id = ads.dest_id WHERE ad.dest_id = 2;

DEST_ID STATUS TARGET ARCHIVER PROCESS REGISTER TRANSMIT_MODE GAP_STATUS 2

        VALID STANDBY LGWR LGWR YES ASYNCHRONOUS NO GAP Do you have figures for the amount of redo produced at 11.2 and 12.1 ?

Hmmm, that sounds like a metric I should be actively monitoring, but am not. This might be in my EM repository, but a quick look at V$ARCHIVED_LOG shows no big difference after the upgrade, with an average of about 64GB/day.

Thoughts?

Thanks!
Rich

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Mar 19 2019 - 11:56:08 CET

Original text of this message