Remote "Local Address" of netstat and SCAN listener and VIP on different hosts

From: Yong Huang <yong321_at_yahoo.com>
Date: Mon, 23 Apr 2012 12:51:16 -0700 (PDT)
Message-ID: <1335210676.21370.YahooMailClassic_at_web181216.mail.ne1.yahoo.com>



2-node Oracle 11.2.0.1 RAC, RHEL 5.7 x86_64, kernel 2.6.18-274.7.1.el5

One of the 3 SCAN listeners listens on an IP which exists on the other node of this 2-node RAC.

C:\>nslookup scancs4.<domainname>
...
Name: scancs4.<domainname>
Addresses: 10.111.76.85

          10.111.76.84
          10.111.76.86

dcsrpcora4a ~ $ ifconfig | egrep '10.111.76.84|10.111.76.85|10.111.76.86'
          inet addr:10.111.76.85  Bcast:10.111.76.127  Mask:255.255.255.128
          inet addr:10.111.76.84  Bcast:10.111.76.127  Mask:255.255.255.128

dcsrpcora4b ~ $ ifconfig | egrep '10.111.76.84|10.111.76.85|10.111.76.86'
          inet addr:10.111.76.86  Bcast:10.111.76.127  Mask:255.255.255.128

The problem is that the 3 IP's, supposedly each backed by one Oracle SCAN listener, do not all have SCAN listeners listening on them. Specifically, 10.111.76.84 on node a has no listener, and on node b there *is* a SCAN listener that claims to be listening on that IP. (Note that the 4th field of `netstat -an' is "Local Address".)

dcsrpcora4b ~ $ netstat -anp 2>/dev/null | grep 10.111.76.84 <-- this IP exists on node a

tcp        0      0 10.111.76.84:1521           0.0.0.0:*                   LISTEN      15130/tnslsnr
tcp        0      0 10.111.76.70:55578          10.111.76.84:1521           ESTABLISHED 12061/ora_pmon_orac
dcsrpcora4b ~ $ ps -fp 15130
UID        PID  PPID  C STIME TTY          TIME CMD
oracle   15130     1  0 Jan27 ?        00:04:40 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit

How can a listener process running on its own server (node b) claim to be listening on an IP which is physically located on a different server (node a)? On node a, everything looks normal from the OS perspective, and there actually is a process, named, listening on 10.111.76.84 using port 53. (Not sure why named uses a virtual interface created by Oracle.)

[root_at_dcsrpcora4a ~]# netstat -anp | grep 10.111.76.84
tcp        0      0 10.111.76.84:53             0.0.0.0:*                   LISTEN      5001/named
udp        0      0 10.111.76.84:53             0.0.0.0:*                               5001/named

We know a SCAN VIP can "float" or relocate between the 2 nodes. But at any give point in time, when netstat says a specific IP is local to a specific host, that IP must be given by that host (as shown by ifconfig), not by a different host, regardless what magic Oracle's SCAN listener software does.

Checking with srvctl:

dcsrpcora4a ~ $ srvctl status scan -i 3
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node dcsrpcora4a dcsrpcora4a ~ $ srvctl status scan_listener -i 3 SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node dcsrpcora4b <-- not dcsrpcora4a!

On another RAC cluster, I tried 'srvctl relocate scan' and 'srvctl relocate scan_listener'. In both cases, both the SCAN VIP and SCAN listener are relocated *together* to a different node. It's not possible to reproduce relocating one but not the other.

I believe to correct the problem we have now, we may just run srvctl relocate either the VIP (...84) or the SCAN listener (LISTENER_SCAN3). But I'd like to find out what caused this situation.

Yong Huang

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Apr 23 2012 - 14:51:16 CDT

Original text of this message