Re: Client sqlplus SELECT + SPOOL knocks down server network

From: hpuxrac <johnbhurley_at_sbcglobal.net>
Date: Tue, 11 Mar 2008 16:14:53 -0700 (PDT)
Message-ID: <4c4b16ed-05bf-4321-8cfe-35090d8b2a74@8g2000hse.googlegroups.com>


On Mar 11, 9:59 am, Carlos <miotromailcar..._at_netscape.net> wrote:
> On 11 mar, 09:43, Carlos <miotromailcar..._at_netscape.net> wrote:
>
>
>
>
>
> > On 10 mar, 18:27, DA Morgan <damor..._at_psoug.org> wrote:
>
> > > Carlos wrote:
> > > > On 10 mar, 14:08, DA Morgan <damor..._at_psoug.org> wrote:
> > > >> Carlos wrote:
> > > >>> On 10 mar, 13:21, DA Morgan <damor..._at_psoug.org> wrote:
> > > >>>> Carlos wrote:
> > > >>>>> Hello all:
> > > >>>>> I'm facing a strange problem on a 10.2.0.3 Oracle on Oracle Enterprise
> > > >>>>> Linux 5 server.
> > > >>>>> This morning someone was issuing a SELECT from a windows client via
> > > >>>>> ODBC and suddenly all the server communications broke down: even no
> > > >>>>> ping from or to that server worked.
> > > >>>>> Tried stop&start network but no avail. The server appears to be dead
> > > >>>>> at communications until reboot.
> > > >>>>> So I tried to repeat the SELECT from sqlplus on a windows client to a
> > > >>>>> OS file with SPOOL and all the communications on the server are gone
> > > >>>>> once again. Ping only answers to 127.0.0.1 or its own IP address, and
> > > >>>>> Linux is telling me that the network card is OK. Stop&start network
> > > >>>>> didn't fix the problem either.
> > > >>>>> Client sqlnet.log shows:
> > > >>>>> Fatal NI connect error 12170.
> > > >>>>>   VERSION INFORMATION:
> > > >>>>>    TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
> > > >>>>>    Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version
> > > >>>>> 10.2.0.1.0 - Production
> > > >>>>>   Time: 10-MAR-2008 12:11:59
> > > >>>>>   Tracing not turned on.
> > > >>>>>   Tns error struct:
> > > >>>>>     ns main err code: 12535
> > > >>>>>     TNS-12535: Message 12535 not found; No message file for
> > > >>>>> product=NETWORK, facility=TNS
> > > >>>>>     ns secondary err code: 12560
> > > >>>>>     nt main err code: 505
> > > >>>>>     TNS-00505: Message 505 not found; No message file for
> > > >>>>> product=NETWORK, facility=TNS
> > > >>>>>     nt secondary err code: 60
> > > >>>>>     nt OS err code: 0
> > > >>>>>   Client address: <unknown>
> > > >>>>> The select is a two column (varchar2(9), char(1)) select of 11 million
> > > >>>>> rows aprox.
> > > >>>>> Could anybody tell me where to search next? (I didn't find anything at
> > > >>>>> metalink).
> > > >>>>> TIA.
> > > >>>>> Cheers.
> > > >>>>> Carlos.
> > > >>>> Some of what you've written seems self-conflicting.
> > > >>>> You start off telling us you've 10.2.0.3 and then everything
> > > >>>> thereafter indicates 10.2.0.1.
> > > >>>> Then you say "dead until reboot." Does that mean it is now
> > > >>>> working fine or not?
> > > >>>> How about queries using Oracle's native interface SQLNET
> > > >>>> rather than ODBC?
> > > >>>> Ping only answers to 127.0.0.1, I presume on that machine,
> > > >>>> have you verified that other addresses exist?
> > > >>>> Have you performed an ipconfig /renew?
> > > >>>> Perform an ipconfig /all ... what does it show?
> > > >>>> Are you using DHCP?
> > > >>>> Can you ping the server from elsewhere on the network?
> > > >>>> What are your firewall settings?
> > > >>>> What tool is connecting using ODBC?
> > > >>>> --
> > > >>>> Daniel A. Morgan
> > > >>>> Oracle Ace Director & Instructor
> > > >>>> University of Washington
> > > >>>> damor..._at_x.washington.edu (replace x with u to respond)
> > > >>>> Puget Sound Oracle Users Groupwww.psoug.org
> > > >>> Hello Daniel:
> > > >>>> You start off telling us you've 10.2.0.3 and then everything
> > > >>>> thereafter indicates 10.2.0.1.
> > > >>> Yup: 10.2.0.3 is the server on EL5. 10.2.0.1 was the windows client
> > > >>> from which I did the test.
> > > >>>> Then you say "dead until reboot." Does that mean it is now
> > > >>>> working fine or not?
> > > >>> Working fine after reboot... until I issue the SELECT again.
> > > >>>> How about queries using Oracle's native interface SQLNET
> > > >>>> rather than ODBC?
> > > >>> This is what I did on the windows 10.2.0.1 client with sqlplus.
> > > >>>> Ping only answers to 127.0.0.1, I presume on that machine,
> > > >>>> have you verified that other addresses exist?
> > > >>> Yes. If I ping the server network IP address from the server itself,
> > > >>> it answers OK, but no answer from any other IP in the network. If I
> > > >>> ping the server from another machine there is no answer.
> > > >>>>> Are you using DHCP?
> > > >>> Nope.
> > > >>>> Can you ping the server from elsewhere on the network?
> > > >>> Nope
> > > >>>> What are your firewall settings?
> > > >>> No firewall. Isolated environment.
> > > >>>> What tool is connecting using ODBC?
> > > >>> Visual FoxPro. But, as said before, sqlplus bring the server
> > > >>> communications to its knees too.
> > > >>> I'll try the ipconfig (after repeating the whole process from startup
> > > >>> the server again).
> > > >>> Thank you very much.
> > > >>> Cheers.
> > > >>> Carlos.
> > > >> A few additional thoughts.
>
> > > >> 1. Upgrade the client to 10.2.0.3
> > > >> 2. Post the SQL statement: I'd like to see it
> > > >> 3. Capture the statement and see if it contains any non-visible
> > > >>     characters when Fox sends it through
> > > >> 4. What ODBC driver? Try changing drivers
>
> > > >> Is it only this one SQL statement or can more than one statement
> > > >> do it? What happens if you take the problem statement and
> > > >> reconstruct it from scratch in SQL*Plus starting with the
> > > >> simplest SELECT single_column FROM single_table and then
> > > >> incrementally add back its full functionality? When does it
> > > >> break?
> > > >> --
> > > >> Daniel A. Morgan
> > > >> Oracle Ace Director & Instructor
> > > >> University of Washington
> > > >> damor..._at_x.washington.edu (replace x with u to respond)
> > > >> Puget Sound Oracle Users Groupwww.psoug.org
>
> > > > Daniel:
>
> > > > I am doing the test from sqlplus (no ODBC this time)
>
> > > >> 2. Post the SQL statement: I'd like to see it
>
> > > > select /*+ full(a) */
> > > > a.col1,
> > > > a.col2
> > > > from schema.table_name a;
>
> > > > It starts displaying/spooling and then breaks with:
>
> > > > ...
> > > > 955800207 T
> > > > ERROR:
> > > > ORA-03135: la conexión ha perdido contacto
>
> > > > 10078290 filas seleccionadas.
>
> > > > ERROR:
> > > > ORA-03114: no conectado a ORACLE
>
> > > > (There are 11339771 rows in the table).
>
> > > > From here on, there is no ping from or to the server.
>
> > > > ifconfig on the server shows:
>
> > > > RX packets:674109 errors:0 dropped:0 overruns:0 frame:0
> > > > TX packets:674109 errors:0 dropped:0 overruns:0 carrier:0
> > > > collisions:0 txqueuelen:1000
> > > > RX bytes:53271778 (50.8 MiB) TX bytes:259126296 (247.1 MiB)
> > > > Interrupt:169
>
> > > > And it's time to reboot again...
>
> > > > Cheers.
>
> > > > Carlos.
>
> > > Aha:
>
> > >https://metalink.oracle.com/metalink/plsql/f?p=130:14:114792536629517...
>
> > > The title is: Subject:  ORA-03113 Or ORA-03135 Reported for Failover
> > > with Network Outage
>
> > > The doc id is: Note:455771.1
>
> > > The date is: 10-MAR-2008
>
> > > That's today.
>
> > > The bug number is:
> > > Bug 5758870 Select Type TAF fails with ORA-3113 If Network Cable Pulled off
>
> > > The solution is:
> > > Problem is fixed from 11g onwards and in 10.2.0.4 Patch set release.
> > > One off patchs for some platforms are available.
> > > Apply one-off Patch 5758870 for your platform, if available.
>
> > > Are you on a RAC cluster?
> > > --
> > > Daniel A. Morgan
> > > Oracle Ace Director & Instructor
> > > University of Washington
> > > damor..._at_x.washington.edu (replace x with u to respond)
> > > Puget Sound Oracle Users Groupwww.psoug.org
>
> > Daniel:
>
> > First of all: thank you very much for your attention and time.
>
> > > Are you on a RAC cluster?
>
> > Nope. Single Instance.
>
> > I've taken a look at the metalink note that you kindly pointed, but no
> > TAF here.
>
> > I'm beggining to suspect of a faulty network card. I've noticed that
> > the problem occurs when there is a long information transfer from
> > server to client, but not based on size, but on the time the transfer
> > takes. I checked with FTP and big files, but it takes not too much
> > time to transfer (the network card and switch are 1000Mbps) and the
> > error does not arise. But when one sqlplus session takes too long to
> > pull out a query the error appears in a random fashion: sometimes at
> > 10M rows, sometimes at 7M rows, even at 700000 rows.
>
> > The server itself works OK as far as its 'local' network activity
> > goes: the listener is up & running and the enterprise manager works
> > too (from a local browser). The same query from a local sqlplus (with
> > listener connection, no bequeath) works too.
>
> > Michael:
>
> > >>This can be tested by moving the system cable to a different switch port
>
> > Yes. I thought of that too (switch issues). But the network boy swears
> > everything is OK. I'll try to convince him to test on other switch.
>
> > Thank you all for the suggestions and hints.
>
> > Cheers.
>
> > Carlos.
>
> Solved (or so it seems)...
>
> After a lot of tests, I discovered that the problem always appeared
> when the client and the server were connected to the same switch
> (!!!). When they were connected to different switches the query wenk
> OK (I think it's because there where 100mbps sections between them: I
> suspect of some kind of overflow issue at the network card).
>
> I searched for a linux driver for the network card (Marvell Yukon).
> After downloading it, I moved my way through the installation process
> (compiling with the Linux source files) and finally 'modprobed' the
> driver...
>
> ...and the problem has not appeared again since then (fingers
> crossed)!
>
> Cheers.
>
> Carlos

We are just starting to take a look at linux.

Modprobed the driver ... what does that mean? Received on Tue Mar 11 2008 - 18:14:53 CDT

Original text of this message