Re: Client sqlplus SELECT + SPOOL knocks down server network

From: DA Morgan <damorgan_at_psoug.org>
Date: Mon, 10 Mar 2008 10:27:45 -0700
Message-ID: <1205170062.458836@bubbleator.drizzle.com>


Carlos wrote:
> On 10 mar, 14:08, DA Morgan <damor..._at_psoug.org> wrote:

>> Carlos wrote:
>>> On 10 mar, 13:21, DA Morgan <damor..._at_psoug.org> wrote:
>>>> Carlos wrote:
>>>>> Hello all:
>>>>> I'm facing a strange problem on a 10.2.0.3 Oracle on Oracle Enterprise
>>>>> Linux 5 server.
>>>>> This morning someone was issuing a SELECT from a windows client via
>>>>> ODBC and suddenly all the server communications broke down: even no
>>>>> ping from or to that server worked.
>>>>> Tried stop&start network but no avail. The server appears to be dead
>>>>> at communications until reboot.
>>>>> So I tried to repeat the SELECT from sqlplus on a windows client to a
>>>>> OS file with SPOOL and all the communications on the server are gone
>>>>> once again. Ping only answers to 127.0.0.1 or its own IP address, and
>>>>> Linux is telling me that the network card is OK. Stop&start network
>>>>> didn't fix the problem either.
>>>>> Client sqlnet.log shows:
>>>>> Fatal NI connect error 12170.
>>>>>   VERSION INFORMATION:
>>>>>    TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
>>>>>    Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version
>>>>> 10.2.0.1.0 - Production
>>>>>   Time: 10-MAR-2008 12:11:59
>>>>>   Tracing not turned on.
>>>>>   Tns error struct:
>>>>>     ns main err code: 12535
>>>>>     TNS-12535: Message 12535 not found; No message file for
>>>>> product=NETWORK, facility=TNS
>>>>>     ns secondary err code: 12560
>>>>>     nt main err code: 505
>>>>>     TNS-00505: Message 505 not found; No message file for
>>>>> product=NETWORK, facility=TNS
>>>>>     nt secondary err code: 60
>>>>>     nt OS err code: 0
>>>>>   Client address: <unknown>
>>>>> The select is a two column (varchar2(9), char(1)) select of 11 million
>>>>> rows aprox.
>>>>> Could anybody tell me where to search next? (I didn't find anything at
>>>>> metalink).
>>>>> TIA.
>>>>> Cheers.
>>>>> Carlos.
>>>> Some of what you've written seems self-conflicting.
>>>> You start off telling us you've 10.2.0.3 and then everything
>>>> thereafter indicates 10.2.0.1.
>>>> Then you say "dead until reboot." Does that mean it is now
>>>> working fine or not?
>>>> How about queries using Oracle's native interface SQLNET
>>>> rather than ODBC?
>>>> Ping only answers to 127.0.0.1, I presume on that machine,
>>>> have you verified that other addresses exist?
>>>> Have you performed an ipconfig /renew?
>>>> Perform an ipconfig /all ... what does it show?
>>>> Are you using DHCP?
>>>> Can you ping the server from elsewhere on the network?
>>>> What are your firewall settings?
>>>> What tool is connecting using ODBC?
>>>> --
>>>> Daniel A. Morgan
>>>> Oracle Ace Director & Instructor
>>>> University of Washington
>>>> damor..._at_x.washington.edu (replace x with u to respond)
>>>> Puget Sound Oracle Users Groupwww.psoug.org
>>> Hello Daniel:
>>>> You start off telling us you've 10.2.0.3 and then everything
>>>> thereafter indicates 10.2.0.1.
>>> Yup: 10.2.0.3 is the server on EL5. 10.2.0.1 was the windows client
>>> from which I did the test.
>>>> Then you say "dead until reboot." Does that mean it is now
>>>> working fine or not?
>>> Working fine after reboot... until I issue the SELECT again.
>>>> How about queries using Oracle's native interface SQLNET
>>>> rather than ODBC?
>>> This is what I did on the windows 10.2.0.1 client with sqlplus.
>>>> Ping only answers to 127.0.0.1, I presume on that machine,
>>>> have you verified that other addresses exist?
>>> Yes. If I ping the server network IP address from the server itself,
>>> it answers OK, but no answer from any other IP in the network. If I
>>> ping the server from another machine there is no answer.
>>>>> Are you using DHCP?
>>> Nope.
>>>> Can you ping the server from elsewhere on the network?
>>> Nope
>>>> What are your firewall settings?
>>> No firewall. Isolated environment.
>>>> What tool is connecting using ODBC?
>>> Visual FoxPro. But, as said before, sqlplus bring the server
>>> communications to its knees too.
>>> I'll try the ipconfig (after repeating the whole process from startup
>>> the server again).
>>> Thank you very much.
>>> Cheers.
>>> Carlos.
>> A few additional thoughts.
>>
>> 1. Upgrade the client to 10.2.0.3
>> 2. Post the SQL statement: I'd like to see it
>> 3. Capture the statement and see if it contains any non-visible
>>     characters when Fox sends it through
>> 4. What ODBC driver? Try changing drivers
>>
>> Is it only this one SQL statement or can more than one statement
>> do it? What happens if you take the problem statement and
>> reconstruct it from scratch in SQL*Plus starting with the
>> simplest SELECT single_column FROM single_table and then
>> incrementally add back its full functionality? When does it
>> break?
>> --
>> Daniel A. Morgan
>> Oracle Ace Director & Instructor
>> University of Washington
>> damor..._at_x.washington.edu (replace x with u to respond)
>> Puget Sound Oracle Users Groupwww.psoug.org

>
> Daniel:
>
> I am doing the test from sqlplus (no ODBC this time)
>
>> 2. Post the SQL statement: I'd like to see it

>
> select /*+ full(a) */
> a.col1,
> a.col2
> from schema.table_name a;
>
> It starts displaying/spooling and then breaks with:
>
> ...
> 955800207 T
> ERROR:
> ORA-03135: la conexión ha perdido contacto
>
> 10078290 filas seleccionadas.
>
> ERROR:
> ORA-03114: no conectado a ORACLE
>
> (There are 11339771 rows in the table).
>
> From here on, there is no ping from or to the server.
>
> ifconfig on the server shows:
>
> RX packets:674109 errors:0 dropped:0 overruns:0 frame:0
> TX packets:674109 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:53271778 (50.8 MiB) TX bytes:259126296 (247.1 MiB)
> Interrupt:169
>
> And it's time to reboot again...
>
> Cheers.
>
> Carlos.

Aha:

https://metalink.oracle.com/metalink/plsql/f?p=130:14:11479253662951723337::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,455771.1,1,1,1,helvetica

The title is: Subject: ORA-03113 Or ORA-03135 Reported for Failover with Network Outage

The doc id is: Note:455771.1

The date is: 10-MAR-2008

That's today.

The bug number is:
Bug 5758870 Select Type TAF fails with ORA-3113 If Network Cable Pulled off

The solution is:
Problem is fixed from 11g onwards and in 10.2.0.4 Patch set release. One off patchs for some platforms are available. Apply one-off Patch 5758870 for your platform, if available.

Are you on a RAC cluster?

-- 
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damorgan_at_x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Received on Mon Mar 10 2008 - 12:27:45 CDT

Original text of this message