Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: Sun Boxes Crashing
and yet some more feedback.....
Sun Admits to Memory Problem
By Jaikumar Vijayan
FRAMINGHAM, 28 August, 2000
Problems with a memory component that Sun Microsystems Inc. has been quietly
trying to fix for the past
several months are continuing to plague some large users of Sun's Ultra
Enterprise Unix servers. And Sun
has gone to extraordinary lengths to keep its customers quiet about the issue.
The problem involves an external memory cache on Sun's UltraSPARC II
microprocessor module. Under
certain conditions, it has been triggering system failures and frequent server
reboots at dozens of customer
locations.
Sun Executive Vice President John Shoemaker this week acknowledged that the
company has been grappling
with memory-related problems on "a few dozen" of its Ultra Enterprise servers
for nearly a year.
Sun customers who have been affected by the problem are unwilling to speak
openly about it because Sun
has persuaded many of them to sign nondisclosure agreements, said Tom Henkel, an
analyst at Gartner Group
Inc. in Stamford, Conn.
The nondisclosure agreements were apparently offered with a claim that signing
them would bolster Sun's
commitment to resolving the problem quickly, Henkel said. Sun customers began
reporting the problem as
long as 18 months ago, he said.
Shoemaker this week acknowledged that it may have been a bad idea for Sun to get
its users to sign
nondisclosure agreements. But he said the company took that measure only because
Sun itself was
struggling to pinpoint a reason for the system failures. He added that Sun has
stopped requiring such agreements.
The long-standing nature of the problem and Sun's handling of the issue raise
troubling questions about the
quality of Sun's hardware and support, Henkel said.
One high-profile customer that has had very public problems with Sun hardware is
eBay Inc. The online auctioneer
has suffered a series of hardware-related outages over the past year, including
one this week. It is unclear whether
eBay's problems are related to the memory issue, however.
Gartner plans soon to release an advisory on the memory component issue,
updating one released in November,
because of continued and "frequent client complaints of persistent downtime"
caused by the problem.
Sun insisted this week that the problem hasn't caused any data loss for
customers. But the frequency of reboots
disrupts availability and can cause data loss if applications don't restart
properly, users said.
In the past year, Henkel said, he has talked with at least 50 Sun customers who
complained of hardware reliability
issues caused by defective memory. Systems affected by the problem appear to be
those based on 400-MHz
UltraSPARC-II CPU modules using either a 4MB or 8MB cache.
"There are a lot of very unhappy campers out there," Henkel said. "Sun has been
experimenting for too long now
to find a solution to this problem."
Meta Group Inc. in Stamford, Conn., also has clients that have experienced the problem.
"There was a rash of reliability issues relating to this problem in the March-to-April time frame," though none since then, said Meta Group analyst Brian Richardson. Eight out of 20 of Meta's large Sun accounts reported the problem, Richardson said.
According to Shoemaker, the issue has triggered a massive overhaul of Sun's
quality processes and has already
directly resulted in about eight major hardware and software changes being
incorporated into Sun's Ultra Enterprise
server line.
Sun has also put in place far more rigorous quality and availability testing of
its products and is mandating more
stringent audits of customer sites, environmental conditions and planned
configurations before taking orders on its
high-end servers, Shoemaker said.
By year's end, Sun will release a mirrored memory module that should address
this issue once and for all, Shoemaker
added. In the past several months, Sun has also been in direct contact with the
CIOs at several of the affected companies
to explain Sun's new quality initiative, he said.
"This has been a watershed event for Sun," Shoemaker said, adding that the
company has moved from the back of the
class to class leader with respect to quality.
But according to an MIS manager in North Carolina who has experienced the memory
problem and who spoke on
condition of anonymity, Sun has offered no explanation for the problems. "Sun
has not disclosed any information to
me about their memory issues - not even a brief description," the manager said.
In the past three months, all of the manager's six Sun servers have crashed
because of memory-related problems,
he said. In each instance, Sun swapped out entire CPU modules but offered no
explanation for doing so, he said.
A user at a Midwestern manufacturing company, who also spoke on condition of anonymity, had a similar experience.
"As soon as we reported the issue to Sun, the affected processors were replaced
under service contract," he said.
The company was able to resolve the problem by rearranging "our data center with
the express purpose of lowering
system temperatures," he said. "The systems run 10 to 15 degrees Fahrenheit
cooler than before, and we haven't seen
a problem since."
According to Shoemaker, Sun hasn't been able to narrow the problem to any one
specific cause. Sun believes the
problems may have been caused by a combination of factors, including defective
components from one of Sun's
suppliers, poor packaging of the memory chips on the system boards and
environmental factors.
Meghan Holohan contributed to this report.
"Wasserman, Sara" <sjwasserman_at_pscnet.com> on 08/09/2000 06:50:37
Please respond to ORACLE-L_at_fatcity.com
To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com>
cc: (bcc: GRANT G HOLYOAKE/NSO/CSDA)
Subject: RE: Sun Boxes Crashing
Sun's memory cache problem:
http://www.infoworld.com/articles/hn/xml/00/09/05/000905hnsunmemory.xml
> -----Original Message-----
> From: Rama Malladi [SMTP:rmalladi_at_inteliant.com]
> Sent: Wednesday, September 06, 2000 2:41 PM
> To: Multiple recipients of list ORACLE-L
> Subject: Sun Boxes Crashing
>
> We have several Sun boxes (Solaris 2.6) running Oracle 8, 8i. One of the
> boxes (description given below) Kept rebooting and this machine happens to
> run one of the most critical billing systems (Murphy's law!).
>
> Overall, this machine rebooted some 40 times, in a period of 2 months and
> some nights, it rebooted as many as 10 times! Our SysAdmin contacted Sun
> Engineers and they never told us what exactly was the problem, and kept
> replacing CPUs, Memory boards, SCSI cards etc ... This happened several
> times and last week there was an article in Computer Weekly magazine
> saying
> several customers were having this kind of problem on Sun boxes and Sun
> tried to hush up the matter ...!!
>
> Has anybody else faced this kind of situation?
>
> Just curious ...
> Rama
>
> =================================
> System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise
> E4500/E5500
> SunOS uscaelmux06 5.6 Generic_105181-21 sun4u sparc SUNW,Ultra-Enterprise
>
> --
> Author: Rama Malladi
> INET: rmalladi_at_inteliant.com
>
> Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> San Diego, California -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from). You may
> also send the HELP command for other information (like subscribing).
-- Author: Wasserman, Sara INET: sjwasserman_at_pscnet.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-LReceived on Thu Sep 07 2000 - 17:28:01 CDT
![]() |
![]() |