[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::digital

Title:The Digital way of working
Moderator:QUARK::LIONELON
Created:Fri Feb 14 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5321
Total number of notes:139771

2459.0. "Customer rumor about DEC order entry system" by QUARK::LIONEL (Free advice is worth every cent) Thu Apr 08 1993 21:19

    I found the following on the Internet this evening.   Given that I've
    not heard a peep about anything like this, I'm inclined to write it
    off as a vicious rumor, but I'm sure many customers will believe it.
    Can anyone authoritatively confirm or deny this rumor?
    
    					Steve
    
Article: 10643
Path: dbased.nuo.dec.com!news.crl.dec.com!deccrl!decwrl!decwrl!olivea!uunet!cs.utexas.edu!zaphod.mps.ohio-state.edu!usc!news.service.uci.edu!unogate!mvb.saic.com!info-vax
From: [email protected] (Phil Rand)
Newsgroups: comp.os.vms
Subject: DEC order-entry down nationwide
Message-ID: <Pine.3.05.9304071251.A23536-a100000@paul>
Date: 7 Apr 93 19:03:51 GMT
Organization: Info-Vax<==>Comp.Os.Vms Gateway
Lines: 19
X-Gateway-Source-Info: Mailing List
 
I hear (3rd hand) that DEC's order entry application is down nationwide
(USA), this now being the 11th day.  Apparently they tried some kind of
system upgrade (what kind I don't know) and ran into trouble.  They're
back to processing orders on paper, and you can forget about tracking ship
dates. 
 
Does anybody know details?
 
Didn't SUN have a glitch something like this back in the 80's and nearly
put the company in the toilet?  I'm surprised this hasn't hit the Wall
Street Journal.  (Or maybe it has--I haven't seen today's.)
 
Makes you feel real confident about DEC's system integration skills...
 
--Phil
 
// Phil Rand                                   [email protected]
// Computer & Information Systems                  (206) 281-2428
// Seattle Pacific University, 3307 3rd Ave W, Seattle, WA  98119
T.RTitleUserPersonal
Name
DateLines
2459.1WHERE DO WE KEEP THOSE TAPES????GJOVAX::SEVICThu Apr 08 1993 22:142
    If true sounds like nothing a restore of the old software couldn't
    handle.
2459.2SOLVIT::REDZIN::DCOXFri Apr 09 1993 00:246
    The DECdirect order I placed last week was processed immediately and I
    received the goods (floppies) within a couple of days.  So.......if
    it's broke, please do not fix it. :-)
    
    Dave 
    
2459.3RCOCER::MICKOLD-FENSFri Apr 09 1993 02:5210
The Field Admin (FOCUS, AQS) cluster (GREAT1::) for the northeast has been up
and down since last weekend. More down than up. I think they upgraded AQS
(Automated Quote System) last weekend. It has caused a fair amount of hassle
for us field types. I've got half a dozen quotes in the queue I can't get to
customers. I thought AQS was the only application affected, but I may be
wrong. I'm sure some heads will roll over this one. 

Anyway, when customers see our lead times for PCs, they will probably assume
something is seriously broken... 

2459.4DPDMAI::DAWSONt/hs+ws=Formula for the futureFri Apr 09 1993 09:196
    
    		I am not sure of the status right now but tuesday and
    wednesday, the system for DECDirect was down.  Maybe it still is.
    
    
    Dave
2459.5GREAT1 IS down....ODIXIE::SCRIVENFri Apr 09 1993 10:5313
    
    DecDirect and PCBYDEC's GREAT1 (Focus application) has been down since
    Tuesday Afternoon.  At least they have been UNABLE to process any
    orders or ship trace requests.  
    
    I'm sitting on about 10 orders that cannot be booked in the field.  I
    wonder what their excuse is.  the FOCUS systems in the field are
    working OK to the best of my knowledge......
    
    Turns those 45 to 90 day lead times on PC's into 90 to 120 I bet.  Just
    what we need.....
    
    Toodles.....JP
2459.6I placed a DECDirect Order 4/8/93 1530 HrsMEMIT::YOUNG_JFri Apr 09 1993 11:186
    I don't know about the systems _behind_ ordering, but I called
    DECDirect yesterday around 3:30 pm EDT and placed my orders.  I'm sure
    my customer response contact was in _some_ kind of system, 'cause he
    had to look up part numbers for a new item ...... and he found 'em.
    
    ... Maybe my order went in during one of the _up_ times???
2459.7DECdirect is upRTL::LAPINEFri Apr 09 1993 11:242
The DECdirect system is up now.  Apparently (finally) came up last night.

2459.8NETWKS::GASKELLFri Apr 09 1993 13:265
    When I called DECdirect yesterday morning they were asking in-house
    orders to call again as they were having system problems.  I am
    assuming they wanted to concentrate on customer orders first.
    
    They seemed to be up and running this morning when I called.
2459.9HP, not DEC...(?)35261::ROGERSFri Apr 09 1993 15:3717
    This whole thing sounds like it might be confusing us with HP.  mIt has
    been widely publicized that their revenues have suffered because all
    their admin systems have broken down -- they "outgrew" their capacity. 
    It has been discussed by Wall Street, and HP had to make a public
    admission.  
    
    The most recent mention in Computerworld, as I seem to recall, is that
    HP has a new, ground-up redesigned Master System that should be ready
    "real soon now."
    
    Maybe the internet author got us confused with HP?  Maybe their
    outdated, overloaded system HAS been down for 11 days?  If so, someone
    should send him a correction over internet.
    
    We aren't the only ones that have problems.  Ours might be stodgy and
    limited in flexibility, but at least they seem to work (mostly).
                                                                    
2459.10"Informed Customers" takes on a new meaning these daysAUSTIN::UNLANDDigitus ImpudicusFri Apr 09 1993 20:4014
    I don't believe that the author confused us with HP.  In talking to
    my HP counterpart, they haven't had any recognizable system outage
    in the past few days.  On the other hand, I *do* know customers who
    have called the local office this week because of issues with PCBYDEC
    and DECdirect, and the local reps have had many problems trying to
    get quotes out because of system problems.  
    
    I do think the Internet item suffers a bit from hyperbole and alarmism,
    but no more than expected.  People shouldn't be amazed when word gets
    around so fast.  Computer customers are *very* sensitive these days to
    glitches in performance by the vendors.  Many PC and Mini vendors are
    hanging by a thread, and the customer know it.
    
    Geoff
2459.11The *straight* scoopODAY40::USLSAT::FRICCHIONERick Fricchione (MRO1-1/L87/297-2573)Mon Apr 12 1993 17:5257
These are the facts... (I was one of the people working on the problem)

The Ordering and Selling systems across the US underwent a fairly major upgrade
beginning about a week ago.  It was primarily in support of the new DPP
(Digital Pricing Program) program (major pricing and discounting changes,
a new discounting system, etc) but provides lots of other fixes and changes
as well.  As much as 30-40 percent of the code changed in some systems.
This was up to 1000 plus changed source units. 

This release was known as CTPS V4.0 (Customer Transaction Processing Systems)
and was probably the most tested release ever.  However, as Murphy would have
it, some technology issues at the layered product interaction level caused
some down time which affected systems like the Electronic Store, DECdirect,
PC Direct and field sales order administration (AQS, FOCUS, etc).  Most of
these technology issues were not something that you test for in the RTE/DTM
sense.  This release was probably the most tested release of any we had done.
There are some issues however, that only seemed to appear when 700 to 900 sessions
of the application are running in a cluster with lots of other demands being
placed on it.  No amount of RTE test scripts and load testing can simulate it.
No application bugs caused it.  It was all at the RDB and VMS interaction level.

The basic technology issues boiled down to VMS lock remastering, RDB 4.0A, the
occurence of FREEZELOCKS, MEMBIT locks, and a few other things which didn't
show up because they are basically results of exception conditions and 
differences in application initialization.  Its a bit complex to get into here,
but they go away with RDB V4.1A.  We were locked into RDB V4.0A (V4.2 is SSB)
because of a common layered product upgrade process in the US called
PASE which means *everyone* has to go to V4.x at the same time. This  puts
us behind SSB by at least six months or more. The problem
symptoms were database servers which started up fine, serviced requests fine
(for a while), but then went into a HIB state and stayed there.  You basically
had to wait for VMS and RDB to decide whether this would occur or not.  You
couldn't tell for a while.  Obviously that makes it a tough call as to whether
to use the system or not.  

After some "interesting" moments, we apparently found that "by standing on
one leg and holding the TV antennae over our head", we got things to 
initialize and not go into a HIB state.  We had some good help from RDB
engineering, and we had some strong people of our own working on it as well.
Trust me.  We had no lack of "management support and attention" in getting 
this fixed.

The systems are *UP*, and have been that way for several days.  At no time
were the systems unavailable for more than a few hours.  The timing of the
installation (first weeks of the fiscal quarter) was deliberate.  If you
are going to risk down time. Do it then.  Orders were always flowing from
the entry systems to the fulfillment sites (thats why the floppies came)
but we deliberately held downstream feeds for a few days to make a possible
rollback easier if it came to that.  They are on-line for several days now.

If anyone has any questions on this, please send me mail directly.  There is
nothing in the above that really could not be said to customers, but we should
be careful of the spin we put on it, as well as just forwarding bits and pieces
of information onto the INTERNET or elseware.  

Rick

2459.12But I digressFUNYET::ANDERSONOpenVMS Forever!Mon Apr 12 1993 18:599
This is another example of the problems that can be caused by being forced to
run old software by CVMS or PASE or whatever.

The people who run most of Digital's production machines unfortunately adhere to
this policy which, I believe, causes more harm than good.  Rdb at V4.0A (two
versions back) is nothing compared to all those IM&T VTX servers still at VTX
V4.1!

Paul
2459.13RCOCER::MICKOLD-FENSMon Apr 12 1993 22:557
Re: .11: I beg to differ with you, but AQS on GREAT1 was down for much of the 
         past week. I tried to use it frequently and it was rarely available
	 and stable. I had a bunch of quotes queued up ready to be entered and
	 it wasn't until this monring at 4am that the problems got fixed.

Jim

2459.14Whoa!LABRYS::CONNELLYNetwork partner excitedMon Apr 12 1993 23:2147
re: .12

>This is another example of the problems that can be caused by being forced to
>run old software by CVMS or PASE or whatever.

I have to take (what i hope is a mild-tempered) exception to that, being one
of the folks who works on CVMS and has seen the PASE process at work.

I can't respond to .11, since i don't know the facts of this particular
problem, but i will say that the software contents of PASE (the Production
Applications Support Environments) are agreed upon by the developers of DEC's
internal applications (including Mr. Fricchione's group).  Why have common
software environments for applications?  Basically because we can't afford to
have dedicated hardware and software for each individual business application
needed by DEC.  We have to share hardware, especially out in the data centers
beyond the pale of GMA.  If application developers can't count on there being
a standard software environment on each system that they install on, it will
be a crap shoot as to whether the applications work from one site to the next.
Especially when multiple different business application groups in different
chains of command may be targeting their software for the same machine.

To avoid mass chaos in the implementation of important business applications,
the applications developers have a Product Architecture Committee where they
jointly decide on what the common (PASE) software environment will be.  In
some cases this will mean that the last application development group ready
to go forward to a new version of a layered product (like RDB) will hold up
all the other developers.  Not very pretty, but the choice is always stated
in terms of "break the business?"

Another comment: yes, RDB V4.0A has bugs.  The latest version of RDB has
bugs too--i'll guarantee that.  This is supposed to be production grade
software and it's bent and twisted out of shape by customers like Pfizer et
al. far more extremely than it is by DEC's mundane IS applications.  There
isn't much you can do to avoid these bugs, just hope that the power-users
uncover them first.  If anything, staying with an older version and applying
patches for known bugs to it should be safer than jumping to the latest and
greatest "bleeding edge" version.

IMHO, DEC desparately needs a strong CIO with authority over both the data
center/network infrastructure and ALL applications development.  We've been
operating for years in a twilight zone where applications software and data
have been "owned" by the sponsoring business while the IS infrastructure
has been quasi-independent but beholden to all these "special interests".
I had been hopeful that Bob Palmer was going to fix this, but the latest
news on that score has not been very encouraging.
								- paul
2459.15ROWLET::AINSLEYLess than 150 kts. is TOO slow!Mon Apr 12 1993 23:486
    re: .11
    
    Thanks for setting the record straight.  To summarize the gory details,
    it sounds like it was a matter of system and application tuning.
    
    Bob
2459.16CVMS OK with meCSOADM::ROTHyou just KEEP ME hangin&#039; on...Tue Apr 13 1993 08:4311
I'll defend CVMS as well.

In one of my previous forms I was a systems/application jockey for a
business-critical application. Having CVMS as a base actually FORCED
those that were developing/maintaining the application to run on a
version of VMS and layered products that were reasonably close to
current... prior to that, they would lag behind clinging to that 'oldie
but goodie' release of VMS. (e.g. was still running V3.x of VMS more
than a year after 4.x came out)

Lee (who just dated himself a bit)
2459.17TOMK::KRUPINSKISlave of the Democratic PartyTue Apr 13 1993 18:538
	re .11

	See TPSYS::FORMAL_INSPECTION for a method that will allow you to 
	detect and eliminate may of those problems that cannot be found
	via testing.

					Tom_K

2459.18PASE/CVMS is not the issue.ODAY40::FRICCHIONERick Fricchione (MRO1-1/297-2573)Wed Apr 14 1993 08:0421
    Since people chose to interpret my note as a "blame PASE" note and not
    as a "here's the factcs" note, let *ME* set the record straight.
    
    1.  All US IM&T organizations are committed to PASE/CVMS as a process.
        It works.  My group is committed to it.
    
    2.  The characteristics of RDB V4.0a are such that it was not a tuning
        issue or application performance issue.  It was basically that when
        you did x before y in the startup of the monitors, opening of the
        database, firing up of the servers, etc, it didn't work in a high
        load situation.   We ran up to 500 sessions using RTE in a test
        environment to simulate load and we didn't run into it.  We now
        know how to simulate the situation and can test for it.  We do not
        have this problem with V4.0a.
    
    I don't want to get into a PASE/CVMS discussion.  Thats not the issue
    here.  The issue is that there were problems, we believe we have 
    addressed them until RDB V4.1 is implemented in these sites.
    
    Rick
    
2459.19How many times does history have to repeat itself ?PARITY::FAHERTYWed Apr 14 1993 18:5741
Truth is, I think this particular Rdb problem has surfaced time and time again
over the last 2 years.  I'd have to look through my old mail, but I think I've
personally helped resolve the problem for at least two projects, one in what
was at the time Al Aucoin's group, and have heard of several other projects
that resolved the problem themselves. 

Unfortunately, our system at that time, and even still, tends to reward lone
wolves who fight fires by themselves in a vacuum, rather than putting an
emphasis on good, documented engineering process, and encouraging and rewarding
such things as defect prevention, idea/solution/experience sharing, and
continuous process improvement.  When those lone wolves move on, the knowledge
they have inside their heads about the possibility, characteristics, and
solutions to such things as this Rdb problem, goes with them.  I believe the
situation described in the last paragraph of .14 is the root cause of this (the
lack of sharing and collaboration). 

This seems to me to be a glaring example of why we need to fully and
consistently embrace a mature, comprehensive software improvement model such as
the SEI Capability Maturity Model.  The SEI model, which looks at software
organizations in terms of 5 levels of successive maturity, characterizes the
lowest level of maturity as being one where the success of the organization
relies on the strengths of individuals, rather than on the strengths of the
process.  As your organization moves up the levels of maturity, the emphasis
shifts to the process, and your process becomes stronger, more refined, and
more complete.

Both the SEI model and the ISO 9000 standard also emphasize the importance of
putting controls in place to assure adequate and known quality of the products
and services provided by your subcontractors and upstream suppliers. 

Fortunately, for some, I think things will begin to get better.  Ricks group,
for example, is getting very serious about quality (independantly of this
problem), are in the process of developing and implementing mechanisms and
processes which capture and leverage experience and learnings, and will be
looking into the possibility of applying the SEI model. 

It's too bad these things weren't in place 2 years ago, might have prevented
this problem from ever occurring again, and at least would have saved a lot of
redundant problem solving. 

John Faherty
2459.20IMHO not!ELWOOD::LANEHalf of everything is below averageThu Apr 15 1993 09:2423
No comments on the Rdb problem but I will commant on your implication
that one or more gifted people working as individuals are at the lower
end of the food chain while a comprehensive organization with proceedures
and processes is at the top.

If all you're interested in is quality, then perhaps you're right.
But a quality what?  Compare the languages C and ADA.

C was invented by three guys who's names escape me at the moment (or
was that the transistor?) and ADA was invented by everybody and their
mother-in-law.

As a language, ADA has a much better quality than C. (Just what does
"char (*(*x())[])()" define, anyway?) but what's preferred? And why?

Quality is a property of something, not the result of some process
or proceedure. Individuals can do extreamly high quality work and
huge, highly structured organizations can produce junk although I'll
agree that this is usually the exception. On the other hand, individuals
usually produce innovative things while huge organizations usually produce
nothing.

Mickey.
2459.21Roger, Roger... Over, Over... What's our vector, Victor?GOTIT::harleyPay no attention to that man behind the curtain...Thu Apr 15 1993 12:379
I still want to know what the heck a

"technology issue at the layered product interaction level"

is...

Is that anything like calling a bug a "previously undocumented feature"?

/harley
2459.22IMHO, way !PARITY::FAHERTYThu Apr 15 1993 13:5181
Re: .20:

I'll respond to reply 20, and then get off my soap box in this particular
conference and note, since I think we may be veering too far away from
the specific issue.

> No comments on the Rdb problem but I will commant on your implication
> that one or more gifted people working as individuals are at the lower
> end of the food chain while a comprehensive organization with proceedures
> and processes is at the top.

First of all, we're not talking food-chain here.  We're talking survival.

Second, the SEI model is not about comparison between organizations or
individuals, as I think you are implying.  Rather, it is a tool for you to use
to determine where your organization is, where you want it to be, and how to
get there, in a gradual, least-cost, least-risk fashion.  It's about
organizational AND individual growth.  You own the data about where you're at
and going, because you own the process of getting there.  Similar to a career
planning guide for individuals, the SEI CMM could be viewed as a growth
planning guide for software organizations (explicitly) and individuals
(implicitly).  All too many people initially view the model the way you seem to
have, both those unfamiliar with the SEI or other improvement models, as well
as those who have incorrectly applied the model (because they came at it with a
similar perspective as yours). 

Third, I'd put hiring and supporting good people at the top of the list, before
process, of important ingredients of "world-class" software organizations, but
process would be a close second, in order to be able to optimize the work of
those good people.  I think most software improvement leaders and experts,
including those at the SEI, would agree with this.

Fourth, doesn't it make sense to put mechanisms in place to leverage the good
ideas and solutions of those gifted people ? 

> 
> If all you're interested in is quality, then perhaps you're right.
> But a quality what?  Compare the languages C and ADA.
> 
> C was invented by three guys who's names escape me at the moment (or
> was that the transistor?) and ADA was invented by everybody and their
> mother-in-law.
> 
> As a language, ADA has a much better quality than C. (Just what does
> "char (*(*x())[])()" define, anyway?) but what's preferred? And why?

Precisely !  One of the benefits of a quality system based on a proven
model is that you have the best chance of those questions getting asked
in the first place, and answered for all to know.

> Quality is a property of something, not the result of some process
> or proceedure. Individuals can do extreamly high quality work and
> huge, highly structured organizations can produce junk although I'll
> agree that this is usually the exception. On the other hand, individuals
> usually produce innovative things while huge organizations usually produce
> nothing.

One way (certainly not the only way) of viewing quality: a system built by,
from, and in support of an organization of individuals with quality attitudes
who want to prevent problems from occurring in the first place, never make the
same mistakes twice, and always repeat successes.

Here are a couple of interesting quotes from Bill Curtis of the SEI that I
think are somewhat pertinent:

             <<< TPSYS::SYS$SYSDEVICE:[NOTES$LIBRARY]SEPF.NOTE;1 >>>
                                   -< SEPF >-
================================================================================
Note 17.7                          Boston SPIN                            7 of 7
TOHOKU::TAYLOR "e-mail is the ethernet of the 90s"    9 lines  28-MAR-1993 17:40
                  -< 2 quotes by Dr. Bill Curtis of the SEI >-
--------------------------------------------------------------------------------
    RE: Boston SPIN meeting 19-JAN-1993, talk by Dr. Bill Curtis of the SEI
    
    I found two interesting quotes in my notes:
    
    "Large projects are bus sensitive.
    If a bus hits the lead person, the project dies." 
    
    "Process maturity lets you go home at night,"
    because there is no overtime required.
2459.23a few more pointsODAY40::FRICCHIONERick Fricchione (MRO1-1/297-2573)Thu Apr 15 1993 23:5056
    I have *NO* idea what some of the previous replies have to do with the
    order processing problem we experienced.  I'd suggest taking ideas on
    who the next CIO should be, development methodologies, and hindsight in
    general to the SOAPBOX notes file (off hours).  I really don't have the
    energy for it.   It oversimplifies things and has little to do with 
    the original note.  
    
    The intent was to let people know what was going on since it seemed to 
    have some exposure internally and externally.  Lets not mix the 
    religious cable channels with CNN (please).
    
    A few RELEVANT points:
    
    1.  The problems still occur and will still occur until we go to RDB
        V4.1.   We know how to deal with them now though so the impacts are
        minimized.  Still there, but minimized. Planned upgrade: this
    	weekend.     
    
    2.  "product interaction level" is management speak :-).  To be honest
        all we know is that dynamic lock remastering, RDB V4.0A, and VMS
        V5.5-1 in this particular situation/load generate these problems. We
        understand how to prevent them at startup, but basically everytime
        the cluster undergoes a state transition (as happened today:
        $#@$%#@# node crash) VMS lock management and RDB send our database
    	servers FREEZELOCKS which seem never to free up.  Also, under certain 
        conditions RDB V4.0A in this environment sends these locks when
        a database recovery is performed (even if someone just CTRL/Ys
        out of an interactive SQL read transaction).  $DELPRC same thing. 
    
    3.  We are also experiencing system performance issues due to a
        completely changed (at least it seems that way) application
        profile.  Again, we probably could have done a better job of 
        characterization here, but hindsight is 20-20.  We are working on
        that as well.  Nothing that a few 7620s couldnt fix.  Tuning is
        progressing but you need data for that and that takes time.  
        Giving everyone DECwindows terminals in the last year and some
    	group consolidations into this cluster basically tripled the 
    	number of sessions to 1200-1500 simultaneous and thats pretty 
    	"challenging". 
    
    4.  We didn't go off and lone wolf this.  We worked with Colorado, RDB
    	engineering and all the organizations who we believed could add
    	value at the time.  Lots of people had seen similar situations
    	before. We had too. Few can fix it.  Are these the wrong people?  
    
    5.  I stand corrected on the uptime statement I made.  There was some
    	additional downtime before I and others got involved.  I don't know
    	how much though, but the system was in for only 2 days at that
    	point.  No where near the 11 days that someone stated.  That part
    	is clearly wrong.
    
    We continue to be up, processing orders and taking calls.  We are
    having a bumpy implementation due to these problem but we believe they
    will be behind us soon.   
    
    Rick