[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::digital

Title:	The Digital way of working

Moderator:	QUARK::LIONELON

Created:	Fri Feb 14 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	5321
Total number of notes:	139771

2459.0. "Customer rumor about DEC order entry system" by QUARK::LIONEL (Free advice is worth every cent) Thu Apr 08 1993 21:19

    I found the following on the Internet this evening.   Given that I've
    not heard a peep about anything like this, I'm inclined to write it
    off as a vicious rumor, but I'm sure many customers will believe it.
    Can anyone authoritatively confirm or deny this rumor?
    
    					Steve
    
Article: 10643
Path: dbased.nuo.dec.com!news.crl.dec.com!deccrl!decwrl!decwrl!olivea!uunet!cs.utexas.edu!zaphod.mps.ohio-state.edu!usc!news.service.uci.edu!unogate!mvb.saic.com!info-vax
From: [email protected] (Phil Rand)
Newsgroups: comp.os.vms
Subject: DEC order-entry down nationwide
Message-ID: <Pine.3.05.9304071251.A23536-a100000@paul>
Date: 7 Apr 93 19:03:51 GMT
Organization: Info-Vax<==>Comp.Os.Vms Gateway
Lines: 19
X-Gateway-Source-Info: Mailing List
 
I hear (3rd hand) that DEC's order entry application is down nationwide
(USA), this now being the 11th day.  Apparently they tried some kind of
system upgrade (what kind I don't know) and ran into trouble.  They're
back to processing orders on paper, and you can forget about tracking ship
dates. 
 
Does anybody know details?
 
Didn't SUN have a glitch something like this back in the 80's and nearly
put the company in the toilet?  I'm surprised this hasn't hit the Wall
Street Journal.  (Or maybe it has--I haven't seen today's.)
 
Makes you feel real confident about DEC's system integration skills...
 
--Phil
 
// Phil Rand                                   [email protected]
// Computer & Information Systems                  (206) 281-2428
// Seattle Pacific University, 3307 3rd Ave W, Seattle, WA  98119

T.R	Title	User	Personal Name	Date	Lines
2459.1	WHERE DO WE KEEP THOSE TAPES????	GJOVAX::SEVIC		`Thu Apr 08 1993 22:14`	2
	If true sounds like nothing a restore of the old software couldn't handle.
2459.2		SOLVIT::REDZIN::DCOX		`Fri Apr 09 1993 00:24`	6
	The DECdirect order I placed last week was processed immediately and I received the goods (floppies) within a couple of days. So.......if it's broke, please do not fix it. :-) Dave
2459.3		RCOCER::MICKOL	D-FENS	`Fri Apr 09 1993 02:52`	10
	The Field Admin (FOCUS, AQS) cluster (GREAT1::) for the northeast has been up and down since last weekend. More down than up. I think they upgraded AQS (Automated Quote System) last weekend. It has caused a fair amount of hassle for us field types. I've got half a dozen quotes in the queue I can't get to customers. I thought AQS was the only application affected, but I may be wrong. I'm sure some heads will roll over this one. Anyway, when customers see our lead times for PCs, they will probably assume something is seriously broken...
2459.4		DPDMAI::DAWSON	t/hs+ws=Formula for the future	`Fri Apr 09 1993 09:19`	6
	I am not sure of the status right now but tuesday and wednesday, the system for DECDirect was down. Maybe it still is. Dave
2459.5	GREAT1 IS down....	ODIXIE::SCRIVEN		`Fri Apr 09 1993 10:53`	13
	DecDirect and PCBYDEC's GREAT1 (Focus application) has been down since Tuesday Afternoon. At least they have been UNABLE to process any orders or ship trace requests. I'm sitting on about 10 orders that cannot be booked in the field. I wonder what their excuse is. the FOCUS systems in the field are working OK to the best of my knowledge...... Turns those 45 to 90 day lead times on PC's into 90 to 120 I bet. Just what we need..... Toodles.....JP
2459.6	I placed a DECDirect Order 4/8/93 1530 Hrs	MEMIT::YOUNG_J		`Fri Apr 09 1993 11:18`	6
	I don't know about the systems _behind_ ordering, but I called DECDirect yesterday around 3:30 pm EDT and placed my orders. I'm sure my customer response contact was in _some_ kind of system, 'cause he had to look up part numbers for a new item ...... and he found 'em. ... Maybe my order went in during one of the _up_ times???
2459.7	DECdirect is up	RTL::LAPINE		`Fri Apr 09 1993 11:24`	2
	The DECdirect system is up now. Apparently (finally) came up last night.
2459.8		NETWKS::GASKELL		`Fri Apr 09 1993 13:26`	5
	When I called DECdirect yesterday morning they were asking in-house orders to call again as they were having system problems. I am assuming they wanted to concentrate on customer orders first. They seemed to be up and running this morning when I called.
2459.9	HP, not DEC...(?)	35261::ROGERS		`Fri Apr 09 1993 15:37`	17
	This whole thing sounds like it might be confusing us with HP. mIt has been widely publicized that their revenues have suffered because all their admin systems have broken down -- they "outgrew" their capacity. It has been discussed by Wall Street, and HP had to make a public admission. The most recent mention in Computerworld, as I seem to recall, is that HP has a new, ground-up redesigned Master System that should be ready "real soon now." Maybe the internet author got us confused with HP? Maybe their outdated, overloaded system HAS been down for 11 days? If so, someone should send him a correction over internet. We aren't the only ones that have problems. Ours might be stodgy and limited in flexibility, but at least they seem to work (mostly).
2459.10	"Informed Customers" takes on a new meaning these days	AUSTIN::UNLAND	Digitus Impudicus	`Fri Apr 09 1993 20:40`	14
	I don't believe that the author confused us with HP. In talking to my HP counterpart, they haven't had any recognizable system outage in the past few days. On the other hand, I do know customers who have called the local office this week because of issues with PCBYDEC and DECdirect, and the local reps have had many problems trying to get quotes out because of system problems. I do think the Internet item suffers a bit from hyperbole and alarmism, but no more than expected. People shouldn't be amazed when word gets around so fast. Computer customers are very sensitive these days to glitches in performance by the vendors. Many PC and Mini vendors are hanging by a thread, and the customer know it. Geoff
2459.11	The straight scoop	ODAY40::USLSAT::FRICCHIONE	Rick Fricchione (MRO1-1/L87/297-2573)	`Mon Apr 12 1993 17:52`	57
	These are the facts... (I was one of the people working on the problem) The Ordering and Selling systems across the US underwent a fairly major upgrade beginning about a week ago. It was primarily in support of the new DPP (Digital Pricing Program) program (major pricing and discounting changes, a new discounting system, etc) but provides lots of other fixes and changes as well. As much as 30-40 percent of the code changed in some systems. This was up to 1000 plus changed source units. This release was known as CTPS V4.0 (Customer Transaction Processing Systems) and was probably the most tested release ever. However, as Murphy would have it, some technology issues at the layered product interaction level caused some down time which affected systems like the Electronic Store, DECdirect, PC Direct and field sales order administration (AQS, FOCUS, etc). Most of these technology issues were not something that you test for in the RTE/DTM sense. This release was probably the most tested release of any we had done. There are some issues however, that only seemed to appear when 700 to 900 sessions of the application are running in a cluster with lots of other demands being placed on it. No amount of RTE test scripts and load testing can simulate it. No application bugs caused it. It was all at the RDB and VMS interaction level. The basic technology issues boiled down to VMS lock remastering, RDB 4.0A, the occurence of FREEZELOCKS, MEMBIT locks, and a few other things which didn't show up because they are basically results of exception conditions and differences in application initialization. Its a bit complex to get into here, but they go away with RDB V4.1A. We were locked into RDB V4.0A (V4.2 is SSB) because of a common layered product upgrade process in the US called PASE which means everyone has to go to V4.x at the same time. This puts us behind SSB by at least six months or more. The problem symptoms were database servers which started up fine, serviced requests fine (for a while), but then went into a HIB state and stayed there. You basically had to wait for VMS and RDB to decide whether this would occur or not. You couldn't tell for a while. Obviously that makes it a tough call as to whether to use the system or not. After some "interesting" moments, we apparently found that "by standing on one leg and holding the TV antennae over our head", we got things to initialize and not go into a HIB state. We had some good help from RDB engineering, and we had some strong people of our own working on it as well. Trust me. We had no lack of "management support and attention" in getting this fixed. The systems are UP, and have been that way for several days. At no time were the systems unavailable for more than a few hours. The timing of the installation (first weeks of the fiscal quarter) was deliberate. If you are going to risk down time. Do it then. Orders were always flowing from the entry systems to the fulfillment sites (thats why the floppies came) but we deliberately held downstream feeds for a few days to make a possible rollback easier if it came to that. They are on-line for several days now. If anyone has any questions on this, please send me mail directly. There is nothing in the above that really could not be said to customers, but we should be careful of the spin we put on it, as well as just forwarding bits and pieces of information onto the INTERNET or elseware. Rick
2459.12	But I digress	FUNYET::ANDERSON	OpenVMS Forever!	`Mon Apr 12 1993 18:59`	9
	This is another example of the problems that can be caused by being forced to run old software by CVMS or PASE or whatever. The people who run most of Digital's production machines unfortunately adhere to this policy which, I believe, causes more harm than good. Rdb at V4.0A (two versions back) is nothing compared to all those IM&T VTX servers still at VTX V4.1! Paul
2459.13		RCOCER::MICKOL	D-FENS	`Mon Apr 12 1993 22:55`	7
	Re: .11: I beg to differ with you, but AQS on GREAT1 was down for much of the past week. I tried to use it frequently and it was rarely available and stable. I had a bunch of quotes queued up ready to be entered and it wasn't until this monring at 4am that the problems got fixed. Jim
2459.14	Whoa!	LABRYS::CONNELLY	Network partner excited	`Mon Apr 12 1993 23:21`	47
	re: .12 >This is another example of the problems that can be caused by being forced to >run old software by CVMS or PASE or whatever. I have to take (what i hope is a mild-tempered) exception to that, being one of the folks who works on CVMS and has seen the PASE process at work. I can't respond to .11, since i don't know the facts of this particular problem, but i will say that the software contents of PASE (the Production Applications Support Environments) are agreed upon by the developers of DEC's internal applications (including Mr. Fricchione's group). Why have common software environments for applications? Basically because we can't afford to have dedicated hardware and software for each individual business application needed by DEC. We have to share hardware, especially out in the data centers beyond the pale of GMA. If application developers can't count on there being a standard software environment on each system that they install on, it will be a crap shoot as to whether the applications work from one site to the next. Especially when multiple different business application groups in different chains of command may be targeting their software for the same machine. To avoid mass chaos in the implementation of important business applications, the applications developers have a Product Architecture Committee where they jointly decide on what the common (PASE) software environment will be. In some cases this will mean that the last application development group ready to go forward to a new version of a layered product (like RDB) will hold up all the other developers. Not very pretty, but the choice is always stated in terms of "break the business?" Another comment: yes, RDB V4.0A has bugs. The latest version of RDB has bugs too--i'll guarantee that. This is supposed to be production grade software and it's bent and twisted out of shape by customers like Pfizer et al. far more extremely than it is by DEC's mundane IS applications. There isn't much you can do to avoid these bugs, just hope that the power-users uncover them first. If anything, staying with an older version and applying patches for known bugs to it should be safer than jumping to the latest and greatest "bleeding edge" version. IMHO, DEC desparately needs a strong CIO with authority over both the data center/network infrastructure and ALL applications development. We've been operating for years in a twilight zone where applications software and data have been "owned" by the sponsoring business while the IS infrastructure has been quasi-independent but beholden to all these "special interests". I had been hopeful that Bob Palmer was going to fix this, but the latest news on that score has not been very encouraging. - paul
2459.15		ROWLET::AINSLEY	Less than 150 kts. is TOO slow!	`Mon Apr 12 1993 23:48`	6
	re: .11 Thanks for setting the record straight. To summarize the gory details, it sounds like it was a matter of system and application tuning. Bob
2459.16	CVMS OK with me	CSOADM::ROTH	you just KEEP ME hangin' on...	`Tue Apr 13 1993 08:43`	11
	I'll defend CVMS as well. In one of my previous forms I was a systems/application jockey for a business-critical application. Having CVMS as a base actually FORCED those that were developing/maintaining the application to run on a version of VMS and layered products that were reasonably close to current... prior to that, they would lag behind clinging to that 'oldie but goodie' release of VMS. (e.g. was still running V3.x of VMS more than a year after 4.x came out) Lee (who just dated himself a bit)
2459.17		TOMK::KRUPINSKI	Slave of the Democratic Party	`Tue Apr 13 1993 18:53`	8
	re .11 See TPSYS::FORMAL_INSPECTION for a method that will allow you to detect and eliminate may of those problems that cannot be found via testing. Tom_K
2459.18	PASE/CVMS is not the issue.	ODAY40::FRICCHIONE	Rick Fricchione (MRO1-1/297-2573)	`Wed Apr 14 1993 08:04`	21
	Since people chose to interpret my note as a "blame PASE" note and not as a "here's the factcs" note, let ME set the record straight. 1. All US IM&T organizations are committed to PASE/CVMS as a process. It works. My group is committed to it. 2. The characteristics of RDB V4.0a are such that it was not a tuning issue or application performance issue. It was basically that when you did x before y in the startup of the monitors, opening of the database, firing up of the servers, etc, it didn't work in a high load situation. We ran up to 500 sessions using RTE in a test environment to simulate load and we didn't run into it. We now know how to simulate the situation and can test for it. We do not have this problem with V4.0a. I don't want to get into a PASE/CVMS discussion. Thats not the issue here. The issue is that there were problems, we believe we have addressed them until RDB V4.1 is implemented in these sites. Rick
2459.19	How many times does history have to repeat itself ?	PARITY::FAHERTY		`Wed Apr 14 1993 18:57`	41
	Truth is, I think this particular Rdb problem has surfaced time and time again over the last 2 years. I'd have to look through my old mail, but I think I've personally helped resolve the problem for at least two projects, one in what was at the time Al Aucoin's group, and have heard of several other projects that resolved the problem themselves. Unfortunately, our system at that time, and even still, tends to reward lone wolves who fight fires by themselves in a vacuum, rather than putting an emphasis on good, documented engineering process, and encouraging and rewarding such things as defect prevention, idea/solution/experience sharing, and continuous process improvement. When those lone wolves move on, the knowledge they have inside their heads about the possibility, characteristics, and solutions to such things as this Rdb problem, goes with them. I believe the situation described in the last paragraph of .14 is the root cause of this (the lack of sharing and collaboration). This seems to me to be a glaring example of why we need to fully and consistently embrace a mature, comprehensive software improvement model such as the SEI Capability Maturity Model. The SEI model, which looks at software organizations in terms of 5 levels of successive maturity, characterizes the lowest level of maturity as being one where the success of the organization relies on the strengths of individuals, rather than on the strengths of the process. As your organization moves up the levels of maturity, the emphasis shifts to the process, and your process becomes stronger, more refined, and more complete. Both the SEI model and the ISO 9000 standard also emphasize the importance of putting controls in place to assure adequate and known quality of the products and services provided by your subcontractors and upstream suppliers. Fortunately, for some, I think things will begin to get better. Ricks group, for example, is getting very serious about quality (independantly of this problem), are in the process of developing and implementing mechanisms and processes which capture and leverage experience and learnings, and will be looking into the possibility of applying the SEI model. It's too bad these things weren't in place 2 years ago, might have prevented this problem from ever occurring again, and at least would have saved a lot of redundant problem solving. John Faherty
2459.20	IMHO not!	ELWOOD::LANE	Half of everything is below average	`Thu Apr 15 1993 09:24`	23
	No comments on the Rdb problem but I will commant on your implication that one or more gifted people working as individuals are at the lower end of the food chain while a comprehensive organization with proceedures and processes is at the top. If all you're interested in is quality, then perhaps you're right. But a quality what? Compare the languages C and ADA. C was invented by three guys who's names escape me at the moment (or was that the transistor?) and ADA was invented by everybody and their mother-in-law. As a language, ADA has a much better quality than C. (Just what does "char ((x())[])()" define, anyway?) but what's preferred? And why? Quality is a property of something, not the result of some process or proceedure. Individuals can do extreamly high quality work and huge, highly structured organizations can produce junk although I'll agree that this is usually the exception. On the other hand, individuals usually produce innovative things while huge organizations usually produce nothing. Mickey.
2459.21	Roger, Roger... Over, Over... What's our vector, Victor?	GOTIT::harley	Pay no attention to that man behind the curtain...	`Thu Apr 15 1993 12:37`	9
	I still want to know what the heck a "technology issue at the layered product interaction level" is... Is that anything like calling a bug a "previously undocumented feature"? /harley
2459.22	IMHO, way !	PARITY::FAHERTY		`Thu Apr 15 1993 13:51`	81
	Re: .20: I'll respond to reply 20, and then get off my soap box in this particular conference and note, since I think we may be veering too far away from the specific issue. > No comments on the Rdb problem but I will commant on your implication > that one or more gifted people working as individuals are at the lower > end of the food chain while a comprehensive organization with proceedures > and processes is at the top. First of all, we're not talking food-chain here. We're talking survival. Second, the SEI model is not about comparison between organizations or individuals, as I think you are implying. Rather, it is a tool for you to use to determine where your organization is, where you want it to be, and how to get there, in a gradual, least-cost, least-risk fashion. It's about organizational AND individual growth. You own the data about where you're at and going, because you own the process of getting there. Similar to a career planning guide for individuals, the SEI CMM could be viewed as a growth planning guide for software organizations (explicitly) and individuals (implicitly). All too many people initially view the model the way you seem to have, both those unfamiliar with the SEI or other improvement models, as well as those who have incorrectly applied the model (because they came at it with a similar perspective as yours). Third, I'd put hiring and supporting good people at the top of the list, before process, of important ingredients of "world-class" software organizations, but process would be a close second, in order to be able to optimize the work of those good people. I think most software improvement leaders and experts, including those at the SEI, would agree with this. Fourth, doesn't it make sense to put mechanisms in place to leverage the good ideas and solutions of those gifted people ? > > If all you're interested in is quality, then perhaps you're right. > But a quality what? Compare the languages C and ADA. > > C was invented by three guys who's names escape me at the moment (or > was that the transistor?) and ADA was invented by everybody and their > mother-in-law. > > As a language, ADA has a much better quality than C. (Just what does > "char ((x())[])()" define, anyway?) but what's preferred? And why? Precisely ! One of the benefits of a quality system based on a proven model is that you have the best chance of those questions getting asked in the first place, and answered for all to know. > Quality is a property of something, not the result of some process > or proceedure. Individuals can do extreamly high quality work and > huge, highly structured organizations can produce junk although I'll > agree that this is usually the exception. On the other hand, individuals > usually produce innovative things while huge organizations usually produce > nothing. One way (certainly not the only way) of viewing quality: a system built by, from, and in support of an organization of individuals with quality attitudes who want to prevent problems from occurring in the first place, never make the same mistakes twice, and always repeat successes. Here are a couple of interesting quotes from Bill Curtis of the SEI that I think are somewhat pertinent: <<< TPSYS::SYS$SYSDEVICE:[NOTES$LIBRARY]SEPF.NOTE;1 >>> -< SEPF >- ================================================================================ Note 17.7 Boston SPIN 7 of 7 TOHOKU::TAYLOR "e-mail is the ethernet of the 90s" 9 lines 28-MAR-1993 17:40 -< 2 quotes by Dr. Bill Curtis of the SEI >- -------------------------------------------------------------------------------- RE: Boston SPIN meeting 19-JAN-1993, talk by Dr. Bill Curtis of the SEI I found two interesting quotes in my notes: "Large projects are bus sensitive. If a bus hits the lead person, the project dies." "Process maturity lets you go home at night," because there is no overtime required.
2459.23	a few more points	ODAY40::FRICCHIONE	Rick Fricchione (MRO1-1/297-2573)	`Thu Apr 15 1993 23:50`	56
	I have NO idea what some of the previous replies have to do with the order processing problem we experienced. I'd suggest taking ideas on who the next CIO should be, development methodologies, and hindsight in general to the SOAPBOX notes file (off hours). I really don't have the energy for it. It oversimplifies things and has little to do with the original note. The intent was to let people know what was going on since it seemed to have some exposure internally and externally. Lets not mix the religious cable channels with CNN (please). A few RELEVANT points: 1. The problems still occur and will still occur until we go to RDB V4.1. We know how to deal with them now though so the impacts are minimized. Still there, but minimized. Planned upgrade: this weekend. 2. "product interaction level" is management speak :-). To be honest all we know is that dynamic lock remastering, RDB V4.0A, and VMS V5.5-1 in this particular situation/load generate these problems. We understand how to prevent them at startup, but basically everytime the cluster undergoes a state transition (as happened today: $#@$%#@# node crash) VMS lock management and RDB send our database servers FREEZELOCKS which seem never to free up. Also, under certain conditions RDB V4.0A in this environment sends these locks when a database recovery is performed (even if someone just CTRL/Ys out of an interactive SQL read transaction). $DELPRC same thing. 3. We are also experiencing system performance issues due to a completely changed (at least it seems that way) application profile. Again, we probably could have done a better job of characterization here, but hindsight is 20-20. We are working on that as well. Nothing that a few 7620s couldnt fix. Tuning is progressing but you need data for that and that takes time. Giving everyone DECwindows terminals in the last year and some group consolidations into this cluster basically tripled the number of sessions to 1200-1500 simultaneous and thats pretty "challenging". 4. We didn't go off and lone wolf this. We worked with Colorado, RDB engineering and all the organizations who we believed could add value at the time. Lots of people had seen similar situations before. We had too. Few can fix it. Are these the wrong people? 5. I stand corrected on the uptime statement I made. There was some additional downtime before I and others got involved. I don't know how much though, but the system was in for only 2 days at that point. No where near the 11 days that someone stated. That part is clearly wrong. We continue to be up, processing orders and taking calls. We are having a bumpy implementation due to these problem but we believe they will be behind us soon. Rick