[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::digital

Title:	The Digital way of working

Moderator:	QUARK::LIONELON

Created:	Fri Feb 14 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	5321
Total number of notes:	139771

4424.0. "MTBF on Alphas?" by MKOTS3::TLAPOINTE () Wed Feb 14 1996 16:53

    Does anyone know where I can get MTBF (mean time between failure) data
    on the following:
    	AS 2000 4/275
    	AW 200 4/166
    	AW 250 4/266
    
    I need this data ASAP as I'm competing against HP and they have already
    supplied data on their machine (( HP715 est 4 to 4.25 yrs MTBF (per my 
    VAR))
    
    	We used to have an easy way of requesting this data but I was told
    the process was killed.  Any assistance will be greatly appreciated.
    
    Regards,
    
    Tony LaPointe

T.R	Title	User	Personal Name	Date	Lines
4424.1	+ or - .01%	ODIXIE::KING		`Wed Feb 14 1996 19:05`	8
	No MTBF STATS...but Let's connect on Friday before another weekend blows us by. You might want to give Tom Walker a call for info on MTBF data. Tom is part of the competitive watch team covering workstations. You can reach Tom by calling 407-6602100. Russ the ISSMister
4424.2	MTBF/MTTR automated system	ODIXIE::MOREAU	Ken Moreau;Technical Support;Florida	`Wed Feb 14 1996 19:29`	57
	RE: .0 I use an automated system which gives me turn-around within 24 hours: all you need is the part #. I last used this system about 3 months ago, so it should still work. Fill out the form below, and mail it to either MTBF @OGO or CSSE::MTBF. You will get an answer via the e-mail address you specify. -- Ken Moreau MTBF/MTTR Request Form ---------------------- Fill in the information below, and send this memo to either MTBF @OGO or CSSE::MTBF. The information, along with the required disclaimer, will be sent to you at the ALL-IN-1 address or VMS address you specify below. ********************************************************************** * IMPORTANT - AN AUTOMATED SYSTEM WILL PROCESS AND REPLY TO YOUR * * REQUEST. PLEASE DO NOT DEVIATE FROM THIS FORMAT. * ********************************************************************** If you have any questons, please write or call the ADEG Program Engineering Group at GSGPROGENG @MKO, or DTN 264-4727. YOUR NAME: YOUR ALL-IN-1 ADDRESS: (Example: JOHN JONES @OGO) or YOUR VMS ADDRESS: (Example: CSSE::JONES) YOUR COST CENTER: YOUR BADGE NUMBER: CUSTOMER NAME: CUSTOMER LOCATION: BUSINESS REASON FOR RELEASING THIS INFORMATION: Using one line per part number, list the part numbers for which you need reliability (MTBF/MTTR) data below between the words "BEGIN" and "END". Please use the 2-5-2 part number format (00-TK50-AA) or you WILL NOT receive the information that you requested. DO NOT REMOVE THE WORDS "BEGIN" AND "END". BEGIN END
4424.3	No automated system available	DECIDE::MOFFITT		`Wed Feb 14 1996 22:20`	25
	Ken, Good idea but the automated system died on Oct 31 due to lack of funding. BTW, the data had become pretty stale over the last year or so. Here's what the header looked like during its last month. Trust me, it's gone. #14 26-OCT-1995 06:33:42.42 MTBF From: CSSE::MTBF "26-Oct-1995 0830 -0400" To: TURCOTTE,DECIDE::MOFFITT CC: MTBF Subj: COMPLETED_MTBF_REQUEST ***************************************************************************** NOTICE: This system will END-OF-SERVICE as of OCTOBER 31, 1995. ***************************************************************************** I made a couple of suggestions to Tony off line. enjoy, tim m.
4424.4	Sigh :-(	ODIXIE::MOREAU	Ken Moreau;Technical Support;Florida	`Thu Feb 15 1996 00:19`	0
4424.5	It's not easy...	TRUCKS::KEMPSTER		`Thu Feb 15 1996 03:59`	9
	I recently went through a similar exercise and for very much the same reasons. Firstly I found that entries in the relevant notes conferences helped. Secondly I was told that if I was to release these figures to a customer the source should be the product manager. Hope this helps, Tom Kempster
4424.6	Contact in APS	NETCAD::GENOVA		`Thu Feb 15 1996 07:37`	8
	Hi, Dan Riccio, wrksys::riccio was the Mechanical Engineer for the AlphaStation 200 and 250, he could tell you the MTBF, as I remember they were quite high. /art
4424.7		VANGA::KERRELL	salva res est	`Thu Feb 15 1996 07:59`	8
	I recently had the same problem and was told the owner of MTBF info process for the SBU is:- Rick Howe @MRO (RELYON::HOWE) If you mail him the part nos, he should be able to help. Dave.
4424.8	20,000 hours and still going	BBPBV1::WALLACE	UNIX is digital. Use Digital UNIX.	`Thu Feb 15 1996 15:06`	8
	Ignoring the politics: if you have access to TIMA/STARS (I don't), you can often find MTBF figures of some sort in the Product Service Plan for the widget you're interested in. regards john
4424.9	The usual caveats...	ATLANT::SCHMIDT	See http://atlant2.zko.dec.com/	`Thu Feb 15 1996 15:54`	28
	Folks may already know this, but it bears repeating: Even if you can GET an MTBF number, be cautious in using it. 1. Digital considers its MTBF numbers proprietary information as they can be very useful to competitors of MCS. It's a lot easier to quote a maintenance price on a product when you know (with pretty good accuracy) what the vendor con- siders the failure rate to be. 2. The MTBF number is often wrapped up in a complex set of assumptions about the operating conditions for the unit. For example, we might quote an MTBF of 100K hours for equipment operating in the temperature range of 5-50�C, whereas a competitor might quote an MTBF of 300K hours for identical equipment, but operating over a temperature range of 20-35�C. This can lead to a BIG apparent difference in the "goodness" of our product versus theirs even though no actual difference exists. There's certainly no ultimate problem in quoting MTBF numbers and we do it all the time -- I'm just advising caution. Don't just find a number in some database soemwhere and blindly quote it. The Product Manager is a good source of help when you're being asked for MTBF numbers. Atlant
4424.10	WE HAVE HIGH AVAILABILITY SERVICES	UTROP1::KOOIJMAN	LIFE IS HELL THEN YOU DIE	`Fri Feb 16 1996 02:54`	482
	Gentlemen, MTTR and MTBF figures are very interresting but are allmost meaningless today. Some disks we specify MTBF is 800,000 hours! (Do you believe it?) Also we must have corporate approval to give them to customers. Digital has developed a couple of services called High Availability Services under the leadership of Dave Varner @OGO. He is the corporate business manager for this. Engineering for the AVANTO application is located in Shrewsbury Mass. The engineer for AVANTO is Ron Rocheleau @SHR. In Holland we have made availability models for hundreds of systems dutring the past year and we have earned lots of revenue with it. This is a unique capability. A short write-up of what we can do is included. Please contact Dave Varner @OGO for more information. This is the best in the IT industry. With Availability Review and Partnership services we have taken out the competition many times in Holland so please involve Dave. In its simplest form AVANTO can be used to produce hardware availability figure in a very very professional way. Best Regards, Aad Kooijman @ UTO (The Netherlands, which is over in Europe) Business manager High Availability Services AVANTO Over the past four years, Digitals Multivendor Customer Services organization has developed and frequently applied an availability analysis application. This application is called AVANTO (AVailability ANalysis TOol) and has been used successfully in practice to determine the availability of many hundreds of configurations. It has also been empirically established that the predicted results are realized actually in 95% of all cases. This means a unique tool is now available to IT managers. This of course leaves other factors unimpeded such as the management organization, the applications, etc. These aspects (domains) will be involved in an Availability Review Service conducted by Digital. So how do things proceed when using AVANTO? The essence of AVANTO is that it enables a system to be designed so that the anticipated availability can be made to correspond to the demands made upon it from the business. AVANTO is frequently used when configuring new application systems. AVANTO is an application that enables the availability of very complex systems to be modeled and to determine beforehand how the demands with respect to availability can be realized without working in an arbitrary way. In the simplest of applications it is possible to calculate the average availability of a systems hardware by using MTTR and MTBF data. However, as stated above, this will result in an incomplete picture as many other aspects will codetermine the availability in practice. One example is the quality of the environment in which the equipment has been installed. Also very important is the organization of the helpdesk and the underlying second and third-line support of the various suppliers. When establishing the potential availability of an existing system, it will be necessary to investigate how the management, the environment, the software and the other domains have been set up. Besides the hardware, all these domains influence the level of availability. By incorporating parameters and setting up a business scenario, AVANTO can show availability as a function of the business requirements. Digitals Availability Review and Partnership Services are also conducted with the aid of AVANTO. An existing situation is scrutinized in an Availability Review and a very detailed investigation determines what can be done to improve availability management. Alternative situations can also be modelled. It is not difficult to imagine that this approach is much more preferable than one in which measures are taken more or less by guesswork, after which we have to measure what effects these measures have had. Moreover, the costs incurred when improving availability in retrospect are generally much higher than those associated with conducting an analysis in advance. AVANTO is now in structural use by a large number of organizations within Change and Availability Management in order to determine the availability effects of scheduled configuration changes in advance. Availability investigation Lets assume that an IT organization with a given infrastructure wants to determine what availability can be offered to its users. Or the existing availability has to be increased. Digital Multivendor Customer Services can provide answers to these questions based on investigation and with the aid of AVANTO. So how does such an investigation proceed? At the start, the customer is consulted to establish which areas (domains) are to be involved in the investigation. If a decision is taken to limit the investigation to the hardware configuration, the result will be of limited value. In accordance with the information provided by ITIL (Information Technology Infrastructure Library), there are five other domains besides the hardware that must be involved in an Availability Review: 1. The environment, including climatic control, power supply, service contracts, etc. 2. The system management, the organization, the procedures and the system configuration 3. The network 4. The system software 5. The applications and the application management. Only when all these domains have been exhaustively charted will it be possible to determine what availability can be offered. If it is found that the availability to be offered is inadequate, alternative scenarios can be formulated with the aid of AVANTO. An Availability Review provides a complete picture of all the aspects of availability management. AVANTO business model When carrying out an Availability Review Service it will be established what availability can be offered with an existing configuration or a new one yet to be installed. Furthermore, extensive modelling activities are also possible using AVANTO. Based on an initial AVANTO model, the reference model, we can modify the redundancies and/or the service contracts to determine the costs at which the desired availability can be realized. In the first place, however, it will be necessary to indicate when a certain availability is to be offered. In this way, a central system with an important database may require 100% availability during the day while this level will also be required for the back-up equipment during the evening and night. It might also be so that not all the hardware is important for a particular application while a different application does require all the hardware in the system in order to operate. Charting these aspects is called setting up the business model or business scenario in AVANTO. This business scenario is entered into AVANTO, which always displays the availability as a function of the business model. Example of a simple business scenario Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Shift 1 50% 50% 100% 100% 75% 50% 25% Shift 2 100% 100% 100% 100% 100% 50% 50% Shift 3 100% 100% 100% 100% 100% 50% 50% Shift 4 75% 75% 75% 100% 100% 50% 25% The percentages indicate when and to what extent availability is required. AVANTO enables a business scenario to be created for each part of the configuration. This means, for example, that a systems ideal service mix can be modelled. Cost of downtime If known, the costs of the downtime can now be entered. In several cases it is possible to establish which costs are associated with the failure of the information system. This can also be entered into AVANTO as part of the business scenario. If the costs of downtime cannot be quantified, AVANTO will express the costs of downtime in a number of points per hour. The customer can then call upon the assistance of his financial department to convert this into the costs of downtime. Various risk-analysis techniques are available for determining the costs of downtime. Figure 4 Redundancy model (Figures not included) Redundancy The diagram on the left displays the topology of a simple hardware configuration. This diagram shows that components A, B, D, E and F must function correctly for the operation of the entire chain. If component A fails, the chain will be broken and the application will come to a standstill. If component B1 fails then B2 will assume functionality. Here, component B has been executed redundantly and will assume the function from B1 automatically or via a manual procedure and vice versa. It will be clear that the likelihood of failure of the functionality of the entire chain as a consequence of component A is greater than as a consequence of component B. Especially if A and B are equally reliable. The reliability of the entire chain will decrease as more non-redundant components are included in the chain. For configurations as in Figure 1, the number of elements in the chain can easily rise to many dozens. AVANTO also offers facilities to take account of the effects of activating redundancies and then switching back to the normal situation. If a redundancy measure is to be activated, it will often have consequences for the performance during the switching time. These effects of switching the functions on and off (invocation and devocation) are also included within AVANTO when determining the eventual availability and costs of downtime. When calculating the average availability, AVANTO will make use of MTBF and MTTR data of all components within the given configuration. The calculation normally takes place during a simulation period of twenty years and the result of the calculation represents an average expectancy. This means the calculated average availability will be realized in 95% of the cases. AVANTO does not take account of unscheduled failure as a consequence of human actions and other completely arbitrary factors. Nor is it possible to express the quality of the management organization as a figure. It is, however, possible to incorporate operational characteristics of the management organization in AVANTO. For example, the average throughput time at the help-desk and similar types of data. AVANTO can be used to model many hundreds of components. Each component can then have three redundancies. Levels of maintenance service Figure 5 AVANTO can be used to calculate which form of maintenance agreement best suits the business scenario of the particular customer. A wide coverage in the maintenance agreement will not necessarily benefit availability. In other words, the yield per extra Dollar spent on maintenance decreases as the coverage increases. In many cases in which the daytime availability must be 100%, but can be less at other times, it is sufficient to provide less coverage in the maintenance agreement. AVANTO can also calculate the optimum form of maintenance. A graph can be used to illustrate that an effect develops in which the added yield actually decreases. In this way, the customer can determine precisely where the optimum lies in respect of coverage in his maintenance agreement. The above graph in Figure 5 clearly indicates that there is no point for the customer to switch to a service contract providing more coverage than seven days at sixteen hours a day and a response time of four hours (7 x 16 + 4). A more expensive maintenance contract does not provide extra cost reductions for the downtime. If the customer in this example enters a maintenance agreement of six days a week at ten hours a day and a response time of four hours, then each extra guilder spent on maintenance will result in a 190 guilder reduction of the downtime costs. Environmental factors A large number of parameters relating to environmental factors can be entered in AVANTO. This particularly concerns aspects relating to the quality of the power supply and air conditioning, no-breaks and possibly even diesel generators. AVANTO sees a diesel generator as a redundancy measure. For aspects like no-breaks it can now be clearly justified whether the investment provides sufficient yield. After all, the downtime costs will fall as a consequence of deploying a no-break. Performance aspects An important factor when establishing availability is the performance. It can be argued that there will be no downtime if one user can still work. In reality a relation exists between performance and availability. AVANTO will also take account of this providing it has been set correctly. Consultants using AVANTO must very clearly understand which performance effects arise when redundancy measures are introduced. The loss of part of the configuration, such as, for example, memory may also affect performance. Assume that when a redundant disk comes into operation the database performance temporarily drops by 25%. In that case, AVANTO will decrease availability accordingly and include this information in the final result of the calculation. So when installing AVANTO, we must have a thorough knowledge of how the system elements and components operate. Software and applications There are, of course, very limited possibilities for including quantitative data in AVANTO on the reliability of system software and applications. In recent years, increasing numbers of figures have been made available, such as Mean Time Between Crash (MTBC). However, these figures may differ considerably from one situation to the next. It is also practically impossible to take account of such aspects as programming errors and software bugs. When conducting an Availability Review, the investigation will focus primarily on the total management environment and the software and application management. Attention is paid in particular here to the procedures for reporting problems in the software, the applications, the help-desk organization and the second and third-line support. A well-designed infrastructure for problem-solving will make a considerably positive contribution to the availability of the applications. In several Availability Reviews, the investigation focuses on the availability of a certain application in the infrastructure. When setting up the AVANTO model and the Fault Tree analysis for this, account is clearly taken of the fact that the application in question uses only part of the hardware. In this way, AVANTO can also be used to create models for a certain group of users. System Health Check An equally important part of the Availability Review Service is the use of the System Health Check (SHC). Although mentioned several times above, the intrinsic (hardware) availability is not the only aspect that is important for the availability of an information system. The way in which the system management is exercised is particularly important for the eventual result. During an Availability Review, the Digital consultant conducts an SHC on all the systems involved in the investigation. This involves scrutinizing a large number of aspects in the areas of: � Security � Performance � Capacity utilization and occupied space � etc. Hundreds of checks are carried out in an SHC by the specially developed software. The result is that a detailed fingerprint of the management is obtained. The consultant carrying out the Availability Review Service indicates all the bottlenecks in his report and gives concise advice on how certain matters might be solved. Taking note of this advice will make a clear contribution to improving the entire system performance in all the fields investigated. Please refer to the brochure on SHC for more information in this respect. Supplementary investigation The above description sets out a clear picture of the availability an IT organization can offer. An Availability Review that has been conducted exhaustively uses questionnaires to obtain even better insight into the organization of the system and network management. Some of the additional aspects to be scrutinized are physical security, reporting, change management, problem management and other aspects that are allied to availability management. Conclusion The previous sections indicate the facilities available for using AVANTO to model availability. This document also indicates general aspects of the power of the Availability Review Service in combination with AVANTO. An Availability Review was conducted at an Australian bank. This involved charting the availability of a network of cash dispensers and particularly the availability at particular points of issue. Finally, AVANTO was used to formulate advice for improving the availability of certain points at the lowest possible costs. The method described for this is universally applicable and not limited to Digital hardware and Digital users. It almost goes without saying that this method in combination with AVANTO is completely unique within the IT sector. There is no other application like AVANTO. Reports are made to the customer by means of a management summary with recommendations, a detailed report containing the background information and all the detailed information from AVANTO and the System Health Check. All this is complemented with the results of the supplementary investigation. The Availability Review Service is applied to situations such as those set out below: � The IT management must guarantee a certain availability and looks for facilities to realize this. � There is uncertainty about the availability that can be offered with the existing infrastructure. � A system is to be expanded and an investigation is to determine what effects this may have on the availability. � A new application system is to be configured with a view to a particular availability. � The availability demands are approaching one hundred percent. � For ITIL implementations and determining Service Levels. Even when conducting an Availability Review in combination with the application of AVANTO, it will be possible to configure an application system so that a predetermined objective relating to availability can also be realized. In practice, designs have already been made of systems that exhibit fewer than an average four hours downtime (intrinsic) per year. Of course, the management organization must again comply with the high quality requirements as, for example, described in ITIL. Digitals Multivendor Customer Services in the Netherlands are ISO 9001 certified. Introduction Today, the availability of information systems is as natural as that of telephone and electricity facilities. Without information systems the greater part of todays economic activities would come to a standstill. Increasing numbers of business managers are aware of this and are taking measures to help safeguard the availability of the information supply. Government also plays a role here with, among other things, its publication of the Code for information security . One of the objectives of this Code is the promotion of business confidence. This objective clearly indicates the relation between economic activity on the one hand and supply of information using automated systems on the other. Indeed, many companies and government organizations rely entirely on information systems for their operations. In everyday situations however, very little account is unfortunately taken of the availability requirements that have to be imposed when developing and configuring new application systems. Availability management, if already deployed explicitly, is almost always limited in practice to subsequent measurements and adjustments where this is possible. The computer industry has been successful in developing fault-tolerant systems for highly critical applications, which systems can operate practically without unscheduled interruptions. Fault-tolerant systems are frequently deployed particularly by financial institutions and logistics organizations. However, these systems are relatively expensive and the alternatives are limited. In addition, for many years there has been a trend among suppliers to design normal systems that can offer very high availability. Increasing numbers of suppliers are also providing lifelong guarantees for certain components. Several years ago personal computers used to break down with frightening regularity, but today, we anticipate that the technical life span of our PCs will far outstrip their economic life expectancy. There are also numerous possibilities for incorporating redundancy into computer configurations and the use of RAID technology is also applied with increasing frequency. There is on the one hand a greater need for information systems configured to provide high to very high availability, while on the other suppliers are offering more and more facilities for building reliable systems. However, the question that information managers are having to answer with increasing regularity is: How can I determine in advance what kind of availability I should offer my users? This is partly attributable to the development in which functional/business management determines the conditions to which the supply of information must comply. And this, of course, at the lowest possible cost. This is complicated by the fact that until recently there was no effectiveway to determine the availability of a complex information system before it was actually implemented. In addition to the many organizational aspects, the book titled Availability Management from the ITIL series only ventures on a mathematical approach to this problem in Appendix B. In practice, when establishing the availability figures in Service Level Agreements in advance, we usually take an arbitrary approach. Based on experience, the inclusion of a considerable dose of redundancy, negotiations and a sound maintenance agreement, it is thought that a certain guarantee can be provided to the users. Practice must then indicate the level of possibility for achieving the agreed availability and how adjustments can be made if the availability is inadequate. This situation is far from ideal since adjustments are almost always associated with a large number of frustrations, unexpected and often high costs. If this relates to strategic applications, it is an unacceptable and also unnecessary state of affairs. Evidently the demand for a well-established foundation for the availability to be expected will become increasingly important. The solution, however, is not simply a well-designed configuration. We only need read a few publications on this subject. The reasons for unscheduled failure of information systems can be significantly attributed to aspects such as environmental factors, service, management, the applications and the network. We will therefore need to take a holistic approach if we wish to find out more about the subject of availability and not simply take the hardware into account. It is not without reason that ITIL has become very popular. If we look at management models described in ITIL, it will become clear that the success or failure of availability depends on the entire system of measures, methods and procedures. All this must be complemented with the clear safeguarding of quality and security measures. Indeed, good security is the wall surrounding our availability. If we examine the situation closely, the only functions of the management organization are making and keeping available the applications necessary for the business. A hardware configuration is at the basis of a high availability, and this hardware can offer a certain availability. This basic availability is called the intrinsic availability of a system. Intrinsic availability is always better than or the same as the actual availability to be realized for the user. A system or infrastructure will only be able to approach its intrinsic or nominal availability if the management and all other external factors are at an optimum. According to ITIL, these external factors are the environment, the software, the network, the applications and the management of the entire system. Investigations conducted during the mid-nineteen-eighties demonstrated that hardware is accountable for only about twenty percent of all instances of non-availability. This only holds true if we look at all the causes of system failure. The greatest threat to the organization, however, is based on unscheduled failure. Most of this unscheduled failure is clearly attributable to the hardware, the service, the network and the environmental factors. If they are well-organized and tested, the management, the applications and the software will cause much fewer unscheduled failures. This is why, when we design a new information system, we must pay great attention to the topology of the hardware configuration. It is not difficult to imagine that the failure of one single hard disk may have serious consequences for a very large databases. An important role is played here not merely by the repair or replacement of the faulty unit but also by the time required to restore the database. Moreover, increasing numbers of systems are becoming part of a much wider infrastructure that is closely linked to the systems of other organizations. These may include, for example, systems for EDI, logistics, telephone sales and electronic payment transactions. But these may also include real-time applications for telecoms companies and within the chemical industry. The consequences of these kinds of computer systems being unavailable are catastrophic. The failure of information systems deployed in the dealing-rooms of banks may have disastrous consequences for the entire banking organization and extend far beyond the limits of that company alone. The reservation systems of airline companies are a case in point. In certain cases, we will need to design systems so that downtime will be limited to a few hours each year. In one situation, it was demanded that the application may never fail even in the event of a disaster. These very high availability demands are without exception prompted by the commercial importance of the application. We also know that as the need for availability increases, so the costs will increase exponentially. But how do we design a configuration to comply with such high availability requirements? Or, what must we do when we have to make a very critical application operational on an existing platform? Which measures must be met to set up the management so that we can continue to meet the demands that have been set? We will be able to find the answers to these questions using Digitals Availability Review, Partnership Services and with the aid of AVANTO.
4424.11	Think Terabytes. Video-on-demand, Commercial DBs, etc.	ATLANT::SCHMIDT	See http://atlant2.zko.dec.com/	`Fri Feb 16 1996 10:18`	24
	Even very high MTBF numbers are not meaningless. It is true for mere humans like you and I, who buy and use our disks one or two at a time, that an MTBF of 100K hours (11.4 years) isn't meaningfully different than an MTBF of 800K hours (91.3 years). After all, the disk will be obsolete in one or two years and truly ridiculous in five or ten years. But for people who assemble huge storage arrays of hundreds or thousands of disks, the law of large numbers starts to come into play. If these disks really have a uniform failure rate throughout their lives, then with a hundred disks, that 100K drive array starts to fail every 1000 hours (42 days). And a thousand-disk array, the failure occurs every 4.2 days. They'd better be using RAID! But RAID requires more disks, and that means more failures! Yipes! If, on the other hand, they buy that disk that runs 800K hours on average, then the hundred disk array runs an average of 8000 hours (333 days) between failures and even the thousand-disk array runs 800 hours (33.3 days). RAID would still be nice-to-have, but or- dinary backup-to-tape might still be a sufficiently practical strategy. Atlant
4424.12	Yes, but get in touch with the people who know	UTROP1::KOOIJMAN	LIFE IS HELL THEN YOU DIE	`Fri Feb 16 1996 10:56`	32
	Yes, Yes, yes, yes you are right. But when a customer wants to know what level of availability he can guarantee to his users you will need AVANTO to give him the right answer. With normal common sense and a pocket calculator you will not be able to satisfy such complex problems as you describe them. And it is not true that 10 disks will give 800k devided by ten is 80k MTBF. That is only true if all disks are used by the same database and application and are not redundant and and and. We have utilised AVANTO many many times in situations where we had to answer questions from customers like "what do I have to do in order to have no more then 16 hours of downtime average per year?" Would you recommend RAID or volume shadowing and/or clustering? We have even designed systems that will never be down, even in case of disaster. We have done this for customers with 8 node clusters and 150 Gbyte of disk. So once again, contact Dave Varner and Ron Rocheleau and don't start a debate here about the real value of MTBF and MTTR. As far as I'm cencerned only a few of us are qualified and the best one is Ron. Use AVANTO with the customer and see for yourself what a great thing we have. We have done it many times and customers are paying us big bundles of money to get the real Availability answers. Regards, Aad Kooijman. Business manager High Availability Services in Holland. So I'm not very very technical
4424.13		ATLANT::SCHMIDT	See http://atlant2.zko.dec.com/	`Fri Feb 16 1996 12:24`	20
	Aad: You're "preaching to the choir". I wasn't arguing about the use- fulness of a complete model. I worked for both Field Service and CSSE, and models were our life! We knew how to get whatever answer we wanted from our models. :-) What I was debating was the statement that such large MTBFs are meaningless. They're definitely not, at least not if you have enough disks (for example) that the law of large numbers starts to apply. Ten disks? You're right -- the calculation isn't simply MTBF/10. But a hundred disks? Maybe. And a thousand disks? Probably. And yes, disks have wear-out mechanisms as well as other sources of failures. But I was trying to draw a simple illustration and not clutter it up too much with details. And the original note that talked about meaningless MTBFs had used disks as an example, so I followed suit. Atlant
4424.14	Is MTBF needed?	MRKTNG::VICKERS		`Fri Feb 16 1996 12:33`	28
	Re: MTBF vs. reliability/availability/maintainability - there are some really good replys in this notes stream, and some very valid positions. Unfortunately, our customers (who emerge from the great unwashed, uneducated masses) still ask for, and in some cases demand, equipment MTBF as part of Digital's response to RFQs. Try as one may, they can not be educated or coerced away from this position - in fact, they sometimes take on the "if you won't tell me, what are you hiding" attitude about the subject. Interestingly enough, most don't specify or care about the method used to generate the number (DoD and other U.S. agencies being the exception - MIL 217 only please), they just want the number and by not providing it Digital risks being declared non-compliant in their proposal. Also, I have never known a customer to come back with MTBF data and say, " Oh, by the way, the equipment you sold me didn't meet the MTBF you specified. I think you should compensate me." I have had them cite me chapter and verse from the IBM/HP/Sun book as to why "my" equipment was substsantially inferior to the competing product in terms of "calculated" MTBF. Then the discussions get really mundane. My .02 worth Bill
4424.15	It's the way they've always done it	BBPBV1::WALLACE	UNIX is digital. Use Digital UNIX.	`Sat Feb 17 1996 08:17`	10
	Bill gets my vote. No numbers, no sale, in much of the OEM market I support. It doesn't matter if the numbers are meaningless, it doesn't matter than some of the OEM customers and/or their end users can't tell the difference between availability and reliability, it just matters whether they can drive their (?ISO9000?) quality process which says they have to crank the MTBF handle on a spreadsheet and come up with The Answer. regards john
4424.16	The next point in our debate	UTROP1::KOOIJMAN	LIFE IS HELL THEN YOU DIE	`Mon Feb 19 1996 03:00`	33
	Hi guys, If your customer wants/needs MTTR and MTBF, give it to them. I do not argue with that. Just make sure you get a non-disclosure. I only want to point out that: 1. These figures do not tell the whole availability story. 2. Digital has great services and applications to help our customers determine the 'real' availability. 3. We have been very succesful selling these services in Holland. 4. We can, by using these services make IBM and HP look stupid. 5. By positioning our High Availability services we have a unique selling point. 6. We generated a million worth of NOR with these services within a year. Especially in the OEM market and with partners. We can help partners to design systems that will meet their Availability specs. The account managers love it because we see a lot of product sales as a result of these services. 7. Digital has expertise that is second to none that you might want to use. 8. The Availability Analysis Tool (AVANTO) is just great and we have used it hundreds of times. One of our ABU accounts bought k$ 600 worth of hardware and software as a result of an AVANTO exersise. Just to improve the availability of one of his VAX clusters. Best regards, Aad Kooijman.
4424.17	Remember what MTBF means...	ADOV01::MANUEL	Over the Horizon....	`Mon Feb 26 1996 08:52`	11
	And just remember that MTBF is "mean time between failures", this statistical number is just that - the time between successive failures of the same piece of equipment. Hopefuly with our latest technology you or your customer or the equipment will not be around to argue whether the second failure exceeded the MTBF. Just replaced an RZ23 in my vaxstation after the first failure - I've had this faithful beasty for about 7 years, the MTBF timer started at 13:00 today.... Steve.
4424.18		LILCPX::THELLEN	Ron Thellen, DTN 522-2952	`Thu Oct 31 1996 10:32`	24
4424.19	try the new call center	TROOA::MSCHNEIDER	Nothing witty to say	`Thu Oct 31 1996 12:47`	3
4424.20	Nearest Reseller	STOWOA::BLANCHARD		`Thu Oct 31 1996 13:30`	4