[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

976.0. "Probability of availability question" by UTRTSC::LUBBERS (Jan Lubbers, Software Support.) Tue Nov 15 1988 06:05

    Hi world,
    
    I have a problem which you may be able to help me with.
    
    The problem has to do with system availability, I know the 
    mean time between failure (MTBF) and mean time to repair (MTTR)
    figures of the components of a system. I can calculate the
    resulting mean availability for the total system, but I need to 
    know the availability that has 95% probability.
    
    It can be done if I have the distribution and deviation of the 
    MTBF and MTTR figures, but that is unfortunately not the case.
    
    I might use simulation to generate statistical data, but
    I still don't know what the distribution must be, so I'm looking 
    for a mathematical model.
    
    So, what can I do without knowing what the distribution and deviation
    is?  

    Regards, Jan.
T.RTitleUserPersonal
Name
DateLines
976.1DWOVAX::YOUNGNote early. Note Often.Tue Nov 15 1988 11:548
>    It can be done if I have the distribution and deviation of the 
>    MTBF and MTTR figures, but that is unfortunately not the case.

    MTBF of most computer components is usually considered to be a Poisson
    distribution.  MTTR is trickier, it depends on what kind of service
    contracts, guarantees, and reliability exists.  I would advise trying
    to come up with some function that approximates what you (or others)
    think will be the distribution of MTTR.
976.2Insufficient information.ERLTC::COOPERTopher CooperTue Nov 15 1988 13:0339
    There is not enough information here to calculate what you want.
    Indeed there is not enough information here to calculate what you
    claim to have calculated.  You cannot calculate the mean time
    between failures for the system as a whole from the mean time
    between failures for its components unless you make some additional
    assumptions.
    
    	Specifically, you have probably assumed that the failures of
        the components are random in time, and independent of each other
        both within and between components.
    
    If you make those assumptions than each failure for the system
    as a whole is also independent of all the others and independent
    of the time.  That is, the probability of the system going down
    during an interval of any given length (say one second) is the
    same whether or not the system has been repaired recently and whether
    it is day night or weekend.  This is unlikely to be true but may
    be a reasonable approximation.
    
    In this case the distribution of inter-failure times follows the
    exponential distribution:
    
    		p(t) = (1/T)*exp(-t/T)
    
    where T is the mean time between failures (if I haven't misremembered
    the formula).  The Poisson distribution is related in that it tells
    you how many failures will occur in a given sized interval.
    
    It does not seem to me that these assumptions are very well justified
    for repair time, and without some knowledge of the distribution
    of the repair time beyond its mean, you cannot make your calculations.
    Off the top of my head, I would say that the best model might be
    a constant time interval plus a normal distribution.  That requires
    an estimate the constant interval plus the variance of the variable
    part.  You might get away with a simple normal approximation.
    
    What are you willing to assume about repair times?
    
    					Topher
976.3Thinking.....UTRTSC::LUBBERSJan Lubbers, Software Support.Thu Nov 17 1988 02:388
    Thanks, 
    
    I have to think now I think ...
    
    I'll come back to this problem later when I have defined the
    assumptions better.
    
    Regards.
976.4a few commentsPULSAR::WALLYWally Neilsen-SteinhardtFri Nov 25 1988 12:4821
    First off, this is a well-studied problem and there is no reason
    for you to thrash around in the dark.  You might start with
    _Maintainability and Maintenance Mangement_, J. D. Patton, Instrument
    Society of America, Research Triangle Park, North Carolina, 1980.
    And a lot of books and Proceedings with "Reliability" in the title
    discuss this problem.
    
    Second, MTTR is defined as "the average length of working clock
    time required to complete a corrective service call..." and excludes
    response time and travel time.  Depending on the repair strategy
    these may be large, so MTTR is not really what you want to combine
    with MTBF to get availability.
    
    Third, availability depends strongly on the redundancy built into
    the hardware and the planned repair strategy.  You have to define
    these first.
    
    Fourth, if this is Digital business, you should contact the CSSE
    experts on availability, who have some computer models for making
    these calculations, lots of experience in applying them, and some
    business judgement which is often more relevant than the math.
976.5formulas?UTRTSC::LUBBERSJan Lubbers, Software Support.Mon Nov 28 1988 02:4720
    Hi Wally,
    
    Thanks for your input, yes I understand the problems you mentioned.
    What I'm really after is a way to determine the probability of a
    certain availability figure...
    
    I do have a simulation model and a program that calculates everything,
    but I want to add some additional functionality.

    To do this I need to understand how a "correct" formula is made.
    I also know there are a lot of assumpitions involved, my major 
    problem is how the different figures are distributed.
    
    eg. I don't know if the "mean" in MTBF and MTTR is the mathematical
    mean, the modus or the median of those figures. Also Im looking
    for the (standard) deviation. In the programs I have they are assumed
    to be lognormal for repair time and exponential for uptime.
    Is this a correct assumption?

    Regards, Jan
976.6New conference on AvailabilityUTRTSC::LUBBERSJan Lubbers, Software Support.Mon Nov 28 1988 11:2614
    I have created a new conference on this subject.
    
    The conference is located at UTRROM::UPTIME
    
    Anyone who is interested in the subject and knowledgeable persons
    on mathematics and statistics are free to contribute...
    
    I have the intention to improve the existing tool to incorporate
    the new developments like volume shadowing and VAXsimPLUS that
    influence availability.
    
    Hit <select> or <KP7> to add it to your notebook.
    
    Regards, Jan.