[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1447.0. "Statistics can be (mis)used to "prove" anything" by ATSE::GOODWIN () Thu May 23 1991 16:26

    
    
    I figured I better take this question to the experts, because either
    I'm all wet (not unlikely), or a large portion of the rest of the world
    is (also not unlikely)...
    
    
    If the Department of Motor Vehicles were to say:
    
    	"Twenty percent of all driving fatalities involve a left-handed 
    	 driver, so we should pass a law against left-handed drivers."
    
    wouldn't your reaction -- being statistics experts -- be:
    
    	"That statistic by itself is not sufficient to determine
    	 whether or not left-handedness has any correlation to
    	 likelihood of causing a driving fatality.  We need to know
    	 the incidence of left-handedness in the general population."  ?
    
    And then if it turned out that only 10% of the general population were
    left handed, you would suspect that there might be some correlation
    between left-handedness and likelihood of causing an accident.
    
    Or if it turned out that 30% of the population was left-handed, then
    maybe you would consider lowering insurance premiums for left-handed
    drivers.
    
    But if it turned out that 20% of the population was left-handed, and
    20% of accidents involved left-handed drivers, then wouldn't you
    conclude there was NO correlation between handedness and likelihood of
    causing an accident?
    
    I certainly would, but that is from gut feel rather than from any
    in-depth knowledge of statistics.
    
    So................
    
    When they say, as they have been so fond of doing for the past 20 or
    more years to my own knowledge, that 50% of all highway fatalities
    involve alcohol (or 55% or whatever), don't we need to know something
    about the population as a whole before we can say there's any
    correlation between drinking and driving accidents?
    
    I think it's obvious we do, and I've never -- except for ONE time --
    heard any such statistic, and that was from the Maine Department
    of Motor Vehicles right around the same time as Massachusetts DMV was
    publicizing a 55% incidence of highway/drinking fatalities a few years
    ago -- when MADD was just making its debut...
    
    Maine said -- on TV news one night -- that by their estimate, around 11
    out of every 20 people "on average" (whatever that means) have been
    drinking.  The statement was vague and didn't address time of
    day/night, but specifically stated that the 11/20 applied to the
    population in general, and not just to drivers.
    
    Well, if that's really true, then 11/20 compared with 55% kinda seems
    to me to indicate there is virtually NO correlation between drinking
    and road accidents.
    
    Which we all know is false because the government tells us so, and they
    wouldn't be spending all that tax money if it weren't true, now would
    they?
    
    What's the story here?  Are we being had once again by cops who need
    work, politicians who need votes, government agencies who need
    money, and citizens groups who need to take out their frustrations 
    on someone?
    
    Or is there some basic principle of math I'm missing...
    
T.RTitleUserPersonal
Name
DateLines
1447.1we must well define the sets of populationsSMAUG::ABBASIThu May 23 1991 23:4649
ref .-1
    
>    wouldn't your reaction -- being statistics experts -- be:
>    	"That statistic by itself is not sufficient to determine
>    	 whether or not left-handedness has any correlation to
>    	 likelihood of causing a driving fatality.  We need to know
>    	 the incidence of left-handedness in the general population."  ?

 actually my reaction would be to ban every one with a Nose on their face,
 since almost 100% of accidents are carried by such folks :-)

 Iam not a statistic person either, but this is what i figure:
 I think one should look at the population that is involved in the activity
 when the statistics is carried out.

 i.e over a time T, find out the number of drivers on the round say N,
 out of these find out the number of ones that are drunk say M, number of
 are not drunk say J, then find out the number of accidents in time T, 
 let M' be accidents carried by M, let J' accidents carried by J.

 then compare M/J  to M'/J'

 if  M/J is  less than M'/J' then you can suspect then that drunk driving
 is a factor.
 Iam assuming that if a drunk has an accident with non-drunk, we count
 both in the survey. (i.e J'++  and M'++ ) .

 example, 100 drivers over 1 hour, 90% non-drunk, 10 accidents, 2 involve
 drunks. then
 N=100, M=10, J=90, M'=2, J'=8.
 so M/J= 10/90
    M'/J'= 2/8

 so 1/9 < 2/8  so drunk driver can be a factor.

 but we dont need all this really, we all "know" if are drunk you are
 more likely to be in an accident then if you were not drunk..QED.

/naser

 






    
    
1447.2ATSE::GOODWINFri May 24 1991 17:5745
    
> but we dont need all this really, we all "know" if are drunk you are
> more likely to be in an accident then if you were not drunk..QED.
    
    Exactly my point -- we don't "know" any such thing.  In fact, the only
    reason we Think we know it is because the pols tell us so and because
    we Want to believe it.
    
    In fact, some other statistics published by state of Maine tend to 
    disprove the contention.
    
    After Maine passed a series of new strict DWI laws a couple years ago,
    they published some statistics in the Portland Press Herald after one
    year of the new laws.
    
    The article was congratulating everyone on a job well done because the
    number of accidents attributed to DWI had dropped from ~50% to ~33%,
    which they said proved that DWI was indeed causing the problem, and if
    they could just get all the rest of those nasty old drunks off the
    road, then they implied the whole 50% would go away.
    
    But at the end of the article, after all the back patting for
    politicians, police, etc., the paper happened to mention that
    unfortunately during the same period, the actual death rate on the road
    had increased by 6%.
    
    The population of the state of Maine did not increase during that year. 
    I checked.
    
    So what actually happened is:
    
    	o Deaths attributed to DWI went down by 1/3
    
    	o Deaths went up 6%
    
    Obviously sober people killed just a few more folks than DWI's had,
    keeping in mind that in Maine, "drunk" does not mean falling down
    sloppy drooling bombed.  It means a 0.0010 blood alcohol content.
    
    I still think we are being hoodwinked into a witch hunt based on faulty
    statistics.
    
    Has anyone ever heard any other statistics that would contribute more
    useful information to this question?
    
1447.3GUESS::DERAMOBe excellent to each other.Fri May 24 1991 18:2319
        To really tell what is going on, you need to have the
        numbers in too much detail.  The accident rate probably
        varies with time of day, type of vehicle, etc., and you
        want to control for these things.  So get the numbers
        for:
        
        	what % of the drivers of [class of vehicle] on
        	[section of highway] from 4pm-6pm [or other
        	narrow time slot] were drunk, and what % of the
        	accidents did they cause?
        
        That won't be easy.
        
        However, driving tests show very clearly that there is a
        large degradation in performance (relative to performance
        measured just before drinking) even after consuming what
        some might consider very little alcohol.
        
        Dan
1447.4ATSE::GOODWINSat May 25 1991 09:4276
    
    >    That won't be easy.
    
    With these 4 words I think you've put your finger squarely on the
    problem.  Driving law enforcement spends 99% of its efforts, including
    convincing the public of the rightness of their doing so, on things
    it can measure easily, whether or not they really have that much
    relevance to safe driving.
    
    Things that are easy to measure are speed and blood alcohol content.
    
    One thing that is hard to measure from a patrol car on the side of the
    road is another driver's driving skills and ability to make sound
    driving judgements.  But such measurements, could they be made, would
    be far more relevant than the easier measurements.
    
    It's like the old joke about the drunk crawling around under a lamp
    post; "I'm looking for my car keys".  The passer-by asks, "Where did
    you lose them?".  Drunk points to other part of lot, "Over there". 
    "Then why are you looking for them here?".  "'Cause the light's
    better here."
    
    >    However, driving tests show very clearly that there is a
    >    large degradation in performance (relative to performance
    >    measured just before drinking) even after consuming what
    >    some might consider very little alcohol.
    
    I've seen those "tests" too, but I doubt they are relevant to real life
    driving.  What those tests measure is one's best reaction time under
    conditions that, were one to encounter them on the road, would be hair
    raising to say the least.  They then measure reaction time to the same
    test after some alcohol, and sure enough, it is slower.
    
    All they measure is reaction time under conditions that push the
    person's ability to react to its limit.
    
    If you drove all the time in such a way that your reaction time is
    tested to its limit, then you should definitely voluntarily take
    yourself off the road and never never drive a car again.
    
    If you plan ahead, are observant, are able to anticipate other drivers'
    impending actions, and possess a host of other driving skills that good
    drivers learn over the years, you should be able to go for years at a
    time without ever learning just how good your best reaction time is.
    
    But the tests never measure such things, so they can't tell what
    effect, if any, a small amount of BAC would have on overall driving
    ability.
    
    This wasn't meant to be a discussion of drunk driving -- I really am
    interested in the statistical aspects of the situation.
    
    I went so far as to obtain the raw accident data for 3 years running
    on road deaths in Maine.
    
    First of all, the numbers do not directly show any 50-some % alcohol
    related accidents.  They don't show anything at all about that, and I
    have been unable to get answers from the DMV fellow who puts this stuff
    together to questions such as, "If this is the source of all their
    conclusions about causes of accidents, then where do the get the
    alcohol statistics from?"  He won't answer questions about it for some 
    reason.
    
    One thing I thought was VERY interesting though is that most accidents
    (90-some %) happened in daytime on clear-weather days on straight
    sections of good road.  Night, rain, curves, etc. accounted for only a
    very small % of accidents.
    
    If that one statistic is so non-intuitive, then I expect others are
    too, especially when our "intuition" has been brainwashed for so many
    years.
    
    That's what I'm trying mathematically to get around -- our programmed
    preconceptions -- and it ain't easy.  People seem only to want to find
    ways to prove that their preconceptions are true.
    
1447.5GUESS::DERAMOBe excellent to each other.Sat May 25 1991 13:0519
        re .4,
        
>>    One thing I thought was VERY interesting though is that most accidents
>>    (90-some %) happened in daytime on clear-weather days on straight
>>    sections of good road.  Night, rain, curves, etc. accounted for only a
>>    very small % of accidents.
>>
>>    If that one statistic is so non-intuitive, [...]
        
        I would expect most accidents to occur when most of the
        driving is done.  That's hardly "non-intuitive".  It just
        means I expect sheer numbers to overwhelm any potential
        difference in rate of accidents per driver-hour.  If the
        rate difference was large enough so that most accidents
        occurred when few people were on the road, that would be
        interesting, but I wouldn't call it "non-intuitive"
        either.
        
        Dan
1447.6VINO::XIAIn my beginning is my end.Sat May 25 1991 14:1631
    In a complicated case like this, statistics at best provides a vague
    inference.  As Dan said it depends on a lot of factors.  Reading the
    statistics from Maine, some relevant issues that come to mind...
    
    1.  How much did the accident rate increase in the last year from the
        year before?  Is it more than 6% or less?
    
    2.  How much more did people drive this year as compared to last year.
    
    3.  Is there any change with the ratio of high way vs. backyard road
        driving?
    
    4.  Is there a change in the age distribution of the population?
    
    and on and on...
    
    However, the main point against DUI is as Dan said, it has been shown
    that under controlled situation, a person's driving performance
    degrades drastically under the influence of alcohol.  This result does
    not depend on how good a driver you are, but rather if you are drunk
    you do not perform as well as you would have if you are not drunk.  The
    controled studies are done to simulate the "stress condition" of an
    accident which is by its nature extreme.  In the middle of an empty 
    parking lot the size of say Kansas, a driver who falls asleep is no 
    more likely to cause an accident than an allert driver.  But that does 
    not mean when an accident is about to happen on Rt 20, the sleeping 
    driver can perform as well as the allert driver.  Performance under 
    extreme stress is what counts when an accident is about to
    happen or has already happened.
    
    Eugene
1447.7Lies, damn lies, and statisticsGIDDAY::FERGUSONMurphy was an optimistSun May 26 1991 03:099
    The fallacy can be shown clearly by the following counter-argument:
    
    If 20% of all accidents involve left-handers, then 80% of those
    accidents involve people other than left-handers, i.e. right-handers
    and ambidexters. Since the proportion of ambidexters is low in the
    population at large, we should ban right-handed people from driving
    motor-vehicles!
    
    James.
1447.8ATSE::GOODWINSun May 26 1991 09:2615
    
    >    I would expect most accidents to occur when most of the
    >    driving is done.
    
    Most total accidents perhaps, but not necessarily most accidents per
    capita.  In general, rush hour driving seems safer than driving at
    other times because everyone has a common goal, drives at about the
    same speed, and seems to exhibit comparable driving skills -- very much
    unlike other times of the day.
    
    But even if you accept that, it doesn't seem intuitive at all to me
    that more accidents occur on straight sections of road than on curves. 
    You just can't make a case that more people drive on straight roads
    than on curved ones.
    
1447.9ATSE::GOODWINSun May 26 1991 09:4642
    
    First of all, I am not trying to say that a drunk person can perform as
    well at a physical or mental task as a sober one.  Or a tired person. 
    Or one who has taken cold medicine.  Or an angry person.  Or a
    depressed person.  Or anyone else who is not awake, alert, and
    fully concentrating on driving.
    
    What I am saying is that because there is one particular factor that
    happens to be easy to measure, (and because it is also morally
    questionable in many quarters) alcohol is being singled out and a large
    proportion of our tax money is being spent on an effort that is
    yielding small results by any indication I have seen so far.  But
    because of one often-quoted statistic, and some often-quoted tests of
    questionable relevance, we are all Believers.  Well, not quite all.
    
    If the anti-DWI efforts across the nation are so bloody effective, then
    how come the highway death rate has not decreased significantly over
    the past 3 or 4 years?
    
>    Performance under 
>    extreme stress is what counts when an accident is about to
>    happen or has already happened.
    
    If an accident has already happened, then whether anyone is drunk or
    not isn't going to make a lot of difference.
    
    But in the case where person A runs a red light (sober) and is about to
    collide with person B, then I think what you are saying is that if
    person B is sober he stands a better chance of taking effective evasive
    action than if he were drunk.
    
    I wouldn't disagree with that.  But I wouldn't say that if person B
    were drunk then he caused the accident.  I would say that person A
    caused the accident.
    
    What your test is testing is the ability of person B.  It's the
    ability, or lack of it, of person A that ought to be being tested and
    either improved or removed from the road altogether.
    
    That's why I maintain the statistics are causing us to spend our
    resources unwisely.
    
1447.10ATSE::GOODWINSun May 26 1991 10:1045
    
    Re. .7
    
    Exactly.  You could just as meaningfully say that if 50% of accidents
    are caused by drunk driving then let's pass a law that requires that
    people get snokkered before they plug in that car key.  I used to say
    that as a joke, but it's just as statistically "true" as the other way
    of looking at it.
    
    It's like saying that since 70% of accidents are caused by
    men (which I believe is somewhere in the ballpark), then only women
    should be allowed to drive.
    
    And it HAS been shown statistically that men have more accidents than
    women, especially men between the ages of 16 and 25.  The insurance
    statistics are far more meaningful than the state's statistics because
    they need to be accurate for the insurance company to make a profit,
    while the state only says what the politicians want them to say.
    
    Unfortunately, the insurance companies won't part with their
    statistics.  As one explained to me, their statistics and analysis
    thereof are their stock in trade, what they base their rates on, and
    are therefore highly confidential.
    
    The thing is, there are probably lots of reasons why a particular
    individual ought not to be driving -- among others, plain old lack of
    skill.  Some folks do not have the ability to become good drivers.
    Most of us have known one or more of these people.
    
    I would bet (having no supporting evidence except my own experience on
    the road), that 90% of accidents are caused simply by lack of driving
    skill, either in new drivers who haven't developed their skills and
    experience yet, or in long-time drivers who aren't capable of doing so.
    
    And the fact that half of them happen to have had a few at the time
    doesn't change the fact that they didn't have the skills in the first
    place.
    
    I still maintain that because of 1 abused statistical datum, an entire
    nation is spending its resources barking up the wrong tree.
    
    I wish there were some way to get hold of more information on accident
    circumstances, but I haven't found any yet.  The state doesn't seem to
    have too much they are willing to part with.
    
1447.11VINO::XIAIn my beginning is my end.Sun May 26 1991 13:5615
    Frankly, I for one am not interested in political soapboxing in
    the MATH notesfile.  No matter how many wishful thinking people would 
    like to dismiss statistics with cliches like "damn lies and statistics", 
    statistics stands for what they are.  There maybe misuse and abuse in
    the intepretations of statistics, but this type of mathematical nihilism
    is ridiculous.  To those who are so dismayed by statistics, do you have
    a better, more objective and more reliable method of surveying?  By the
    way, this type of statistics is exact.  Others, such as how many people 
    like Coke or Pepsi are backed up by mathematical theorems regarding 
    large numbers.
    
    So if you have more questions on statistics such as chi-square and what
    not, let us know; otherwise, I am bowing out of this debate.  
    
    Eugene
1447.12ATSE::GOODWINTue May 28 1991 13:3225
    
    Re. .-1
    
    I was about to say something similar myself, and will make this my last
    reply on this topic unless anyone has anything mathematical to
    contribute to the discussion.
    
    Maybe .0 was not clear, but all I was asking is whether or not the
    generally accepted conclusions regarding DWI can really follow from the
    available data.
    
    I am asking not only about statistical theory but also about its
    application in this real situation.
    
    I am also asking people to try to put aside what may be strongly
    ingrained preconceptions on the subject long enough to consider the
    question objectively and logically.
    
    I reckoned if any conference would have people who could do that, this
    would be the one.
    
    Thanks to those who did try to address the question.
    
    Dick
    
1447.13statistics of correlationCSSE::NEILSENWally Neilsen-SteinhardtTue May 28 1991 14:1258
First off, I'll agree with Eugene that diatribes on DUI policy have no place
in this conference.

But I notice that the question in .0 has not received a solid answer yet.

.0>don't we need to know something
>    about the population as a whole before we can say there's any
>    correlation between drinking and driving accidents?

Yes, we need to know the percentage of drivers classified as DUI, by the same
criteria used in the accident classifications.  This is the same reasoning
as in .1, which is basically correct, although a statistician would throw in
a test of significance.

I think such data exists, although I have seen very little.  One relevant
datum has emerged from those stop-everybody roadblocks that have occasionally
been set up.  As I remember the numbers from a few, less than 1% of drivers 
meet the DUI criteria.

.2>    The article was congratulating everyone on a job well done because the
>    number of accidents attributed to DWI had dropped from ~50% to ~33%,
>    which they said proved that DWI was indeed causing the problem, and 

Another terrible use of statistics.  The relevant number is the count of DUI
related accidents and fatalities in the two years.  Again with a test of 
significance.

.2>   unfortunately during the same period, the actual death rate on the road
>    had increased by 6%.

Not any more relevant.  Is 6% a statistically significant change, or within the 
usual year-to-year variation?

And as .3 says, you also need more detailed information.  Of course, if DUI is
a major cause of accidents, then it would stand out even in a study where 
other causes were not controlled.

A final statistical point: correlation is not the same as causation, and the
former is much easier to test.

A hypothesis to illustrate the distinction in this case: It has been suggested
that people with little interest in long life drink a lot, and also drive
recklessly.  This results in a lot of fatal accidents.  But the drinking, under
this hypothesis, did not cause the accident.  Both drinking and reckless driving
are effects of the same cause.  I don't accept this hypothesis, but offer it
here to illustrate the point: correlation is not causation.

.8>    You just can't make a case that more people drive on straight roads
>    than on curved ones.

Sure you can.  It depends on how straight and curved are defined.  I could
classify every 100 yard stretch of road as curved or straight, and then notice
that 90% of them fell into the straight bucket.  Then I could go out with
electric traffic monitors and determine that most of the traffic occurred
on roads which were mostly straight.  Since the gummint spends a lot of money
making straight the roads with high traffic density, I would expect that in fact
more people drive on straight roads than curved ones.

1447.14Various comments.CADSYS::COOPERTopher CooperTue May 28 1991 17:3070
RE: .13 (Wally)

    Basically, Wally, I agree with you.

    A distinction should be made, however, between a formal statistical
    analysis and a presentation.

    If decisions are being made without proper figures being gotten on the
    percentage of drivers who are intoxicated then there is a serious
    problem in the decision making process.  However, since people are
    likely to get lost in a lot of figures, it is quite reasonable to
    present simple summaries -- as long as reasonable care is taken that
    the simplifications do not distort.  When a figure is given as to the
    number of accidents or fatalities which involve a DWI, what is being
    relied on is peoples subjective prior-probablility estimates for the
    other relevant value.  This is justified as long as there is reason to
    believe that such estimates are reasonably accurate.  If, contrary to
    intuition, the figures support a 20% overall DWI (which would lead us
    to expect a 20% to 36% DWI involvement in accidents rate, depending on
    the percentage of 1-car, 2-car, etc. accidents) rather than the
    intuitive 1% or less value *then* it would be imperitive to report that
    fact.

    The one problem with this is that there is one class of people who are
    quite likely to grossly overestimate the proportion of drinkers on the
    road -- and they are people who most need to know.  I'm willing to bet
    that the problem drinker who insists on driving while intoxicated will
    rationalize his/her behavior by claiming that "practically everyone
    does it".  This is part and parcel of the pattern of denial that these
    people engage in.  Of course, most of them, if presented with the
    actual figures would find a justification to reject them.

    A few other points --

    I have seen detailed figures of the DWI rates, broken down by
    time-of-day, urban vs rural, etc., as part of a technical analysis of
    this issue (probably in Science, though I wouldn't swear to that). 
    Unfortunately, I have no memory of what the figures were.  I would,
    however, *expect* to remember them if they had been at all surprising.
    Since my intuition feels comfortable with a 1% rate, you can take that
    is rather weak support of Wally's figure.  More important is the
    unsurprising fact that the figures *have* been worked through.

    I am quite sure that I have never seen such high figures for the DWI
    rate as were previously cited.  I have seen two high figures which
    might have gotten misreported as this.  One is the DWI rate in urban
    locations "in the wee hours".  This high rate is a result of the bars
    closing and the hard-core drinkers being therefore the principle
    population of drivers.  The other figure I've seen is the number of
    people who report having DWI *sometime in their life*.  Most of them,
    however, only did this once or twice when they were kids and had not
    yet learned about responsible drinking.  Again, without remembering the
    specific figures, I do remember seeing figures (detailed figures)
    demonstrating that most of the DWI arrests and DWI-related accidents
    were of a small percentage of *habitual* DWIs, not the one-time or even
    occasional DWIs (this fact is used to justify laws in some states which
    are relatively leniant for first-offenses -- e.g., a few months without
    a drivers licence, fines, and/or treatment -- and reserve expensive jail
    sentences for repeat offenders).

    Finally, some popular demonstrations of the effects of alcohol simply
    test reaction time.  There have probably been real experiments
    attempting to give a detailed quantification of why alcohol decreases
    driving ability.  But there have been numerous experiments which have
    directly measured driving performance either in simulators or on
    special courses -- and the deteriorization of performance is dramatic
    and unmistakable.  Those same tests show that the subjects almost
    universally overestimate how well they did relative to when sober.

					Topher
1447.15Along the same lines, only differentVMSDEV::HALLYBThe Smart Money was on GoliathWed May 29 1991 12:4123
    Could I take this opportunity to ask about a similar, somewhat less
    controversial topic:  the so-called "Canadian rat" studies?
    
    In these studies, an edible substance XYZ is hypothesized to cause
    cancer in humans.  Researchers proceed to feed laboratory rats/mice
    with enormous quantities of XYZ, far more than a human would take,
    and then observe some fraction of the mice develop cancer.  Hopefully
    there is also a control group that doesn't develop (as much) cancer.
    
    About this time, red alarm bells go off in the FDA cancer control center
    :-) and the product is banned from USA store shelves.  Then we are
    treated to two camps of statisticians arguing on the morning news and
    evening McNeil/Lehrer report; one camp decrying the methodology and
    another claiming it is proper.  
    
    Who's right?  Is it valid to deduce there is a hazard to a fraction of
    the general population in general because a fraction of a test group is
    affected by enormous quantities of XYZ?  
    
    Is the argument being advanced that if one has (samples) the tail of a
    normal curve, then one can infer the parameters of the distribution?
    
      John
1447.16ATSE::GOODWINWed May 29 1991 13:0929
    
    Didn't that happen with cyclamates a while back -- FDA banned 'em based
    on some tests, then later on decided the tests were not valid and
    lifted the ban?
    
    I don't remember the details of the tests.
    
    There might also be some causality vs correlation problems there too,
    depending on how the tests are conducted.  I've seen 'em inject little
    white mice at the National Institutes of Health in Bethesda, Md. for
    research purposes.  They were injecting the maximum that the mouse's
    body would hold, blowing it right up like a little balloon.
    
    If they did not inject a control group with anything, then maybe some
    increase in cancer could be due to being periodically inflated with
    fluid.
    
    Emotionally, I want to avoid anything that causes cancer in mice, no
    matter how the experiment was conducted.
    
    Logically, I can not unquestioningly accept conclusions of tests that
    are done under abnormal conditions for the purpose of speeding them up
    or magnifying their effects.
    
    Art Buchwald did a column once on the FDA's testing procedures.  He
    talked about taking a wool suit, dissoving it in boiling sulphuric
    acid, then injecting a quart or two into a mouse.  Since the mouse
    died, he concluded wool suits should be banned immediately.
    
1447.17Statistics don't lie, People do.DECWET::BISHOPF. Avery Bishop, back in the US for nowWed May 29 1991 13:2512
	If there are problems in statistics it is in the way people
	interpret them.  The question raised in the base note is a valid
	one.  It's too bad that the issue that precipitated the question
	is so emotion charged that people couldn't address the 
	issue itself without getting involveed in the social debate.

	And in answer to Eugene, I think there is nothing wrong with 
	talking about _interpretation_ of statistics in a math conference,
	e.g., pointing out the rules of inference, discussing error margins,
	controls groups, etc.  That is not soapboxing.

	Avery
1447.18of mice and menCSSE::NEILSENWally Neilsen-SteinhardtWed May 29 1991 13:3034
.15>    Could I take this opportunity to ask about a similar, somewhat less
>    controversial topic:  the so-called "Canadian rat" studies?

This is usually a more controversial topic; with any luck we can confine the 
discussion here to the purely mathematical issues.

>    Who's right?  Is it valid to deduce there is a hazard to a fraction of
>    the general population in general because a fraction of a test group is
>    affected by enormous quantities of XYZ?  
    
>    Is the argument being advanced that if one has (samples) the tail of a
>    normal curve, then one can infer the parameters of the distribution?

No, the argument here is the linear dose-response theory: we can model a
response as a linear function of the dose.  This says that if 1 g of XYZ per
kilogram of mouse causes 1 cancer per mouse, then 1 microgram of XYZ per
kilogram of human will cause 1 cancer per million people.

There is also an assumption that mice and humans react similarly, but that has
been fairly well established on the average.  And there is a nit about body 
weight which is easily taken care of.

The linear dose-response theory has some evidence in its favor and a lot of 
evidence against it.  At the high dosage end, there is some evidence for a 
shock model: inject a mouse with enough of anything and you will produce cancer 
in the survivors.  At the low dosage end, there is a lot of evidence for a 
repair mechanism: XYZ damages chromosomes, but the standard repair mechanisms
can keep ahead of the damage.  Detailed discussion of this probably belongs
in BIOLOGY.

Most analyses of these problems conclude that the linear done-response theory
builds in a margin of error, by overestimating the damage at low doses, but
this margin of error is appropriate in setting public health policy.  Detailed
discussion of this issue probably belongs in SOAPBOX.
1447.19on presentationCSSE::NEILSENWally Neilsen-SteinhardtWed May 29 1991 13:4724
.14>    A distinction should be made, however, between a formal statistical
>    analysis and a presentation.

True.  The question in .0 was whether the statistical inference in the 
presentation was correct, and the answer to that is no.

Another question is whether the thinking behind DUI campaigns is based on
better statistical thinking.  The articles mentioned in .14 suggest that it is,
but leave some room for doubt.

A third question is whether presentations should be statistically correct.  This
is a question that comes up frequently in my work.  On the one hand, I don't
want to bore the audience with a lot of numbers and greek letters.  On the
other hand, I don't want the audience jumping on a lot of sloppy statements the
way that .0 and several follow-up replies do.  This gives the audience an easy
way of avoiding the conclusion I want to bring them to.

My working rules, which I recommend to those running these campaigns:

	Be sure that the underlying statistics and inferences are solid.

	Keep the presentation simple, but correct.

	Be ready with details to meet any questions or objections.
1447.20The mice that roared.CADSYS::COOPERTopher CooperWed May 29 1991 14:5377
RE: .18 (Wally, re: .15)

    Substantially correct, as usual, Wally.  A few minor clarifications.

    It should be emphasized that the linear dose-response model applies
    only (or at least in general only) to cancer studies.  No one claims a
    linear dose-response for toxins in general.  There are related issues
    in the way that substances are tested for general toxicity but the
    methods and arguments are quite different.

    The linear dose-response model has been shown over and over again to be
    accurate for most carcinogens over a wide range of dosages.  In those
    relatively few cases where actual population exposure and predicted
    response has allowed epidemiological studies to check extrapolation
    into the very small dose range, it has remained accurate.

    As far as I know, there are no longer any legitimate argument about the
    "linearity" of the genetic repair mechanism.  The issue has been very
    thoroughly studied both theoretically and experimentally.  The cell has
    a pretty much constant probability of finding and correcting each
    mutation.  This results in a linear overall effect.  There is no
    "overload" of the mechanism (which would result in a non-linearity) at
    the still relatively small number of mutations *per cell* found in even
    the most extreme "rat-study".  No deviation from linearity has been
    found even in the much more extreme (per cell) Ames test of
    mutagenicity.

    The argument today is over non-linearities introduced at the
    extra-cellular level: both in the bodie's handling of carcinogens and
    in the bodies response to cancerous cells.

    There is no doubt that there are special cases where the linear
    dose-response model is incorrect.  For example, there was a substance
    which was labeled a carcinogen in the usual high-dosage rat study,
    which turned out not to be one.  For this substance there really was a
    threshhold, and it turned out to be not too much below the dosages used
    in the initial study.  It turned out that what *was* carcinogenetic was
    a metabolite which was only produced as a response to gross overloading
    of the rat's normal means of metabolizing that substance.

    Such cases are, however, quite rare.  The argument is that they *might*
    be more common in the low-dose region that is so hard to study but
    precisely of interest.  Given that possibility, it is argued, the
    burden of regulation -- both on manufactures and on those who would
    benefit from the use of these substances -- should not be imposed
    without firm proof of hazard (which amounts to only after multi-decade
    epidemiological  following introduction of the substance in question).

    The issue then is mostly political/moral.  Should the most likely model
    justify regulation for the public good unless industry can demonstrate
    that the model is incorrect or at least show why it is questionable in
    the specific case?  Or should regulators be required to show reasonable
    *direct* evidence for risk before restrictions are imposed?

    Some asides:

    This question is very topical.  The letter column of the current issue
    of Science is devoted to an argument closely related to this one.  It
    features a letter by the leading spokesman for the "less regulation"
    crowd, I-forget-his-first-name Ames: inventor of the Ames Assay
    mentioned previously.

    Although the clyclomate industry waged a PR compaign against the
    Canadian rat studies which resulted in the banning of clyclomates on
    the basis of the  irrelevance of the large doses involved, this was
    not the real issue.  The researcher's study was simply too small and
    used too *low* a dose.  Their positive results were apparent flukes.
    This was shown by larger studies using larger doses, which got negative
    results.  It was these later studies which resulted in the
    "reinstatement" of cyclomates.

    No scientist would take seriously a study which did not use controls
    which took into account such possible effects as edema.  Given the fuss
    made by industry whenever a substance is "banned", it is unlikely that
    a regulator would either.

				    Topher
1447.21JARETH::EDPAlways mount a scratch monkey.Mon Jun 03 1991 15:3526
    I've only just scanned the entries here, but I am surprised that nobody
    has pointed out one obvious flaw in the statistics:  There are often at
    least two drivers involved in an accident.  If about half the drivers
    on the road were under some influence of alcohol, then about 75% of
    two-car accidents would involve influenced drivers if such drivers were
    not significantly different in driving ability from other drivers.  So
    if only half the two-car accidents involve influenced drivers, that
    would seem to indicate influenced drivers are better.
    
    I think the path to resolving these issues is the good old scientific
    method:
    
    	1) Propose a theorem.
    	2) Describe what would be observed if the observation were true.
    	3) Describe possible experiments that could be conducted to look
    	   for data that might contradict the descriptions found in step 2.
    	4) Conduct experiments.
    	5) Evaluate results.
    
    Deductive mathematics and logic and application of physical laws
    (theorems) is used in step 2.  Statistics is used in step 5.  These
    steps provide a guide to dealing with statistics successfully
    (correctly).
    
    
    				-- edp
1447.22Was sort of discussed.CADSYS::COOPERTopher CooperMon Jun 03 1991 16:1825
RE: .21 (edp)

    Actually, I alluded to this in note .14:

>    If, contrary to intuition, the figures support a 20% overall DWI (which
>    would lead us to expect a 20% to 36% DWI involvement in accidents rate,
>    depending on the percentage of 1-car, 2-car, etc. accidents) ...

    20% here is what would be the expected involvement of a DWI in a 1-car
    accident if the overall DWI rate is 20% and DWI has no effect on the
    accident rate.  36% would be the expected involvement of at least one
    DWI in a 2-car accident under the same conditions.  Three car and
    more accidents do, of course, occur; but I figured that they were
    unusual enough that it was reasonable to assume that the average number
    of cars per accident was between 1 and 2.

    If the actual DWI rate is, however, about 1% as suggested, then by the
    same reasoning the expected involvement of a DWI in an accident is less
    than 2%.  Since the claimed observed rate figures we are talking about
    are substantially greater than that, there is little qualitative
    difference -- unless, of course, you feel that there is sufficient
    question as to whether the "average" number of cars per accident is
    a dozen or more.

				    Topher
1447.23pardon the prolixityEAGLE1::BESTR D Best, sys arch, I/OWed Jun 12 1991 02:0188
re .0

>    If the Department of Motor Vehicles were to say:
>    
>    	"Twenty percent of all driving fatalities involve a left-handed 
>    	 driver, so we should pass a law against left-handed drivers."

Why not right handed drivers since they would seem to cause 80% of
the driving fatalities ? (oops; almost forgot about those driving fatalities
involving no-handed or ambidextrous drivers :-)

>    wouldn't your reaction -- being statistics experts -- be:
>    
>    	"That statistic by itself is not sufficient to determine
>    	 whether or not left-handedness has any correlation to
>    	 likelihood of causing a driving fatality.  We need to know
>    	 the incidence of left-handedness in the general population."  ?

Right, and you will undoubtedly need to know a lot more than that.

Generally speaking, you can learn next to nothing (11%) from
point statistics except for the ulterior motives of the parties (visible
and hidden) providing the statistic.  It's very easy to generate large numbers
of reasonable questions about a point statistic that will cast doubt on its
meaningfulness.

The point statistic tells you nothing about the methodology for determining
the statistic.  Methodology is absolutely the most critical thing to know
since no statistic can be relied upon if the method for generating it is bad.
Luckily for the providers of most statistics, the methodology is not discussed
at all, or very insufficiently.

Admittedly, abusers of statistics have other reasons than fear of exposure
to want to avoid methodology descriptions.  Work is involved.  A good m.d.
should run somewhere from short story to novella length, and should include
the bulk statistics (if appropriate) from which point statistics were derived,
so that 3rd parties can attempt rederivation or other interpretations.

Even the wording of most (83.5%) of statistics is rife with ambiguity and
imprecision.

A perverse example: in the first statistic provided above, the claim is that
20% of fatalities involve a left handed driver.  Is the 'involved driver'
actually in the driver's seat in these accidents ? If we were to learn that
the i.d. were always in the back seat (i.e. non-driving passenger, we hope),
should we then conclude that back seats or the presence of driving-capable
passengers should be forbidden ?

Yet another example: What exactly is a driving fatality ? Does it include
any people killed in cars or buses regardless of who is driving ?  Or does
it include only the drivers themselves ? The question is relevant because
it may bear on the question of whether passengers are exposed to involuntary
risk.  If only the driver is at risk, then perhaps we need no law (unless
we are concerned about the waste of automobiles).

We will need to know more (i.e. at least the mortality rate of the passengers,
directly or inferred from other given information) in order to determine
whether there is an involuntary risk.  And as a policy question unrelated
to statistics we will have to discuss whether it's OK to let people take
risks voluntarily if they don't expose others, etc.

As you've probably noticed, statistics are frequently presented to justify a
policy.  Always figure out first what the likely agenda of the presenter is.
If, by luck or by crook, a methodological description is available, probe
deeply to see if the methodology was selected to make a favored result likely
(the typical case).  A generally safe assumption is that statistics produced
by parties with vested interests can be shredded without too much work.

Good scientific methodology requires that extraneous influences be carefully
sifted out, not conveniently ignored or deliberately introduced. 
Getting those influences out is very hard work (nearly impossible).  Many
times, the best an honest researcher can do is admit to the myriad
uncontrolled potential influences.

It gets even worse, because data can simply be made up or tweaked.

Scariest of all, as more relevant information is introduced and more
influences factored out, what seemed a fairly solid conclusion can be
reversed, and sometimes reversed again later, etc.  All of this despite
good methodologies and good data in the original sample.  You see a lot of
this in medicine, where the latest clinical studies drive therapeutic fads,
by layering complexity and provisos on top of previous study results, or
simply failing to reproduce earlier results.  Clinical studies are
generally not as good as control studies, but over time and with increasing
samples and careful cross correlation, solid results can emerge.

Enough for now.  This topic is one of great interest to me; I apologise
for turning it into a soapbox.