[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1621.0. "Multiple Linear Regression" by VIZUAL::FINNERTY (The bug stops here) Wed Jun 03 1992 13:34

    
    Does anyone have a good implementation of a multiple-linear regression
    algorithm?  (question also posted in the ALGORITHMS conference).
    
       /Jim
    
T.RTitleUserPersonal
Name
DateLines
1621.1note 1255VIZUAL::FINNERTYThe bug stops hereWed Jun 03 1992 16:253
    
    note 1255 has a pointer to obtain a regression algorithm coded in
    FORTRAN...  the source is a bit intimidating, though. :)
1621.2Numerical RecipesFASDER::MTURNERMark Turner * DTN 425-3702 * MEL4Thu Jun 18 1992 15:004
    The "Numerical Recipes" book also has routines in Chap. 14.
    
    
    							Mark
1621.3serial correlationSARAH::FINNERTYThe bug stops hereFri Jun 19 1992 10:1326
    
    on the topic of regression...
    
    I have some time series data with a single independent variable. 
    Unfortunately, both the dependent variable and the independent variable
    show significant serial correlation.
    
    When the data is corrected for serial correlation, the fit of the
    equation is _much_ worse.  What does this suggest?
    
    	-  That the data set size, N = 23, is too small to conclude
    	   much of anything
    
        -  That the serial correlation is _negative_, that is, errors
    	   in time period T are negatively correlated with errors in
    	   time period T+1  (seems improbable)
    
    	-  ? statistical anomaly ?
    
        -  ?
    
    and if apparent goodness of fit is the goal, should I ignore the
    fact that the data is serially correlated?
    
    /Jim
    
1621.4focus on the serial correlationMOCA::BELDIN_RAll's well that endsFri Jun 19 1992 13:3615
    
    Assume a model like  y(t) = a + b x(t).
    
    Then serial correlation implies that y(t+1) is related to y(t) and
    x(t+1) is related to x(t), each with their own linear relationship.
    
    It could be that the best explanation of the data is with a
    two variable vector model <x y>, both dependent on t, the single
    independent variable.
    
    In other words, maybe the serial correlation is all there is?
    
    /rab
    
    
1621.5Independent variable?CADSYS::COOPERTopher CooperFri Jun 19 1992 15:5526
RE: .3 (Jim)

    I'm not sure what it means to say that an independent variable shows a
    serial correlation.  You choose the values of the independent variable.
    It sounds like you are really regressing two *dependent* variables
    against each other, with hidden independent variable(s) of either time
    and/or sequence.  In that case I agree with .2 -- the predictive
    ability of one of your dependent variables on your other is primarily
    (or fully) "explained" by their common dependence on time/sequence.

    As to what you should do about it depends on what you mean that
    "apparent goodness of fit is the goal."  If the apparent goodness of
    fit refers only to the data at hand -- if the regression is meant as a
    summary description of the existing data only -- then you can ignore
    it.  If you want the apparent goodness of fit to apply to data
    collected in the future, than whether to ignore the correlation or not
    depends on whether that future data can be collected in a way that will
    maintain the correlation as is.  You may be better off "predicting"
    both your variables from the temporal variable.

    By the way, you cannot apply standard linear regression to two sets of
    measurements, unless the measurement error is vanishingly small on
    your "X".  Another reason to regress two dependent variables against a
    reliable temporal variable.

					Topher
1621.6not forecastable from just 't'VIZUAL::FINNERTYThe bug stops hereFri Jun 19 1992 17:3718
    
    well, it's slightly more complex than suggested, since the independent
    variable has a time lag, i.e. the independent variable is measured at
    (t-13), whereas the dependent variable is measured at (t); furthermore,
    the relationship between t and either of the variables is by no means
    linear.
    
    consecutive values of X(t) or Y(t) are not very different from each
    other, giving rise to the serial correlation; however, X(t-13) seems to 
    be closely correlated with Y(t) {R� = .82, not accounting for serial
    correlation}.
    
    re: -.2  maybe serial correlation is all there is
    
    ...I'm still pondering over this...
    
    /Jim
    
1621.7complicationsMOCA::BELDIN_RAll&#039;s well that endsFri Jun 19 1992 17:5532
    Ok, I'll describe a hypothetical situation and you decide if it helps
    you.
    
    Each week, I purchase raw materials to the tune of x(t) dollars.  My
    product has approximately 13 weeks of lead time (no wonder I'm losing
    customers :-) ) and the total output, y(t), certainly should be related
    to how much I buy.  So I hypothesize that y(t+13) = a + b x(t).
    
    Well, I have several alternative models:  ("e" represents an error
    variable in each case)
    
    	1) y(t+13) = a + b f(t) + e
     	   x(t)    = c + d f(t)	+ e	eg, time is the controlling factor
    
    	2) x(t-13) = a + b y(t) + e	because I used MRP to tell me how
    					much to buy based on planned output
    
    	3) x(t)    = a + b x(t-1) + e
    	   y(t)    = c + d y(t-1) + e	where (a,b) is "close" to (c,d)
    
    The statistical analysis for each of these is different.  Traditional
    linear regression assumes a very much simpler model.
    
    	4) y = a + b x + e		and the x's have no error. 
    
    Any one of these (and some other variations) can be thought of as
    linear regression, but the standard linear regression analysis is only
    appropriate if the model is very like 4).
    
    Does that help or confuse the issue more?
    
    /rab
1621.8the problem...VIZUAL::FINNERTYThe bug stops hereFri Jun 19 1992 18:5930
	the problem being considered is prediction of the % movement in
	a stock market index; the independent variable is a measure of
	sentiment in the market:  (as Topher suggested in -.3, the goal
	is accurate prediction as opposed to accurate fitting of 
	historical data)

    	1) y(t+13) = a + b f(t) + e
     	   x(t)    = c + d f(t)	+ e	eg, time is the controlling factor

	   Very improbable that y(t) can be predicted from t alone.

    	2) x(t-13) = a + b y(t) + e	because I used MRP to tell me how
    					much to buy based on planned output

	   In this case we're predicting the past :)  {we already know
	   x(t-13), so this doesn't help us much}
    
    	3) x(t)    = a + b x(t-1) + e
    	   y(t)    = c + d y(t-1) + e	where (a,b) is "close" to (c,d)
    
	   This might be useful for projecting x out to future periods,
	   and therefore allow y to be predicted a bit farther into the
	   future...  (but I'd be happy to predict y at all, at this point)

    	4) y = a + b x + e		and the x's have no error. 
    
	   In this case, in fact, there is virtually no error in the
	   measurement of x, so this model still seems reasonable.

1621.9DDDDDD?CADSYS::COOPERTopher CooperFri Jun 19 1992 21:0017
    Is y the stock market index (in which case the serial correlation is
    characteristic of the delta index) or the delta (% movement) of the
    index (in which case the serial correlation is the delta of the delta
    of the index)?  In any case it sounds like you should be looking at
    the correlation of delta-x to either y or delta-y.  Unless you have a
    strong reason for believing that the relationship is linear, I would
    think seriously about throwing in a x^2 or delta-x^2 term as well.
    Better yet get a solid amount of data and eye-ball it to see what comes
    out.  With only a first order term, a clean U or inverted-U (i.e.,
    y increases(decreases) with increasing x for a while then decreases
    (increases)) comes out flat and unpredicted.  You might also try the
    loess procedure (sometimes called non-parametric regression).

    Rule of thumb in model fitting -- if you don't have a lot of theory you
    need a lot of data or a lot of luck.

				Topher
1621.10assessment of likelihood of successSGOUTL::BELDIN_RAll&#039;s well that endsMon Jun 22 1992 09:4527
    re .8
    
    Topher has just triggered in my head, one of the standard techniques
    for getting started.  Make tables of successive differences.  If you
    can find relations among any pair of difference sets, you've got a
    start on the kind of model.
    
    On the other hand, just from the cynic's point of view, consider this.  
    
    If it were possible to predict the DJ or any of its components, I would
    expect that somebody would be doing it already for a (big) profit. 
    There have always been many professional and amateur speculators
    interested in that topic and willing to part with enough money to make
    solid scientific development of any good idea economically feasible. 
    So, I believe your prospects for (economic) success are slight.  On the
    other hand, you are bound to (re-)learn some interesting facts.
    
    From a scientific point of view, the stock market summarizes millions 
    of transactions every day.  The number of transactions makes it very
    difficult to believe that detailed movements can be predicted. 
    Certainly there are many small effects due to a general trend which can
    be predicted, but day to day changes are like the wind direction and
    velocity in a storm. 
    
    fwiw,
    
    /rab
1621.11AUSSIE::GARSONTue Jun 23 1992 00:1413
re .10
    
>    If it were possible to predict the DJ or any of its components, I would
>    expect that somebody would be doing it already for a (big) profit. 
    
    Bear in mind also that your playing in the market affects that market -
    and the greedier you get the more the effect.
    
    Make sure that noone else has the benefit of your new found predictive
    techniques.
    
    Make sure that your techniques are not *too* good else you might find
    yourself on the wrong end of an insider trading charge. (-:
1621.12Does the problem have a solution?UNTADH::TOWERSTue Jun 23 1992 05:0017
    Didn't the (now deceased) economist, Hayek, the father of monetarism,
    have something to say about this? Something which people in general 
    and economists in particular have been wilfully ignoring for about 40 
    years? 

    What he said was that any successful model must incorporate at least
    as much richness and complexity as that which it is trying to model.
    Since the flow of money is one aspect of human behaviour, a successful
    economic model must have the same level of complexity as the human mind.

    Hayek's conclusion was that economics as an exact science was a logical
    impossibility for humans. All that is possible is a rough approximation.
    Certainly, it seems unlikely that a linear, von Neuman (ie. current
    computing) model would be sufficient to generate predictions that would
    yield significant profits on the stock markets.

    Brian
1621.13VMSDEV::HALLYBFish have no concept of fire.Tue Jun 23 1992 09:5210
>    Certainly, it seems unlikely that a linear, von Neuman (ie. current
>    computing) model would be sufficient to generate predictions that would
>    yield significant profits on the stock markets.
    
    Some floor traders such as Barry Haigh have made money year after year 
    doing the same thing over and over for their own account.  They have a
    model for how the market works on a very short-term basis and they
    profit from it.
    
      John
1621.14building a theoryVIZUAL::FINNERTYThe bug stops hereTue Jun 23 1992 12:2723
    
    re: .9
    
       "Y" is in fact delta-y in percent.
    
    re: if it was profitable, people would already be doing it.
    
       in fact, people _are_ doing this every day, whether or not it is
       more profitable than guessing.
    
    re: curve-fit model vs theoretically derived model
    
       I've often heard this criticism, and I must admit it does confuse me
       a little.  Putting together a model takes time and effort... surely
       you wouldn't want to measure any random thing such as asrological
       conditions or the time of cherry trees blooming in Washington.  So
       you construct a theory, gather some data, learn what the past has
       to tell you by doing some modelling and curve fitting, and then go
       back and reconsider your theory.  
    
       /Jim
    
       
1621.15how the stock market might also be modeled ?!STAR::ABBASIi^(-i) = SQRT(exp(PI))Tue Jun 23 1992 13:062
    may be one can model stock market as a closed loop feedback control system
    with diststurbances thrown in, and noise modeled as stochastic processes.