Talk:Regression dilution
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Example?
[edit]The subject of this article can be confusing when described only in general terms. It could greatly benefit from an example. Chris3145 (talk) 21:55, 11 April 2010 (UTC)
On the following from the article...
- Why does variability in the x variable cause bias, but not variability in the y variable? Because linear regression methods (the fitting of a straight line relating y to x) are based on statistical models that explicitly include the variability in the y variable:
Does anyone have a proof of this, or at least a reference on this specific point? This doesn't seem to make sense in general terms. The error term here can be interpreted to model the fact that there is an imperfect relationship between two sets of measures, even if the measures contain no error (or at least trivial errors, for example height and weight). I don't see that it can be interpreted to model measurement error only in y, but not x as seems to be suggested. For regression of y on x, the slope is given by . The correlation r is attenuated by measurement error in both 'variables', x and y (see disattenuation). I'd have thought this implies that the regression slope is affected by measurement errors in both variables. Take the case where there is a perfect relationship between the variables and the variables are expressed in standardized form. Here, the regression slope should be 1, but it will be . Now, it doesn't matter which variable is regressed onto which, there will be precisely the same effect due to measurement error and its corresponding effect on . So in the special case of a perfect relationship obscured by measurement error in both variables, the claim that only noise in y but not x is modeled doesn't seem to make sense. Stephenhumphry 11:00, 13 August 2005 (UTC)
- Yes, that equation can be re-written as a normal model for x given y. However the usual linear regression, y on x, uses a likelihood made up of conditional probabilities for y given x; and so reaches a different estimated gradient () from the regression of x on y, which conditions x on y to reach an estimated gradient .
- In putting the debated sentence on the page, I was trying to allude to the asymmetry of linear regression without getting too technical. I concede I didn't do a good job but haven't (yet) got a better idea. --Pstevens 16:02, 25 May 2006 (UTC)
- It is a tricky one and it seems fair enough to allude to the assymetry of regression. I did a little digging and couldn't find anything that elaborates on this further. I think this is actually quite a difficult issue, and when I have time I want to do more investigation. If I come up with anything, I'll add to it but I understand the difficulty. Cheers Holon 03:31, 31 May 2006 (UTC)
- For now, I've changed the text to, as you say "allude" to the assymmetry of regression - without actually giving an explanation. This is a placeholder until we can come up with better text.--Pstevens 11:07, 10 November 2006 (UTC)