User:Synergy42/sandbox
The two-sample t-test is a special case of simple linear regression
[edit]The two-sample t-test is a special case of simple linear regression as illustrated by the following example.
A clinical trial examines 6 patients given drug or placebo. 3 patients get 0 units of drug (the placebo group). 3 patients get 1 unit of drug (the active treatment group). At the end of treatment, the researchers measure the change from baseline in the number of words that each patient can recall in a memory test.
Data and code are given for the analysis using the R programming language with the t.test
and lm
functions for the t-test and linear regression. Here are the (fictitious) data generated in R.
> word.recall.data=data.frame(drug.dose=c(0,0,0,1,1,1), word.recall=c(1,2,3,5,6,7))
Patient | drug.dose | word.recall |
---|---|---|
1 | 0 | 1 |
2 | 0 | 2 |
3 | 0 | 3 |
4 | 1 | 5 |
5 | 1 | 6 |
6 | 1 | 7 |
Perform the t-test. Notice that the assumption of equal variance, var.equal=T, is required to make the analysis exactly equivalent to simple linear regression.
> with(word.recall.data, t.test(word.recall~drug.dose, var.equal=T))
Running the R code gives the following results.
- The mean word.recall in the 0 drug.dose group is 2.
- The mean word.recall in the 1 drug.dose group is 6.
- The difference between treatment groups in the mean word.recall is 6 – 2 = 4.
- The difference in word.recall between drug doses is significant (p=0.00805).
Perform a linear regression of the same data. Calculations may be performed using the R function lm()
for a linear model.
> word.recall.data.lm = lm(word.recall~drug.dose, data=word.recall.data)
> summary(word.recall.data.lm)
The linear regression provides a table of coefficients and p-values.
Coefficient | Estimate | Std. Error | t value | P-value |
---|---|---|---|---|
Intercept | 2 | 0.5774 | 3.464 | 0.02572 |
drug.dose | 4 | 0.8165 | 4.899 | 0.000805 |
The table of coefficients gives the following results.
- The estimate value of 2 for the intercept is the mean value of the word recall when the drug dose is 0.
- The estimate value of 4 for the drug dose indicates that for a 1-unit change in drug dose (from 0 to 1) there is a 4-unit change in mean word recall (from 2 to 6). This is the slope of the line joining the two group means.
- The p-value that the slope of 4 is different from 0 is p = 0.00805.
The coefficients for the linear regression specify the slope and intercept of the line that joins the two group means, as illustrated in the graph. The intercept is 2 and the slope is 4.
Compare the result from the linear regression to the result from the t-test.
- From the t-test, the difference between the group means is 6-2=4.
- From the regression, the slope is also 4 indicating that a 1-unit change in drug dose (from 0 to 1) gives a 4-unit change in mean word recall (from 2 to 6).
- The t-test p-value for the difference in means, and the regression p-value for the slope, are both 0.00805. The methods give identical results.
This example shows that, for the special case of a simple linear regression where there is a single x-variable that has values 0 and 1, the t-test gives the same results as the linear regression. The relationship can also be shown algebraically.
Recognizing this relationship between the t-test and linear regression facilitates the use of multiple linear regression and multi-way analysis of variance . These alternatives to t-tests allow for the inclusion of additional explanatory variables that are associated with the response. Including such additional explanatory variables using regression or anova reduces the otherwise unexplained variance, and commonly yields greater power to detect differences than do two-sample t-tests.