Part One:  Brief Response

The main, and most important part of Chapter 13 was how to talk and think about factorial ANOVAs; the terminology, notation, calculations, and underlying logic. There were several other major points, but I thought the most important might be how to deal with unequal sample sizes—the problems created and ways to resolve them.

I found section 13.8 (new edition) to be the most confusing. I get most of the individual concepts, like sampling fractions and fixed variables, and I’m a bit confused about how random sampling can generate levels, but I didn’t put it all together: I don’t understand expected mean squares.

Part Two: Problem Set

1a. Reported level of distraction had a significant effect on number of errors made (F(1, 131) = 65.696, p < .001). However, after controlling for level of distraction, there was still a significant effect of task type on number of errors (F(2, 131) = 139.638, p < .001). That is, although level of distraction had a significant effect on the number of errors made, the type of task significantly predicted number of errors above and beyond level of distraction.

1b. The full model from last week, not including distractibility, had F(2, 132) = 113.474, p < .001 and R2 = .632. When distractibility was accounted for, both the F-value and the R2 increased. This is because the extra factor, distractibility, accounts for some of the variation that had previously been considered error, decreasing the SS of error from 16670.400 to 11102.535. Decreasing the error increases the power of the test.

1c. Pattern recognition had a raw mean of 9.644 and a slightly larger adjusted ean of 9.709. The cognitive task had a raw mean of 38.778 and a smaller adjusted mean of 37.236. The driving simulation had a raw mean of 6.356 and a larger adjusted mean of 7.833. The means have changed because we have taken out the effect contributed by distractibility. The raw means are just the mean number of errors per task. The adjusted means are calculated with the effect of the covariate removed, holding it at its mean, 112.54. Figure 4 shows a plot of these means.

Descriptive Statistics
Dependent Variable:Number of Errors
Task Type Mean Std. Deviation N
Pattern Recognition 9.6444 4.51339 45
Cognitive Task 38.7778 18.05533 45
Driving Simulation 6.3556 5.70150 45
Total 18.2593 18.39288 135

Figure 1. Means are raw means.

Tests of Between-Subjects Effects
Dependent Variable:Number of Errors
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 34229.391a 3 11409.797 134.625 .000
Intercept 1213.155 1 1213.155 14.314 .000
distract 5567.865 1 5567.865 65.696 .000
tasktype 23669.320 2 11834.660 139.638 .000
Error 11102.535 131 84.752
Total 90341.000 135
Corrected Total 45331.926 134
a. R Squared = .755 (Adjusted R Squared = .749)

Figure 2. Showing a significant p-value for distractibility and for

the omnibus test for type of task.

Task Type
Dependent Variable:Number of Errors
Task Type Mean Std. Error 95% Confidence Interval
Lower Bound Upper Bound
Pattern Recognition 9.709a 1.372 6.994 12.423
Cognitive Task 37.236a 1.385 34.495 39.977
Driving Simulation 7.833a 1.384 5.095 10.572
a. Covariates appearing in the model are evaluated at the following values: Distractability = 112.5407.

Figure 3. These are the adjusted means.

Your browser may not support display of this image.

Figure 4. Estimated marginal means for number of errors by

task type.

2a. Distractibility significantly predicted number of errors (Your browser may not support display of this image. = .438, t (133) = 6.355, p < .001), accounting for just over 23% of the variance in number of errors, R2 = .233. That is, the more distractible participants rated themselves, the more errors they tended to make, and about 23% of the errors made can be accounted for by this relationship.

The Your browser may not support display of this image.reported above (.483) represents the change in errors made, in terms of standard deviations, associated with a one standard deviation change in distractibility. The unstandardized coefficient for distractibility, b = .418, is the change in number of errors associated with a one raw unit change in distractibility. Both of these are statistically significantly different from zero, as indicated by p < .001 (See Figure 6), which means that the variation accounted for by distractibility is significant.

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .483a .233 .227 16.16918
a. Predictors: (Constant), Distractability

Figure5.

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) -28.750 7.526 -3.820 .000
Distractability .418 .066 .483 6.355 .000
a. Dependent Variable: Number of Errors

Figure 6.

3. An augmented model including Distractibility (b = .309, t(131) = 8.105, p < .001), and the orthogonal contrasts show in Figure 7, the Pattern Recognition vs. Driving Simulation contrast (b = 9.488, t(131) = 16.695, p < .001), and the Cognitive Task vs. the average of the Pattern Recognition and Driving Simulation contrast (b = .938, (131) = .962, p = .338), provided a reasonably good fit (R2 = .755), significantly predicting variance in errors made (F(3, 131) = 134.625, p < .001). In other words, this model predicted over 75% of the variation in number of errors, which was statistically significant. Note that on the correlation table, Figure 9, the contrasts are correlated at 0, showing that they are indeed orthogonal.

3a. The unstandardized coefficient for Distractibility (.309) means that for every increase of one unit in Distractibility, participants tended to make .309 more errors. The unstandardized coefficient for the Pattern Recognition vs. Driving Simulation contrast (.938) equals half the number of errors that can be attributed to the shift between the pattern recognition task and the driving simulation task. That is, it is half of the difference between the mean number of errors in the pattern recognition task and the driving simulation task. The unstandardized coefficient for the Cognitive Task vs. the average of the Pattern Recognition and Driving Simulation contrast (9.488) is one third the number of errors that can be attributed to the shift between the cognitive task and the average of the pattern recognition and driving simulation tasks. The value under B for the constant in the Coefficients table, Figure 12, is not interpretable in the same way as that coefficient from homework 5; it can only be used to calculate the adjusted means.

3b. The number of errors in the Cognitive Task (adjusted mean = 37.24) was significantly higher than number of errors in the Pattern Recognition and Driving Simulation tasks, even after accounting for the effects of distractibility (b = 9.488, t(131) = 16.696, p < .001). Number of errors in the Pattern Recognition task (adjusted mean = 9.71) was not significantly different from number of errors in the Driving Simulation task (adjusted mean = 7.83) after controlling for the effect of distractibility (b = -.938, t(131) = .962, p = .338). That is, after controlling for distractibility, the number of errors in the Cognitive Task is significantly higher than the Pattern Recognition and Driving Simulation tasks, and the Pattern Recognition and Driving Simulation tasks are not significantly different from each other.

The p-value for pattern vs. driving, p < .338, means that the coefficient for that contrast is not significantly different from zero. The p-value for cognitive vs. pattern and driving, p < .001, means that the coefficient for that contrast, 9.488, is significantly different from zero, which tells us if the comparison coded into that contrast is significant or not.

3c. These results are similar to what the ANCOVA gave us except that they give us more insight into what’s going on between the tasks. The adjusted means for the tasks are the same, R2 is the same, and the F values for the whole models are the same (134.625)  the current t-value for distractibility (squared) equals the F from the ANCOVA; 8.1052 = 65.696. The differences are that the ANCOVA is an omnibus test when it comes to the differences between the task types, so we didn’t tell exactly where the differences are until we do the regression. The degrees of freedom are also different.

P R C T D S Check
CC1 -1 2 -1 0
CC2 -1 0 1 0
Check -1 0 1 0

Descriptive Statistics
Mean Std. Deviation N
Number of Errors 18.2593 18.39288 135
Distractability 112.5407 21.25248 135
CT v PR DS .0000 1.41948 135
PR v DS .0000 .81954 135

Figure 7.    Figure 8.

Correlations
# Errors Distractability CT v PR DS PR v DS
Pearson Correlation Number of Errors 1.000 .483 .792 -.073
Distractability .483 1.000 .167 -.088
CT v PR DS .792 .167 1.000 .000
PR v DS -.073 -.088 .000 1.000
Sig. (1-tailed) Number of Errors . .000 .000 .199
Distractability .000 . .027 .154
CT v PR DS .000 .027 . .500
PR v DS .199 .154 .500 .
N Number of Errors 135 135 135 135
Distractability 135 135 135 135
CT v PR DS 135 135 135 135
PR v DS 135 135 135 135

Figure 9.

Model Summaryc
Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics
R Square Change F Change df1 df2 Sig. F Change
1 .483a .233 .227 16.16918 .233 40.392 1 133 .000
2 .869b .755 .749 9.20609 .522 139.638 2 131 .000
a. Predictors: (Constant), Distractability
b. Predictors: (Constant), Distractability, PR v DS, CT v PR DS
c. Dependent Variable: Number of Errors

Figure 10.

ANOVAc
Model Sum of Squares df Mean Square F Sig.
1 Regression 10560.071 1 10560.071 40.392 .000a
Residual 34771.855 133 261.443
Total 45331.926 134
2 Regression 34229.391 3 11409.797 134.625 .000b
Residual 11102.535 131 84.752
Total 45331.926 134
a. Predictors: (Constant), Distractability
b. Predictors: (Constant), Distractability, PR v DS, CT v PR DS
c. Dependent Variable: Number of Errors

Figure 11.

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics
B Std. Error Beta Tolerance VIF
1 (Constant) -28.750 7.526 -3.820 .000
Distractability .418 .066 .483 6.355 .000 1.000 1.000
2 (Constant) -16.499 4.361 -3.783 .000
Distractability .309 .038 .357 8.105 .000 .964 1.037
CT v PR DS 9.488 .568 .732 16.696 .000 .972 1.029
PR v DS -.938 .974 -.042 -.962 .338 .992 1.008
a. Dependent Variable: Number of Errors

Figure 12.

4a. Testing the reduced model (just distractibility, from question 2, or see model 1 in Figure 12) against the full model (distractibility plus the Cognitive Task vs. the average of the Pattern Recognition and the Driving Simulation contrast, and the Pattern Recognition vs. Driving Simulation contrast (model 2 in Figure 12)), we can conclude that the two contrasts explain a significant amount of variance that is not accounted for by distractibility, ΔR2 = .522, ΔF(2, 131) = 139.638, p < .001. That is, adding the two contrasts to our model explains a significantly larger amount of the variation in number of errors made, above and beyond that accounted for by distractibility.

5a. The Homogeneity of Regression assumption is not met in this data set. The model summary table above shows a significant change in F (p < .001) when the covariate interactions are added to the model. The groups who did each task differed significantly on their levels of distractibility, so distractibility cannot be legitimately covaried out; it would be unwise to use ANCOVA here.

5b. The tolerance and VIF statistics for model 2 (shown in Figure 16) are all problematic, except for the statistics for distractibility. Those for model 1 look fine. The correlation table, Figure 13, shows only two problematic correlations; each contrast is highly correlated with its respective covariate interaction. This is to be expected, since the contrasts are factors of the interactions. Looking at the data set, two of the distance values are over three, but all of the Cook’s D and leverage values look fine. Overall I would say that collinearity is not much of a problem here, except that groups are not supposed to differ on the covariate, and some of the adjusted means were higher, some lower than their raw means, suggesting that the groups were different on the covariate.

5c. The results of 5a would probably have kept me from running an ANCOVA in the first place and, having done so, I would not be confident in the results. I would also be somewhat cautious interpreting the results based on 5b, at least until I had talked to someone who knew more than I do about collinearity. (See pages 11-13 for Figures 13-17.)

Correlations
Number of Errors Distractability CT v PR DS PR v DS cov_int1cc1 cov_int2cc2
Pearson Correlation Number of Errors 1.000 .483 .792 -.073 .838 -.059
Distractability .483 1.000 .167 -.088 .208 -.080
CT v PR DS .792 .167 1.000 .000 .981 .012
PR v DS -.073 -.088 .000 1.000 .011 .986
cov_int1cc1 .838 .208 .981 .011 1.000 .022
cov_int2cc2 -.059 -.080 .012 .986 .022 1.000
Sig. (1-tailed) Number of Errors . .000 .000 .199 .000 .249
Distractability .000 . .027 .154 .008 .178
CT v PR DS .000 .027 . .500 .000 .446
PR v DS .199 .154 .500 . .448 .000
cov_int1cc1 .000 .008 .000 .448 . .400
cov_int2cc2 .249 .178 .446 .000 .400 .
N Number of Errors 135 135 135 135 135 135
Distractability 135 135 135 135 135 135
CT v PR DS 135 135 135 135 135 135
PR v DS 135 135 135 135 135 135
cov_int1cc1 135 135 135 135 135 135
cov_int2cc2 135 135 135 135 135 135

Figure 13.

Model Summaryc
Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics
R Square Change F Change df1 df2 Sig. F Change
1 .869a .755 .749 9.20609 .755 134.625 3 131 .000
2 .903b .816 .809 8.03698 .061 21.442 2 129 .000
a. Predictors: (Constant), PR v DS, CT v PR DS, Distractability
b. Predictors: (Constant), PR v DS, CT v PR DS, Distractability, cov_int1cc1, cov_int2cc2
c. Dependent Variable: Number of Errors

Figure 14.

ANOVAc
Model Sum of Squares df Mean Square F Sig.
1 Regression 34229.391 3 11409.797 134.625 .000a
Residual 11102.535 131 84.752
Total 45331.926 134
2 Regression 36999.426 5 7399.885 114.562 .000b
Residual 8332.500 129 64.593
Total 45331.926 134
a. Predictors: (Constant), CC2, CC1, Distractability
b. Predictors: (Constant), CC2, CC1, Distractability, cov_int1, cov_int2
c. Dependent Variable: Number of Errors

Figure 15.

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics
B Std. Error Beta Tolerance VIF
1 (Constant) -16.499 4.361 -3.783 .000
Distractability .309 .038 .357 8.105 .000 .964 1.037
CT v PR DS 9.488 .568 .732 16.696 .000 .972 1.029
PR v DS -.938 .974 -.042 -.962 .338 .992 1.008
2 (Constant) -11.137 3.896 -2.858 .005
Distractability .255 .034 .295 7.443 .000 .909 1.100
CT v PR DS -7.150 2.591 -.552 -2.760 .007 .036 28.064
PR v DS -4.091 5.040 -.182 -.812 .419 .028 35.400
cov_int1cc1 .146 .022 1.318 6.538 .000 .035 28.511
cov_int2cc2 .025 .045 .122 .543 .588 .028 35.334
a. Dependent Variable: Number of Errors

Figure 16.

Descriptive Statistics
Mean Std. Deviation N
Number of Errors 18.2593 18.39288 135
Distractability 112.5407 21.25248 135
CC1 .0000 1.41948 135
CC2 .0000 .81954 135
cov_int1 4.9926 166.42866 135
cov_int2 -1.5259 91.49821 135

Figure 17.

Advertisements

2 Responses to “Applied Data Analysis Homework # 6”


  1. Hey, cool tips. I’ll buy a bottle of beer to the person from that chat who told me to visit your site :)

  2. obnocto Says:

    Thanks! I never imagined anyone was looking at this. Maybe I’ll post the rest, if it’s helpful.

    Nathen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s