Applied Data Analysis Homework # 6

Part One: Brief Response

The main, and most important part of Chapter 13 was how to talk and think about factorial ANOVAs; the terminology, notation, calculations, and underlying logic. There were several other major points, but I thought the most important might be how to deal with unequal sample sizes—the problems created and ways to resolve them.

I found section 13.8 (new edition) to be the most confusing. I get most of the individual concepts, like sampling fractions and fixed variables, and I’m a bit confused about how random sampling can generate levels, but I didn’t put it all together: I don’t understand expected mean squares.

Part Two: Problem Set

1a. Reported level of distraction had a significant effect on number of errors made (F(1, 131) = 65.696, p < .001). However, after controlling for level of distraction, there was still a significant effect of task type on number of errors (F(2, 131) = 139.638, p < .001). That is, although level of distraction had a significant effect on the number of errors made, the type of task significantly predicted number of errors above and beyond level of distraction.

1b. The full model from last week, not including distractibility, had F(2, 132) = 113.474, p < .001 and R² = .632. When distractibility was accounted for, both the F-value and the R² increased. This is because the extra factor, distractibility, accounts for some of the variation that had previously been considered error, decreasing the SS of error from 16670.400 to 11102.535. Decreasing the error increases the power of the test.

1c. Pattern recognition had a raw mean of 9.644 and a slightly larger adjusted ean of 9.709. The cognitive task had a raw mean of 38.778 and a smaller adjusted mean of 37.236. The driving simulation had a raw mean of 6.356 and a larger adjusted mean of 7.833. The means have changed because we have taken out the effect contributed by distractibility. The raw means are just the mean number of errors per task. The adjusted means are calculated with the effect of the covariate removed, holding it at its mean, 112.54. Figure 4 shows a plot of these means.

Descriptive Statistics
Dependent Variable:Number of Errors
Task Type	Mean	Std. Deviation	N
Pattern Recognition	9.6444	4.51339	45
Cognitive Task	38.7778	18.05533	45
Driving Simulation	6.3556	5.70150	45
Total	18.2593	18.39288	135

Figure 1. Means are raw means.

Tests of Between-Subjects Effects
Dependent Variable:Number of Errors
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	34229.391^a	3	11409.797	134.625	.000
Intercept	1213.155	1	1213.155	14.314	.000
distract	5567.865	1	5567.865	65.696	.000
tasktype	23669.320	2	11834.660	139.638	.000
Error	11102.535	131	84.752
Total	90341.000	135
Corrected Total	45331.926	134
a. R Squared = .755 (Adjusted R Squared = .749)

Figure 2. Showing a significant p-value for distractibility and for

the omnibus test for type of task.

Task Type
Dependent Variable:Number of Errors
Task Type	Mean	Std. Error	95% Confidence Interval
Task Type	Mean	Std. Error	Lower Bound	Upper Bound
Pattern Recognition	9.709^a	1.372	6.994	12.423
Cognitive Task	37.236^a	1.385	34.495	39.977
Driving Simulation	7.833^a	1.384	5.095	10.572
a. Covariates appearing in the model are evaluated at the following values: Distractability = 112.5407.

Figure 3. These are the adjusted means.

Your browser may not support display of this image.

Figure 4. Estimated marginal means for number of errors by

task type.

2a. Distractibility significantly predicted number of errors ( = .438, t (133) = 6.355, p < .001), accounting for just over 23% of the variance in number of errors, R² = .233. That is, the more distractible participants rated themselves, the more errors they tended to make, and about 23% of the errors made can be accounted for by this relationship.

The reported above (.483) represents the change in errors made, in terms of standard deviations, associated with a one standard deviation change in distractibility. The unstandardized coefficient for distractibility, b = .418, is the change in number of errors associated with a one raw unit change in distractibility. Both of these are statistically significantly different from zero, as indicated by p < .001 (See Figure 6), which means that the variation accounted for by distractibility is significant.

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.483^a	.233	.227	16.16918
a. Predictors: (Constant), Distractability

Figure5.

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	-28.750	7.526		-3.820	.000
1	Distractability	.418	.066	.483	6.355	.000
a. Dependent Variable: Number of Errors

Figure 6.

3. An augmented model including Distractibility (b = .309, t(131) = 8.105, p < .001), and the orthogonal contrasts show in Figure 7, the Pattern Recognition vs. Driving Simulation contrast (b = 9.488, t(131) = 16.695, p < .001), and the Cognitive Task vs. the average of the Pattern Recognition and Driving Simulation contrast (b = –.938, (131) = .962, p = .338), provided a reasonably good fit (R² = .755), significantly predicting variance in errors made (F(3, 131) = 134.625, p < .001). In other words, this model predicted over 75% of the variation in number of errors, which was statistically significant. Note that on the correlation table, Figure 9, the contrasts are correlated at 0, showing that they are indeed orthogonal.

3a. The unstandardized coefficient for Distractibility (.309) means that for every increase of one unit in Distractibility, participants tended to make .309 more errors. The unstandardized coefficient for the Pattern Recognition vs. Driving Simulation contrast (.938) equals half the number of errors that can be attributed to the shift between the pattern recognition task and the driving simulation task. That is, it is half of the difference between the mean number of errors in the pattern recognition task and the driving simulation task. The unstandardized coefficient for the Cognitive Task vs. the average of the Pattern Recognition and Driving Simulation contrast (9.488) is one third the number of errors that can be attributed to the shift between the cognitive task and the average of the pattern recognition and driving simulation tasks. The value under B for the constant in the Coefficients table, Figure 12, is not interpretable in the same way as that coefficient from homework 5; it can only be used to calculate the adjusted means.

3b. The number of errors in the Cognitive Task (adjusted mean = 37.24) was significantly higher than number of errors in the Pattern Recognition and Driving Simulation tasks, even after accounting for the effects of distractibility (b = 9.488, t(131) = 16.696, p < .001). Number of errors in the Pattern Recognition task (adjusted mean = 9.71) was not significantly different from number of errors in the Driving Simulation task (adjusted mean = 7.83) after controlling for the effect of distractibility (b = -.938, t(131) = .962, p = .338). That is, after controlling for distractibility, the number of errors in the Cognitive Task is significantly higher than the Pattern Recognition and Driving Simulation tasks, and the Pattern Recognition and Driving Simulation tasks are not significantly different from each other.

The p-value for pattern vs. driving, p < .338, means that the coefficient for that contrast is not significantly different from zero. The p-value for cognitive vs. pattern and driving, p < .001, means that the coefficient for that contrast, 9.488, is significantly different from zero, which tells us if the comparison coded into that contrast is significant or not.

3c. These results are similar to what the ANCOVA gave us except that they give us more insight into what’s going on between the tasks. The adjusted means for the tasks are the same, R² is the same, and the F values for the whole models are the same (134.625) the current t-value for distractibility (squared) equals the F from the ANCOVA; 8.105² = 65.696. The differences are that the ANCOVA is an omnibus test when it comes to the differences between the task types, so we didn’t tell exactly where the differences are until we do the regression. The degrees of freedom are also different.

	P R	C T	D S	Check
CC1	-1	2	-1	0
CC2	-1	0	1	0
Check	-1	0	1	0

Descriptive Statistics
	Mean	Std. Deviation	N
Number of Errors	18.2593	18.39288	135
Distractability	112.5407	21.25248	135
CT v PR DS	.0000	1.41948	135
PR v DS	.0000	.81954	135

Figure 7. Figure 8.

Correlations
		# Errors	Distractability	CT v PR DS	PR v DS
Pearson Correlation	Number of Errors	1.000	.483	.792	-.073
	Distractability	.483	1.000	.167	-.088
	CT v PR DS	.792	.167	1.000	.000
	PR v DS	-.073	-.088	.000	1.000
Sig. (1-tailed)	Number of Errors	.	.000	.000	.199
	Distractability	.000	.	.027	.154
	CT v PR DS	.000	.027	.	.500
	PR v DS	.199	.154	.500	.
N	Number of Errors	135	135	135	135
	Distractability	135	135	135	135
	CT v PR DS	135	135	135	135
	PR v DS	135	135	135	135

Figure 9.

Model Summary^c
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Change Statistics
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	R Square Change	F Change	df1	df2	Sig. F Change
1	.483^a	.233	.227	16.16918	.233	40.392	1	133	.000
2	.869^b	.755	.749	9.20609	.522	139.638	2	131	.000
a. Predictors: (Constant), Distractability
b. Predictors: (Constant), Distractability, PR v DS, CT v PR DS
c. Dependent Variable: Number of Errors

Figure 10.

ANOVA^c
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	10560.071	1	10560.071	40.392	.000^a
	Residual	34771.855	133	261.443
	Total	45331.926	134
2	Regression	34229.391	3	11409.797	134.625	.000^b
	Residual	11102.535	131	84.752
	Total	45331.926	134
a. Predictors: (Constant), Distractability
b. Predictors: (Constant), Distractability, PR v DS, CT v PR DS
c. Dependent Variable: Number of Errors

Figure 11.

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.	Collinearity Statistics
Model		B	Std. Error	Beta	t	Sig.	Tolerance	VIF
1	(Constant)	-28.750	7.526		-3.820	.000
1	Distractability	.418	.066	.483	6.355	.000	1.000	1.000
2	(Constant)	-16.499	4.361		-3.783	.000
	Distractability	.309	.038	.357	8.105	.000	.964	1.037
	CT v PR DS	9.488	.568	.732	16.696	.000	.972	1.029
	PR v DS	-.938	.974	-.042	-.962	.338	.992	1.008
a. Dependent Variable: Number of Errors

Figure 12.

4a. Testing the reduced model (just distractibility, from question 2, or see model 1 in Figure 12) against the full model (distractibility plus the Cognitive Task vs. the average of the Pattern Recognition and the Driving Simulation contrast, and the Pattern Recognition vs. Driving Simulation contrast (model 2 in Figure 12)), we can conclude that the two contrasts explain a significant amount of variance that is not accounted for by distractibility, ΔR² = .522, ΔF(2, 131) = 139.638, p < .001. That is, adding the two contrasts to our model explains a significantly larger amount of the variation in number of errors made, above and beyond that accounted for by distractibility.

5a. The Homogeneity of Regression assumption is not met in this data set. The model summary table above shows a significant change in F (p < .001) when the covariate interactions are added to the model. The groups who did each task differed significantly on their levels of distractibility, so distractibility cannot be legitimately covaried out; it would be unwise to use ANCOVA here.

5b. The tolerance and VIF statistics for model 2 (shown in Figure 16) are all problematic, except for the statistics for distractibility. Those for model 1 look fine. The correlation table, Figure 13, shows only two problematic correlations; each contrast is highly correlated with its respective covariate interaction. This is to be expected, since the contrasts are factors of the interactions. Looking at the data set, two of the distance values are over three, but all of the Cook’s D and leverage values look fine. Overall I would say that collinearity is not much of a problem here, except that groups are not supposed to differ on the covariate, and some of the adjusted means were higher, some lower than their raw means, suggesting that the groups were different on the covariate.

5c. The results of 5a would probably have kept me from running an ANCOVA in the first place and, having done so, I would not be confident in the results. I would also be somewhat cautious interpreting the results based on 5b, at least until I had talked to someone who knew more than I do about collinearity. (See pages 11-13 for Figures 13-17.)

Correlations
		Number of Errors	Distractability	CT v PR DS	PR v DS	cov_int1cc1	cov_int2cc2
Pearson Correlation	Number of Errors	1.000	.483	.792	-.073	.838	-.059
	Distractability	.483	1.000	.167	-.088	.208	-.080
	CT v PR DS	.792	.167	1.000	.000	.981	.012
	PR v DS	-.073	-.088	.000	1.000	.011	.986
	cov_int1cc1	.838	.208	.981	.011	1.000	.022
	cov_int2cc2	-.059	-.080	.012	.986	.022	1.000
Sig. (1-tailed)	Number of Errors	.	.000	.000	.199	.000	.249
	Distractability	.000	.	.027	.154	.008	.178
	CT v PR DS	.000	.027	.	.500	.000	.446
	PR v DS	.199	.154	.500	.	.448	.000
	cov_int1cc1	.000	.008	.000	.448	.	.400
	cov_int2cc2	.249	.178	.446	.000	.400	.
N	Number of Errors	135	135	135	135	135	135
	Distractability	135	135	135	135	135	135
	CT v PR DS	135	135	135	135	135	135
	PR v DS	135	135	135	135	135	135
	cov_int1cc1	135	135	135	135	135	135
	cov_int2cc2	135	135	135	135	135	135

Figure 13.

Model Summary^c
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Change Statistics
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	R Square Change	F Change	df1	df2	Sig. F Change
1	.869^a	.755	.749	9.20609	.755	134.625	3	131	.000
2	.903^b	.816	.809	8.03698	.061	21.442	2	129	.000
a. Predictors: (Constant), PR v DS, CT v PR DS, Distractability
b. Predictors: (Constant), PR v DS, CT v PR DS, Distractability, cov_int1cc1, cov_int2cc2
c. Dependent Variable: Number of Errors

Figure 14.

ANOVA^c
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	34229.391	3	11409.797	134.625	.000^a
	Residual	11102.535	131	84.752
	Total	45331.926	134
2	Regression	36999.426	5	7399.885	114.562	.000^b
	Residual	8332.500	129	64.593
	Total	45331.926	134
a. Predictors: (Constant), CC2, CC1, Distractability
b. Predictors: (Constant), CC2, CC1, Distractability, cov_int1, cov_int2
c. Dependent Variable: Number of Errors

Figure 15.

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.	Collinearity Statistics
Model		B	Std. Error	Beta	t	Sig.	Tolerance	VIF
1	(Constant)	-16.499	4.361		-3.783	.000
	Distractability	.309	.038	.357	8.105	.000	.964	1.037
	CT v PR DS	9.488	.568	.732	16.696	.000	.972	1.029
	PR v DS	-.938	.974	-.042	-.962	.338	.992	1.008
2	(Constant)	-11.137	3.896		-2.858	.005
	Distractability	.255	.034	.295	7.443	.000	.909	1.100
	CT v PR DS	-7.150	2.591	-.552	-2.760	.007	.036	28.064
	PR v DS	-4.091	5.040	-.182	-.812	.419	.028	35.400
	cov_int1cc1	.146	.022	1.318	6.538	.000	.035	28.511
	cov_int2cc2	.025	.045	.122	.543	.588	.028	35.334
a. Dependent Variable: Number of Errors

Figure 16.

Descriptive Statistics
	Mean	Std. Deviation	N
Number of Errors	18.2593	18.39288	135
Distractability	112.5407	21.25248	135
CC1	.0000	1.41948	135
CC2	.0000	.81954	135
cov_int1	4.9926	166.42866	135
cov_int2	-1.5259	91.49821	135

Figure 17.

2 Responses to “Applied Data Analysis Homework # 6”

How to Get Six Pack Fast Says:

April 15, 2009 at 2:59 pm
Hey, cool tips. I’ll buy a bottle of beer to the person from that chat who told me to visit your site :)

Reply
obnocto Says:

April 15, 2009 at 3:17 pm
Thanks! I never imagined anyone was looking at this. Maybe I’ll post the rest, if it’s helpful.

Nathen

Reply

Nathen's Miraculous Escape