Many times in an analysis, multiple variables in the data will be measuring the same quantity. For example, in the `mri`

data available at Scott Emerson’s website and documented on the same page, both the `yrsquit`

and `packyrs`

variables measure the amount of smoking that a person does.

To fully analyze these variables, we need to run multiple-partial F-tests. Prior to the `uwIntroStats`

package, the process to perform these tests involved more code than was necessary. First the user had to create a linear model (or perhaps multiple linear models), and then run an ANOVA test.

Now, using the `U()`

function, the user can specify multiple-partial F-tests within a call to `regress()`

, the regression function supplied by `uwIntroStats`

. A full explanation of that function can be found in “Regression in uwIntroStats”.

This document provides an introduction to using the `U()`

function as a supplement to regression analyses. In each case, we will use linear regression to avoid confusion, and leave all of the arguments to `regress()`

up to its own vignette.

`U()`

functionTo continue our example above, if we want to describe the association between cerebral atrophy and smoking and age using linear regression, we would have to use both the `yrsquit`

and `packyrs`

variables, in addition to the `age`

variable. But as we already described, the former two both measure smoking habits, and thus are truly one variable.

The `U()`

function only requires a formula when it is used to create a multiple-partial F-test. However, this is not a usual formula, because the response variable has already been defined in the outer formula in the call to `regress()`

. For example, the formula given to `regress()`

without the multiple-partial F-test would follow the usual convention of `lm()`

.

`atrophy ~ age + packyrs + yrsquit`

Now if we want to make the F-test, we give `U()`

the formula

`~ packyrs + yrsquit`

and it knows to use the response variable `atrophy`

. In fact, an error will be returned if a response variable is entered to the `U()`

formula.

Now we can run the regression.

`library(uwIntroStats)`

```
##
## Attaching package: 'uwIntroStats'
##
## The following object is masked from 'package:base':
##
## tabulate
```

```
data(mri)
regress("mean", atrophy ~ age + U(~packyrs + yrsquit), data = mri)
```

```
## ( 1 cases deleted due to missing values)
##
##
## Call:
## regress(fnctl = "mean", formula = atrophy ~ age + U(~packyrs +
## yrsquit), data = mri)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.673 -8.610 -0.873 7.727 52.552
##
## Coefficients:
## Estimate Naive SE Robust SE 95%L
## [1] Intercept -18.22 6.312 6.812 -31.60
## [2] age 0.7096 0.08401 0.09077 0.5314
## U(packyrs + yrsquit)
## [3] packyrs 0.02860 0.01694 0.01685 -4.488e-03
## [4] yrsquit 0.07252 0.03241 0.03221 9.288e-03
## 95%H F stat df Pr(>F)
## [1] Intercept -4.850 7.16 1 0.0076
## [2] age 0.8878 61.12 1 < 0.00005
## U(packyrs + yrsquit) 4.37 2 0.0130
## [3] packyrs 0.06168 2.88 1 0.0901
## [4] yrsquit 0.1358 5.07 1 0.0246
##
## Residual standard error: 12.27 on 730 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.09961, Adjusted R-squared: 0.09591
## F-statistic: 23.05 on 3 and 730 DF, p-value: 2.882e-14
```

The regression output indicates that the variable for smoking should be in the model. The F-statistic for the multiple-partial F-test, which tests that the `packyrs`

and `yrsquit`

coefficient estimates are simultaneously equal to zero, is 4.37 with a p-value of less than 0.05. Thus we would conclude that both age and smoking are associated with cerebral atrophy. For a full example of the inference we would make from this model, see the vignette for using `regress()`

.

`U()`

In our example above, we stated that both variables were actually measuring smoking habits. Thus in our regression call we could name this group to have more informative output. The `U()`

function allows us to name the groups by placing an “=” before the tilde in the formula, and assigning a name on the left. In our example above, we could name the group “smoke” by writing

`U(smoke = ~packyrs + yrsquit)`

This would return the following output.

`regress("mean", atrophy ~ age + U(smoke = ~packyrs + yrsquit), data = mri)`

```
## ( 1 cases deleted due to missing values)
##
##
## Call:
## regress(fnctl = "mean", formula = atrophy ~ age + U(smoke = ~packyrs +
## yrsquit), data = mri)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.673 -8.610 -0.873 7.727 52.552
##
## Coefficients:
## Estimate Naive SE Robust SE 95%L 95%H
## [1] Intercept -18.22 6.312 6.812 -31.60 -4.850
## [2] age 0.7096 0.08401 0.09077 0.5314 0.8878
## smoke
## [3] packyrs 0.02860 0.01694 0.01685 -4.488e-03 0.06168
## [4] yrsquit 0.07252 0.03241 0.03221 9.288e-03 0.1358
## F stat df Pr(>F)
## [1] Intercept 7.16 1 0.0076
## [2] age 61.12 1 < 0.00005
## smoke 4.37 2 0.0130
## [3] packyrs 2.88 1 0.0901
## [4] yrsquit 5.07 1 0.0246
##
## Residual standard error: 12.27 on 730 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.09961, Adjusted R-squared: 0.09591
## F-statistic: 23.05 on 3 and 730 DF, p-value: 2.882e-14
```

This is more informative than above, because now we are immediately reminded that `yrsquit`

and `packyrs`

are measuring smoking history when we look at the output.