Many times in an analysis, multiple variables in the data will be measuring the same quantity. For example, in the `mri` data available at Scott Emerson's website and documented on the same page, both the `yrsquit` and `packyrs` variables measure the amount of smoking that a person does.

To fully analyze these variables, we need to run multiple-partial F-tests. Prior to the `uwIntroStats` package, the process to perform these tests involved more code than was necessary. First the user had to create a linear model (or perhaps multiple linear models), and then run an ANOVA test.

Now, using the `U()` function, the user can specify multiple-partial F-tests within a call to `regress()`, the regression function supplied by `uwIntroStats`. A full explanation of that function can be found in “Regression in uwIntroStats”.

This document provides an introduction to using the `U()` function as a supplement to regression analyses. In each case, we will use linear regression to avoid confusion, and leave all of the arguments to `regress()` up to its own vignette.

# Arguments to the `U()` function

To continue our example above, if we want to describe the association between cerebral atrophy and smoking and age using linear regression, we would have to use both the `yrsquit` and `packyrs` variables, in addition to the `age` variable. But as we already described, the former two both measure smoking habits, and thus are truly one variable.

The `U()` function only requires a formula when it is used to create a multiple-partial F-test. However, this is not a usual formula, because the response variable has already been defined in the outer formula in the call to `regress()`. For example, the formula given to `regress()` without the multiple-partial F-test would follow the usual convention of `lm()`.

``````atrophy ~ age + packyrs + yrsquit
``````

Now if we want to make the F-test, we give `U()` the formula

``````~ packyrs + yrsquit
``````

and it knows to use the response variable `atrophy`. In fact, an error will be returned if a response variable is entered to the `U()` formula.

Now we can run the regression.

``````library(uwIntroStats)
data(mri)
regress("mean", atrophy ~ age + U(~packyrs + yrsquit), data = mri)
``````
``````## ( 1  cases deleted due to missing values)
##
##
## Call:
## regress(fnctl = "mean", formula = atrophy ~ age + U(~packyrs +
##     yrsquit), data = mri)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -35.673  -8.610  -0.873   7.727  52.552
##
## Coefficients:
##                             Estimate  Naive SE  Robust SE    95%L
##  Intercept                -18.22     6.312     6.812        -31.60
##  age                       0.7096   0.08401   0.09077       0.5314
##     U(packyrs + yrsquit)
##    packyrs                0.02860   0.01694   0.01685     -4.488e-03
##    yrsquit                0.07252   0.03241   0.03221      9.288e-03
##                             95%H         F stat    df Pr(>F)
##  Intercept                -4.850           7.16 1    0.0076
##  age                       0.8878         61.12 1  < 0.00005
##     U(packyrs + yrsquit)                      4.37 2    0.0130
##    packyrs                0.06168          2.88 1    0.0901
##    yrsquit                 0.1358          5.07 1    0.0246
##
## Residual standard error: 12.27 on 730 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09961,    Adjusted R-squared:  0.09591
## F-statistic: 23.05 on 3 and 730 DF,  p-value: 2.882e-14
``````

The regression output indicates that the variable for smoking should be in the model. The F-statistic for the multiple-partial F-test, which tests that the `packyrs` and `yrsquit` coefficient estimates are simultaneously equal to zero, is 4.37 with a p-value of less than 0.05. Thus we would conclude that both age and smoking are associated with cerebral atrophy. For a full example of the inference we would make from this model, see the vignette for using `regress()`.

# Naming the groups defined by `U()`

In our example above, we stated that both variables were actually measuring smoking habits. Thus in our regression call we could name this group to have more informative output. The `U()` function allows us to name the groups by placing an “=” before the tilde in the formula, and assigning a name on the left. In our example above, we could name the group “smoke” by writing

``````U(smoke = ~packyrs + yrsquit)
``````

This would return the following output.

``````regress("mean", atrophy ~ age + U(smoke = ~packyrs + yrsquit), data = mri)
``````
``````## ( 1  cases deleted due to missing values)
##
##
## Call:
## regress(fnctl = "mean", formula = atrophy ~ age + U(smoke = ~packyrs +
##     yrsquit), data = mri)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -35.673  -8.610  -0.873   7.727  52.552
##
## Coefficients:
##                  Estimate  Naive SE  Robust SE    95%L       95%H
##  Intercept     -18.22     6.312     6.812        -31.60    -4.850
##  age            0.7096   0.08401   0.09077       0.5314     0.8878
##     smoke
##    packyrs     0.02860   0.01694   0.01685     -4.488e-03  0.06168
##    yrsquit     0.07252   0.03241   0.03221      9.288e-03   0.1358
##                     F stat    df Pr(>F)
##  Intercept            7.16 1    0.0076
##  age                 61.12 1  < 0.00005
##     smoke                4.37 2    0.0130
##    packyrs            2.88 1    0.0901
##    yrsquit            5.07 1    0.0246
##
## Residual standard error: 12.27 on 730 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09961,    Adjusted R-squared:  0.09591
## F-statistic: 23.05 on 3 and 730 DF,  p-value: 2.882e-14
``````

This is more informative than above, because now we are immediately reminded that `yrsquit` and `packyrs` are measuring smoking history when we look at the output.