Introduction to inferr

2017-05-02

Inferential statistics allows us to make generalizations about populations using data drawn from the population. We use them when it is impractical or impossible to collect data about the whole population under study and instead, we have a sample that represents the population under study and using inferential statistics technique, we make generalizations about the population from the sample. inferr builds upon the solid set of statistical tests provided in stats package by including additional data types as inputs, expanding and restructuring the test results.

The inferr package:

As of version 0.1, inferr includes a select set of parametric and non-parametric statistical tests which are listed below:

These tests are described in more detail in the following sections.

One Sample t Test

A one sample t-test is used to determine whether a sample of observations comes from a population with a specific mean. The observations must be continuous, independent of each other, approximately distributed and should not contain any outliers.

Example

Using the hsb data, test whether the average of write differs significantly from 50.

ttest(hsb$write, mu = 50, type = 'all')
##                               One-Sample Statistics                               
## ---------------------------------------------------------------------------------
##  Variable    Obs     Mean     Std. Err.    Std. Dev.    [95% Conf. Interval] 
## ---------------------------------------------------------------------------------
##   write      200    52.775     0.6702       9.4786       51.4537    54.0969   
## ---------------------------------------------------------------------------------
## 
##                                Ho: mean(write) ~=50                              
## 
##         Ha: mean < 50              Ha: mean ~= 50               Ha: mean > 50        
##          t = 4.141                   t = 4.141                   t = 4.141         
##        P < t = 1.0000             P > |t| = 0.0001             P > t = 0.0000

Paired t test

A paired (samples) t-test is used when you want to compare the means between two related groups of observations on some continuous dependent variable. In a paired sample test, each subject or entity is measured twice. It can be used to evaluate the effectiveness of training programs or treatments. If the dependent variable is dichotomous, use the McNemar test.

Examples

Using the hsb data, test whether the mean of read is equal to the mean of write.

# Lower Tail Test
paired_ttest(hsb$read, hsb$write, alternative = 'less')
##                          Paired Samples Statistics                          
## ---------------------------------------------------------------------------
## Variables    Obs    Mean     Std. Err.    Std. Dev.    [95% Conf. Interval] 
## ---------------------------------------------------------------------------
##    read       200    52.23      0.72         10.25         50.8      53.66   
##    write      200    52.77      0.67         9.48         51.45      54.09    
## ---------------------------------------------------------------------------
##    diff       200    -0.55      0.63         8.89         -1.79       0.69    
## ---------------------------------------------------------------------------
## 
##          Paired Samples Correlations         
## -------------------------------------------
##   Variables      Obs    Correlation    Sig.
##  read & write    200       0.60        0 
## -------------------------------------------
## 
##           Paired Samples Test           
##           -------------------           
##       Ho: mean(read - write) = 0        
##       Ha: mean(read - write) < 0        
## 
## ---------------------------------------
##   Variables        t       df     Sig.  
## ---------------------------------------
##  read - write    -0.873    199    0.192 
## ---------------------------------------
# Test all alternatives
paired_ttest(hsb$read, hsb$write, alternative = 'all')
##                          Paired Samples Statistics                          
## ---------------------------------------------------------------------------
## Variables    Obs    Mean     Std. Err.    Std. Dev.    [95% Conf. Interval] 
## ---------------------------------------------------------------------------
##    read       200    52.23      0.72         10.25         50.8      53.66   
##    write      200    52.77      0.67         9.48         51.45      54.09    
## ---------------------------------------------------------------------------
##    diff       200    -0.55      0.63         8.89         -1.79       0.69    
## ---------------------------------------------------------------------------
## 
##          Paired Samples Correlations         
## -------------------------------------------
##   Variables      Obs    Correlation    Sig.
##  read & write    200       0.60        0 
## -------------------------------------------
## 
##                 Ho: mean(read - write) = mean(diff) = 0                  
## 
##    Ha: mean(diff) < 0      Ha: mean(diff) ~= 0       Ha: mean(diff) > 0    
##        t = -0.873               t = -0.873               t = -0.873        
##      P < t = 0.192           P > |t| = 0.384           P > t = 0.808

Two Independent Sample t Test

An independent samples t-test is used to compare the means of a normally distributed continuous dependent variable for two unrelated groups. The dependent variable must be approximately normally distributed and the cases/subjects in the two groups must be different i.e. a subject in one group cannot also be a subject of the other group. It can be used to answer whether:

Example

Using the hsb data, test whether the mean for write is the same for males and females.

hsb2 <- inferr::hsb
hsb2$female <- as.factor(hsb2$female)
ind_ttest(hsb2, 'female', 'write', alternative = 'all')
##                               Group Statistics                                
## -----------------------------------------------------------------------------
##   Group       Obs     Mean     Std. Err.    Std. Dev.    [95% Conf. Interval] 
## -----------------------------------------------------------------------------
##     0          91    50.121      1.080       10.305        47.975     52.267   
##     1         109    54.991      0.779        8.134        53.447     56.535   
## -----------------------------------------------------------------------------
##  combined     200    52.775      0.67         9.479        51.454     54.096   
## -----------------------------------------------------------------------------
##    diff       200    -4.87       1.304        9.231        -7.426     -2.314   
## -----------------------------------------------------------------------------
## 
##                         Independent Samples Test                         
##                       ------------------------                        
## 
##                     Ho: mean(0) - mean(1) = diff = 0                     
## 
##       Ha: diff < 0            Ha: diff ~= 0             Ha: diff > 0       
## 
##                                   Pooled                                   
## ------------------------------------------------------------------------
##        t = -3.7347              t = -3.7347              t = -3.7347       
##      P < t = 0.0001          P > |t| = 0.0002          P > t = 0.9999      
## 
##                               Satterthwaite                                
## ------------------------------------------------------------------------
##        t = -3.6564              t = -3.6564              t = -3.6564       
##      P < t = 0.0002          P > |t| = 0.0003          P > t = 0.9998      
## 
## 
##                 Test for Equality of Variances                  
## ---------------------------------------------------------------
##  Variable      Method     Num DF    Den DF    F Value    P > F  
## ---------------------------------------------------------------
##   write       Folded F      90       108       1.605     0.0188 
## ---------------------------------------------------------------

One Sample Test of Proportion

One sample test of proportion compares proportion in one group to a specified population proportion.

Examples

Using hsb data, test whether the proportion of females is 50%.

# Using Variables
prop_test(as.factor(hsb$female), prob = 0.5)
##      Test Statistics      
## -------------------------
## Sample Size           200 
## Exp Prop              0.5 
## Obs Prop            0.545 
## z                  1.2728 
## Pr(|Z| > |z|)      0.2031 
## 
## -----------------------------------------------------------------
## Category    Observed    Expected    % Deviation    Std. Residuals 
## -----------------------------------------------------------------
##    0           91         100          -9.00           -0.90      
##    1          109         100           9.00            0.90      
## -----------------------------------------------------------------

Using Calculator

# Calculator
prop_test(200, prob = 0.5, phat = 0.3)
##      Test Statistics       
## --------------------------
## Sample Size            200 
## Exp Prop               0.5 
## Obs Prop               0.3 
## z                  -5.6569 
## Pr(|Z| > |z|)            0 
## 
## -----------------------------------------------------------------
## Category    Observed    Expected    % Deviation    Std. Residuals 
## -----------------------------------------------------------------
##    0          140         100          40.00            4.00      
##    1           60         100         -40.00           -4.00      
## -----------------------------------------------------------------

Two Sample Test of Proportion

Two sample test of proportion performs tests on the equality of proportions using large-sample statistics. It tests that a categorical variable has the same proportion within two groups or that two variables have the same proportion.

Examples

Using Variables

Using the treatment data, test equality of proportion of two treatments

# Using Variables
ts_prop_test(var1 = treatment$treatment1, var2 = treatment$treatment2, alternative = 'all')
##     Test Statistics      
## ------------------------
## Sample Size           50 
## z                  0.403 
## Pr(|Z| > |z|)      0.687 
## Pr(Z < z)          0.656 
## Pr(Z > z)          0.344

Use Grouping Variable

Using the treatment2 data, test whether outcome has same proportion for male and female

# Using Grouping Variable
ts_prop_grp(var = treatment2$outcome, group = treatment2$female, alternative = 'all')
##     Test Statistics      
## ------------------------
## Sample Size           91 
## z                  0.351 
## Pr(|Z| > |z|)      0.726 
## Pr(Z < z)          0.637 
## Pr(Z > z)          0.363

Using Calculator

Test whether the same proportion of people from two batches will pass a review exam for a training program. In the first batch of 30 participants, 30% passed the review, whereas in the second batch of 25 participants, 50% passed the review.

# Calculator
ts_prop_calc(n1 = 30, n2 = 25, p1 = 0.3, p2 = 0.5, alternative = 'all')
##      Test Statistics      
## -------------------------
## Sample Size            30 
## z                  -1.514 
## Pr(|Z| > |z|)        0.13 
## Pr(Z < z)           0.065 
## Pr(Z > z)           0.935

One Sample Variance Test

One sample variance comparison test compares the standard deviation (variances) to a hypothesized value. It determines whether the standard deviation of a population is equal to a hypothesized value. It can be used to answer the following questions:

Examples

Using the mtcars data, compare the standard deviation of mpg to a hypothesized value.

# Lower Tail Test
os_vartest(mtcars$mpg, 0.3, alternative = 'less')
##                             One-Sample Statistics                             
## -----------------------------------------------------------------------------
##  Variable    Obs     Mean      Std. Err.    Std. Dev.    [95% Conf. Interval] 
## -----------------------------------------------------------------------------
##    mpg       32     20.0906     1.0654       6.0269        3.8737    10.6526   
## -----------------------------------------------------------------------------
## 
##              Lower Tail Test             
##              ---------------             
##             Ho: sd(mpg) >= 0.3           
##              Ha: sd(mpg) < 0.3            
## 
##       Chi-Square Test for Variance       
## ----------------------------------------
##  Variable        c         DF      Sig       
## ----------------------------------------
##    mpg       12511.436     31     1.0000  
## ----------------------------------------
# Test all alternatives
os_vartest(mtcars$mpg, 0.3, alternative = 'all')
##                             One-Sample Statistics                             
## -----------------------------------------------------------------------------
##  Variable    Obs     Mean      Std. Err.    Std. Dev.    [95% Conf. Interval] 
## -----------------------------------------------------------------------------
##    mpg       32     20.0906     1.0654       6.0269        3.8737    10.6526   
## -----------------------------------------------------------------------------
## 
##                                Ho: sd(mpg) = 0.3                               
## 
##         Ha: sd < 0.3              Ha: sd != 0.3               Ha: sd > 0.3        
##       c = 12511.4359             c = 12511.4359             c = 12511.4359      
##      Pr(C < c) = 1.0000       2 * Pr(C > c) = 0.0000       Pr(C > c) = 0.0000

Two Sample Variance Test

Two sample variance comparison tests equality of standard deviations (variances). It tests that the standard deviation of a continuous variable is same within two groups or the standard deviation of two continuous variables is equal.

Example

Use Grouping Variable

Using the mtcars data, compare the standard deviation in miles per gallon for automatic and manual vehicles.

# Using Grouping Variable
var_test(mtcars$mpg, group_var = mtcars$am, alternative = 'all')
##                Variance Ratio Test                 
## --------------------------------------------------
##   Group      Obs    Mean     Std. Err.    Std. Dev. 
## --------------------------------------------------
##    0        19     17.15      0.88         3.83    
##    1        13     24.39      1.71         6.17    
## --------------------------------------------------
##  combined    32     20.09      1.07         6.03    
## --------------------------------------------------
## 
##                 Variance Ratio Test                 
## --------------------------------------------------
##         F              Num DF           Den DF      
## --------------------------------------------------
##       0.3866             18               12        
## --------------------------------------------------
## 
##        Null & Alternate Hypothesis        
## ----------------------------------------
##           ratio = sd(0) / (1)           
##               Ho: ratio = 1              
## 
##     Ha: ratio < 1        Ha: ratio > 1    
##   Pr(F < f) = 0.0335   Pr(F > f) = 0.9665 
## ----------------------------------------

Using Variables

Using the hsb data, compare the standard deviation of reading and writing scores.

# Using Variables
var_test(hsb$read, hsb$write, alternative = 'all')
##                Variance Ratio Test                 
## --------------------------------------------------
##   Group      Obs    Mean     Std. Err.    Std. Dev. 
## --------------------------------------------------
##   read      200    52.23      0.72         10.25   
##  write      200    52.77      0.67         9.48    
## --------------------------------------------------
##  combined    400    52.5       0.49         9.86    
## --------------------------------------------------
## 
##                 Variance Ratio Test                 
## --------------------------------------------------
##         F              Num DF           Den DF      
## --------------------------------------------------
##       1.1701            199              199        
## --------------------------------------------------
## 
##        Null & Alternate Hypothesis        
## ----------------------------------------
##        ratio = sd(read) / (write)       
##               Ho: ratio = 1              
## 
##     Ha: ratio < 1        Ha: ratio > 1    
##   Pr(F < f) = 0.8656   Pr(F > f) = 0.1344 
## ----------------------------------------

Binomial Probability Test

A one sample binomial test allows us to test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value.

Examples

Using the hsb data, test whether the proportion of females and males are equal.

# Using variables
binom_test(as.factor(hsb$female), prob = 0.5)
##              Binomial Test              
##  ---------------------------------------
##   Group     N     Obs. Prop    Exp. Prop 
##  ---------------------------------------
##     0       91        0.455        0.500 
##     1      109        0.545        0.500 
##  ---------------------------------------
## 
## 
##                  Test Summary                  
##  ---------------------------------------------
##   Tail              Prob              p-value  
##  ---------------------------------------------
##   Lower    Pr(k <= 109)               0.910518 
##   Upper    Pr(k >= 109)               0.114623 
##   Two      Pr(k <= 91 or k >= 109)    0.229247 
##  ---------------------------------------------

Using Calculator

# calculator
binom_calc(32, 16, prob = 0.5)
##             Binomial Test              
##  --------------------------------------
##   Group    N     Obs. Prop    Exp. Prop 
##  --------------------------------------
##     0      16          0.5        0.500 
##     1      16          0.5        0.500 
##  --------------------------------------
## 
## 
##                  Test Summary                 
##  --------------------------------------------
##   Tail              Prob             p-value  
##  --------------------------------------------
##   Lower    Pr(k <= 16)               0.569975 
##   Upper    Pr(k >= 16)               0.569975 
##   Two      Pr(k <= 15 or k >= 16)           1 
##  --------------------------------------------

ANOVA

The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups. It tests the null hypothesis that samples in two or more groups are drawn from populations with the same mean values. It cannot tell you which specific groups were statistically significantly different from each other but only that at least two groups were different and can be used only for numerical data.

Examples

Using the hsb data, test whether the mean of write differs between the three program types.

owanova(hsb, 'write', 'prog')
##                                 ANOVA                                  
## ----------------------------------------------------------------------
##                    Sum of                                             
##                    Squares     DF     Mean Square      F        Sig.  
## ----------------------------------------------------------------------
## Between Groups    3175.698      2      1587.849      21.275    0.0000 
## Within Groups     14703.177    197      74.635                        
## Total             17878.875    199                                    
## ----------------------------------------------------------------------
## 
##                  Report                   
## -----------------------------------------
##  Category      N      Mean     Std. Dev. 
## -----------------------------------------
##     1         45     51.333        9.398 
##     2         105    56.257        7.943 
##     3         50     46.760        9.319 
## -----------------------------------------
## 
## Number of obs = 200       R-squared     = 0.1776 
## Root MSE      = 8.6392    Adj R-squared = 0.1693

Chi Square Goodness of Fit Test

A chi-square goodness of fit test allows us to compare the observed sample distribution with expected probability distribution. It tests whether the observed proportions for a categorical variable differ from hypothesized proportions. The proportion of cases expected in each group of categorical variable may be equal or unequal. It can be applied to any univariate distribution for which you can calculate the cumulative distribution function. It is applied to binned data and the value of the chi square test depends on how the data is binned. For the chi square approximation to be valid, the sample size must be sufficiently large.

Example

Using the hsb data, test whether the observed proportions for race differs significantly from the hypothesized proportions.

# basic example
race <- as.factor(hsb$race)
chisq_gof(race, c(20, 20, 20 , 140))
##     Test Statistics     
## -----------------------
## Chi-Square       5.0286 
## DF                    3 
## Pr > Chi Sq      0.1697 
## Sample Size         200 
## 
##                          Variable: race                           
## -----------------------------------------------------------------
## Category    Observed    Expected    % Deviation    Std. Residuals 
## -----------------------------------------------------------------
##    1           24          20          20.00            0.89      
##    2           11          20         -45.00           -2.01      
##    3           20          20           0.00            0.00      
##    4          145         140           3.57            0.42      
## -----------------------------------------------------------------

Continuity Correction

# using continuity correction
race <- as.factor(hsb$race)
chisq_gof(race, c(20, 20, 20 , 140), correct = TRUE)
##     Test Statistics     
## -----------------------
## Chi-Square       4.3821 
## DF                    3 
## Pr > Chi Sq      0.2231 
## Sample Size         200 
## 
##                          Variable: race                           
## -----------------------------------------------------------------
## Category    Observed    Expected    % Deviation    Std. Residuals 
## -----------------------------------------------------------------
##    1           24          20          17.50            0.78      
##    2           11          20         -47.50           -2.12      
##    3           20          20          -2.50           -0.11      
##    4          145         140           3.21            0.38      
## -----------------------------------------------------------------

Chi Square Test of Independence

A chi-square test is used when you want to test if there is a significant relationship between two nominal (categorical) variables.

Examples

Using the hsb data, test if there is a relationship between the type of school attended (schtyp) and students’ gender (female).

chisq_test(as.factor(hsb$female), as.factor(hsb$schtyp))
##                Chi Square Statistics                 
## 
## Statistics                     DF    Value      Prob 
## ----------------------------------------------------
## Chi-Square                     1    0.0470    0.8284
## Likelihood Ratio Chi-Square    1    0.0471    0.8282
## Continuity Adj. Chi-Square     1    0.0005    0.9822
## Mantel-Haenszel Chi-Square     1    0.0468    0.8287
## Phi Coefficient                     0.0153          
## Contingency Coefficient             0.0153          
## Cramer's V                          0.0153          
## ----------------------------------------------------

Using the hsb data, test if there is a relationship between the type of school attended (schtyp) and students’ socio economic status (ses).

chisq_test(as.factor(hsb$schtyp), as.factor(hsb$ses))
##                Chi Square Statistics                 
## 
## Statistics                     DF    Value      Prob 
## ----------------------------------------------------
## Chi-Square                     2    6.3342    0.0421
## Likelihood Ratio Chi-Square    2    7.9060    0.0192
## Phi Coefficient                     0.1780          
## Contingency Coefficient             0.1752          
## Cramer's V                          0.1780          
## ----------------------------------------------------

Levene’s Test

Levene’s test is used to determine if k samples have equal variances. It is less sensitive to departures from normality and is an alternative to Bartlett’s test. This test returns Levene’s robust test statistic and the two statistics proposed by Brown and Forsythe that replace the mean in Levene’s formula with alternative location estimators. The first alternative replaces the mean with the median and the second alternative replaces the mean with the 10% trimmed mean.

Examples

Use Grouping Variable

Using the hsb data, test whether variance in reading score is same across race.

# Using Grouping Variable
levene_test(hsb$read, group_var = hsb$race)
##            Summary Statistics             
## Levels    Frequency    Mean     Std. Dev  
## -----------------------------------------
##   1          24        46.67      10.24   
##   2          11        51.91      7.66    
##   3          20        46.8       7.12    
##   4          145       53.92      10.28   
## -----------------------------------------
## Total        200       52.23      10.25   
## -----------------------------------------
## 
##                              Test Statistics                              
## -------------------------------------------------------------------------
## Statistic                            Num DF    Den DF         F    Pr > F 
## -------------------------------------------------------------------------
## Brown and Forsythe                        3       196      3.44    0.0179 
## Levene                                    3       196    3.4792     0.017 
## Brown and Forsythe (Trimmed Mean)         3       196    3.3936     0.019 
## -------------------------------------------------------------------------

Using Variables

Using the hsb data, test whether variance is equal for reading, writing and social studies scores.

# Using Variables
levene_test(hsb$read, hsb$write, hsb$socst)
##            Summary Statistics             
## Levels    Frequency    Mean     Std. Dev  
## -----------------------------------------
##   0          200       52.23      10.25   
##   1          200       52.77      9.48    
##   2          200       52.41      10.74   
## -----------------------------------------
## Total        600       52.47      10.15   
## -----------------------------------------
## 
##                              Test Statistics                              
## -------------------------------------------------------------------------
## Statistic                            Num DF    Den DF         F    Pr > F 
## -------------------------------------------------------------------------
## Brown and Forsythe                        2       597    1.1683    0.3116 
## Levene                                    2       597    1.3803    0.2523 
## Brown and Forsythe (Trimmed Mean)         2       597    1.3258    0.2664 
## -------------------------------------------------------------------------

Use Simple Linear Model

Using the hsb data, test whether variance in reading score is same for male and female students.

# Using Linear Regression Model
m <- lm(read ~ female, data = hsb)
levene_test(m)
##            Summary Statistics             
## Levels    Frequency    Mean     Std. Dev  
## -----------------------------------------
##   0          91        52.82      10.51   
##   1          109       51.73      10.06   
## -----------------------------------------
## Total        200       52.23      10.25   
## -----------------------------------------
## 
##                              Test Statistics                              
## -------------------------------------------------------------------------
## Statistic                            Num DF    Den DF         F    Pr > F 
## -------------------------------------------------------------------------
## Brown and Forsythe                        1       198    0.4542    0.5011 
## Levene                                    1       198    0.6024    0.4386 
## Brown and Forsythe (Trimmed Mean)         1       198     0.494     0.483 
## -------------------------------------------------------------------------

Using Formula

Using the hsb data, test whether variance in reading score is same across school types.

# Using Formula
levene_test(as.formula(paste0('read ~ schtyp')), hsb)
##            Summary Statistics             
## Levels    Frequency    Mean     Std. Dev  
## -----------------------------------------
##   1          168       51.85      10.42   
##   2          32        54.25       9.2    
## -----------------------------------------
## Total        200       52.23      10.25   
## -----------------------------------------
## 
##                              Test Statistics                              
## -------------------------------------------------------------------------
## Statistic                            Num DF    Den DF         F    Pr > F 
## -------------------------------------------------------------------------
## Brown and Forsythe                        1       198    0.5643    0.4534 
## Levene                                    1       198    0.6153    0.4337 
## Brown and Forsythe (Trimmed Mean)         1       198    0.5886    0.4439 
## -------------------------------------------------------------------------

Cochran’s Q Test

Cochran’s Q test is an extension to the McNemar test for related samples that provides a method for testing for differences between three or more matched sets of frequencies or proportions. It is a procedure for testing if the proportions of 3 or more dichotomous variables are equal in some population. These outcome variables have been measured on the same people or other statistical units.

Example

The exam data set contains scores of 15 students for three exams (exam1, exam2, exam3). Test if three exams are equally difficult.

cochran_test(exam)
##    Test Statistics     
## ----------------------
## N                   15 
## Cochran's Q       4.75 
## df                   2 
## p value          0.093 
## ----------------------

McNemar Test

McNemar test is a non parametric test created by Quinn McNemar and first published in Psychometrika in 1947. It is similar to a paired t test but applied to a dichotomous dependent variable. It is used to test if a statistically significant change in proportions have occurred on a dichotomous trait at two time points on the same population. It can be used to answer whether:

Examples

Using the hsb data, test if the proportion of students in himath and hiread group is equal.

himath <- ifelse(hsb$math > 60, 1, 0)
hiread <- ifelse(hsb$read > 60, 1, 0)
mcnemar_test(table(himath, hiread))
##            Controls 
## ---------------------------------
## Cases       0       1       Total 
## ---------------------------------
##   0        135      21        156 
##   1         18      26         44 
## ---------------------------------
## Total      153      47        200 
## ---------------------------------
## 
##        McNemar's Test        
## ----------------------------
## McNemar's chi2        0.2308 
## DF                         1 
## Pr > chi2              0.631 
## Exact Pr >= chi2      0.7493 
## ----------------------------
## 
##        Kappa Coefficient         
## --------------------------------
## Kappa                     0.4454 
## ASE                        0.075 
## 95% Lower Conf Limit      0.2984 
## 95% Upper Conf Limit      0.5923 
## --------------------------------
## 
## Proportion With Factor 
## ----------------------
## cases             0.78 
## controls         0.765 
## ratio           1.0196 
## odds ratio      1.1667 
## ----------------------

Perform the above test using matrix as input.

mcnemar_test(matrix(c(135, 18, 21, 26), nrow = 2))
##            Controls 
## ---------------------------------
## Cases       0       1       Total 
## ---------------------------------
##   0        135      21        156 
##   1         18      26         44 
## ---------------------------------
## Total      153      47        200 
## ---------------------------------
## 
##        McNemar's Test        
## ----------------------------
## McNemar's chi2        0.2308 
## DF                         1 
## Pr > chi2              0.631 
## Exact Pr >= chi2      0.7493 
## ----------------------------
## 
##        Kappa Coefficient         
## --------------------------------
## Kappa                     0.4454 
## ASE                        0.075 
## 95% Lower Conf Limit      0.2984 
## 95% Upper Conf Limit      0.5923 
## --------------------------------
## 
## Proportion With Factor 
## ----------------------
## cases             0.78 
## controls         0.765 
## ratio           1.0196 
## odds ratio      1.1667 
## ----------------------

Runs Test for Randomness

Runs Test can be used to decide if a data set is from a random process. It tests whether observations of a sequence are serially independent i.e. whether they occur in a random order by counting how many runs there are above and below a threshold. A run is defined as a series of increasing values or a series of decreasing values. The number of increasing, or decreasing, values is the length of the run. By default, the median is used as the threshold. A small number of runs indicates positive serial correlation; a large number indicates negative serial correlation.

Examples

We will use runs test to check regression residuals for serial correlation.

# linear regression
reg <- lm(mpg ~ disp, data = mtcars)

# basic example
runs_test(residuals(reg))
## Runs Test
##  Total Cases:  32 
##  Test Value :  -0.9630856 
##  Cases < Test Value:  16 
##  Cases > Test Value:  16 
##  Number of Runs:  11 
##  Expected Runs:  17 
##  Variance (Runs):  7.741935 
##  z Statistic:  -2.156386 
##  p-value:  0.03105355
# drop values equal to threshold
runs_test(residuals(reg), drop = TRUE)
## Runs Test
##  Total Cases:  32 
##  Test Value :  -0.9630856 
##  Cases < Test Value:  16 
##  Cases > Test Value:  16 
##  Number of Runs:  11 
##  Expected Runs:  17 
##  Variance (Runs):  7.741935 
##  z Statistic:  -2.156386 
##  p-value:  0.03105355
# recode data in binary format
runs_test(residuals(reg), split = TRUE)
## Runs Test
##  Total Cases:  32 
##  Test Value :  -0.9630856 
##  Cases < Test Value:  16 
##  Cases > Test Value:  16 
##  Number of Runs:  11 
##  Expected Runs:  17 
##  Variance (Runs):  7.741935 
##  z Statistic:  -2.156386 
##  p-value:  0.03105355
# use mean as threshold
runs_test(residuals(reg), mean = TRUE)
## Runs Test
##  Total Cases:  32 
##  Test Value :  -1.12757e-16 
##  Cases < Test Value:  19 
##  Cases > Test Value:  13 
##  Number of Runs:  11 
##  Expected Runs:  16.4375 
##  Variance (Runs):  7.189642 
##  z Statistic:  -2.027896 
##  p-value:  0.04257089
# threshold to be used for counting runs
runs_test(residuals(reg), threshold = 0)
## Runs Test
##  Total Cases:  32 
##  Test Value :  0 
##  Cases < Test Value:  19 
##  Cases > Test Value:  13 
##  Number of Runs:  11 
##  Expected Runs:  16.4375 
##  Variance (Runs):  7.189642 
##  z Statistic:  -2.027896 
##  p-value:  0.04257089

Credits

The examples and the data set used in the vignette are borrowed from the below listed sources: