# The design and structure of geex

#### 2018-09-23

The details below are for those interested in how geex is organized. It is not necessary for using geex.

## The Estimating Function

The design of geex starts with the key to M-estimation, the estimating function:

$\psi(O_i, \theta) .$

geex composes $$\psi$$ with two R functions: the “outer” estFUN and the “inner” psiFUN. In pseudocode, $$\psi(O_i, \theta) =$$:

estFUN <- function(O_i){
psiFUN <- function(theta){
psi(O_i, theta)
}
return(psiFUN)
}

The reason for composing the $$\psi$$ function in this way is that in order to do estimation (finding roots) and inference (computing the empirical sandwich variance estimator), $$\psi$$ needs to be function of $$\theta$$. M-estimation theory gives the following instructions:

• To estimate $$\hat{\theta}$$, we need to find roots of $$G_m = \sum_i \psi(O_i, \theta) = 0$$.
• To estimate the empirical sandwich variance estimator, we need two quantities for each unit: $$A_i = - (\partial \psi(O_i, \theta)/\partial \theta)|_{\theta = \hat{\theta}}$$ and $$B_i = \psi(O_i, \hat{\theta})\psi(O_i, \theta)^{\intercal}$$.

With $$\hat{\theta}$$ in hand, the quantity $$B_i$$ is simple to compute. The computational challenges of M-estimation, then, are finding roots of $$G_m$$ and calculating the derivative $$A_i$$. By composing $$\psi$$ of two functions in geex, one can first do all the manipulations of $$O_i$$ (data) that are independent of $$\theta$$. In a sense, estFUN “fixes” the data so that numerical routines only need deal with $$\theta$$ in psiFUN.

## M-estimation basis

Before describing the mechanics of how geex finding roots of $$G_m$$ and computes derivatives of $$\psi$$, let’s look at the m_estimation_basis S4 object which forms the basis of all computations in geex.

An m_estimation_basis object, at a minimum needs two objects: an estFUN and a data.frame. Let’s use a simple estFUN that estimates the mean and variance of Y1 in the geexex dataset.

library(geex)
library(dplyr)

myee <- function(data){
Y1 <- data$Y1 function(theta){ c(Y1 - theta, (Y1 - theta)^2 - theta) } } Now we can create a basis: mybasis <- new("m_estimation_basis", .estFUN = myee, .data = geexex) And look at what this object contains: slotNames(mybasis) ##  ".data" ".units" ".weights" ".psiFUN_list" ##  ".GFUN" ".control" ".estFUN" ".outer_args" ##  ".inner_args" Two slots are worth examining. First, .psiFUN_list is a list of functions: mybasis@.psiFUN_list[1:2] ##$1
## function (theta)
## {
##     c(Y1 - theta, (Y1 - theta)^2 - theta)
## }
## <environment: 0x7f9a64794a88>
##
## $2 ## function (theta) ## { ## c(Y1 - theta, (Y1 - theta)^2 - theta) ## } ## <bytecode: 0x7f9a646142a8> ## <environment: 0x7f9a64782e08> This object is essentially equivalent to: m <- nrow(geexex) lapply(split(geexex, f = 1:m), function(O_i){ myee(O_i) }) From this list of functions, we can compute $$A_i$$, and by summing across the list, form $$G_m$$. The latter is found in: mybasis@.GFUN ## function (theta) ## { ## psii <- lapply(psi_list, function(psi) { ## do.call(psi, args = append(list(theta = theta), object@.inner_args)) ## }) ## compute_sum_of_list(psii, object@.weights) ## } ## <environment: 0x7f9a64b3d968> ## Finding roots Now that we have $$G_m$$ as a function of theta, we can found its roots using a root-finding algorithm such as rootSolve::multiroot: rootSolve::multiroot( f = mybasis@.GFUN, start = c(0, 0)) ##$root
##   5.044563 10.041239
##
## $f.root ##  -2.131628e-14 4.654055e-13 ## ##$iter
##  4
##
## $estim.precis ##  2.433609e-13 Within geex this is done with the estimate_GFUN_roots function. To illustrate, I first need to update the .control slot in mybasis with starting values for multiroot. mycontrol <- new('geex_control', .root = setup_root_control(start = c(1, 1))) mybasis@.control <- mycontrol roots <- mybasis %>% estimate_GFUN_roots() roots ##$root
##   5.044563 10.041239
##
## $f.root ##  -2.131628e-14 -2.238210e-13 ## ##$iter
##  4
##
## $estim.precis ##  1.225686e-13 Note that is bad form to assign S4 slot with someS4object@aslot <- something, but I do so here because I have not created a generic function for setting the .control slot. ## Computing the Empirical Sandwich Variance Estimator In the last section, we found $$\hat{\theta}$$, which we now use to compute the $$A_i$$ and $$B_i$$ matrices. geex uses the numDeriv::jacobian function to numerically evaluate derivatives. For example, $$A_1 = - (\partial \psi(O_1, \theta)/\partial \theta)|_{\theta = \hat{\theta}}$$ for this example is: -numDeriv::jacobian(func = mybasis@.psiFUN_list[], x = roots$root)
##           [,1] [,2]
## [1,]  1.000000    0
## [2,] -2.752514    1

geex performs this operation for each $$i = 1, \dots, m$$ to yield a list of $$A_i$$ matrices. Then summing across this list yields $$A = \sum_i A_i$$. The estimate_sandwich_matrices function computes the list of $$A_i$$, $$B_i$$ and $$A$$ and $$B$$:

mats <- mybasis %>%
estimate_sandwich_matrices(.theta = roots$root) # Compare to the numDeriv computation above grab_bread_list(mats)[] ## [,1] [,2] ## [1,] 1.000000 0 ## [2,] -2.752514 1 Finally, computing $$\hat{\Sigma} = A^{-1} B (A^{-1})^{\intercal}$$ is accomplished with the compute_sigma function. mats %>% {compute_sigma(A = grab_bread(.), B = grab_meat(.))} ## [,1] [,2] ## [1,] 0.10041239 0.03667969 ## [2,] 0.03667969 2.49219638 ## M-estimation with m_estimate All of the operations described above are wrapped and packaged in the m_estimate function: m_estimate( estFUN = myee, data = geexex, root_control = setup_root_control(start = c(0, 0)) ) ## An object of class "geex" ## Slot "call": ## m_estimate(estFUN = myee, data = geexex, root_control = setup_root_control(start = c(0, ## 0))) ## ## Slot "basis": ## An object of class m_estimation_basis ## psi: ## { ## Y1 <- data$Y1
##     function(theta) {
##         c(Y1 - theta, (Y1 - theta)^2 - theta)
##     }
## }
## Data:
##           Y1        Y2       X1       Y3       W1        Z1 X2         Y4
## 1  3.6683066 2.0281718 4.949316 16.34576 4.823768  8.921782  0 0.09273926
## 2 10.4524548 1.6432966 7.851962 25.68742 7.884845 13.909474  0 1.01672736
## 3  3.1234106 2.8526264 4.729075 16.10831 4.709346  9.014695  0 0.49399039
## 4  8.3715025 2.5133652 2.564395 10.57997 2.786091  6.733378  0 1.24322433
## 5 -0.8319749 3.0182030 4.782347 16.46401 4.811590  9.290492  0 0.69520599
## 6  3.3987763 0.9785209 5.335713 18.32577 5.415370 10.322199  0 0.95220138
##   Y5
## 1  1
## 2  1
## 3  0
## 4  0
## 5  1
## 6  1
## Units:
## Slot "rootFUN_results":
## $root ##  5.044563 10.041239 ## ##$f.root
##  -2.131628e-14  4.654055e-13
##
## $iter ##  4 ## ##$estim.precis
##  2.433609e-13
##
##
## Slot "sandwich_components":
## An object of class sandwich_components
##              [,1] [,2]
## [1,]  1.00000e+02    0
## [2,] -1.65139e-11  100
## B (meat matrix):
##           [,1]       [,2]
## [1,] 1004.1239   366.7969
## [2,]  366.7969 24921.9638
##
## Slot "GFUN":
## function ()
## NULL
## <bytecode: 0x7f9a5e800ba0>
##
## Slot "corrections":
## list()
##
## Slot "estimates":
##   5.044563 10.041239
##
## Slot "vcov":
##            [,1]       [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638