# bife - Binary Choice Models with Fixed Effects

## Introduction

In econometrics, fixed effects binary choice models are important tools for panel data analysis. The package bife provides a new approach suggested by Stammann, Heiss, and McFadden (2016) to estimate logit and probit panel data models of the following form:

$y_{it} = \mathbf{1}\left[\mathbf{x}_{it}\boldsymbol{\beta} + \alpha_{i} + \epsilon_{it} > 0\right] \;,$

where $$i = 1, \dots, N$$ and $$t = 1, \dots, T_i$$ denote different indices. In many applications, $$i$$ represents individuals, firms or other cross-sectional units and $$t$$ represents time in a longitudinal data set. But the setup is also useful for instance if $$i$$ represents ZIP code areas and $$t$$ is an index of individuals. The dependent variable $$y_{it}$$ is binary. We observe regressors collected in the vector $$\mathbf{x}_{it}$$ but we don’t observe the error term $$\epsilon_{it}$$.

We are primarily interested in estimating the parameters $$\boldsymbol{\beta}$$, but the model also includes individual fixed effects $$\alpha_{i}$$. We assume $$E(\epsilon_{it} | \mathbf{X}_{i}, \alpha_{i}) = 0$$ but don’t make any assumptions about the marginal distribution of $$\alpha_{i}$$ or its correlation with the regressors $$\mathbf{x}_{i1},\dots, \mathbf{x}_{iT_i}$$.

The estimator implemented in this package is based on maximum likelihood estimation (ML) of both $$\boldsymbol{\beta}$$ and $$\alpha_1, \dots, \alpha_N$$. It actually is the same estimator as a logistic regression a set of individual dummy variables such as

glm(y ~ X + factor(i), family = binomial())

The main difference is that In contrast to glm(), bife() applies a pseudo-demeaning algorithm proposed by Stammann, Heiss, and McFadden (2016)1. Its computational costs are lower by orders of magnitude if $$N$$ is reasonably large.

It is well known that as $$N\rightarrow\infty$$, the ML estimator is not consistent. This “incidental parameters problem” can be severe if $$T$$ is small. To tackle this problem, we provide an analytical and a jackknife bias correction for the structural parameters $$\boldsymbol{\beta}$$ and the average partial effects (Hahn and Newey 2004). Thus this package is well suited to analyse big micro-data where $$N$$ and/or $$T$$ are large.

This package provides methods to:

• bife() – estimate binary choice models with fixed effects with/-out bias correction
• apeff.bife() – compute average partial effects2 with/-out bias correction

Both methods utilize the RcppArmadillo-package provided by Eddelbuettel and Sanderson (2014).

An alternative to full ML estimation of all parameters is a conditional maximum likelihood estimator which conditions out $$\alpha_1, \dots, \alpha_N$$ and only estimates $$\boldsymbol\beta$$. It is for example available with survival::clogit() and is consistent under the usual regularity conditions. The problem with this estimator is that its computational burden increases dramatically for larger $$T$$ values and that partial effects cannot be consistently estimated since this would require estimates of $$\alpha_1, \dots, \alpha_N$$.

To demonstrate the computational advantage of bife(), we compare it with two popular methods to estimate logit models with fixed effects: glm() and survival::clogit().4 To compare these methods, we utilize the data generating process of Greene (2004) :

$y_{it} = \mathbf{1}\left[w_{it} + \epsilon_{it} > 0\right] \;,$ $w_{it} = \alpha_{i} + x_{it} + d_{it} \;,$ $\alpha_{i} = \sqrt{T} \bar{x_{i}} + a_{i} \;,$ $x_{it} \sim \mathcal{N}[0, 1] \;,$ $d_{it} = \mathbf{1}\left[x_{it} + h_{it} > 0\right] \;,$ $\epsilon_{it} = \log\left[u_{it} / (1 - u_{it})\right] \;,$

where $$x_{it}, h_{it}, a_{i} \sim \mathcal{N}[0, 1]$$, $$u_{it} \sim \mathcal{U}[0, 1]$$, $$i = 1, \dots, N$$, and $$t = 1, \dots, T$$.

To show how the computational burden changes with $$N$$ and $$T$$, we either fix $$N$$ and vary $$T$$ or vice versa. For each of the different combinations we generate 50 datasets and measure the compuational time with the following commands:

time.bife   <- system.time(bife(  y ~ x + d | id, model = "logit", bias.corr = "ana"))[3]
time.clogit <- system.time(clogit(y ~ x + d + strata(id))                          )[3]
time.glm    <- system.time(glm(   y ~ x + d + 0 + factor(id), family = binomial()) )[3]

The following table reports the average computation time in seconds for each method and all combinations of $$N$$ and $$T$$. Note that bife.corr refers to the results with analytical bias correction.

N T bife.corr clogit glm N T bife.corr clogit glm
100 10 0.00880 0.00736 0.13450 100 10 0.00870 0.00746 0.13430
200 10 0.01220 0.01216 0.89906 100 20 0.01174 0.01412 0.24614
300 10 0.01590 0.01710 3.06478 100 30 0.01448 0.02400 0.34784
400 10 0.01964 0.02176 7.53678 100 40 0.01744 0.03638 0.40704
500 10 0.02310 0.02658 14.42108 100 50 0.02026 0.05176 0.43400
600 10 0.02644 0.03152 25.63560 100 60 0.02334 0.07006 0.45406
700 10 0.03046 0.03620 39.55658 100 70 0.02624 0.09234 0.46582
800 10 0.03402 0.04122 58.09904 100 80 0.02924 0.11644 0.51412
900 10 0.03762 0.04626 81.24916 100 90 0.03228 0.14450 0.60294
1000 10 0.04364 0.05102 110.23218 100 100 0.03512 0.17500 0.67712

Visualizing the results illustrates that the computational time of bife() is linear in both dimensions and hence is well suited to analyse big panel data. The left figure depicts that with $$T=10$$, glm() becomes dramatically more computationally intensive as $$N$$ increases. Besides, glm() is always more time consuming than the other two methods regardless which dimension is varied. Additionally, the right figure already indicates that the computation time of survival::clogit() increases more than linearly with $$T$$. In applications with very large $$T$$, survival::clogit() can quickly become computationally infeasible in terms of convergence, computation time or technique.

## Workflow

This section demonstrates two examples of a typical workflow using bife:: with real datasets. The first example uses a balanced panel with many fixed effects (large $$N$$) and the second example a unbalanced panel with large $$T_i$$.

### Example: Hyslop (1999) — Large $$N$$

The first example is inspired by Hyslop (1999) who analysed the labor force participation of married women in a “classic” balanced panel. The sample was obtained from the “Panel Study of Income Dynamics” and contains information about $$N=1461$$ women that were observed over $$T=9$$ years.

To analyse the labor force participation of married women, we specify the following model:

$LFP_{it} = \mathbf{1}\left[\beta_{1} AGE_{it} + \beta_{2} (INCH / 1000)_{it} + \beta_{3} KID1_{it} + \beta_{4} KID2_{it} + \beta_{5} KID3_{it} + \alpha_{i} + \epsilon_{it} > 0\right] \;,$

where $$LFP_{it}$$ indicates the labor force participation of a married woman, $$AGE_{it}$$ refers to the age, $$(INCH / 1000)_{it}$$ is the husbands income in thousand dollars, and the $$KID*_{it}$$ variables refer to the number of kids in a certain age group.

We start with a comparison of different methods to estimate logit models with fixed effects similiar to the section before. The following table reports the structural parameters ($$\boldsymbol{\beta}$$) and the execution time for each method, where bife.corr refers to the results with analytical bias correction.

bife glm bife.corr clogit
AGE 0.0378720 0.0378720 0.0339453 0.0336164
I(INCH / 1000) -0.0086599 -0.0086599 -0.0076302 -0.0074680
KID1 -1.1742997 -1.1742997 -1.0529849 -1.0301399
KID2 -0.5690298 -0.5690298 -0.5091778 -0.5000720
KID3 -0.0115029 -0.0115029 -0.0105624 -0.0102339
Time in sec 0.0350000 340.2960000 0.0370000 0.0770000

There are two things to highlight in this table. First bife(..., bias.corr = "no") and glm(..., family = binomial()) deliver the same structural parameter estimates, but the execution time of glm(..., family = binomial()) is about 10,000 times as long. Second the small $$T$$ leads to incidental parameters bias, but the analytical bias correction (bife(..., bias.corr = "ana")) is able to correct the structural parameters such that we get very competitive results compared to the unbiased alternative survival::clogit().

Next, we show how to estimate the specification above with bife().

mod.logit <- bife(LFP ~ AGE + I(INCH / 1000) + KID1 + KID2 + KID3 | ID, data = psid, bias.corr = "ana")
summary(mod.logit)
## ---------------------------------------------------------------
## Fixed effects logit model
## with analytical bias-correction
##
## Estimated model:
## LFP ~ AGE + I(INCH/1000) + KID1 + KID2 + KID3 | ID
##
## Log-Likelihood= -3045.505
## n= 5976, number of events= 3432
## Demeaning converged after 5 iteration(s)
## Offset converged after 3 iteration(s)
##
## Corrected structural parameter(s):
##
##               Estimate Std. error t-value  Pr(> t)
## AGE           0.033945   0.012990   2.613  0.00899 **
## I(INCH/1000) -0.007630   0.001993  -3.829  0.00013 ***
## KID1         -1.052985   0.096186 -10.947  < 2e-16 ***
## KID2         -0.509178   0.084510  -6.025 1.79e-09 ***
## KID3         -0.010562   0.060413  -0.175  0.86121
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ( 7173 observation(s) deleted due to perfect classification )
## Average individual fixed effects= 0.0122
## ---------------------------------------------------------------

The parameters of binary outcome variables are difficult to interpret quantitatively. In econometrics, partial effects $$\frac{\partial Pr(y_{it}=1)}{\partial x_{itj}}$$ are of more interest. Neither glm() nor survival::clogit() provide a routine to compute partial effects. This package provides the function apeff.bife() to compute average partial effects based on the estimated model provided by bife(). The user simply has to specify which of the variables are discrete and which type of bias correction should be used for the computation of the avarage partial effects. The left column named apeff refers to usual uncorrected average partial effects and the right column named apeff refers to semi-corrected average partial effects following Stammann, Heiss, and McFadden (2016).

apeff.bife(mod.logit, discrete = c("KID1", "KID2", "KID3"), bias.corr = "semi")
##                     apeff apeff.corrected
## AGE           0.006374356     0.005767347
## I(INCH/1000) -0.001457574    -0.001296385
## KID1         -0.203386638    -0.183839399
## KID2         -0.096692724    -0.087283987
## KID3         -0.001933520    -0.001792393

bife() also offers the opportunity to estimate fixed effects probit models by specifiying model = "probit".

mod.probit <- bife(LFP ~ AGE + I(INCH / 1000) + KID1 + KID2 + KID3 | ID,
data = psid, bias.corr = "ana", model = "probit")
summary(mod.probit)
## ---------------------------------------------------------------
## Fixed effects probit model
## with analytical bias-correction
##
## Estimated model:
## LFP ~ AGE + I(INCH/1000) + KID1 + KID2 + KID3 | ID
##
## Log-Likelihood= -3046.317
## n= 5976, number of events= 3432
## Demeaning converged after 4 iteration(s)
## Offset converged after 3 iteration(s)
##
## Corrected structural parameter(s):
##
##               Estimate Std. error t-value  Pr(> t)
## AGE           0.019508   0.007557   2.582  0.00986 **
## I(INCH/1000) -0.004541   0.001154  -3.935 8.41e-05 ***
## KID1         -0.621357   0.054957 -11.306  < 2e-16 ***
## KID2         -0.303336   0.048954  -6.196 6.17e-10 ***
## KID3         -0.005361   0.034907  -0.154  0.87795
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ( 7173 observation(s) deleted due to perfect classification )
## Average individual fixed effects= 0.0306
## ---------------------------------------------------------------

Although the structural parameters are different compared to the logit model due to a different normalization, the average partial effects are similiar:

apeff.bife(mod.probit, discrete = c("KID1", "KID2", "KID3"), bias.corr = "semi")
##                     apeff apeff.corrected
## AGE           0.006179070     0.005637702
## I(INCH/1000) -0.001449039    -0.001312178
## KID1         -0.199812313    -0.184140972
## KID2         -0.096109623    -0.088374005
## KID3         -0.001468476    -0.001547728

### Example: ACS PUMS 2014 — Large T

The second example is based on a sample drawn from the American Community Survey (ACS PUMS 2014) were the panel structure is slightly different in comparison to the “classic” structure used in the section before. Instead of individual fixed effects we consider state fixed effects. $$N$$ can be now considered as the number of groups (states) and $$T_i$$ as the group size of group $$i$$.

In this example we observe a total of 662,775 married women in $$N = 51$$ states. Since each state is of different population size, we end up with a highly unbalanced panel were the largest state consists of $$T_{\max} = 74,752$$ and the smallest of $$T_{\min} = 855$$ married women.

The model can be described as follows: $LFP_{it} = \mathbf{1}\left[\beta_{1} AGEP_{it} + \beta_{2} (PINCP / 1000)_{it} + \beta_{3} FER_{it} + \alpha_{i} + \epsilon_{it} > 0\right] \;,$

where $$LFP_{it}$$ indicates the labor force participation of a married woman, $$AGEP_{it}$$ refers to the age, $$(PINCP / 1000)_{it}$$ is the total persons income in thousand dollars, and $$FER_{it}$$ indicates if a woman gave birth to a child within the past 12 months. In this example $$i$$ refers to one of the states and $$t$$ refers to one of the individuals observed in this state.

As before, we start with a comparison of different methods to estimate logit models with fixed effects.

bife glm bife.corr clogit
AGEP -0.0757220 -0.0757220 -0.0757159 NA
I(PINCP / 1000) 0.0676899 0.0676899 0.0676870 NA
FER -1.0004365 -1.0004365 -1.0003380 NA
Time in sec 1.6580000 18.9230000 1.9330000 NA

There are again two things to highlight. First since the bias of $$\boldsymbol{\hat{\beta}}$$ obtained from bife(..., bias.corr = "no") vanishes with large $$T$$, the bias correction is redundant. Second since the panel structure consists of very large $$T$$, survival::clogit() is not able to handle this dataset:

print(try(clogit(LFP ~ AGEP + I(PINCP / 1000) + FER + strata(ST), data = acs)))
## [1] "Error in fitter(X, Y, strats, offset, init, control, weights = weights,  : \n  (number at risk) * (number tied deaths) is too large\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in fitter(X, Y, strats, offset, init, control, weights = weights,     method = method, row.names(mf)): (number at risk) * (number tied deaths) is too large>

Next, we will show how to analyse panel data with bife(..., bias.corr = "no") following the specification above.

mod.logit <- bife(LFP ~ AGEP + I(PINCP / 1000) + FER | ST, data = acs, bias.corr = "no")
summary(mod.logit)
## ---------------------------------------------------------------
## Fixed effects logit model
## with no bias-correction
##
## Estimated model:
## LFP ~ AGEP + I(PINCP/1000) + FER | ST
##
## Log-Likelihood= -287688.2
## n= 662775, number of events= 398766
## Demeaning converged after 7 iteration(s)
##
## Uncorrected structural parameter(s):
##
##                 Estimate Std. error t-value Pr(> t)
## AGEP          -0.0757220  0.0002542 -297.88  <2e-16 ***
## I(PINCP/1000)  0.0676899  0.0002140  316.25  <2e-16 ***
## FER           -1.0004365  0.0175241  -57.09  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Average individual fixed effects= 3.0803
## ---------------------------------------------------------------
apeff.bife(mod.logit, discrete = "FER")
##                      apeff
## AGEP          -0.010776434
## I(PINCP/1000)  0.009633341
## FER           -0.139900085

Since we estimated a logit model without bias-correction, apeff() delivers only one column with uncorrected average partial effects.

## Notes

For further details on when to use the bias-corrections for structural parameters and average partial effects please consult Stammann, Heiss, and McFadden (2016) and Hahn and Newey (2004).

## References

Chamberlain, Gary. 1980. “Analysis of Covariance with Qualitative Data.” The Review of Economic Studies 47 (1): 225–38.

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” Computational Statistics and Data Analysis 71: 1054–63.

Greene, William. 2004. “The Behaviour of the Maximum Likelihood Estimator of Limited Dependent Variable Models in the Presence of Fixed Effects.” Econometrics Journal 7 (1): 98–119.

Hahn, Jinyong, and Whitney Newey. 2004. “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models.” Econometrica 72 (4): 1295–1319.

Hyslop, Dean R. 1999. “State Dependence, Serial Correlation and Heterogeneity in Intertemporal Labor Force Participation of Married Women.” Econometrica 67 (6): 1255–94.

Stammann, Amrei, Florian Heiss, and Daniel McFadden. 2016. “Estimating Fixed Effects Logit Models with Large Panel Data.”

1. The proposed pseudo-demeaning algorithm is in spirit of Greene (2004) and Chamberlain (1980).

2. Partial effects are also called marginal effects.

3. A more detailed analyses can be found in Stammann, Heiss, and McFadden (2016)

4. We use the base-package version 3.3.1 for all analyses with glm() and the survival-package version 2.39-5 for all analyses with survival::clogit().