In econometrics, fixed effects binary choice models are important tools for panel data analysis. The package `bife`

provides a new approach suggested by Stammann, Heiss, and McFadden (2016) to estimate logit and probit panel data models of the following form:

\[y_{it} = \mathbf{1}\left[\mathbf{x}_{it}\boldsymbol{\beta} + \alpha_{i} + \epsilon_{it} > 0\right] \;,\]

where \(i = 1, \dots, N\) and \(t = 1, \dots, T_i\) denote different indices. In many applications, \(i\) represents individuals, firms or other cross-sectional units and \(t\) represents time in a longitudinal data set. But the setup is also useful for instance if \(i\) represents ZIP code areas and \(t\) is an index of individuals. The dependent variable \(y_{it}\) is binary. We observe regressors collected in the vector \(\mathbf{x}_{it}\) but we don’t observe the error term \(\epsilon_{it}\).

We are primarily interested in estimating the parameters \(\boldsymbol{\beta}\), but the model also includes individual fixed effects \(\alpha_{i}\). We assume \(E(\epsilon_{it} | \mathbf{X}_{i}, \alpha_{i}) = 0\) but don’t make any assumptions about the marginal distribution of \(\alpha_{i}\) or its correlation with the regressors \(\mathbf{x}_{i1},\dots, \mathbf{x}_{iT_i}\).

The estimator implemented in this package is based on maximum likelihood estimation (ML) of both \(\boldsymbol{\beta}\) and \(\alpha_1, \dots, \alpha_N\). It actually is the same estimator as a logistic regression a set of individual dummy variables such as

`glm(y ~ X + factor(i), family = binomial())`

The main difference is that In contrast to `glm()`

, `bife()`

applies a pseudo-demeaning algorithm proposed by Stammann, Heiss, and McFadden (2016)^{1}. Its computational costs are lower by orders of magnitude if \(N\) is reasonably large.

It is well known that as \(N\rightarrow\infty\), the ML estimator is not consistent. This “incidental parameters problem” can be severe if \(T\) is small. To tackle this problem, we provide an analytical and a jackknife bias correction for the structural parameters \(\boldsymbol{\beta}\) and the average partial effects (Hahn and Newey 2004). Thus this package is well suited to analyse big micro-data where \(N\) and/or \(T\) are large.

This package provides methods to:

`bife()`

– estimate binary choice models with fixed effects with/-out bias correction`apeff.bife()`

– compute average partial effects^{2}with/-out bias correction

Both methods utilize the `RcppArmadillo`

-package provided by Eddelbuettel and Sanderson (2014).

An alternative to full ML estimation of all parameters is a conditional maximum likelihood estimator which conditions out \(\alpha_1, \dots, \alpha_N\) and only estimates \(\boldsymbol\beta\). It is for example available with `survival::clogit()`

and is consistent under the usual regularity conditions. The problem with this estimator is that its computational burden increases dramatically for larger \(T\) values and that partial effects cannot be consistently estimated since this would require estimates of \(\alpha_1, \dots, \alpha_N\).

To demonstrate the computational advantage of `bife()`

, we compare it with two popular methods to estimate logit models with fixed effects: `glm()`

and `survival::clogit()`

.^{4} To compare these methods, we utilize the data generating process of Greene (2004) :

\[y_{it} = \mathbf{1}\left[w_{it} + \epsilon_{it} > 0\right] \;,\] \[w_{it} = \alpha_{i} + x_{it} + d_{it} \;,\] \[\alpha_{i} = \sqrt{T} \bar{x_{i}} + a_{i} \;,\] \[x_{it} \sim \mathcal{N}[0, 1] \;,\] \[d_{it} = \mathbf{1}\left[x_{it} + h_{it} > 0\right] \;,\] \[\epsilon_{it} = \log\left[u_{it} / (1 - u_{it})\right] \;,\]

where \(x_{it}, h_{it}, a_{i} \sim \mathcal{N}[0, 1]\), \(u_{it} \sim \mathcal{U}[0, 1]\), \(i = 1, \dots, N\), and \(t = 1, \dots, T\).

To show how the computational burden changes with \(N\) and \(T\), we either fix \(N\) and vary \(T\) or vice versa. For each of the different combinations we generate 50 datasets and measure the compuational time with the following commands:

```
time.bife <- system.time(bife( y ~ x + d | id, model = "logit", bias.corr = "ana"))[3]
time.clogit <- system.time(clogit(y ~ x + d + strata(id)) )[3]
time.glm <- system.time(glm( y ~ x + d + 0 + factor(id), family = binomial()) )[3]
```

The following table reports the average computation time in seconds for each method and all combinations of \(N\) and \(T\). Note that `bife.corr`

refers to the results with analytical bias correction.

N | T | bife.corr | clogit | glm | N | T | bife.corr | clogit | glm |
---|---|---|---|---|---|---|---|---|---|

100 | 10 | 0.00880 | 0.00736 | 0.13450 | 100 | 10 | 0.00870 | 0.00746 | 0.13430 |

200 | 10 | 0.01220 | 0.01216 | 0.89906 | 100 | 20 | 0.01174 | 0.01412 | 0.24614 |

300 | 10 | 0.01590 | 0.01710 | 3.06478 | 100 | 30 | 0.01448 | 0.02400 | 0.34784 |

400 | 10 | 0.01964 | 0.02176 | 7.53678 | 100 | 40 | 0.01744 | 0.03638 | 0.40704 |

500 | 10 | 0.02310 | 0.02658 | 14.42108 | 100 | 50 | 0.02026 | 0.05176 | 0.43400 |

600 | 10 | 0.02644 | 0.03152 | 25.63560 | 100 | 60 | 0.02334 | 0.07006 | 0.45406 |

700 | 10 | 0.03046 | 0.03620 | 39.55658 | 100 | 70 | 0.02624 | 0.09234 | 0.46582 |

800 | 10 | 0.03402 | 0.04122 | 58.09904 | 100 | 80 | 0.02924 | 0.11644 | 0.51412 |

900 | 10 | 0.03762 | 0.04626 | 81.24916 | 100 | 90 | 0.03228 | 0.14450 | 0.60294 |

1000 | 10 | 0.04364 | 0.05102 | 110.23218 | 100 | 100 | 0.03512 | 0.17500 | 0.67712 |

Visualizing the results illustrates that the computational time of `bife()`

is linear in both dimensions and hence is well suited to analyse big panel data. The left figure depicts that with \(T=10\), `glm()`

becomes dramatically more computationally intensive as \(N\) increases. Besides, `glm()`

is always more time consuming than the other two methods regardless which dimension is varied. Additionally, the right figure already indicates that the computation time of `survival::clogit()`

increases more than linearly with \(T\). In applications with very large \(T\), `survival::clogit()`

can quickly become computationally infeasible in terms of convergence, computation time or technique.

This section demonstrates two examples of a typical workflow using `bife::`

with real datasets. The first example uses a balanced panel with many fixed effects (large \(N\)) and the second example a unbalanced panel with large \(T_i\).

The first example is inspired by Hyslop (1999) who analysed the labor force participation of married women in a “classic” balanced panel. The sample was obtained from the “Panel Study of Income Dynamics” and contains information about \(N=1461\) women that were observed over \(T=9\) years.

To analyse the labor force participation of married women, we specify the following model:

\[LFP_{it} = \mathbf{1}\left[\beta_{1} AGE_{it} + \beta_{2} (INCH / 1000)_{it} + \beta_{3} KID1_{it} + \beta_{4} KID2_{it} + \beta_{5} KID3_{it} + \alpha_{i} + \epsilon_{it} > 0\right] \;,\]

where \(LFP_{it}\) indicates the labor force participation of a married woman, \(AGE_{it}\) refers to the age, \((INCH / 1000)_{it}\) is the husbands income in thousand dollars, and the \(KID*_{it}\) variables refer to the number of kids in a certain age group.

We start with a comparison of different methods to estimate logit models with fixed effects similiar to the section before. The following table reports the structural parameters (\(\boldsymbol{\beta}\)) and the execution time for each method, where **bife.corr** refers to the results with analytical bias correction.

bife | glm | bife.corr | clogit | |
---|---|---|---|---|

AGE | 0.0378720 | 0.0378720 | 0.0339453 | 0.0336164 |

I(INCH / 1000) | -0.0086599 | -0.0086599 | -0.0076302 | -0.0074680 |

KID1 | -1.1742997 | -1.1742997 | -1.0529849 | -1.0301399 |

KID2 | -0.5690298 | -0.5690298 | -0.5091778 | -0.5000720 |

KID3 | -0.0115029 | -0.0115029 | -0.0105624 | -0.0102339 |

Time in sec | 0.0350000 | 340.2960000 | 0.0370000 | 0.0770000 |

There are two things to highlight in this table. First `bife(..., bias.corr = "no")`

and `glm(..., family = binomial())`

deliver the same structural parameter estimates, but the execution time of `glm(..., family = binomial())`

is about 10,000 times as long. Second the small \(T\) leads to incidental parameters bias, but the analytical bias correction (`bife(..., bias.corr = "ana")`

) is able to correct the structural parameters such that we get very competitive results compared to the unbiased alternative `survival::clogit()`

.

Next, we show how to estimate the specification above with `bife()`

.

```
mod.logit <- bife(LFP ~ AGE + I(INCH / 1000) + KID1 + KID2 + KID3 | ID, data = psid, bias.corr = "ana")
summary(mod.logit)
```

```
## ---------------------------------------------------------------
## Fixed effects logit model
## with analytical bias-correction
##
## Estimated model:
## LFP ~ AGE + I(INCH/1000) + KID1 + KID2 + KID3 | ID
##
## Log-Likelihood= -3045.505
## n= 5976, number of events= 3432
## Demeaning converged after 5 iteration(s)
## Offset converged after 3 iteration(s)
##
## Corrected structural parameter(s):
##
## Estimate Std. error t-value Pr(> t)
## AGE 0.033945 0.012990 2.613 0.00899 **
## I(INCH/1000) -0.007630 0.001993 -3.829 0.00013 ***
## KID1 -1.052985 0.096186 -10.947 < 2e-16 ***
## KID2 -0.509178 0.084510 -6.025 1.79e-09 ***
## KID3 -0.010562 0.060413 -0.175 0.86121
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ( 7173 observation(s) deleted due to perfect classification )
## Average individual fixed effects= 0.0122
## ---------------------------------------------------------------
```

The parameters of binary outcome variables are difficult to interpret quantitatively. In econometrics, partial effects \(\frac{\partial Pr(y_{it}=1)}{\partial x_{itj}}\) are of more interest. Neither `glm()`

nor `survival::clogit()`

provide a routine to compute partial effects. This package provides the function `apeff.bife()`

to compute average partial effects based on the estimated model provided by `bife()`

. The user simply has to specify which of the variables are discrete and which type of bias correction should be used for the computation of the avarage partial effects. The left column named `apeff`

refers to usual uncorrected average partial effects and the right column named `apeff`

refers to semi-corrected average partial effects following Stammann, Heiss, and McFadden (2016).

`apeff.bife(mod.logit, discrete = c("KID1", "KID2", "KID3"), bias.corr = "semi")`

```
## apeff apeff.corrected
## AGE 0.006374356 0.005767347
## I(INCH/1000) -0.001457574 -0.001296385
## KID1 -0.203386638 -0.183839399
## KID2 -0.096692724 -0.087283987
## KID3 -0.001933520 -0.001792393
```

`bife()`

also offers the opportunity to estimate fixed effects probit models by specifiying `model = "probit"`

.

```
mod.probit <- bife(LFP ~ AGE + I(INCH / 1000) + KID1 + KID2 + KID3 | ID,
data = psid, bias.corr = "ana", model = "probit")
summary(mod.probit)
```

```
## ---------------------------------------------------------------
## Fixed effects probit model
## with analytical bias-correction
##
## Estimated model:
## LFP ~ AGE + I(INCH/1000) + KID1 + KID2 + KID3 | ID
##
## Log-Likelihood= -3046.317
## n= 5976, number of events= 3432
## Demeaning converged after 4 iteration(s)
## Offset converged after 3 iteration(s)
##
## Corrected structural parameter(s):
##
## Estimate Std. error t-value Pr(> t)
## AGE 0.019508 0.007557 2.582 0.00986 **
## I(INCH/1000) -0.004541 0.001154 -3.935 8.41e-05 ***
## KID1 -0.621357 0.054957 -11.306 < 2e-16 ***
## KID2 -0.303336 0.048954 -6.196 6.17e-10 ***
## KID3 -0.005361 0.034907 -0.154 0.87795
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ( 7173 observation(s) deleted due to perfect classification )
## Average individual fixed effects= 0.0306
## ---------------------------------------------------------------
```

Although the structural parameters are different compared to the logit model due to a different normalization, the average partial effects are similiar:

`apeff.bife(mod.probit, discrete = c("KID1", "KID2", "KID3"), bias.corr = "semi")`

```
## apeff apeff.corrected
## AGE 0.006179070 0.005637702
## I(INCH/1000) -0.001449039 -0.001312178
## KID1 -0.199812313 -0.184140972
## KID2 -0.096109623 -0.088374005
## KID3 -0.001468476 -0.001547728
```

The second example is based on a sample drawn from the American Community Survey (ACS PUMS 2014) were the panel structure is slightly different in comparison to the “classic” structure used in the section before. Instead of individual fixed effects we consider state fixed effects. \(N\) can be now considered as the number of groups (states) and \(T_i\) as the group size of group \(i\).

In this example we observe a total of 662,775 married women in \(N = 51\) states. Since each state is of different population size, we end up with a highly unbalanced panel were the largest state consists of \(T_{\max} = 74,752\) and the smallest of \(T_{\min} = 855\) married women.

The model can be described as follows: \[LFP_{it} = \mathbf{1}\left[\beta_{1} AGEP_{it} + \beta_{2} (PINCP / 1000)_{it} + \beta_{3} FER_{it} + \alpha_{i} + \epsilon_{it} > 0\right] \;,\]

where \(LFP_{it}\) indicates the labor force participation of a married woman, \(AGEP_{it}\) refers to the age, \((PINCP / 1000)_{it}\) is the total persons income in thousand dollars, and \(FER_{it}\) indicates if a woman gave birth to a child within the past 12 months. In this example \(i\) refers to one of the states and \(t\) refers to one of the individuals observed in this state.

As before, we start with a comparison of different methods to estimate logit models with fixed effects.

bife | glm | bife.corr | clogit | |
---|---|---|---|---|

AGEP | -0.0757220 | -0.0757220 | -0.0757159 | NA |

I(PINCP / 1000) | 0.0676899 | 0.0676899 | 0.0676870 | NA |

FER | -1.0004365 | -1.0004365 | -1.0003380 | NA |

Time in sec | 1.6580000 | 18.9230000 | 1.9330000 | NA |

There are again two things to highlight. First since the bias of \(\boldsymbol{\hat{\beta}}\) obtained from `bife(..., bias.corr = "no")`

vanishes with large \(T\), the bias correction is redundant. Second since the panel structure consists of very large \(T\), `survival::clogit()`

is not able to handle this dataset:

`print(try(clogit(LFP ~ AGEP + I(PINCP / 1000) + FER + strata(ST), data = acs)))`

```
## [1] "Error in fitter(X, Y, strats, offset, init, control, weights = weights, : \n (number at risk) * (number tied deaths) is too large\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in fitter(X, Y, strats, offset, init, control, weights = weights, method = method, row.names(mf)): (number at risk) * (number tied deaths) is too large>
```

Next, we will show how to analyse panel data with `bife(..., bias.corr = "no")`

following the specification above.

```
mod.logit <- bife(LFP ~ AGEP + I(PINCP / 1000) + FER | ST, data = acs, bias.corr = "no")
summary(mod.logit)
```

```
## ---------------------------------------------------------------
## Fixed effects logit model
## with no bias-correction
##
## Estimated model:
## LFP ~ AGEP + I(PINCP/1000) + FER | ST
##
## Log-Likelihood= -287688.2
## n= 662775, number of events= 398766
## Demeaning converged after 7 iteration(s)
##
## Uncorrected structural parameter(s):
##
## Estimate Std. error t-value Pr(> t)
## AGEP -0.0757220 0.0002542 -297.88 <2e-16 ***
## I(PINCP/1000) 0.0676899 0.0002140 316.25 <2e-16 ***
## FER -1.0004365 0.0175241 -57.09 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Average individual fixed effects= 3.0803
## ---------------------------------------------------------------
```

`apeff.bife(mod.logit, discrete = "FER")`

```
## apeff
## AGEP -0.010776434
## I(PINCP/1000) 0.009633341
## FER -0.139900085
```

Since we estimated a logit model without bias-correction, `apeff()`

delivers only one column with uncorrected average partial effects.

For further details on when to use the bias-corrections for structural parameters and average partial effects please consult Stammann, Heiss, and McFadden (2016) and Hahn and Newey (2004).

Chamberlain, Gary. 1980. “Analysis of Covariance with Qualitative Data.” *The Review of Economic Studies* 47 (1): 225–38.

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” *Computational Statistics and Data Analysis* 71: 1054–63.

Greene, William. 2004. “The Behaviour of the Maximum Likelihood Estimator of Limited Dependent Variable Models in the Presence of Fixed Effects.” *Econometrics Journal* 7 (1): 98–119.

Hahn, Jinyong, and Whitney Newey. 2004. “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models.” *Econometrica* 72 (4): 1295–1319.

Hyslop, Dean R. 1999. “State Dependence, Serial Correlation and Heterogeneity in Intertemporal Labor Force Participation of Married Women.” *Econometrica* 67 (6): 1255–94.

Stammann, Amrei, Florian Heiss, and Daniel McFadden. 2016. “Estimating Fixed Effects Logit Models with Large Panel Data.”

The proposed pseudo-demeaning algorithm is in spirit of Greene (2004) and Chamberlain (1980).↩

Partial effects are also called marginal effects.↩

A more detailed analyses can be found in Stammann, Heiss, and McFadden (2016)↩

We use the

`base`

-package version 3.3.1 for all analyses with`glm()`

and the survival-package version 2.39-5 for all analyses with`survival::clogit()`

.↩