Introduction to convey

Djalma Pessoa

2016-06-23

Library convey

The library convey aims at estimating measures of poverty and income concentration. There are already at least two libraries covering this subject: vardpoor and Laeken. The main difference between the library convey and these two is that the convey strongly hinges on the survey library.

Some measures of poverty and income concentration are defined by non-differentiable functions so that it is not possible to use Taylor linearization to estimate their variances. An alternative is to use Influence functions as described in Deville (1999) and Osier (2009). The library convey implements this methodology to work with survey.design objects and also with svyrep.design objects.

Some examples of these measures are:

\[ qsr=\frac{\sum_U 1(y_i>q_{.80})}{\sum_U 1(y_i\leq q_{.20})} \]

Note that it is not possible to use Taylor linearization for these measures because they depend on quantiles and the Gini is defined as a function of ranks. This could be done using the approach proposed by Deville (1999) based upon influence functions.

Influence function

Let \(U\) be a population of size \(N\) and \(M\) be a measure that allocates mass one to the set composed by one unit, that is \(M(i)=M_i= 1\) if \(i\in U\) and \(M(i)=0\) if \(i\notin U\)

Now, a population parameter \(\theta\) can be expressed as a functional of \(M\) \[ \theta=T(M) \]

Examples of such parameters are:

To estimate these parameters from the sample, we replace the measure \(M\) by the estimated measure \(\hat{M}\) defined by: \(\hat{M}(i)=\hat{M}_i= w_i\) if \(i\in s\) and \(\hat{M}(i)=0\) if \(i\notin s\).

The estimators of the population parameters can then be expressed as functional of the measure \(\hat{M}\).

The variance estimator

The variance of the estimator \(T(\hat{M})\) can approximated by:

\[ Var\left[T(\hat{M})\right]\cong var\left[\sum_s w_i z_i\right] \]

The linearized variable \(z\) is given by the derivative of the functional:

\[ z_k=lim_{t\rightarrow0}\frac{T(M+t\delta_k)-T(M)}{t}=IT_k(M) \] where, \(\delta_k\) is the Dirac measure in \(k\): \(\delta_k(i)=1\) if and only if \(i=k\).

This derivative is called Influence Function and was introduced in the area of Robust Statistics.

Influence functions - Examples

  • Total: \[ \begin{align} IT_k(M)&=lim_{t\rightarrow 0}\frac{T(M+t\delta_k)-T(M)}{t}\\ &=lim_{t\rightarrow 0}\frac{\int y.d(M+t\delta_k)-\int y.dM}{t}\\ &=lim_{t\rightarrow 0}\frac{\int yd(t\delta_k)}{t}=y_k \end{align} \]

  • Ratio of two totals: \[ \begin{align} IR_k(M)&=I\left(\frac{U}{V}\right)_k(M)=\frac{V(M)\times IU_k(M)-U(M)\times IV_k(M)}{V(M)^2}\\ &=\frac{X y_k-Y x_k}{X^2}=\frac{1}{X}(y_k-Rx_k) \end{align} \]

Linearization by influence function - Examples

  • At-risk-of-poverty threshold: \[ arpt = 0.6\times m \] where \(m\) is the median income.

\[ z_k= -\frac{0.6}{f(m)}\times\frac{1}{N}\times\left[I(y_k\leq m-0.5) \right] \]

  • At-risk-of-poverty rate:

\[ arpr=\frac{\sum_U I(y_i \leq t)}{\sum_U w_i}.100 \] \[ z_k=\frac{1}{N}\left[I(y_k\leq t)-t\right]-\frac{0.6}{N}\times\frac{f(t)}{f(m)}\left[I(y_k\leq m)-0.5\right] \]

where:

\(N\) - population size;

\(t\) - at-risk-of-poverty threshold;

\(y_k\) - income of person \(k\);

\(m\) - median income;

\(f\) - income density function;

Structure of the library

In the library convey, there are some basic functions that produces the linearized variables of some estimates that often enter in the definition of measures of concentration and poverty. For example the quantile which is linearized by the function svyiqalpha. Other example is the function svyisq that linearizes the total below a quantile of the variable.

From the linearized variables of these basic estimates it is possible by using rules of composition, valid for influence functions, to derive the influence function of more complex estimates. By definition the influence function is a Gateaux derivative and the rules rules of composition valid for Gateaux derivatives also hold for Influence Functions.

The following property of Gateaux derivatives was often used in the library convey. Let \(g\) be a differentible function of \(m\) variables. Suppose we want to compute the influence function of the estimator \(g(T_1, T_2,\ldots, T_m)\), knowing the Influence function of the estimators \(T_i, i=1,\ldots, m\). Then the following holds:

\[ I(g(T_1, T_2,\ldots, T_m)) = \sum_{i=1}^m \frac{\partial g}{\partial T_i}I(T_i) \]

In the library convey this rule is implemented by the function contrastinf which uses the R function deriv to compute the formal partial derivatives \(\frac{\partial g}{\partial T_i}\).

For example, suppose we want to linearize the Relative median poverty gap(rmpg), defined as the difference between the at-risk-of-poverty threshold (arpt) and the median of incomes less than the arpt relative to the arprt:

\[ rmpg= \frac{arpt-medpoor} {arpt} \]

where medpoor is the median of incomes less than arpt.

Suppose we know how to linearize arpt and medpoor, then by applying the function contrastinf with \[ g(T_1,T_2)= \frac{(T_1 - T_2)}{T_1} \] we linearize the rmpg.

Examples of use of the library convey

In the following examples we will use the data set eusilc contained in the libraries vardpoor and Laeken.

library(vardpoor)
data(eusilc)

Next, we create an object of class survey.design using the function svydesign of the library survey:

library(survey)
des_eusilc <- svydesign(ids = ~rb030, strata =~db040,  weights = ~rb050, data = eusilc)

Right after the creation of the design object des_eusilc, we should use the function convey_prep that adds an attribute to the survey design which saves information on the design object based upon the whole sample, needed to work with subset designs.

library(convey)
des_eusilc <- convey_prep( des_eusilc )
## preparing your full survey design to work with R convey package functions
## 
note that this function must be run on the full survey design object immediately after the svydesign() or svrepdesign() call.
## 

To estimate the at-risk-of-poverty rate we use the function svyarpt:

svyarpr(~eqIncome, design=des_eusilc)
            arpr     SE
eqIncome 0.14444 0.0028

To estimate the at-risk-of-poverty rate for domains defined by the variable db040 we use

svyby(~eqIncome, by = ~db040, design = des_eusilc, FUN = svyarpr, deff = FALSE)
                      db040  eqIncome se.eqIncome
Burgenland       Burgenland 0.1953984 0.017202243
Carinthia         Carinthia 0.1308627 0.010610622
Lower Austria Lower Austria 0.1384362 0.006517660
Salzburg           Salzburg 0.1378734 0.011579280
Styria               Styria 0.1437464 0.007452360
Tyrol                 Tyrol 0.1530819 0.009880430
Upper Austria Upper Austria 0.1088977 0.005928336
Vienna               Vienna 0.1723468 0.007682826
Vorarlberg       Vorarlberg 0.1653731 0.013754670

Using the same data set, we estimate the quintile share ratio:

# for the whole population
svyqsr(~eqIncome, design=des_eusilc, alpha= .20)
          qsr     SE
eqIncome 3.97 0.0426
# for domains
svyby(~eqIncome, by = ~db040, design = des_eusilc,
  FUN = svyqsr, alpha= .20, deff = FALSE)
                      db040 eqIncome se.eqIncome
Burgenland       Burgenland 5.008486  0.32755685
Carinthia         Carinthia 3.562404  0.10909726
Lower Austria Lower Austria 3.824539  0.08783599
Salzburg           Salzburg 3.768393  0.17015086
Styria               Styria 3.464305  0.09364800
Tyrol                 Tyrol 3.586046  0.13629739
Upper Austria Upper Austria 3.668289  0.09310624
Vienna               Vienna 4.654743  0.13135731
Vorarlberg       Vorarlberg 4.366511  0.20532075

These functions can be used as S3 methods for the classes survey.design and svyrep.design.

Let’s create a design object of class svyrep.design and run the function convey_prep on it:

des_eusilc_rep <- as.svrepdesign(des_eusilc, type = "bootstrap")
des_eusilc_rep <- convey_prep(des_eusilc_rep) 
## preparing your full survey design to work with R convey package functions
## 
note that this function must be run on the full survey design object immediately after the svydesign() or svrepdesign() call.
## 

and then use the function svyarpr:

svyarpr(~eqIncome, design=des_eusilc_rep)
            arpr     SE
eqIncome 0.14444 0.0026
svyby(~eqIncome, by = ~db040, design = des_eusilc_rep, FUN = svyarpr, deff = FALSE)
                      db040  eqIncome se.eqIncome
Burgenland       Burgenland 0.1953984 0.015948955
Carinthia         Carinthia 0.1308627 0.009369766
Lower Austria Lower Austria 0.1384362 0.006378286
Salzburg           Salzburg 0.1378734 0.012678287
Styria               Styria 0.1437464 0.007245318
Tyrol                 Tyrol 0.1530819 0.010223210
Upper Austria Upper Austria 0.1088977 0.005749901
Vienna               Vienna 0.1723468 0.008765321
Vorarlberg       Vorarlberg 0.1653731 0.014346126

The functions of the library convey are called in a similar way to the functions in library survey.

It is also possible to deal with missing values by using the argument na.rm.

# survey.design using a variable with missings
svygini( ~ py010n , design = des_eusilc )
       gini SE
py010n   NA NA
svygini( ~ py010n , design = des_eusilc , na.rm = TRUE )
          gini     SE
py010n 0.64606 0.0036
# svyrep.design using a variable with missings
# svygini( ~ py010n , design = des_eusilc_rep ) get error
svygini( ~ py010n , design = des_eusilc_rep , na.rm = TRUE )
          gini     SE
py010n 0.64606 0.0041

FGT indicator

Foster and all(1984) proposed a family of indicators to measure poverty.

The class of \(FGT\) measures, can be defined as

\[ p=\frac{1}{N}\sum_{k\in U}h(y_{k},\theta ), \]

where

\[ h(y_{k},\theta )=\left[ \frac{(\theta -y_{k})}{\theta }\right] ^{\gamma }\delta \left\{ y_{k}\leq \theta \right\} , \]

where: \(\theta\) is the poverty threshold; \(\delta\) the indicator function that assigns value 1 if the condition \(\{y_{k}\leq \theta \}\) is satisfied and 0 otherwise, and \(\gamma\) is a non-negative constant.

When \(\gamma =0\), \(p\) can be interpreted as the ratio of poor people, and for \(\gamma \geq 1\), the weight of poor people increases with the value \(\gamma\), (Foster and all, 1984).

The poverty measure FGT is implemented in the library convey by the function svyfgt. The argument thresh_type of this function defines the type of poverty threshold adopted. There are three possible choices:

  1. abs – fixed and given by the argument thresh_value
  2. relq – a proportion of a quantile fixed by the argument proportion and the quantile is defined by the argument order.
  3. relm – a proportion of the mean fixed the argument proportion

The quantile and the mean involved in the definition of the threshold are estimated for the whole population. When \(\gamma=0\) and \(\theta= .6*MED\) the measure is equal to the indicator arpr computed by the function svyarpr.

Next, we give some examples of the function svyfgt to estimate the values of the FGT poverty index.

Consider first the poverty threshold fixed (\(\gamma=0\)) in the value \(10000\). The headcount ratio (FGT0) is

svyfgt(~eqIncome, des_eusilc, g=0, abs_thresh=10000)
            fgt0     SE
eqIncome 0.11444 0.0027

The poverty gap (FGT1) (\(\gamma=1\)) index for the poverty threshold fixed at the same value is

svyfgt(~eqIncome, des_eusilc, g=1, abs_thresh=10000)
             fgt1     SE
eqIncome 0.032085 0.0011

To estimate the FGT0 with the poverty threshold fixed at \(0.6* MED\) we fix the argument type_thresh=“relq” and use the default values for percent and order:

svyfgt(~eqIncome, des_eusilc, g=0, type_thresh= "relq")
            fgt0     SE
eqIncome 0.14444 0.0028

that matches the estimate obtained by

svyarpr(~eqIncome, design=des_eusilc, .5, .6)
            arpr     SE
eqIncome 0.14444 0.0028

To estimate the poverty gap(FGT1) with the poverty threshold equal to \(0.6*MEAN\) we use:

svyfgt(~eqIncome, des_eusilc, g=1, type_thresh= "relm")
             fgt1     SE
eqIncome 0.051187 0.0012

References

Berger, Y.G. e C.J. Skinner (to be published) - Variance Estimation for a Low-Income Proportion.

Foster, K., J. Greer e E. Thornbecke (1984) - A Class of Decomposable Poverty Measure. Econometrica, 52, 761-766.

Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. , Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL .

Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL .