Time-Varying Ideal Points

Robert Kubinec

2018-10-30

Note: To report bugs with the package, please file an issue on the Github page.

If you use this package, please cite the following:

Kubinec, Robert. “Generalized Ideal Point Models for Time-Varying and Missing-Data Inference”. Working Paper.

This package implements to kinds of time-varying ideal point models. Because these time-varying models are independent of the specific outcome used, time-varying ideal point models can be fit with any outcome/response supported by the package, including binary, ordinal, counts, continuous and positive-continuous data, in addition to the latent space model for binary data. This vignette demonstrates the use of the two time-varying ideal point models and how to decide between them with example data drawn from the 114th Senate.

data('senate114')
knitr::kable(head(select(senate114,1:7)))
bioname born party_code rollnumber date cast_code congress
SESSIONS, Jefferson Beauregard III (Jeff) 1946 R 4 2015-01-20 Yes 114
SESSIONS, Jefferson Beauregard III (Jeff) 1946 R 5 2015-01-20 Yes 114
SESSIONS, Jefferson Beauregard III (Jeff) 1946 R 7 2015-01-21 Yes 114
SESSIONS, Jefferson Beauregard III (Jeff) 1946 R 8 2015-01-21 No 114
SESSIONS, Jefferson Beauregard III (Jeff) 1946 R 9 2015-01-21 Yes 114
SESSIONS, Jefferson Beauregard III (Jeff) 1946 R 11 2015-01-21 No 114

The process to create a time-varying ideal point model is no different than that for creating a static model, except that a column should exist in the data with dates, preferably in date or date-time format. If you have a character vector of dates that you need to convert to R’s date format, check out the excellent lubridate package. I will demonstrate some of the package’s functionality by re-coding the senate114 dates, which are currently in year-month-day format. That will lead us to estimate one ideal point per day in the sample, which would be a lot of ideal points, as the following histogram of dates shows:

senate114 %>% 
  distinct(rollnumber,date) %>% 
  ggplot(aes(x=date)) +
  geom_bar() +
  theme_minimal() + 
  ylab('Count of Rollcall Votes') +
  xlab('') +
  ggtitle('Count of Votes by Day in the 114th Senate')

We see that for many individual days in the sample there are only a few votes at most. In addition, there are a total of 98 time points in the sample, which would give us a lot of time points. We could certainly fit a model to all these time points, even with a small amount of data, but for this example I will show how to roll up the dates to the month level.

Using lubridate, we simply change the day of each month to 1:

day(senate114$date) <- 1

We can then plot the aggregated votes:

senate114 %>% 
  distinct(rollnumber,date) %>% 
  ggplot(aes(x=date)) +
  geom_bar() +
  theme_minimal() + 
  ylab('Count of Rollcall Votes') +
  xlab('') +
  ggtitle('Count of Votes by Month in the 114th Senate')

We have now reduced the number of time points to 21. Again, we could certainly model every time point in the data, but for the purposes of illustration we reduced the number. Aggregating dates can be useful when there is simply too much granularity in the data, such as dates recording down to each second or hour.

There are two time-varying models included in idealstan package, each of which makes different assumptions about how ideal points change over time. It is important to note that neither of these models is superior to the other. Ideal points do not have any natural time process as they are a latent, unobserved construct, so the question is more about which time process is most relevant to the social or physical process being studied.

The first kind of model included in idealstan is known as a random-walk process (also non-stationary time-series and I(1)). This simple model of time implies that the location of an ideal point in the current time point is equal to the position of the ideal point in the prior time point plus some random noise. A helpful analogy is to imagine a frog hopping around a room. It could head in virtually any direction.

The advantage of the random-walk model is that it allows ideal points to move in any direction. The downside is that it can assume too much change in the ideal point process. It also does not provide a great deal of information about the time series other than the variance parameter of the time series that indicate the average rate of change over time (i.e., how bouncy the time series is). Furthermore, random-walk models change significantly when other covariates are included in the model, as an additional covariate that has a constant effect over time will push the time-series in a single direction.

Despite these limitations, this model is still useful, especially in two situations. First, when little is known about the time process/social situation, this model makes the most minimal assumptions about how ideal points change. Second, when the time series is of a relatively long time period, then the time series is likely to have some kind of random-walk nature, especially if there is no natural limit. For example, when looking at legislature voting data, ideal points may follow a random-walk pattern when looking at a legislator’s entire career over decades.

The second model included in idealstan is a stationary time series model (also called an AR(1) or first-order autoregressive time series). A stationary time-series is so called because it must return over time to a long-term average or mean. Change over time is conceived of as shocks that push the time series away from its long-term average. The AR(1) model includes additional parameters that measure how fast a time-series will return to its long-term average. A good empirical example for this model is economic growth over time. There are periods when “growth shocks” occur, such as recessions and boom times. Overall, though, economic growth for a specific country will tend towards some long-term average rate of growth. Economic growth can’t simply move off in any direction, especially in an upward direction, as that would over-heat the economy and result in massive inflation.

Returning to the analysis of legislatures, a stationary model might be more appropriate for shorter time spans, such as the one in our Senate data that covers two years. Over this period of time, ideal points might experience shocks, such as scandals, but legislators are unlikely to change their fundamental policy positions in the course of a year. As such, a stationary model may be more appropriate.

In addition, stationary models allow us to fit covariates that have a more meaningful interpretation: the estimates of covariates represent shocks to the ideal points away from their long-term average. We can even measure the time it takes for an ideal point process to return to its long-term average after experiencing the shock, such as if we included a covariate for 9/11 in the model.

To show what these models look like, we will fit each model to the senate114 data in turn. We use the vb option to produce variational estimates of the true posterior; these approximations are much faster to fit than the full model but usually have some distortions. For finished analysis we would want to use the full sampler (use_vb=FALSE).

Random-Walk Model

To fit the random walk model, we first create data in which we pass the name of the column of dates for each bill/item to the time_id option of id_make:

senate_data <- id_make(senate114,outcome = 'cast_code',
                       person_id = 'bioname',
                       item_id = 'rollnumber',
                       group_id= 'party_code',
                       time_id='date',
                       miss_val='Absent')

We then pass this object to the id_estimate function and specify 'random_walk' in the vary_ideal_pts option. We also use model_type=2 to select a binary model (yes/no votes) that adjust for the missing data (legislator absences). We pass the names of two Senators to restrict their ideal points for identification. For the random walk model, only the first time points for these Senators will be fixed.

sen_est <- id_estimate(senate_data,
                model_type = 2,
                 use_vb = T,
                fixtype='vb_partial',
                vary_ideal_pts='random_walk',
                 restrict_ind_high = "WARREN, Elizabeth",
                 restrict_ind_low="BARRASSO, John A.",
            seed=84520)
## Chain 1: Gradient evaluation took 0.04 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 400 seconds.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:    100        -9625.721             1.000            1.000
## Chain 1:    200        -9544.641             0.504            1.000
## Chain 1:    300        -9508.650             0.337            0.008   MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior... 
## Chain 1: Gradient evaluation took 0.031 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 310 seconds.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:    100       -10275.545             1.000            1.000
## Chain 1:    200       -10021.283             0.513            1.000
## Chain 1:    300        -9978.185             0.343            0.025
## Chain 1:    400        -9964.488             0.258            0.025
## Chain 1:    500        -9938.185             0.207            0.004   MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...

Given the fitted model, we can now plot the ideal points. We will turn off the option for showing the uncertainty interval as there are a lot of lines, one for reach of the 100 Senators:

id_plot_legis_dyn(sen_est,use_ci = F)

This plot does now show very much that is particularly interesting. Most of the ideal points are not changing over time, except for some of the moderate Democrats that become slightly more conservative over time. This small amount of change is not surprising as the Senate has become highly polarized and people are not shifting their policy positions.

However, we can also change the model’s parameters to induce more change over time. By default, idealstan restricts the over-time change in ideal points to have an SD of no more than .1. Restricting the variance this low helps with identification, however, it also prevents the ideal points from changing too much, such as switching signs from one time point to the next. We can relax that parameter and see if we get slightly more variation by increasing the restrict_var_high option have an SD of .5:

sen_est <- id_estimate(senate_data,
                model_type = 2,
                 use_vb = T,
                restrict_var_high = .5,
                fixtype='vb_partial',
                vary_ideal_pts='random_walk',
                 restrict_ind_high = "WARREN, Elizabeth",
                 restrict_ind_low="BARRASSO, John A.",
            seed=84520)
## Chain 1: Gradient evaluation took 0.046 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 460 seconds.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:    100        -9640.260             1.000            1.000
## Chain 1:    200        -9537.839             0.505            1.000
## Chain 1:    300        -9527.265             0.337            0.011
## Chain 1:    400        -9489.842             0.254            0.011
## Chain 1:    500        -9480.068             0.203            0.004   MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior... 
## Chain 1: Gradient evaluation took 0.033 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 330 seconds.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:    100       -10564.689             1.000            1.000
## Chain 1:    200       -10295.668             0.513            1.000
## Chain 1:    300       -10017.282             0.351            0.028
## Chain 1:    400       -10045.711             0.264            0.028
## Chain 1:    500        -9960.317             0.213            0.026
## Chain 1:    600        -9975.976             0.178            0.026
## Chain 1:    700        -9933.233             0.153            0.009   MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
id_plot_legis_dyn(sen_est,use_ci = F)

We now see slightly more movement. Republicans as a whole have moved slightly farther away from Democrats. Elizabeth Warren appears to move more, but we constrained her first time point so that may indicate why she moved over time.

We can also look at the variance of the ideal points to see which of the Senators had the highest variance in their ideal points:

id_plot_legis_var(sen_est)
## Joining, by = "id_num"

We can access the actual estimates of the variances by passing the return_data=TRUE option to the plot function:

out_d <- id_plot_legis_var(sen_est,return_data = T)
## Joining, by = "id_num"
knitr::kable(head(out_d$plot_data))
legis low_pt high_pt median_pt id_num person_id group_id
time_var_restrict[1] 0.0079622 0.1192108 0.0365002 1 ALEXANDER, Lamar R
time_var_restrict[10] 0.0061290 0.2261995 0.0596394 10 BROWN, Sherrod D
time_var_restrict[100] 0.0080143 0.2320775 0.0668056 100 WARREN, Elizabeth D
time_var_restrict[11] 0.0046079 0.1718374 0.0385637 11 BURR, Richard M. R
time_var_restrict[12] 0.0054070 0.1848464 0.0387676 12 CANTWELL, Maria E. D
time_var_restrict[13] 0.0065909 0.1585252 0.0406844 13 CAPITO, Shelley Moore R

Stationary Model

We now fit a stationary version of the model by passing 'AR1' to vary_ideal_pts. By default, this model does not put a hard upper limit on the over-time variance, but rather puts a tight prior on over-time variance that biases the variances to zero. We can increase this prior variance slightly by changing the value of time_sd from 0.1 to 0.2 to allow for more variation:

sen_est <- id_estimate(senate_data,
                model_type = 2,
                 use_vb = T,
                time_sd = .2,
                fixtype='vb_partial',
                vary_ideal_pts='AR1',
                 restrict_ind_high = "WARREN, Elizabeth",
                 restrict_ind_low="BARRASSO, John A.",
            seed=84520)
## Chain 1: Gradient evaluation took 0.031 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 310 seconds.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:    100        -9974.227             1.000            1.000
## Chain 1:    200        -9660.058             0.516            1.000
## Chain 1:    300        -9674.684             0.345            0.033
## Chain 1:    400        -9591.990             0.261            0.033
## Chain 1:    500        -9528.500             0.210            0.009   MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior... 
## Chain 1: Gradient evaluation took 0.03 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 300 seconds.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:    100      -156447.324             1.000            1.000
## Chain 1:    200       -25999.792             3.009            5.017
## Chain 1:    300       -19281.111             2.122            1.000
## Chain 1:    400       -17723.003             1.613            1.000
## Chain 1:    500       -17289.463             1.296            0.348
## Chain 1:    600       -14948.539             1.106            0.348
## Chain 1:    700       -10080.245             1.017            0.348
## Chain 1:    800        -9652.038             0.895            0.348
## Chain 1:    900        -9521.241             0.797            0.157
## Chain 1:   1000        -9498.464             0.718            0.157
## Chain 1:   1100        -9468.801             0.618            0.088   MAY BE DIVERGING... INSPECT ELBO
## Chain 1:   1200        -9483.751             0.117            0.044
## Chain 1:   1300        -9494.131             0.082            0.025
## Chain 1:   1400        -9472.728             0.073            0.014
## Chain 1:   1500        -9440.893             0.071            0.003   MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
id_plot_legis_dyn(sen_est,use_ci = F)

This plot shows minor perturbations in the ideal points of individual senators with a high level of relative stability over time. It provides strong evidence that a stationary model fits the data well. Given the short time frame, we would expect the Senator’s ideal points to be more or less stationary.

Finally, we can also examine the individual ideal points by each time point using the summary function:

summary(sen_est,pars='ideal_pts') %>% 
  head %>% 
  knitr::kable(.)
Person Group Time_Point Low Posterior Interval Posterior Median High Posterior Interval Parameter Name
ALEXANDER, Lamar R 2015-01-01 -0.8894863 -0.7099715 -0.5262713 L_tp1[1,1]
BROWN, Sherrod D 2015-01-01 1.5524635 1.9875750 2.4368030 L_tp1[1,10]
WARREN, Elizabeth D 2015-01-01 2.0403390 2.0600100 2.0786415 L_tp1[1,100]
BURR, Richard M. R 2015-01-01 -1.4556720 -1.1831550 -0.9080982 L_tp1[1,11]
CANTWELL, Maria E. D 2015-01-01 1.4340835 1.8196150 2.2506335 L_tp1[1,12]
CAPITO, Shelley Moore R 2015-01-01 -2.3066865 -1.9693400 -1.6310870 L_tp1[1,13]

To examine trace plots of the actual MCMC sampling, we can use the stan_plot function to look at posterior sampling for the first time point for Lamar Alexander based on the value shown in Parameter Name in the table above:

stan_trace(sen_est,'L_tp1[1,1]')

Group-level Time-varying Ideal Points

Finally, we can also re-code the data so that we look at group-level, i.e. party-level, ideal points. To do so we need to specify the use_groups=T option in the id_estimate function, and we change the restricted parameters to parties:s

sen_est <- id_estimate(senate_data,
                model_type = 2,
                 use_vb = T,
                time_sd=0.2,
                use_groups = T,
                fixtype='vb_partial',
                vary_ideal_pts='AR1',
                 restrict_ind_high = "D",
                 restrict_ind_low="R",
            seed=84520)
## Chain 1: Gradient evaluation took 0.032 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 320 seconds.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:    100       -10107.861             1.000            1.000
## Chain 1:    200        -9863.695             0.512            1.000
## Chain 1:    300        -9841.212             0.342            0.025
## Chain 1:    400        -9815.229             0.257            0.025
## Chain 1:    500        -9813.056             0.206            0.003   MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior... 
## Chain 1: Gradient evaluation took 0.029 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 290 seconds.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:    100       -10519.066             1.000            1.000
## Chain 1:    200        -9865.992             0.533            1.000
## Chain 1:    300        -9806.903             0.357            0.066
## Chain 1:    400        -9804.913             0.268            0.066
## Chain 1:    500        -9797.578             0.215            0.006   MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
id_plot_legis_dyn(sen_est) + scale_colour_manual(values=c(R='red',
                                                          D='blue',
                                                          I='green'),
                                                 name="Parties")

Finally, we can also overlay a bill/item midpoint to see where the line of indifference in voting is relative to party positions:

id_plot_legis_dyn(sen_est,item_plot='342',text_size_label = 5) + scale_colour_manual(values=c(R='red',
                                                          D='blue',
                                                          I='green'),
                                                 name="Parties") +
  ggtitle('Time-Varying Party-level Ideal Points for the 114th Senate',
          subtitle = 'Midpoint (Line of Indifference to Voting) for 342nd Roll-call Vote as Dotted Line') +
  guides(color='none') +
  annotate(geom='text',
           x = ymd('2016-01-01'),
           y=-1,
           label='Confirmation Vote for Wilhelmina Wright as U.S. District Judge')

As this plot shows, the line of indifference is in a no-person’s zone in the middle of the plot, signifying the lack of overlap and consensus on legislation in the current Senate.