Here we use the GridSample package (https://doi.org/10.1186/s12942-017-0098-4) to replicate the sample design of the 2010 Rwanda DHS. Further details are available at: <www.dhsprogram.com/pubs/pdf/FR259/FR259.pdf>.
The 2010 Rwanda DHS sampled 12,540 households from 492 PSUs comprising rural villages and urban neighborhoods (30). The sample was stratified by Rwanda’s 30 districts, urban areas were oversampled by adding 12 PSUs in Kigali’s three districts, and 26 households were sampled from each urban and rural PSU. The average village in Rwanda had 610 occupants according to the sample frame.
Therefore, the corresponding parameters for the gs_sample
algorithm are
cfg_hh_per_stratum <- 416 #
cfg_hh_per_urban <- 26 # households from each urban PSU
cfg_hh_per_rural <- 26 # households from each rural PSU
cfg_pop_per_psu <- 610 # limit PSU size to the average village size
We start with two files, a population raster and a shapefile RWAshp
defining the various strata in the census. Here, the strata shapefile consists of the 30 districts. The population raster can be downloaded from <www.worldpop.org.uk>. For the purposes of this example, we will use the population raster that estimates population in Rwanda for 2010, adjusted using census estimates, where each grid-cell represents the population per pixel. The corresponding raster when downloaded from the WorldPop website is named RWA_ppp_v2b_2010_UNadj.tif
.
First, we read in the relevant datasets, and rasterize the strata layer using a field that contains a unique identifier for each polygon.
library(gridsample)
library(raster)
## Warning: package 'raster' was built under R version 3.4.4
## Loading required package: sp
## Warning: package 'sp' was built under R version 3.4.4
population_raster <- raster(system.file("extdata", "RWA_ppp_2010_adj_v2.tif",
package="gridsample"))
plot(population_raster)
data(RWAshp)
strata_raster <- rasterize(RWAshp,population_raster,field = "STL.2")
plot(strata_raster)
We need a raster that defines urban and rural areas as well. Here, we’ll define urban areas by determining the population cell density associated with the official proportion of people in an urban area. The 2012 Rwanda census found 16% of people to live in urban areas, so we’ll change the most dense cells as urban, until 16% of the population is defined as being in urban areas.
total_pop <- cellStats(population_raster, stat = "sum")
urban_pop_value <- total_pop * .16
pop_df <- data.frame(index = 1:length(population_raster[]),
pop = population_raster[])
pop_df <- pop_df[!is.na(pop_df$pop), ]
pop_df <- pop_df[order(pop_df$pop,decreasing = T), ]
pop_df$cumulative_pop <- cumsum(pop_df$pop)
pop_df$urban <- 0
pop_df$urban[which(pop_df$cumulative_pop <= urban_pop_value)] <- 1
urban_raster <- population_raster >= min(subset(pop_df,urban == 1)$pop)
plot(urban_raster)
gs_sample
Now that we have the population, strata, and urbanization rasters, we are ready to use the gridsample algorithm. We exclude any grid cells with a population less than 0.01 to prevent the algorithm from creating overly large sampling areas (cfg_min_pop_per_cell = 0.01
), also restricting the total PSU size to less than 10km^2 (cfg_max_psu_size = 10
). Finally, because we wanted the sample to be representative of both urban and rural areas, we allowed the algorithm to oversample rural or urban areas by setting cfg_sample_rururb = TRUE
. We save the output shapefile to a temporary directory.
psu_polygons <- gs_sample(
population_raster = population_raster,
strata_raster = strata_raster,
urban_raster = urban_raster,
cfg_desired_cell_size = NA,
cfg_hh_per_stratum = 416,
cfg_hh_per_urban = 26,
cfg_hh_per_rural = 26,
cfg_min_pop_per_cell = 0.01,
cfg_max_psu_size = 10,
cfg_pop_per_psu = 610,
cfg_sample_rururb = TRUE,
cfg_sample_spatial = FALSE,
cfg_sample_spatial_scale = 100,
output_path = tempdir(),
sample_name = "rwanda_psu"
)
## [1] "Beginning gs_sample() Process:"
## [1] "Randomly Stratifying PSU Cell IDs:"
## [1] "Processing stratum ID 6 ..."
## [1] "Processing stratum ID 1 ..."
## [1] "Processing stratum ID 4 ..."
## [1] "Processing stratum ID 2 ..."
## [1] "Processing stratum ID 3 ..."
## [1] "Processing stratum ID 5 ..."
## [1] "Processing stratum ID 7 ..."
## [1] "Processing stratum ID 8 ..."
## [1] "Checking urban and rural domains meet stratum HH sample size"
## [1] "Reassignment for stratum 6 of 1 cells processed"
## [1] "Begin Second Stage Sampling, Growing PSUs if requested:"
##
## PLEASE NOTE: The components "delsgs" and "summary" of the
## object returned by deldir() are now DATA FRAMES rather than
## matrices (as they were prior to release 0.0-18).
## See help("deldir").
##
## PLEASE NOTE: The process that deldir() uses for determining
## duplicated points has changed from that used in version
## 0.0-9 of this package (and previously). See help("deldir").
## [1] "PSUs to process: 128"
## [1] "Active PSUs:"
## [1] 110 83 127 46 61 93 51 109 12 32 59 114 27 116 55 64 3
## [18] 50 58 106 7 54 23 21 96 19 101 104 72 115 20 102 123 8
## [35] 84 75 62 49 63 30 44 25 122 120 36 43 91 68 14 85 22
## [52] 126 13 65 77 67 42 89 119 124 15 10 11 97 39 98 28 78
## [69] 5 18 2 125 6 76 95 38 117 35 87 26 88 69 24 121 56
## [86] 31 74 52 86 112 53 81 45 128 34 107 29 33 111 40 60 113
## [103] 92 80 90 48 57 118 47 17 1 41 105 9 37 99 73 108 82
## [120] 71 70 16 100 94 103 66 4 79
## [1] "Active PSU Population Sums:"
## [1] 1144.6572 1306.7236 3424.1336 1199.3570 1160.2891 3161.4045
## [7] 8484.4252 1232.6077 1032.3489 860.4852 955.7482 415.9963
## [13] 1315.1844 1320.9459 848.6110 1473.1985 1091.2181 979.1264
## [19] 959.1565 1319.7061 1568.2858 2152.9234 1028.0767 1128.2486
## [25] 21556.3594 613.1150 1059.5182 1300.3492 1393.4698 1508.1392
## [31] 131.3457 2043.6701 1704.5570 1028.6012 2095.1550 2064.2653
## [37] 1501.9663 1029.0230 851.2141 2175.0904 10245.2136 1364.6390
## [43] 259.5918 2494.7397 987.5451 47911.0375 1599.3976 17843.5810
## [49] 1658.7823 4947.8855 1956.4188 607.0674 1473.3615 1547.9205
## [55] 650.6719 2224.9154 1006.0417 2138.3901 1964.6719 1180.9378
## [61] 1031.0676 521.2857 405.3794 1292.4436 1313.3385 1093.5068
## [67] 452.1606 4930.7139 1308.3001 1162.5517 962.5251 1120.7299
## [73] 5203.7464 1480.8351 1100.8640 708.2816 4699.4580 2682.2158
## [79] 488.4512 948.5000 1034.0282 1624.3811 1273.5639 947.1859
## [85] 2408.5114 1249.9432 1331.9261 1846.3818 1210.3671 1200.2283
## [91] 901.4919 674.9285 1938.3210 947.4876 885.8436 440.3320
## [97] 1382.3109 918.2573 2271.3246 964.0099 10901.2984 12736.4476
## [103] 598.6255 1366.0017 1109.0397 1389.2465 848.9205 1970.9421
## [109] 903.6887 1095.1059 1100.1390 918.5493 1760.1123 1919.0557
## [115] 783.2866 664.7677 1470.7730 933.0224 718.6760 535.4312
## [121] 4722.8411 3274.3940 506.9392 537.9949 1084.5628 1480.0378
## [127] 1131.8322 22994.9311
## [1] "Active PSU Cell Count:"
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1] "Processing 114 ..."
## [1] "... cells added: 1"
## [1] "... population added: 282.767237246037"
## [1] "... population total: 698.763534963131"
## [1] "Processing 20 ..."
## [1] "... cells added: 5"
## [1] "... population added: 491.487710639834"
## [1] "... population total: 622.833435609937"
## [1] "Processing 122 ..."
## [1] "... cells added: 1"
## [1] "... population added: 631.574843214435"
## [1] "... population total: 891.166687278653"
## [1] "Processing 126 ..."
## [1] "... cells added: 1"
## [1] "... population added: 470.618171453476"
## [1] "... population total: 1077.68557143211"
## [1] "Processing 10 ..."
## [1] "... cells added: 1"
## [1] "... population added: 644.818376779556"
## [1] "... population total: 1166.10408675671"
## [1] "Processing 11 ..."
## [1] "... cells added: 1"
## [1] "... population added: 299.59882748127"
## [1] "... population total: 704.978253483772"
## [1] "Processing 28 ..."
## [1] "... cells added: 1"
## [1] "... population added: 581.901154398918"
## [1] "... population total: 1034.06173014641"
## [1] "Processing 87 ..."
## [1] "... cells added: 1"
## [1] "... population added: 219.939943730831"
## [1] "... population total: 708.391180574894"
## [1] "Processing 107 ..."
## [1] "... cells added: 1"
## [1] "... population added: 565.452972531319"
## [1] "... population total: 1005.78496551514"
## [1] "Processing 92 ..."
## [1] "... cells added: 1"
## [1] "... population added: 890.719133973122"
## [1] "... population total: 1489.34467315674"
## [1] "Processing 71 ..."
## [1] "... cells added: 1"
## [1] "... population added: 550.031454861164"
## [1] "... population total: 1085.46263051033"
## [1] "Processing 100 ..."
## [1] "... cells added: 1"
## [1] "... population added: 574.137712478638"
## [1] "... population total: 1081.07694292068"
## [1] "Processing 94 ..."
## [1] "... cells added: 1"
## [1] "... population added: 854.525423288345"
## [1] "... population total: 1392.52030861378"
## [1] "Converting PSU Cells to Polygons and Summarizing Metrics:"
plot(psu_polygons)