Semi-custom data

Data is generated by combining two aspects. First it must be specified how data adds to the model, then it needs to be specified what data it is. You can use the following functions for that:

What data is added can be specified for given generators:

Meaning we can specify a linear mixed model set-up with the regressor x, the model error e a random effect v and a spatially correlated random effect vSp as follows:

library(saeSim)
setup <- sim_base() %>% sim_gen_x() %>% sim_gen_e() %>% sim_gen_v() %>% 
  sim_gen(gen_v_sar(name = "vSp")) %>% sim_resp_eq(y = 100 + x + v + vSp + e)
setup
##   idD idU        x      e     v     vSp      y
## 1   1   1 -4.13032 -5.765 1.106 -0.8515  90.36
## 2   1   2 -0.03954 -5.174 1.106 -0.8515  95.04
## 3   1   3 -6.52239  3.600 1.106 -0.8515  97.33
## 4   1   4 -0.76599  2.028 1.106 -0.8515 101.52
## 5   1   5 -1.82341  1.158 1.106 -0.8515  99.59
## 6   1   6  5.55645 10.608 1.106 -0.8515 116.42

To get the simulated data as a list:

dataList <- sim(setup, R = 500)

Contaminated data

When interested in contamination it is important to know, that the contamination adds additively to the values in the data. This means how data is added to the model changes, the data generators stay the same. If you want a contaminated spatially correlated error component you can add the following to the setup object from above:

contSetup <- setup %>% 
  sim_gen_cont(gen_v_sar(sd = 40, name = "vSp"), nCont = 0.05, type = "area", areaVar = "idD", fixed = TRUE)

Note that the generator is the same but with a higher standard deviation. The argument nCont controls how much observations are contaminated. Values < 1 are interpreted as probability. A single number as the number of contaminated units (can be areas or observations in each area or observations). When given with length(nCont) > 1 it will be interpreted as number of contaminated observations in each area. Use the following example to see how these things play together:

sim(base_id(3, 4) %>% sim_gen_x() %>% sim_gen_e() %>% 
      sim_gen_ec(mean = 0, sd = 150, name = "eCont", nCont = c(1, 2, 3)))
## [[1]]
## Source: local data frame [12 x 8]
## 
##    idD idU          x         e    eCont   idC idR simName
## 1    1   1  3.0426745 -8.895868    0.000 FALSE   1        
## 2    1   2 -2.1870423  1.312302    0.000 FALSE   1        
## 3    1   3 -6.6599026  1.871466    0.000 FALSE   1        
## 4    1   4 -9.4016207  1.041706    5.916  TRUE   1        
## 5    2   1 -2.2121344 -3.195060    0.000 FALSE   1        
## 6    2   2  6.5940150  3.988802    0.000 FALSE   1        
## 7    2   3 -0.0279630  6.005806 -191.437  TRUE   1        
## 8    2   4  1.5433492 -0.006177   13.658  TRUE   1        
## 9    3   1 10.1495513 -1.645273    0.000 FALSE   1        
## 10   3   2 -6.5109211  0.205836  310.542  TRUE   1        
## 11   3   3 -1.3331190 -4.534455  292.471  TRUE   1        
## 12   3   4 -0.0001271 -8.006533 -265.403  TRUE   1