Creating new ss3sim operating and estimation model setups

2017-04-18

In some cases you may wish to adapt your own SS3 model to work with the ss3sim package. This is possible but may be difficult because the functions in ss3sim were developed to work with the existing model setups and a model with a different structure may cause errors in these functions. This stems from the high flexibility of SS3, allowing for more complex model setups than those used while developing ss3sim.

For instance, sample_index() does not currently have the capability to handle more than one season. Given the many options available in SS3, it is extremely difficult to write auxiliary functions that will interact reliably with all combinations of these options. For this reason, we recommend that users strongly consider trying to modify an existing model rather than creating a new one. It is likely that the ss3sim functions will need to be modified in some way in the process, so the user should be familiar with both SS3 and R.

The main purpose of the OM (operating model) is to generate data files that can be read into the EM (estimation model). Thus the user needs to setup the .dat files in the OM such that they conform to the structure needed by the EM. Two key examples are with survey and age/length composition data. For the indices of abundance (CPUE and scientific survey) the OM .dat file will determine which years are available to the sampling function sample_index(), so if a year is desired in the EM it needs to be in the OM. In practice it may be easiest to just include all years in the OM if surveys will be dynamic (i.e. changing years between scenarios), or if it will be fixed for all scenarios to set it to match the EM exactly.

Similarly, with age/length compositions, the OM .dat file will determine which years and bins are available to the sampling functions sample_agecomp() and sample_lcomp(). If dynamic binning is to be used, the user should setup the .dat file so that all desired combinations of bins are possible (see the section in the Introduction vignette on dynamic binning for more details). Specifically, the user must specify small enough OM .ctl bins (no smaller than the population bin specified in the appropriate section in the OM .dat file) so that they can easily be re-binned. Alternatively, if composition data is not to be explored in the simulation then the user can just set the OM .dat file to match the desired input for the EM .dat.

For those users who choose to create a new ss3sim model setup, we outline the steps to take an existing SS3 model and modify it to work with the ss3sim package. First, we cover setting up an operating model and then we cover setting up an estimation model.

1 Setting up a new operating model

The first step is to run the assessment model to make sure the model runs and estimates parameters as desired. The .par file generated will be used in the OM. We recommend opening a command window inside your OM and EM folder to help test whether the model still runs at many points along the process. After turning parameter estimation off in the starter file (see below), the model can be checked by running ss3_24o_safe -nohess to make sure the input files are read in properly and the model writes the files. Use the .ss_new files produced as a starting point so that the comments match the instructions below.

You will see an error by ADMB that it cannot find the data file because we have specified a different file in the starter file. You can safely ignore this error.

1.1 Forecast file modifications

1.2 Starter file modifications

1.3 Control file modifications

  1. Delete the <modelname>.ctl file and rename control.ss_new to om.ctl. Modify this new file.

  2. Specify all environmental deviates on biological parameters to be unconstrained by bounds by setting #_env/block/dev_adjust_method to 1. If the method is set to 2, parameters adjusted using environmental covariate inputs will be adjusted using a logistic transformation to ensure that the adjusted parameter will stay within the bounds of the base parameter. If it exists and is not already commented out, comment out the second line entitled #_env/block/dev_adjust_method underneath the section which specifies selectivity parameters. If time-varying selectivity parameters are added using the change_tv() function, this line will be modified by the same function.

  3. Turn on recruitment deviations by specifying #do_recdev to 1. Using the next two lines, specify the use of recruitment deviations to begin and end with the start and end years of the model.

  4. Turn on additional advanced options for the recruitment deviations by specifying # (0/1) to read 13 advanced options to 1.

  5. Set #_recdev_early_start to 0 so that the model will use the # first year of main recr_devs.

  6. Set #_lambda for Fcast_rec_like occurring before endyr+1 to 1. This lambda is for the log likelihood of the forecast recruitment deviations that occur before the first year of forecasting. Values larger than one accommodate noisy data at the end of the time series.

  7. Recruitment is log-normally distributed in SS. If inputting a normally distributed recruitment deviations specify #_max_bias_adj_in_MPD to -1 so that SS performs the bias correction for you. If inputting bias corrected normal recruitment deviation, specify it at 0. Either method will lead to the same end result.

  8. Use any negative value in line # F ballpark year, to disable the use of a ballpark year to determine fishing mortality levels.

  9. Specify # F_Method to 2, which facilitates the use of a vector of instantaneous fishing mortality levels. The max harvest rate in the subsequent line will depend upon the fishing mortality levels in your simulation. Following the max harvest rate, specify a line with three value separated by spaces. The first value is the overall start F value, followed by the phase. The last value is the number of inputs. Set the number of inputs to 1, because the actual fishing mortality trajectory will be specified in the .dat file. Next, specify a single line with six values, separated by spaces, where the values correspond to fleet number, start year, season, fishing mortality level, the standard error of the fishing mortality level, and a negative phase value. E.g 1 2000 1 0 0.01 -1

  10. Set #_Variance_adjustments_to_input_values to 0. Comment out any lines underneath referring to variance adjustments.

  11. Set # number of changes to make to default Lambdas to 0. Comment out any lines with default lambda changes below.

  12. If needed, change the specification of the .ctl using the functions available in the ss3sim package. E.g change_growth, change_sel.

1.4 Data file modifications

  1. Delete the <modelname>.dat file and rename data.ss_new to om.dat. Modify this new file.

  2. Specify the start and end year for the simulation by modifying #_styr and #_endyr. Years can be specified as a number line (i.e. 1, 2, 3, …) or as actual years (i.e. 1999, 2000, 2001, …).

  3. Specify the names for each fleet in an uncommented line after the line #_N_areas. Names must be separated by a % with no spaces. It is these names which you will use in the plain text case files to specify and change characteristics of each fleet throughout the simulation. E.g. Fishery%Survey1%Survey2

  4. Specify the number of mean body weight observations across all selected sizes and ages to be specific to measured fish by setting #_N_meanbodywt to 0. Subsequently, specify 1 under #_DF_for_meanbodywt_T-distribution_like - this is the degree of freedom for the Student’s T distribution used to evaluated the mean body weight deviations in the following line. The degrees of freedom must be specified even if there are zero mean body weight observations.

  5. Set the length bin method to 1 or 2 in the line labelled # length bin method. Using a value of 1, the bins refer to the data bins (specified later). Using a value of 2 instructs SS to generate the bin widths from a user specified minimum and maximum value. In the following three lines, specify the bin width for population size composition data; the minimum size, or the lower edge of the first bin and size at age zero; and the maximum size, or lower edge of the last bin. The length data bins MUST be wider than the population bin, but the boundaries do not have to align.

  6. Specify #_comp_tail_compression to any negative value to turn off tail compression.

  7. Specify #_add_to_comp to a very small number E.g 1e-005. This specifies the value that will be added to each composition (age and length) data bins.

  8. Set the length bin range method for the age composition data (used when the conditional age at length data exists) to 1, 2 or 3 in the line #_Lbin_method depending on the data you have or the purpose of the study.

2 Setting up a new estimation model

Unlike the OM, the EM needs to be a valid SS3 model setup and run to achieve maximum likelihood estimates (and possibly standard errors). Thus the OM needs to be adapted to create a new EM.

2.1 Starter file modifications

  1. Change the names of the .dat and .ctl files to your chosen naming scheme.

  2. Specify the model to use parameter values found in the .ctl file, by changing # 0=use init values in control file; 1=use ss3.par to 0.

  3. Turn on parameter estimation by changing # Turn off estimation for parameters entering after this phase to a value larger than the max phase specified in the .ctl file.

2.2 Control file modifications

  1. Set the phases of the parameters to positive or negative value to inform SS to estimate or fix the parameters, respectively. (TODO can we give a few words of guidance about which parameters may want to be estimated by default? -SA)

  2. Set the #_recdev phase to a positive value to estimate yearly recruitment deviations.

  3. If using bias adjustment set #_recdev_early_phase to a positive value. Estimates for the years and maximum bias adjustment can initially be inputted with approximations or use the bias adjustment function within ss3sim to find appropriate values for the base case EM and input them in the appropriate lines. (TODO can we provide guidance on how this phase should relate to recdev phase?)

  4. Specify # F_Method to 3, which allows the model to use catches to estimate appropriate fishing mortality levels. The max harvest rate in the subsequent line will depend upon the fishing mortality levels in your simulation. An additional line must be inserted after the maximum harvest rate to specify the number of iterations used in the hybrid method from 3 to 7.

  5. If it exists and is not already commented out, comment out the second line entitled #_env/block/dev_adjust_method underneath the section which specifies selectivity parameters. If time-varying selectivity parameters are added using the change_tv() function, this line will be modified by the same function.

  6. Use the functions in the ss3sim package to change the estimation specification in the EM. E.g. change_e

2.3 Data file modifications

You can delete the .dat file from the EM model setup. The data.ss_new files produced when executing the OM contain the expected values of the OM population dynamics. The data the EM model is fit to needs to be sampled with observation error from these expected values in order to mimic the random sampling process done with real fisheries data. The ss3sim package provides three functions which carry out the random sampling process and generate .dat files to be used in the EM. See the Introduction vignette section for more details.

2.4 Testing the new estimation model

After completing the above steps run the model manually for a single iteration. Verify that the data are read in correctly and expected values of the population dynamics are written to the .dat files (and sensical). Verify that the EM loads the data properly and the objective function value (negative log-likelihood) is sensible. If it works correctly, try running deterministic cases on the model (see the Introduction vignette) and further verify that ss3sim functions that modify the EM (e.g., change_e) act correctly on the model setup. The help files for the functions demonstrate how to use the functions to test models. Note that the OM will not be a valid SS3 model in the sense that ADMB cannot run and produce maximum likelihood estimates of parameters; it is intended to only be run for one iteration to generate the population dynamics using values specified in the input files. It is possible that some of the functions will not work perfectly with the new model setups. In this case, it may be necessary to modify the ss3sim functions to be compatible with the new OM and EM.