Multi-Fidelity MOSMaFS

Martin Binder

Multi-Fidelity

An introduction to multi-fidelity is given in J. E. Fieldsend and R. M. Everson, The Rolling Tide Evolutionary Algorithm: A Multiobjective Optimizer for Noisy Optimization Problems. In its current state, mosmafs supports two ways of performing multi-fidelity optimization: Selected by generation, and selected by dominance. Multi-fidelity by generation is simply performed by changing the fidelity of the objective function after a given number of generations. Multi-fidelity by dominance is performed by evaluating a point with low fidelity first, and then enabling high fidelity if the first evaluation suggests that the point is not dominated by any previous result.

Preparation

This vignette starts where the previous vignette leaves off and expects the following preparation:

devtools::install_github("jakobbossek/ecr2")
library("ecr")
library("magrittr")
library("ggplot2")
library("ParamHelpers")
library("mlr")
library("mlrCPO")
library("mosmafs")
task.whole <- create.hypersphere.data(3, 2000) %>%
  create.classif.task(id = "sphere") %>%
  task.add.permuted.cols(10)
rows.whole <- sample(2000)
task <- subsetTask(task.whole, rows.whole[1:500])
task.hout <- subsetTask(task.whole, rows.whole[501:2000])

lrn <- makeLearner("classif.rpart", maxsurrogate = 0)

ps.simple <- pSS(
  maxdepth: integer[1, 30],
  minsplit: integer[2, 30],
  cp: numeric[0.001, 0.999])

fitness.fun.simple <- makeObjective(lrn, task, ps.simple, cv5,
  holdout.data = task.hout)

ps.objective <- getParamSet(fitness.fun.simple)

mutator.simple <- combine.operators(ps.objective,
  numeric = ecr::setup(mutGauss, sdev = 0.1),
  integer = ecr::setup(mutGaussInt, sdev = 3),
  selector.selection = mutBitflipCHW)

crossover.simple <- combine.operators(ps.objective,
  numeric = recPCrossover,
  integer = recPCrossover,
  selector.selection = recPCrossover)

initials <- sampleValues(ps.objective, 32, discrete.names = TRUE)

Fidelity Argument

An objective function optimized with slickEcr() may have a fidelity argument which should choose the fidelity at which the function is evaluated. It can take any numeric value chosen (at another point) by the user, but it should make sense to take a weighted mean of results by this fidelity:

(obj(x, fidelity = a) * a + obj(x, fidelity = b) * b) / (a + b)

A sensible usage of fidelity is to choose the number of resampling iterations through it.

The makeObjective() function will create a multi-fidelity compatible objective function if its resampling argument is a function, mapping from numeric(1) to a resampling object. The results for different fidelities should usually not be subsets of one another, because the evaluation for different fidelities is sometimes averaged over, which can lead to over-emphasis of some resampling folds.

nRes <- function(n) {
  makeResampleDesc("Subsample", split = 0.9, iters = n)
}

We can use this function to create a multi-fidelity fitness function:

fitness.fun <- makeObjective(lrn, task, ps.simple, nRes, holdout.data = task.hout)

formals(fitness.fun)
#> $args
#> 
#> 
#> $fidelity
#> NULL
#> 
#> $holdout
#> [1] FALSE

Generation-Wise Multi-Fidelity

The slickEcr() function accepts the fidelity argument, which must be a data.frame with two or three columns. For generation-wise multi-fidelity, we give it a data.frame with two columns, with the first column indicating the generation at which a certain fidelity should be used, and the second column containing the fidelity to use. To use fidelity 1 for the first five generations, then fidelity 3, for another five generations, and finally 5 for the last five, the data.frame would be

fidelity <- data.frame(
    c(1, 6, 11),
    c(1, 3, 5))
print(fidelity)
#>   c.1..6..11. c.1..3..5.
#> 1           1          1
#> 2           6          3
#> 3          11          5

This is given to slickEcr():

run.gen.mufi <- slickEcr(
    fitness.fun = fitness.fun,
    lambda = 16,
    population = initials,
    mutator = mutator.simple,
    recombinator = crossover.simple,
    generations = 15,
    fidelity = fidelity)

The plot of resulting pareto-fronts notably has later generation’s pareto fronts seemingly dominated by individuals from earlier generations. This is because in the log-object, the fitness of the first generations was evaluated using the low fidelity of these generations. In later generations, these points were re-evaluated using the larger fidelity.

plot_fronts <- function(run) {
  fronts <- fitnesses(run, function(x) paretoEdges(x, c(1, 1)))
  ggplot(data = fronts, aes(x = perf, y = propfeat, color = ordered(gen))) +
    geom_line() +
    geom_point(data = fronts[fronts$point, ], shape = "x", size = 5) +
    xlim(0, 1) +
    ylim(0, 1) +
    coord_fixed()
}

plot_fronts(run.gen.mufi)

The fidelity that was used for each individuum can be extracted from its "fidelity" attribute. It can be accessed using the attr() method or the popAggregate() utility function. The runtime will scale approximately linearly (with an added constant overhead) in this case. The sum of all fidelity options used for each generation can also be inspected using the log.newinds logging object–it may represent a proxy for the computational ressources that were used. Note how the generations with changing fidelity are present twice: for re-evaluation with new fidelity, and for ordinary evaluation of new individuals.

populations <- getPopulations(run.gen.mufi$log)

ind1.gen1 <- populations[[1]]$population[[1]]
attr(ind1.gen1, "fidelity")
#> [1] 1
attr(ind1.gen1, "runtime")
#>    elapsed 
#> 0.09830184

ind1.gen7 <- populations[[7]]$population[[1]]
attr(ind1.gen7, "fidelity")
#> [1] 2
attr(ind1.gen7, "runtime")
#>   elapsed 
#> 0.1270655

ind1.gen15 <- populations[[15]]$population[[1]]
attr(ind1.gen15, "fidelity")
#> [1] 3
attr(ind1.gen15, "runtime")
#>   elapsed 
#> 0.1710709

getStatistics(run.gen.mufi$log.newinds)
#>    gen runtime.mean runtime.sum fidelity.sum size      population
#> 1    0    0.1124612    3.598758           32   32            init
#> 2    1    0.1036838    1.658940           16   16       offspring
#> 3    2    0.1055486    1.688778           16   16       offspring
#> 4    3    0.1000582    1.600931           16   16       offspring
#> 5    4    0.1059207    1.694731           16   16       offspring
#> 6    5    0.1038469    1.661550           16   16       offspring
#> 7    6    0.1296874    4.149995           64   32 fidelity.reeval
#> 8    6    0.1331626    2.130601           32   16       offspring
#> 9    7    0.1429175    2.286680           32   16       offspring
#> 10   8    0.1431344    2.290150           32   16       offspring
#> 11   9    0.1340155    2.144248           32   16       offspring
#> 12  10    0.1303563    2.085701           32   16       offspring
#> 13  11    0.1781437    5.700597           96   32 fidelity.reeval
#> 14  11    0.1716709    2.746734           48   16       offspring
#> 15  12    0.1737341    2.779745           48   16       offspring
#> 16  13    0.1701880    2.723008           48   16       offspring
#> 17  14    0.1804589    2.887343           48   16       offspring
#> 18  15    0.1765593    2.824949           48   16       offspring

The cumulative fidelity up to each generation can be computed using the collectResult() function. This makes it possible to analyse progress up to a certain point of computational expenditure.

cres <- collectResult(run.gen.mufi)
ggplot(cres, aes(x = cum.fid, y = eval.domHV, color = "Training")) +
  geom_line() + geom_point(size = 1) +
  geom_point(data = cres[cres$fid.reeval, ], shape = "x", size = 5) +
  geom_line(aes(x = cum.fid, y = true.hout.domHV, color = "Holdout")) +
  ylim(0, 1)

The “x” mark the points at which fidelity reevaluation occurred: There a jump downward in hypervolume is possible. Also, the plot shows a large jump in cumulative fidelity before each reevaluation, because the whole population is reevaluated at these points.

Multi-Fidelity by Dominance

The slickEcr() fidelity argument accepts data.frames with three columns in the case that different fidelity should be used for points that lie on the pareto-front than those that, in a first evaluation, are dominated. This can be combined with generation-wise multi-fidelity, but our first example will only have one row. It evaluates simple holdout-resampling for each point, and, if the result seems to be better than previous evaluations with the same number of features, re-does the resampling with ten-times repeated holdout resampling, which results in overall 11 evaluations for this point.

fidelity <- data.frame(1, 1, 10)
print(fidelity)
#>   X1 X1.1 X10
#> 1  1    1  10
run.dom.mufi <- slickEcr(
    fitness.fun = fitness.fun,
    lambda = 16,
    population = initials,
    mutator = mutator.simple,
    recombinator = crossover.simple,
    generations = 15,
    fidelity = fidelity)
plot_fronts(run.dom.mufi)

The log.newinds object will again provide information about the aggregated runtime and fidelity used. Some generations perform more high-fidelity evaluations than others (and consequently have longer runtime).

getStatistics(run.dom.mufi$log.newinds)
#>    gen runtime.mean runtime.sum fidelity.sum size population
#> 1    0   0.13693718    4.381990           64   32       init
#> 2    1   0.11109892    1.777583           17   16  offspring
#> 3    2   0.12187480    1.949997           26   16  offspring
#> 4    3   0.10250596    1.640095           17   16  offspring
#> 5    4   0.10021521    1.603443           17   16  offspring
#> 6    5   0.09373616    1.499779           16   16  offspring
#> 7    6   0.10757998    1.721280           20   16  offspring
#> 8    7   0.10915497    1.746480           16   16  offspring
#> 9    8   0.09913886    1.586222           18   16  offspring
#> 10   9   0.10062406    1.609985           17   16  offspring
#> 11  10   0.10009846    1.601575           16   16  offspring
#> 12  11   0.10105343    1.616855           16   16  offspring
#> 13  12   0.12216897    1.954704           20   16  offspring
#> 14  13   0.10537456    1.685993           17   16  offspring
#> 15  14   0.10251513    1.640242           18   16  offspring
#> 16  15   0.10518360    1.682938           18   16  offspring

Use the popAggregate() function to get collected information about fidelity in individuals in a generation. The following shows the fidelities used in generation 2:

popAggregate(run.dom.mufi$log, "fidelity")[[2]]
#>  [1] 2 2 2 1 2 2 2 2 2 2 2 1 2 2 1 2 2 2 1 1 2 2 2 1 1 1 2 2 2 2 2 1

Note that, the fidelity is 11 for some candidates, as they were promising after one holdout resampling and were additionally reevaluated with ten-times repeated holdout resampling.

Individual evaluation fidelity can be retrieved, again, from a logged individual’s attributes. The following plots the pareto-front of generation 3, and the individuals sampled for generation 4. All individuals sampled with high fidelity lie close to the pareto-front, but some may, after high-fidelity evaluation, have turned out to actually be dominated by the old generation.

gen.i <- 3
logobj <- run.dom.mufi$log.newinds
stat <- getStatistics(logobj)
pop <- popAggregate(logobj, c("fidelity", "fitness"), data.frame = TRUE)
ngen.info <- pop[[which(stat$gen == gen.i + 1)]]
front <- fitnesses(run.dom.mufi, function(x) paretoEdges(x, c(1, 1))) %>%
    subset(gen == gen.i)
new.front <- fitnesses(run.dom.mufi, function(x) paretoEdges(x, c(1, 1))) %>%
    subset(gen == gen.i + 1)


ggplot() +
  geom_line(data = front, aes(x = perf, y = propfeat, linetype = "Gen3 Front")) +
  geom_line(data = new.front, aes(x = perf, y = propfeat, linetype = "Gen4 Front")) +
  geom_point(data = ngen.info,
    aes(x = fitness.perf, y = fitness.propfeat,
      shape = as.factor(fidelity),
      color = as.factor(fidelity),
      size = 3)) +
  scale_size_identity() +
  guides(colour = guide_legend(override.aes = list(size=3))) +
  xlim(0, 1) +
  ylim(0, 1) +
  coord_fixed()

All of the Above

The two approaches can, of course, be combined. It should be noted that the initial generation, as well as each generation after a fidelity-jump, is evaluated with the “high” fidelity column number. The following, for example, evaluates generation 1 with fidelity 6, generation 6 with fidelity 11, and generation 11 with fidelity 15. In between, new points are evaluated with low fidelity first and then re-evaluated on high-fidelity if they seem to not be dominated by the previous generation. Because of this, fidelity-jumps tend to be relatively computationally expensive and should be done sparingly if at all. If unbiased.fidelity is set (the default), only an upward step in total fidelity of pareto set elements (i.e. sum of 2nd and 3rd column if three columns are given, otherwise just value in 2nd column) leads to a re-evaluation of all fidelity. It should be set whenever the expectation value of evaluated fitness does not change with changing fidelity between generations (note that it is strongly recommended that evaluations with the two fidelities in each row have the same or very close expectation values).

fidelity <- data.frame(
    c(1, 3, 6, 11),
    c(1, 2, 3, 5),
    c(5, 4, 8, 10))
print(fidelity)
#>   c.1..3..6..11. c.1..2..3..5. c.5..4..8..10.
#> 1              1             1              5
#> 2              3             2              4
#> 3              6             3              8
#> 4             11             5             10
run.all.mufi <- slickEcr(
    fitness.fun = fitness.fun,
    lambda = 16,
    population = initials,
    mutator = mutator.simple,
    recombinator = crossover.simple,
    generations = 15,
    fidelity = fidelity)
getStatistics(run.all.mufi$log.newinds)
#>    gen runtime.mean runtime.sum fidelity.sum size      population
#> 1    0    0.1208761    3.868035           64   32            init
#> 2    1    0.1159971    1.855953           23   16       offspring
#> 3    2    0.1063414    1.701462           17   16       offspring
#> 4    3    0.1594038    5.100920           96   32 fidelity.reeval
#> 5    3    0.1629898    2.607836           43   16       offspring
#> 6    4    0.1428655    2.285847           34   16       offspring
#> 7    5    0.1795496    2.872793           43   16       offspring
#> 8    6    0.2003337    6.410680          128   32 fidelity.reeval
#> 9    6    0.1804435    2.887095           49   16       offspring
#> 10   7    0.2069931    3.311890           62   16       offspring
#> 11   8    0.1935673    3.097077           49   16       offspring
#> 12   9    0.1926168    3.081868           49   16       offspring
#> 13  10    0.1815628    2.905004           48   16       offspring
#> 14  11    0.2338018    7.481658          160   32 fidelity.reeval
#> 15  11    0.2066598    3.306557           65   16       offspring
#> 16  12    0.2153423    3.445477           67   16       offspring
#> 17  13    0.2243985    3.590376           68   16       offspring
#> 18  14    0.2243817    3.590108           69   16       offspring
#> 19  15    0.1835168    2.936269           64   16       offspring
plot_fronts(run.all.mufi)

Run Continuation

When continuing finished runs using the continueEcr() functionality, it is possible to submit a new fidelity data.frame to change fidelity behaviour. Note that generations continue to count from where previous evaluations left off. Therefore, in the following example, the entry for generation 1 is ignored, the fidelity used for all new offspring is 20 and rises to 25.

fidelity.new <- data.frame(
    c(1, 10, 17, 19),
    c(11, 20, 23, 25))
print(fidelity.new)
#>   c.1..10..17..19. c.11..20..23..25.
#> 1                1                11
#> 2               10                20
#> 3               17                23
#> 4               19                25
run.all.mufi.20 <- continueEcr(run.all.mufi, 5,
  lambda = 8, fidelity = fidelity.new)
getStatistics(run.all.mufi.20$log.newinds)
#>    gen runtime.mean runtime.sum fidelity.sum size      population
#> 1    0    0.1355579    4.337852           64   32            init
#> 2    1    0.1264633    2.023412           23   16       offspring
#> 3    2    0.1024360    1.638975           17   16       offspring
#> 4    3    0.1621682    5.189381           96   32 fidelity.reeval
#> 5    3    0.1796969    2.875150           43   16       offspring
#> 6    4    0.1326706    2.122729           34   16       offspring
#> 7    5    0.1434492    2.295188           43   16       offspring
#> 8    6    0.2037721    6.520706          128   32 fidelity.reeval
#> 9    6    0.1605284    2.568455           49   16       offspring
#> 10   7    0.2164037    3.462460           62   16       offspring
#> 11   8    0.1682461    2.691938           49   16       offspring
#> 12   9    0.1660740    2.657185           49   16       offspring
#> 13  10    0.1647918    2.636669           48   16       offspring
#> 14  11    0.2246896    7.190068          160   32 fidelity.reeval
#> 15  11    0.1933105    3.092968           65   16       offspring
#> 16  12    0.1903386    3.045417           67   16       offspring
#> 17  13    0.2225879    3.561407           68   16       offspring
#> 18  14    0.2064031    3.302449           69   16       offspring
#> 19  15    0.2125515    3.400824           64   16       offspring
#> 20  16    0.7158490   22.907168          640   32 fidelity.reeval
#> 21  16    0.7895922    6.316738          160    8       offspring
#> 22  17    0.8323791   26.636130          736   32 fidelity.reeval
#> 23  17    0.9035392    7.228314          184    8       offspring
#> 24  18    0.8796759    7.037407          184    8       offspring
#> 25  19    0.7904455   25.294254          800   32 fidelity.reeval
#> 26  19    0.8411839    6.729471          200    8       offspring
#> 27  20    0.9458693    7.566954          200    8       offspring
plot_fronts(run.all.mufi.20)