This vignette is an introduction to performing density estimation in **mlr3proba**.

Density estimation is the learning task to find the unknown distribution from which an i.i.d. data set is generated. We interpret this broadly, with this distribution not necessarily being continuous (so may possess a mass not density). The conditional case, where a distribution is predicted conditional on covariates, is known as ‘probabilistic supervised regression’, and will be implemented in **mlr3proba** in the near-future. In **mlr3proba**, (unconditional) density estimation is viewed as an unsupervised task, whereas probabilistic supervised regression (or conditional density estimation) is a supervised task

Unconditional density estimation is an unsupervised method. Hence, `TaskDens`

is an unsupervised task which inherits directly from `Task`

unlike `TaskClassif`

and `TaskRegr`

. However, `TaskDens`

still has a `target`

and a `$truth`

field defined by:

`target`

- the variable for which to estimate density`truth`

- the`target`

. This is*not*the true density which is always unknown.

```
library(mlr3proba); library(mlr3)
#> Registered S3 methods overwritten by 'mlr3proba':
#> method from
#> as.data.table.PredictionRegr mlr3
#> c.PredictionRegr mlr3
#>
#> Attaching package: 'mlr3'
#> The following object is masked from 'package:mlr3proba':
#>
#> PredictionRegr
task = TaskDens$new(id = "mpg", backend = datasets::mtcars, target = "mpg")
task
#> <TaskDens:mpg> (32 x 11)
#> * Target: mpg
#> * Properties: -
#> * Features (10):
#> - dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
task$truth()[1:10]
#> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
```

Density learners have `train`

and `predict`

methods, though being unsupervised, ‘prediction’ is actually ‘estimation’. In training, a distr6 object is created, see here for full tutorials on how to access the `pdf`

, `cdf`

, and other important fields and methods. The predict method is simply a wrapper around `self$model$pdf`

and if available `self$model$cdf`

, i.e. evaluates the pdf/cdf at given points. Note that in prediction the points to evaluate the pdf and cdf are determined by the `target`

column in the `TaskDens`

object used for testing.

```
# create task and learner
task_faithful = TaskDens$new(id = "eruptions", backend = datasets::faithful,
target = "eruptions")
learner = lrn("dens.kde")
# train/test split
train_set = sample(task_faithful$nrow, 0.8 * task_faithful$nrow)
test_set = setdiff(seq_len(task_faithful$nrow), train_set)
# fitting KDE and model inspection
learner$train(task_faithful, row_ids = train_set)
learner$model
#> Norm_KDE
class(learner$model)
#> [1] "Distribution" "R6"
# make predictions for new data
prediction = learner$predict(task_faithful, row_ids = test_set)
```

Every `PredictionDens`

object can estimate:

`pdf`

- probability density function

Some learners can estimate:

`cdf`

- cumulative distribution function

```
prediction
#> <PredictionDens> for 55 observations:
#> row_id truth pdf
#> 3 3.333 0.1094527
#> 8 3.600 0.2057676
#> 11 1.833 0.3015347
#> ---
#> 241 4.150 0.4651316
#> 242 2.350 0.2310265
#> 272 4.467 0.4800183
# `pdf` is evaluated using the `log-loss`
prediction$score()
#> dens.logloss
#> 1.145351
```