`riskyr`

User Guide“Solving a problem simply means representing it so as to make the solution transparent.”

(H.A. Simon)^{1}

What is the probability of a disease or clinical condition given a positive test result? This seems a simple and fairly common question, yet doctors, patients and medical students find it surprisingly difficult to answer.

Decades of research on probabilistic reasoning and risk literacy have shown that people are perplexed and struggle when information is expressed in terms of probabilities, but have no problem to understand and process the same information when it is expressed in terms of natural frequencies (see Gigerenzer and Hoffrage, 1995; Gigerenzer et al., 2007; Hoffrage et al., 2015; for overviews).

`riskyr`

is a toolbox for rendering risk literacy more transparent by facilitating such changes in representation and offering multiple perspectives on the dynamic interplay between probabilities and frequencies. The main goal of `riskyr`

is to provide a long-term boost in risk literacy by fostering competence in understanding statistical information in domains such as health, weather, and finances (Hertwig & Grüne-Yanoff, 2017).

This guide first illustrates a typical problem and then helps you solving it by viewing risk-related information in a variety of ways. It proceeds in three steps:

We will first present a typical problem in the probabilistic format that is commonly used in textbooks. This allows introducing some key probabilities, but also explains why both this problem and its traditional solution (via Bayes’ formula) remains opaque and is rightfully perceived as difficult.

We will then translate the problem into natural frequencies and show how this facilitates its comprehension and solution.

Finally, we show how

`riskyr`

renders the problem more transparent by providing three sets of tools:A. A fancy calculator that allows the computation of probabilities and frequencies;

B. A set of functions that translate between different representational formats;

C. A variety of visualizations that illustrate relationships between frequencies and probabilities.

A basic motivation for developing `riskyr`

was to facilitate our understanding of problems like the following:

Mammography screeningThe probability of breast cancer is 1% for a woman at age 40 who participates in routine screening.

If a woman has breast cancer, the probability is 80% that she will get a positive mammography.

If a woman does not have breast cancer, the probability is 9.6% that she will also get a positive mammography.A woman in this age group had a positive mammography in a routine screening.

What is the probability that she actually has breast cancer?(Hoffrage et al., 2015, p. 3)

Problems like this tend to appear in texts and tutorials on risk literacy and are ubiquitous in medical diagnostics. They typically provide some risk-related information (i.e., specific probabilities of some clinical condition and likelihoods of some decision or test of detecting its presence or absence) and ask for some other risk-related quantity. In the most basic type of scenario, we are given 3 essential probabilities:

- The
*prevalence*of some target population (here: women at age 40) for some condition (breast cancer):

`prev`

= \(p(\mathrm{cancer}) = 1\%\)

- The
*sensitivity*of some decision or diagnostic procedure (here: a mammography screening test), which is the conditional probability:

`sens`

= \(p(\mathrm{positive\ test}\ |\ \mathrm{cancer}) = 80\%\)

- The
*false alarm rate*of this decision, diagnostic procedure or test, which is the conditional probability:

`fart`

= \(p( \mathrm{positive\ test}\ |\ \mathrm{no\ breast\ cancer} ) = 9.6\%\)

and can also be expressed by its complement (aka. the test’s *specificity*):

`spec`

= 1 –`fart`

= \(p( \mathrm{negative\ test}\ |\ \mathrm{no\ cancer} ) = 90.4\%\)

The first challenge in solving this problem is to realize that the probability asked for is *not* the sensitivity `sens`

(i.e., the probability of a positive test given cancer), but the *reversed* conditional probability (i.e., the probability of having cancer given a positive test). The clinical term for this quantity is the *positive predictive value* (`PPV`

) or the test’s *precision*:

`PPV`

= \(p( \mathrm{cancer}\ |\ \mathrm{positive\ test} )\) = ?

How can we compute the *positive predictive value* (`PPV`

) from the information provided by the problem? In the following, we sketch three different paths to the solution.

One way to solve problems concerning conditional probabilities is to remember and apply Bayes’ formula (which is why such problems are often called problems of “Bayesian reasoning”):

\[ p(H|D) = \frac{p(H) \cdot p(D|H) } {p(H) \cdot p(D|H) + p(\neg H) \cdot p(D|\neg H) } \]

In our example, we are looking for the probability of breast cancer (\(H\)) given a positive mammography test (\(D\)):

\[ p(\mathrm{cancer}\ |\ \mathrm{positive\ test}) = \frac{p(\mathrm{cancer}) \cdot p(\mathrm{positive\ test}\ |\ \mathrm{cancer}) } {p(\mathrm{cancer}) \cdot p(\mathrm{positive\ test}\ |\ \mathrm{cancer}) + p(\mathrm{no\ cancer}) \cdot p(\mathrm{positive\ test}\ |\ \mathrm{no\ cancer}) } \]

By inserting the probabilities identified above and knowing that the probability for the absence of breast cancer in our target population is the complementary probability of its presence (i.e., $p() = 1 - \(`prev` = 99\%\)) we obtain:

\[ p(\mathrm{cancer}\ |\ \mathrm{positive\ test}) = \frac{1\% \cdot 80\% } { 1\% \cdot 80\% + 99\% \cdot 9.6\% } \approx\ 7.8\%\]

Thus, the information above and a few basic mathematical calculations tell us that the likelihood of a woman in our target population with a positive mammography screening test actually having breast cancer (i.e., the `PPV`

of this mammography screening test) is slightly below 8%.

If you fail to find the Bayesian solution easy and straightforward, you are in good company: Even people who have studied and taught statistics find it difficult to think in these terms. Fortunately, researchers have found that a simple change in representation renders the same information much more transparent.

Consider the following problem description:

Mammography screening(`freq`

)10 out of every 1000 women at age 40 who participate in routine screening have breast cancer.

8 out of every 10 women with breast cancer will get a positive mammography.

95 out of every 990 women without breast cancer will also get a positive mammography.Here is a new representative sample of women at age 40 who got a positive mammography in a routine screening.

How many of these women do you expect to actually have breast cancer?(Hoffrage et al., 2015, p. 4)

Importantly, this version (freq) of the problem refers to a frequency of \(1000\) individuals of our original target population. It still provides the same probabilities as above, but specifies them in terms of *natural frequencies* (see Gigerenzer & Hoffrage, 1999, and Hoffrage et al., 2002, for clarifications of this concept):

- The
*prevalence*of breast cancer in the target population:

`prev`

= \(p(\mathrm{cancer}) = \frac{10}{1000} (= 1\%)\)

- The
*sensitivity*of the mammography screening test, which is the conditional probability:

`sens`

= \(p(\mathrm{positive\ test}\ |\ \mathrm{cancer}) = \frac{8}{10} (= 80\%)\)

- The test’s
*false alarm rate*, which is the conditional probability:

`fart`

= \(p( \mathrm{positive\ test}\ |\ \mathrm{no\ breast\ cancer} ) = \frac{95}{990} (\approx\ 9.6\%)\)

and can still be expressed by its complement (the test’s *specificity*):

`spec`

= 1 –`fart`

= \(p( \mathrm{negative\ test}\ |\ \mathrm{no\ cancer} ) = \frac{990 - 95}{990} = \frac{895}{990} (\approx\ 90.4\%)\)

Rather than asking us to compute a conditional probability (i.e., the `PPV`

), the task now prompts us to imagine a new representative sample of women from our target population and focuses on the women with a positive test result. It then asks for a *frequency*: “How many of these women” do we expect to have cancer?

To provide any answer in terms of frequencies, we need to imagine a specific sample size \(N\). As the problem referred to a population of \(1000\) women, we conveniently pick a sample size of \(N = 1000\) women with identical characteristics (which is suggested by mentioning a “representative” sample) and ask: How many women with a positive test result actually have cancer?^{2}

In this new sample, the frequency of women with cancer and with a positive test result should match the numbers of the original sample. Hence, we can assume that \(10\) out of \(1000\) women have cancer (`prev`

) and \(8\) of the \(10\) women with cancer receive a positive test resul (`sens`

). Importantly, \(95\) out of the \(990\) women without cancer also receive a positive test result (`fart`

). Thus, the number of women with a positive test result is \(8 + 95 = 103\), but only \(8\) of them actually have cancer. Of course the ratio \(\frac{8}{103}\) is identical to our previous probability (of slightly below 7.8%). Incidentally, the reformulation in terms of frequencies protected us from erroneously taking the sensitivity (of `sens`

= \(\frac{8}{10} = 80\%\)) as an estimate of the desired frequency. Whereas it is easy to confuse the term \(p( \mathrm{positive\ test}\ |\ \mathrm{cancer} )\) with \(p( \mathrm{cancer}\ |\ \mathrm{positive\ test} )\) when the task is expressed in terms of probabilities, it is clearly unreasonable to assume that about 800 of 1000 women (i.e., 80%) actually have cancer (since the prevalence in the population was specified to be 10 in 1000, i.e., 10%). Thus, reframing the problem in terms of frequencies made us immune against a typical mistake.

`riskyr`

Reframing the probabilistic problem in terms of frequencies made its solution easier. This is neat and probably one of the best tricks in risk literacy education (as advocated by Gigerenzer & Hoffrage, 1995; Gigerenzer 2002; 2014). While it is good to have a way to *cope* with tricky problems, it would be even more desirable to *actually understand* the interplay between probabilities and frequencies in risk-related tasks and domains. This is where `riskyr`

comes into play.^{3}

`riskyr`

provides a set of basic risk literacy tools in R. As we have seen, the problems humans face when dealing with risk-related information are less of a *computational*, and more of a *representational* nature. As a statistical programming language, R is a pretty powerful computational tool, but for our present purposes it is more important that R is also great for designing and displaying aesthetic and informative visualizations. By applying these qualities to the task of training and instruction in risk literacy, `riskyr`

is a toolbox that renders risk literacy education more transparent.

`riskyr`

promotes a deeper understanding of risk-related information in three ways:^{4}

by organizing

*data structures*and*computational functions*in useful ways;by providing

*translations*between probabilities and frequencies;by providing transparent

*visualizations*that illustrate relationships between variables and representations.

In the following, we show how we could address the above problem by using three types of tools provided by `riskyr`

.

`riskyr`

provides a set of functions that allows us to calculate various desired outputs (probabilities and frequencies) from given inputs (probabilities and frequencies). For instance, the following function computes the positive predictive value `PPV`

from the 3 basic probabilities `prev`

, `sens`

, and `spec`

(with `spec`

= 1 – `fart`

) that were provided in the original problem:

```
library("riskyr") # loads the package
#> Welcome to riskyr!
#> riskyr.guide() opens user guides.
comp_PPV(prev = .01, sens = .80, spec = (1 - .096))
#> [1] 0.07763975
```

It’s good to know that `riskyr`

can apply Bayes’ formula, but so can any other basic calculator — including by brain on a good day and some environmental support in the form of paper and pencil. The R in `riskyr`

only begins to make sense when considering functions like the following:

```
# Compute probabilities from 3 essential probabilities: # Input arguments:
p1 <- comp_prob_prob(prev = .01, sens = .80, spec = NA, fart = .096) # prev, sens, NA, fart
p2 <- comp_prob_prob(prev = .01, sens = .80, spec = .904, fart = NA) # prev, sens, spec, NA
p3 <- comp_prob_prob(prev = .01, sens = .80, spec = .904, fart = .096) # prev, sens, spec, fart
# Check equality of outputs:
all.equal(p1, p2)
#> [1] TRUE
all.equal(p2, p3)
#> [1] TRUE
```

The function `comp_prob_prob`

computes probabilities from probabilities (hence its name). The probabilities provided need to include a prevalence `prev`

, a sensitivity `sens`

, and either the specificity `spec`

or the false alarm rate `fart`

(with `spec`

= 1 – `fart`

). The code above shows 3 different ways in which 3 of these “essential” probabilities can be provided (and hence the objects `p1`

, `p2`

, and `p3`

are all equal to each other).

The probabilities computed by these “essential” probabilities include the `PPV`

, which can be obtained by asking for `p1$PPV`

= 0.0776398. But the object computed by `comp_prob_prob`

is actually a list of 10 probabilities and can be inspected by printing `p1`

:

```
p1
#> $prev
#> [1] 0.01
#>
#> $sens
#> [1] 0.8
#>
#> $mirt
#> [1] 0.2
#>
#> $spec
#> [1] 0.904
#>
#> $fart
#> [1] 0.096
#>
#> $ppod
#> [1] 0.10304
#>
#> $PPV
#> [1] 0.07763975
#>
#> $FDR
#> [1] 0.9223602
#>
#> $NPV
#> [1] 0.9977702
#>
#> $FOR
#> [1] 0.002229754
```

The list of probabilities computed includes the 3 essential probabilities (`prev`

, `sens`

, and `spec`

or `fart`

) and the desired probability (`p1$PPV`

= 0.0776398), but also many other probabilities that may have been asked instead. (See the vignette on data formats for details on these probabilities.)

Incidentally, as R does not case whether probabilities are entered as decimal numbers or fractions, we can check whether the 2nd version of our problem — the version reframed in terms of frequencies — yields the same solution:

```
# Compute probabilities from 3 ratios of frequencies (probabilities): # Input arguments:
p4 <- comp_prob_prob(prev = 10/1000, sens = 8/10, spec = NA, fart = 95/990) # prev, sens, NA, fart
p4$PPV
#> [1] 0.0776699
```

This shows that the `PPV`

computed in this version is only marginally different (`p4$PPV`

= 0.0776699). More importantly, it is identical to the ratio \(\frac{8}{103}\) = 0.0776699.

Another function of `riskyr`

is to translate between representational formats. This translation comes in two varieties:

```
# Compute frequencies from probabilities:
f1 <- comp_freq_prob(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000)
f2 <- comp_freq_prob(prev = 10/1000, sens = 8/10, spec = NA, fart = 95/990, N = 1000)
# Check equality of outputs:
all.equal(f1, f2)
#> [1] TRUE
```

By providing our original probabilities to the function `comp_freq_prob`

we can compute a list of frequencies from probabilities (hence the name). To compute frequencies for the specific sample size of 1000 individuals, we need to provide `N = 1000`

as an additional argument. As before, it does not matter whether the probabilities are supplied as decimal numbers or as ratios (as long as they actually *are* probabilities, i.e., numbers from 0 to 1).

As the ratio `fart`

= 95/990 is not exactly equal to `fart`

= .096 (but rather 95/100 = 0.95) the two versions of our problem actually vary by a bit. Here, the results `f1`

and `f2`

are only identical because the function `comp_freq_prob`

rounds to nearest integers by default. To compute more precise frequencies (that no longer round to integers), use the `round = FALSE`

argument:

```
# Compute frequencies from probabilities (without rounding):
f3 <- comp_freq_prob(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000, round = FALSE)
f4 <- comp_freq_prob(prev = 10/1000, sens = 8/10, spec = NA, fart = 95/990, N = 1000, round = FALSE)
# Check equality of outputs:
all.equal(f3, f4) # => shows slight differences in some frequencies:
#> [1] "Component \"dec.pos\": Mean relative difference: 0.0003881988"
#> [2] "Component \"dec.neg\": Mean relative difference: 4.459508e-05"
#> [3] "Component \"dec.cor\": Mean relative difference: 4.429875e-05"
#> [4] "Component \"dec.err\": Mean relative difference: 0.0004122012"
#> [5] "Component \"fa\": Mean relative difference: 0.0004208754"
#> [6] "Component \"cr\": Mean relative difference: 4.469473e-05"
```

As before, the function `comp_freq_prob`

does not compute only one frequency, but a list of 11 frequencies. Their names and values can be inspected by printing `f1`

:

```
f1
#> $N
#> [1] 1000
#>
#> $cond.true
#> [1] 10
#>
#> $cond.false
#> [1] 990
#>
#> $dec.pos
#> [1] 103
#>
#> $dec.neg
#> [1] 897
#>
#> $dec.cor
#> [1] 903
#>
#> $dec.err
#> [1] 97
#>
#> $hi
#> [1] 8
#>
#> $mi
#> [1] 2
#>
#> $fa
#> [1] 95
#>
#> $cr
#> [1] 895
```

In this list, the sample of `N`

= \(1000\) women is split into 3 different subgroups. For instance, the \(10\) women with cancer appear as `cond.true`

cases, whereas the `990`

without cancer are listed as `cond.false`

cases. The \(8\) women with cancer and a positive test result appear as *hits* `hi`

and the 95 women who receive a positive test result without having cancer are listed as *false alarms* `fa`

. (See the vignette on data formats for details on all frequencies.)

A translator between two representational formats should work in both directions. Consequently, `riskyr`

also allows to compute probabilities by providing frequencies:

```
# Compute probabilities from frequencies:
p5 <- comp_prob_freq(hi = 8, mi = 2, fa = 95, cr = 895) # => provide 4 essential frequencies
```

Fortunately, the `comp_prob_freq`

does not require all 11 frequencies that were returned by `comp_freq_prob`

and contained in the list of frequencies `f1`

. Instead, we must provide `comp_prob_freq`

with the 4 essential frequencies that were listed as `hi`

, `mi`

, `fa`

, and `cr`

in `f1`

. The resulting probabilities (saved in `p5`

) match our list of probabilities from above (saved in `p4`

):

```
# Check equality of outputs:
all.equal(p5, p4)
#> [1] TRUE
```

More generally, when we translate between formats twice — first from probabilities to frequencies and then from the resulting frequencies to probabilities — the original probabilities appear again:

```
# Pick 3 random probability inputs:
rand.p <- runif(n = 3, min = 0, max = 1)
rand.p
#> [1] 0.4077872 0.3356049 0.3711084
# Translation 1: Compute frequencies from probabilities:
freq <- comp_freq_prob(prev = rand.p[1], sens = rand.p[2], spec = rand.p[3], round = FALSE) # without rounding!
# Translation 2: Compute probabilities from frequencies:
prob <- comp_prob_freq(hi = freq$hi, mi = freq$mi, fa = freq$fa, cr = freq$cr)
# Verify that results match original probabilities:
all.equal(prob$prev, rand.p[1])
#> [1] TRUE
all.equal(prob$sens, rand.p[2])
#> [1] TRUE
all.equal(prob$spec, rand.p[3])
#> [1] TRUE
```

Similarly, going full circle from frequencies to probabilities and back returns the original frequencies:

```
# Pick 4 random frequencies:
rand.f <- round(runif(n = 4, min = 0, max = 10^3), 0)
rand.f
#> [1] 500 971 924 499
# sum(rand.f)
# Translation 1: Compute probabilities from frequencies:
prob <- comp_prob_freq(hi = rand.f[1], mi = rand.f[2], fa = rand.f[3], cr = rand.f[4])
# prob
# Translation 2: Compute frequencies from probabilities (for the original population size N):
freq <- comp_freq_prob(prev = prob$prev, sens = prob$sens, spec = prob$spec, N = sum(rand.f), round = FALSE) # without rounding!
# freq
# Verify that results match original frequencies:
all.equal(freq$hi, rand.f[1])
#> [1] TRUE
all.equal(freq$mi, rand.f[2])
#> [1] TRUE
all.equal(freq$fa, rand.f[3])
#> [1] TRUE
all.equal(freq$cr, rand.f[4])
#> [1] TRUE
```

To obtain the same results when translating back and forth between probabilities and frequencies, it is important to switch off rounding when computing frquencies from probabilities with `comp_freq_prob`

. Similarly, we need to scale the computed frequencies to the original population size `N`

to arrive at the original frequencies.

Inspecting the lists of probabilities and frequencies shows that the two problem formulations cited above are only two possible instances out of an array of many alternative formulations. Essentially, the same scenario can be described in a variety of variables and formats. Gaining deeper insights into the interplay between these variables requires a solid understanding of the underlying concepts and their mathematical definitions. To facilitate the development of such an understanding, `riskyr`

recruits the power of visual representations and shows the same scenario from a variety of angles and perspectives. It is mostly this graphical functionality that supports `riskyr`

’s claim on being a toolbox for rendering risk literacy more transparent. Thus, in addition to being a fancy calculator and a translator between formats, `riskyr`

is mostly a machine that turns risk-related information into pretty pictures.

`riskyr`

provides many alternative visualizations that depict the same risk-related scenario in the form of different representations. As each type of graphic has its own properties and perspective — strengths that emphasize or illuminate some particular aspect and weaknesses that hide or obscure others — the different visualizations are somewhat redundant, yet complement and support each other.

Here are some examples that depict the scenario described above:

A straightforward way of plotting an entire population of individuals is provided by an icon array that represents each individual as a symbol which is color-coded:

```
plot_icons(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000,
icon.types = c(21, 21, 22, 22),
title.lbl = "Mammography screening")
```

Perhaps the most intuitive visualization of the relationships between probability and frequency information in our above scenario is provided by a tree diagram that shows the population and the frequency of subgroups as its nodes and the probabilities as its edges:

```
plot_tree(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000,
title.lbl = "Mammography screening")
```

Importantly, the `plot_tree`

function is called with the same 3 essential probabilities (`prev`

, `sens`

, and `spec`

) and 1 frequency (the number of individuals `N`

of our sample or population). But in addition to computing risk-related information (e.g., the number of individuals in each of the 4 subgroups at the 2nd level of the tree), the tree diagram visualizes crucial dependencies and relationships between concepts and quantities. For instance, the diagram illustrates that the number of true positives (`hi`

) depends on both the condition’s prevalance (`prev`

) and the decision’s sensitivity (`sens`

), or that the decision’s specificity `spec`

can be expressed and computed as the ratio of the number of true negatives (`cr`

) divided by the number of unaffected individuals (`cond.false`

cases).

For details and additional options of the `plot_tree`

function, see the documentation of `?plot_tree`

.

An alternative way to split a group of individuals into subgroups depicts the population as a square and dissects it into various rectangles that represent parts of the population. In the following mosaic plot, the relative proportions of rectangle sizes represent the relative frequencies of the corresponding subgroups:

```
plot_mosaic(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000,
title.lbl = "Mammography screening")
```

The vertical split dissects the population into two subgroups that correspond to the frequency of `cond.true`

and `cond.false`

cases in the tree diagram above. The `prev`

value of 1% yields a slim vertical rectangle on the left.

For details and additional options of the `plot_mosaic`

function, see the documentation of `?plot_mosaic`

.

Both the tree diagram and the mosaic plot shown above adopted a particular perspective by splitting the population into 2 subgroups by condition (via the default option `by = "cd"`

). Rather than emphasizing the difference between `cond.true`

and `cond.false`

cases, an alternative perspective could ask: How many people are detected as positive vs. negative by the test? By using the option `by = "dc"`

, the tree diagram first splits the population into `dec.pos`

and `dec.neg`

cases:

```
plot_tree(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000,
by = "dc",
title.lbl = "Mammography screening",
dec.pos.lbl = "positive test",
dec.neg.lbl = "negative test")
```

Similarly, the population area of the mosaic plot can be split horizontally by using the option `vsplit = FALSE`

:

```
plot_mosaic(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000,
vsplit = FALSE,
title.lbl = "Mammography screening")
```

`riskyr`

uses a consistent color scheme to represent the same subgroups across different graphs. If this color coding is not sufficient, plotting the tree diagram with the option `area = "hr"`

further highlights the correspondence by representing the relative frequencies of subgroups by the proportions of rectangles:

```
plot_tree(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000,
by = "dc",
area = "hr",
title.lbl = "Mammography screening",
dec.pos.lbl = "positive test",
dec.neg.lbl = "negative test")
```

Incidentally, as both an icon array and a mosaic plot depict probability by area size, both representations can be translated into each other. This is still visible when relaxing the positional constraint of icons in the icon array:

```
plot_icons(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000, block.d = 0.01,
type = "mosaic",
icon.types = c(21, 21, 22, 22),
title.lbl = "Mammography screening")
```

Can you spot cases of hits (true positives) and misses (false negatives)? (Hint: Their frequency is 8 and 2, respectively.)

The following network diagram is a generalization of the tree diagram. It plots all 9 different frequencies (computed by `comp_freq_prop`

and `comp_freq_freq`

and contained in `freq`

) as nodes of a single graph and depicts all 10 probabilities (computed by `comp_prop_prop`

and `comp_prop_freq`

and contained in `prob`

) as edges between these nodes. Thus, the network diagram integrates both perspectives of the above tree diagrams:

```
plot_fnet(prev = .01, sens = .80, spec = NA, fart = .096, N = 1000,
title.lbl = "Mammography screening")
```

In addition to showing the interplay between all key frequencies and probabilities, the network diagram notes accuracy metrics that are based on the confusion matrix depicted as the middle row 4 central nodes (`hi`

, `mi`

, `fa`

, and `cr`

).

For details and additional options of the `plot_fnet`

function, see the documentation of `?plot_fnet`

.

Gigerenzer, G. (2002).

*Reckoning with risk: Learning to live with uncertainty*. London, UK: Penguin.Gigerenzer, G. (2014).

*Risk savvy: How to make good decisions*. New York, NY: Penguin.Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L., & Woloshin, S. (2007). Helping doctors and patients make sense of health statistics.

*Psychological Science in the Public Interest*,*8*, 53–96.Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats.

*Psychological Review*,*102*, 684–704.Gigerenzer, G., & Hoffrage, U. (1999). Overcoming difficulties in Bayesian reasoning: A reply to Lewis and Keren (1999) and Mellers and McGraw (1999).

*Psychological Review*,*106*, 425–430.Hertwig, R., & Grüne-Yanoff, T. (2017). Nudging and boosting: Steering or empowering good decisions.

*Perspectives on Psychological Science*,*12*, 973–986.Hoffrage, U., Gigerenzer, G., Krauss, S., & Martignon, L. (2002). Representation facilitates reasoning: What natural frequencies are and what they are not.

*Cognition*,*84*, 343–352.Hoffrage, U., Krauss, S., Martignon, L., & Gigerenzer, G. (2015). Natural frequencies improve Bayesian reasoning in simple and complex inference tasks.

*Frontiers in Psychology*,*6*, 1473.Hoffrage, U., Lindsey, S., Hertwig, R., & Gigerenzer, G. (2000). Communicating statistical information.

*Science*,*290*, 2261–2262.Kurzenhäuser, S., & Hoffrage, U. (2002). Teaching Bayesian reasoning: An evaluation of a classroom tutorial for medical students.

*Medical Teacher*,*24*, 516–521.Kurz-Milcke, E., Gigerenzer, G., & Martignon, L. (2008). Transparency in risk communication.

*Annals of the New York Academy of Sciences*,*1128*, 18–28.Sedlmeier, P., & Gigerenzer, G. (2001). Teaching Bayesian reasoning in less than two hours.

*Journal of Experimental Psychology: General*,*130*, 380–400.Strevens, M. (2013).

*Tychomancy: Inferring probability from causal structure*. Cambridge, MA: Harvard University Press.

We appreciate your feedback, comments, or questions.

Please report any

`riskyr`

-related issues at https://github.com/hneth/riskyr/issues.For general inquiries, please email us at contact.riskyr@gmail.com.

`riskyr`

VignettesNr. | Vignette | Content |
---|---|---|

A. | User guide | Motivation and general instructions |

B. | Data formats | Data formats: Frequencies and probabilities |

C. | Confusion matrix | Confusion matrix and accuracy metrics |

D. | Functional perspectives | Adopting functional perspectives |

E. | Quick start primer | Quick start primer |

Simon, H.A. (1996).

*The Sciences of the Artificial*(3rd ed.). The MIT Press, Cambridge, MA. (p. 132).↩The actual sample size

`N`

chosen is irrelevant, but the numbers are easier to calculate when`N`

is a round number and at least as large as the frequencies mentioned in the problem.↩Full disclosure: As former students of Gerd Gigerenzer, we think that his recommendations are insightful, convincing, and correct. However, while expressing probabilities in terms of natural frequencies promotes a better understanding of risks, it does not automatically lead to a better understanding of conditional probabilities per se.

`riskyr`

extends beyond mere translations between representational formats by showing the interplay between frequencies and probabilities in a variety of ways.↩The

`riskyr`

logo (showing three facets of a dice) also represents its functionality in a variety of ways:

1. First, each facet provides a frequency (e.g., the number 3), though the dice is a paradigmatic example of a device that generates probabilities. (See Strevens, 2013, for inferring probabilisitc properties from physical devices.)

2. More importantly, each facet is informative by itself — and often all that is of interest. However, to really understand the mechanism of the risk-generating device, it is crucial to view it from multiple angles.`riskyr`

provides alternative perspectives that — when viewed together — render issues of risk literacy more transparent.

3. The three facets can be counted as the three steps of (a) organizing information, (b) translating between representational formats, and (c) visualising relationships between variables.↩