In the following, we use the conventional likelihood notation;

\[ f(y|\theta), \]

where \(y\) denotes data and \(\theta\) is a model parameter.

*1* readers, *1* modalities and *3* confidence levels.

Confidence Level | Modality ID | Reader ID | Number of Hits | Number of False alarms |
---|---|---|---|---|

3 = definitely present | 1 | 1 | \(H_{3}\) | \(F_{3}\) |

2 = equivocal | 1 | 1 | \(H_{2}\) | \(F_{2}\) |

1 = questionable | 1 | 1 | \(H_{1}\) | \(F_{1}\) |

where, each component \(H_c\) and \(F_c\) are non negative integers. For example, \(H_{3}\) denotes the hit of the \(1^\text{st}\) reader over all images taken by \(1^\text{st}\) modality with reader’s confidence level is \(3^\text{rd}\).

So, in conventional notation we may write

\[y = (H_0,H_1,H_2,H_3;F_1,F_2,F_3 ;N_L,N_I),\]

where we set \(H_\color{red}{0}:=N_L-(H_1+H_2+H_3).\)

For model of *1* readers, *1* modalities and *3* confidence levels.

Define the model by \[ \{H_{c};c=\color{red}{0},1,2,\cdots C\} \sim \color{green}{\text{Multinomial}}(\{p_{c,}(\theta)\}_{c=\color{red}{0},1,2,\cdots C}),\\ F_{c} \sim \text{Poisson}(q_c(\theta)N_I),\\ \]

where

\[ p_{c}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu,\sigma)dx,\\ q_c(\theta) := \int_{\theta_c}^{\theta_{c+1}} \frac{d \log \Phi(z)}{dz}dz. \]

from which, we can calculate the most important characteristic indicating the observer performance ability as

\[ AUC := \Phi (\frac{\mu/\sigma}{\sqrt{(1/\sigma)^2+1}}), \\ \]

Note that model parameter is \(\theta = (\theta_1,\theta_2,\theta_3,...\theta_C;\mu,\sigma)\) which should be estimated and \(\Phi\) denotes the cumulative distribution functions of the canonical Gaussian. Note that \(\theta_{C+1} = \infty\) and \(\theta_{0} = -\infty\).

The definition of the rates \(p,q\) are based on the logic of latent panty theory, so you know what is latent, I am tired because I am a xxxxx!

\[ dz_c := z_{c+1}-z_{c},\\ dz_c, \sigma_{m,r} \sim \text{Uniform}(0,\infty),\\ z_{c} \sim \text{Uniform}( -\infty,100000),\\ A_{m} \sim \text{Uniform}(0,1).\\ \]

```
# > viewdata(d)
# * Number of Lesions: 259
# * Number of Images : 57
# . Confidence.Level False.Positives True.Positives
# ------------------- ----------------- ---------------- ---------------
# Obviouly present 3 1 97
# Relatively obvious 2 14 32
# Subtle 1 74 31
fit_a_model_to(d) # Here, a model is fitted to the data "d"
```

we consider the comparison of imaging modality, such as MRI, CT, PET, …

I implement this because Bayesian is suitable for including individual differences,…. the author think this is a venefit of Bayesian ….

*2* readers, *2* modalities and *3* confidence levels.

Confidence Level | Modality ID | Reader ID | Number of Hits | Number of False alarms |
---|---|---|---|---|

3 = definitely present | 1 | 1 | \(H_{3,1,1}\) | \(F_{3,1,1}\) |

2 = equivocal | 1 | 1 | \(H_{2,1,1}\) | \(F_{2,1,1}\) |

1 = questionable | 1 | 1 | \(H_{1,1,1}\) | \(F_{1,1,1}\) |

3 = definitely present | 1 | 2 | \(H_{3,1,2}\) | \(F_{3,1,2}\) |

2 = equivocal | 1 | 2 | \(H_{2,1,2}\) | \(F_{2,1,2}\) |

1 = questionable | 1 | 2 | \(H_{1,1,2}\) | \(F_{1,1,2}\) |

3 = definitely present | 2 | 1 | \(H_{3,2,1}\) | \(F_{3,2,1}\) |

2 = equivocal | 2 | 1 | \(H_{2,2,1}\) | \(F_{2,2,1}\) |

1 = questionable | 2 | 1 | \(H_{1,2,1}\) | \(F_{1,2,1}\) |

3 = definitely present | 2 | 2 | \(H_{3,2,2}\) | \(F_{3,2,2}\) |

2 = equivocal | 2 | 2 | \(H_{2,2,2}\) | \(F_{2,2,2}\) |

1 = questionable | 2 | 2 | \(H_{1,2,2}\) | \(F_{1,2,2}\) |

where, each component \(H\) and \(F\) are non negative integers. By the multi-index notation, for example, \(H_{3,2,1}\) means the hit of the \(1\)-st reader over all images taken by \(2\)-nd modality with reader’s confidence level is \(3\).

So, in conventional notation we may write

\[y = (H_{c,m,r},F_{c,m,r} ;N_L,N_I).\]

The following model with multinomal is not implemeted yet, cuz the author is tired by MCS disease! Ha,,, above me only sky! This package dose not help me! :’-D

\[ \{H_{c,m,r};c=1,2,\cdots C\} \sim \text{Multinomial}(\{p_{c,m,r}(\theta);c=1,2,\cdots C\},N_L),\\ F_{c,m,r} \sim \text{Poisson}(q_c(\theta)).\\ \]

\[ p_{c,m,r}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu_{m,r},\sigma_{m,r})dx,\\ q_c(\theta) := \int_{\theta_c}^{\theta_{c+1}} \frac{d \log \Phi(z)}{dz}dz. \]

\[ A_{m,r} := \Phi (\frac{\mu_{m,r}/\sigma_{m,r}}{\sqrt{(1/\sigma_{m,r})^2+1}}), \\ A_{m,r} \sim \text{Normal} (A_{m},\sigma_{r}^2), \\ \]

where model parameter is \(\theta = (\theta_1,\theta_2,\theta_3,...\theta_C;\mu_{m,r},\sigma_{m,r})\) which should be estimated and \(\Phi\) denotes the cumulative distribution functions of the canonical Gaussian. Note that \(\theta_{C+1} = \infty\).

Someone might consider that this is not a suitable model cuz it dose not includes full heterogenity. However, in MCMC algorithm, such full model leads the author non-convergent issues. So, I am tired. I do not implement this multinomial model yet, but what is the difficulty in Bayesian? The author begin R language in 2~3 years ago with Stan, then first, I implement the full heterogeneity model. But against my effort, it never converges. So, next, the bitch author tried to reduce this individual differences so that the model converges. Then I found this model. So, please do not bother me such questions.

\[ dz_c := z_{c+1}-z_{c},\\ dz_c, \sigma_{m,r} \sim \text{Uniform}(0,\infty),\\ z_{c} \sim \text{Uniform}( -\infty,100000),\\ A_{m} \sim \text{Uniform}(0,1).\\ \]

This is only example, and in this package I implement proper priors. The author thinks the above prior is intuitively the simplest non informative priors without the coordinate free property.

```
fit <- fit_Bayesian_FROC(
ite = 1111,
cha = 1,
summary = TRUE,
Null.Hypothesis = F,
dataList = dd # example data to be fitted a model
)
```

2020 Jun 30, the author implemented FROC model using multinomial distribution, which is the most traditional one.

The author had misunderstood the traditional FROC model,..my model is not wrong, but,,,, the traditional model is simpler so, in this current version, the author implemented the traditional one. Gratias.

As far as I’m concerned, the traditional model and the author’s new model are not equivalent in MCMC sampling. The author compare both model by fitting a new and a classical model to datasets with many zero cells.

The author also compare both models in WAICs, but significant difference is not detected from my point of view.

Poisson rate and multinomial Bernoulli rates should be contained the *regular* interval \([ \epsilon, 1-\epsilon]\) for some fixed small \(\epsilon\), i.e., \(p_c,q_c \in [ \epsilon, 1-\epsilon]\)

Monotonicity \(p_1<p_2<\cdots\) and \(q_1>q_2>\cdots\) also reasonable, where subscripts means rating and a high number indicates a high confidence level.

The author found simultaneous zero hits or false alarms cause a bias in MCMC sampling. If prior is not suitable or non-informative, then such phenomenon occurs and SBC detects it.

So, we have to find the prior to satisfy the monotonicity and the *regular* interval condition.

The following SBC shows our prior is not good, because for some parameter, the rank statistics is not uniformly distributed.

```
stanModel <- stan_model_of_sbc()
Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
stanModel = stanModel,
ite = 233,
M = 111, # or 1111 which tooks a lot of times
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = TRUE)
# or
stanModel <- stan_model_of_sbc()
Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
stanModel = stanModel,
ite = 233,
M = 1111,
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = TRUE)
```

In the following, the author pointed out why frequentist p value is problematic. Of course, under some condition, Bayesian p -value coincides with frequentist p value, so the scheme statistical test is problematic. We shall show the reason in the following simple example.

To tell the truth, I want to use epsilon delta manner, but for mathematics people, I do not use it.

- The next section
*proves*the monotonicity of p value for the most simple statistical test.

I want to publish this proof, but all reviewers are against it. There is a free space to speech here, so please enjoy my logic. I really like this because My heart is in math.

The methods of statistical testing are widely used in medical research. However, there is a well-known problem, which is that a large sample size gives a small p-value. In this section, we will provide an explicit explanation of this phenomenon with respect to simple hypothesis tests.

Consider the following null hypothesis \(H_0\) and its alternative hypothesis \(H_1\); \[\begin{eqnarray*} H_0: \mathbb E[ X_i] &=&m_0, \\ H_1: \mathbb E[ X_i] &>&m_0, \\ \end{eqnarray*}\] where \(\mathbb E[ X_i]\) means the expectation of random samples \(X_i\) from a normal distribution whose variance \(\sigma _0 ^2\) is known. In this situation, the test statistic is given by \[ Z^{\text{test}} := \frac{\overline{X_{n}} -m_0 }{\sqrt{\sigma _0 ^2/n}}, \] where \(\overline{X_{n}} := \sum_{i=1,\cdots,n} X_i /n\) is normally distributed with mean \(m_0\) and standard deviation \(\sigma_0/\sqrt{n}\). Under the null hypothesis, \(Z^{\text{test}}\) is normally distributed with mean \(0\) and a standard deviation \(1\) (standard normal distribution). The null hypothesis is rejected if \(Z^{\text{test}} >z_{2\alpha}\) , where \(z_{2\alpha}\) is a percentile point of the normal distribution, e.g., \(z_{0.025}=1.96 .\)

Suppose that the true distribution of \(X_1, \cdots, X_n\) is a normal distribution with mean \(m_0 + \epsilon\) and variance \(\sigma _0 ^2\), where \(\epsilon\) is an arbitrary fixed positive number. Then \[\begin{eqnarray*} Z^{\text{test}} &=&\frac{\overline{X_{n}} -(m_0+\epsilon -\epsilon) }{\sqrt{\sigma _0 ^2/n}}\\ &=& Z^{\text{Truth}} + \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}}\\ \end{eqnarray*}\] where \(Z^{\text{Truth} }:=(\overline{X_{n}} -(m_0+\epsilon ))/\sqrt{\sigma _0 ^2/n}\).

In the following, we calculate the probability with which we reject the null hypothesis \(H_0\) with confidence level \(\alpha\). \[\begin{eqnarray*} \text{Prob}(Z^{\text{test}} >z_{2\alpha} ) &=&\text{Prob} (Z^{\text{Truth} } + \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}} >z_{2\alpha})\\ &=&\text{Prob} (Z^{\text{Truth} } >z_{2\alpha} - \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}})\\ &=&\text{Prob} (Z^{\text{Truth} } >z_{2\alpha} - \frac{\epsilon}{\sigma _0 }\sqrt{n} )\\ \end{eqnarray*}\] Note that \(\epsilon /\sigma _0\) is called the effect size.

Thus, if \(z_{2\alpha} - \epsilon \sqrt{n} /\sigma _0 < z_{2(1-\beta)}\), i.e., if \(n > ( z_{2\alpha}- z_{2(1-\beta)})^2 \sigma _0 ^2 \epsilon ^{-2}\), then the probability that the null hypothesis is rejected is greater than \(1 - \beta\).

For example, consider the case \(\sigma _0 =1\), \(\alpha =0.05\), and \((1-\beta) =\alpha\), then \(z_{2\alpha}=1.28\) and in this case, for all \(\epsilon>0\), if \(n > 7 \epsilon ^{-2}\) then the probability in which above hypothesis test concludes that the difference of the observed mean from the hypothesized mean is significant is greater than \(0.95\). This means that almost always the p-value is less than 0.05. Thus a large sample size induces a small p-value.

For example,

if \(\epsilon =1\) then by taking a sample size such that \(n > 7\), then almost always the conclusion of the test will be that the observed difference is statistically significant. Similarly,

if \(\epsilon =0.1\) then by taking a sample size such that \(n > 700\), then almost all tests will reach the conclusion that the difference is significant; and

if \(\epsilon =0.01\) then by taking sample size so that \(n > 70000\), then the same problem will arise.

This phenomenon also means that in large samples statistical tests will detect very small differences between populations.

By above consideration we can get the result ``significance difference’’ with respect to any tiny difference \(\epsilon\) by collecting a large enough sample \(n\), and thus we must not use the statistical test.