BayesMallows
: An R Package for Probabilistic Preference Learning with the Mallows Rank ModelThe BayesMallows
package implements methods for Bayesian preference learning with the Mallows rank model, as originally described in Vitelli et al. (2018), and further developed in Asfaw et al. (2016) and Crispino et al. (2018). This vignette describes the usage of the package, starting from the complete data cases, through top-\(k\) rankings, pairwise comparisons, and finally clustering. We refer to the above mentioned papers, as well as the review Liu et al. (2019) for a thorough description of the methods. The necessary methods for data preprocessing, tuning of algorithms, and assessment of the posterior distributions will be described along the way.
Here is an overview of the most used functions. You can read their documentation and see examples with ?function_name
.
Function Name | Description |
---|---|
compute_mallows |
Compute the posterior distribution of the Bayesian Mallows model. This is the main function of the package. Returns an object of class BayesMallows . |
compute_mallows_mixtures |
Compute multiple Mallows models with different number of mixture components. This is a convenience function for determining the number of mixtures to use. |
sample_mallows |
Sample from the Mallows model. |
plot.BayesMallows |
Quick plots of the posterior densities of parameters of the Mallows model. |
assess_convergence |
Study the convergence of the Markov chain, in order to determine burnin and other algorithm parameters. |
plot_elbow |
Create an elbow plot for comparing models with different number of clusters. |
plot_top_k |
Plot the top-\(k\) rankings. Particularly relevant when the data is in the form of pairwise comparisons. |
assign_cluster |
Compute the cluster assignment of assessors. |
compute_consensus |
Compute the CP or MAP consensus ranking of the latent ranks. |
compute_posterior_intervals |
Compute Bayesian posterior intervals for the parameters. |
generate_initial_ranking |
Generate an initial ranking, for the case of missing data or pairwise comparisons. |
generate_transitive_closure |
Generate the transitive closure for a set of pairwise comparisons. |
estimate_partition_function |
Estimate the partition function of the Mallows model using either importance sampling or an asymptotic approximation. |
Here is an overview of the example datasets in BayesMallows
. You can read their documentation with ?dataset_name
, or search for an example in this vignette.
Dataset Name | Description |
---|---|
beach_preferences |
Stated pairwise preferences between random subsets of 15 images of beaches, by 60 assessors. |
sushi_rankings |
Complete rankings of 10 types of sushi by 5000 assessors. |
potato_visual |
Complete rankings of 20 potatoes by weight, based on visual inspection, by 12 assessors. |
potato_weighing |
Complete rankings of 20 potatoes by weight, where the assessors were allowed to weigh the potatoes in their hands, by 12 assessors. |
potato_true_ranking |
Vector of true weight rankings for the 20 potatoes in the example datasets potato_visual and potato_weighing . |
We here give an informal review of the Mallows model (Mallows (1957)). The distribution of a ranking \(r \in \mathcal{P}_{n}\) of \(n\) items is modeled as
\[ P(r | \alpha, \rho) = Z_{n}(\alpha)^{-1} \exp\left\{-\frac{\alpha}{n} d(r, \rho)\right\} 1_{\mathcal{P}_{n}}(r), \]
where \(\mathcal{P}_{n}\) is the set of all permutations of \(1, \dots, n\), \(\alpha\) is a scale parameter, \(\rho \in \mathcal{P}_{n}\) is a latent consensus ranking, \(d(\cdot, \cdot)\) is a distance measure, \(1_{S}(\cdot)\) is the indicator function for the set \(S\), and \(Z_{n}(\alpha)\) is the partition function, or normalizing constant.
Given \(N\) observed rankings, \(R_{1}, \dots, R_{N}\), the likelihood of the model is
\[ P(R_{1}, \dots, R_{N} | \alpha, \rho) = Z_{n}(\alpha)^{-N} \exp\left\{-\frac{\alpha}{n} \sum_{j=1}^{N}d(R_{j}, \rho)\right\} \prod_{j=1}^{N} \left\{1_{\mathcal{P}_{n}}(R_{j}) \right\}. \]
The rankings
argument to compute_mallows
is assumed to be a matrix of the form \((R_{1}, R_{2}, \dots, R_{N})^{T}\), i.e., each row contains a ranking and each column is an item.
For \(\alpha\) we use an exponential prior, with density \(\pi(\alpha | \lambda) = \lambda \exp(-\lambda \alpha).\) The rate parameter \(\lambda\) can be set by the user with the lambda
argument to compute_mallows
. For \(\rho\) we assume a uniform prior distribution on \(\mathcal{P}_{n}\), with density \(\pi(\rho) = 1_{\mathcal{P}_{n}}(\rho) /n!.\)
We use a Metropolis-Hastings algorithm for computing the posterior distributions of \(\alpha\) and \(\rho\). We propose \(\alpha\) from a lognormal distribution \(\log \mathcal{N}(\log(\alpha), \sigma_{\alpha}^{2})\). We propose \(\rho\) with a leap-and-shift algorithm, described in detail in Vitelli et al. (2018). The standard deviation \(\sigma_{\alpha}^{2}\) and the leap size can be set by the user with the arguments alpha_prop_sd
and leap_size
to compute_mallows
.
If each assessor \(j\) has ranked a subset of the items \(\mathcal{A}_{j}\), we use data augmentation to fill in the missing ranks. We define the augmented data vectors \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\), and use a uniform prior for each assessor with support \(\mathcal(P)_{n} \setminus R_{j}\), i.e., the set of rankings not already chosen. The Metropolis-Hastings algorithm now alternates between sampling \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\) given the current \(\alpha\) and \(\rho\), and sampling \(\alpha\) and \(\rho\) given the current \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\).
When the assessors have stated a set of pairwise comparisons, rather than rankings, we use the same data augmentation ideas as for partial rankings, but the proposal distribution is slightly more complicated in order to ensure that the proposed ranking is compliant with the ordering implied by the pairwise comparisons. In addition, the transitive closure of the stated ordering has to be computed, in order to find all implied orderings. From a user perspective, no new algorithm parameters need to be considered.
When the assessor pool is heterogeneous, one might assume that there exist several latent consensus rankings, \(\rho_{1}, \dots, \rho_{C}\), one for each cluster of assessors. Letting \(z_{1}, \dots, z_{N} \in\{1, \dots, C\}\) assign each assessor to each cluster, the likelihood of the observed rankings is
\[P(R_{1}, \dots, R_{N} | \{\alpha_{c}, \rho_{c}\}_{c=1,\dots,C}, z_{1},\dots,z_{N}) = \prod_{j=1}^{N}\frac{1_{\mathcal{P}_{n}}(\rho)}{Z_{n}(\alpha_{z_{j}})}\exp\left\{ -\frac{\alpha_{z_{j}}}{n} d(R_{j}, \rho_{z_{j}})\right\}. \] For the scale parameters \(\alpha_{1}, \dots, \alpha_{C}\) we assume the exponential prior as before, all with the same rate parameter \(\lambda\). We assume that the cluster labels are a priori distributed according to \(P(z_{1}, \dots, z_{N} | \tau_{1}, \dots, \tau_{C}) = \prod_{j=1}^{N} \tau_{z_{j}}\), where \(\tau_{c}\) is the a priori probability that an assessor belongs to cluster \(c\). For \(\tau_{1}, \dots, \tau_{C}\) we assume the Dirichlet prior \(\pi(1, \dots, C) = \Gamma(\psi C)\Gamma(\psi)^{-C}\prod_{c=1}^{C}\tau_{c}^{\psi - 1}\), where \(\Gamma(\cdot)\) is the gamma function. The user can control the value of \(\psi\) with the psi
argument to compute_mallows
.
Asfaw, D., V. Vitelli, O. Sorensen, E. Arjas, and A. Frigessi. 2016. “Time‐varying Rankings with the Bayesian Mallows Model.” Stat 6 (1): 14–30. https://onlinelibrary.wiley.com/doi/abs/10.1002/sta4.132.
Crispino, M., E. Arjas, V. Vitelli, N. Barrett, and A. Frigessi. 2018. “A Bayesian Mallows approach to non-transitive pair comparison data: how human are sounds?” Accepted for Publication in Annals of Applied Statistics.
Liu, Qinghua, Marta Crispino, Ida Scheel, Valeria Vitelli, and Arnoldo Frigessi. 2019. “Model-Based Learning from Preference Data.” Annual Review of Statistics and Its Application 6 (1). https://doi.org/10.1146/annurev-statistics-031017-100213.
Mallows, C. L. 1957. “Non-Null Ranking Models. I.” Biometrika 44 (1/2): 114–30.
Vitelli, V., O. Sorensen, M. Crispino, E. Arjas, and A. Frigessi. 2018. “Probabilistic Preference Learning with the Mallows Rank Model.” Journal of Machine Learning Research 18 (1): 1–49. http://jmlr.org/papers/v18/15-481.html.