The evaluation module is based on the k-fold cross-validation method. A stratified random selection procedure is applied when dividing the rated items of each user into \(k\) folds such that each user is uniformly represented in each fold, i.e. the number of ratings of each user in any fold differs at most by one. For k-fold cross validation each of the \(k\) disjoint fractions of the ratings are used \(k-1\) times for training (i.e. \(\subset R_{train}\)) and once for testing (i.e. \(\subset R_{test}\)). Practically, ratings in \(R_{test}\) are set as missing in the original dataset and predictions/recommendations are compared to \(R_{test}\) to compute the performance measures.

We included many popular performance metrics. These are mean absolute error (MAE), root mean square error(RMSE), Precision, Recall, F1, True and False Positives, True and False Negatives, normalized discounted cumulative gain (NDCG), rank score, area under the ROC curve (AUC) and catalog coverage. RMSE and MAE metrics are computed according to their two variants, user-based vs. global. The user-based variant weights each user uniformly by computing the metric for each user separately and averaging over all users while in the global variant users with larger test sets have more weight.

To evaluate via rrecsys we must start my creating the evaluation model:

e <- evalModel(smallML, folds = 5)

The same model may be used on different algorithms. To evaluate a prediction using this model:

evalPred(e, "ibknn", neigh = 5)

An evaluation on a recommendation:

evalRec(e, "funk", k = 5, lambda = 0.001, goodRating = 3, topN = 3)

The evalPred and evalRec arguments correspond to the arguments of the algorithm that we want to evaulate. evalRec, obviosly, has the topN argument and two additional arguments. The goodRating that defines a threshold to distinguish between a negative rating and a positive rating. The alpha argument is the ranking half-life for the rank score metric.

To compute the AUC:

getAUC(e, "globalAv")