The Rating Matrix

First thing first, to use a RS you should start from a data set, more specifically a rating matrix. In this case rrecsys is built to start process any available matrix as long as the user has a clear idea of the composition of its rating matrix. In a rating matrix all 0 or NULL values will be considered as missing values. The users will be on rows and the items will be on columns.

Currently rrecsys is equipped with the MovieLens Latest dataset. We will use this dataset for demonstration. To load it:

data("mlLatest100k")

To define a Likert scale data set a user would have to specify the domain of the ratings, minimum and maximum of the ratings. The precision of the rating, currently in rrecsys is either half star or full scale (argument halfStar). For example:

ML <- defineData(mlLatest100k, minimum = .5, maximum = 5, halfStar = TRUE)
ML
## Dataset containing 718 users and  8927 items.

A dataset can be as well binarized by:

binML <- defineData(mlLatest100k, binary = TRUE, goodRating = 3)
binML
## Binary dataset containing 718 users and  8927 items.

In this case all the rating in mlLatest100k with a value larger or equal to goodRating are converted to 1 and the resto to 0. This conversion has no scientific relevance, just used for demonstration purposes on OCCF.

The output of defineData is an S4 object (class dataSet). This would be the main input to the dispatcher in rrecsys for training a model.

A dataSet object can be explored with the following methods:

# Number of times an item was rated.
colRatings(ML)
# Number of times a user has rated.
rowRatings(ML)
# Total number of rating in the rating matrix.
numRatings(ML)
# Sparsity.
sparsity(ML)

A dataSet object can be cropped to contain a specific number of ratings:

# Removing users that rated less than 40 items and items that were rated less than 30 times.
subML <- ML[rowRatings(ML)>=40, colRatings(ML)>=30]
sparsity(ML)
## [1] 0.9843619
sparsity(subML)
## [1] 0.8683475