sentometrics: An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction


The sentometrics package is designed to perform time series analysis based on textual sentiment. Put differently, it is an integrated framework for textual sentiment time series aggregation and prediction. It accounts for the intrinsic challenge that, for a given text, sentiment can be computed in many different ways, as well as the large number of possibilities to pool sentiment across texts and time. This additional layer of manipulation does not exist in standard text mining and time series analysis packages. As a final outcome, the package provides an automated means to econometrically model the impact of sentiment in texts on a given variable, by first computing a wide range of textual sentiment time series and then selecting those that are most informative. The package therefore integrates the fast qualification of sentiment from texts, the aggregation into different sentiment measures and the optimized prediction based on these measures.

The package implements the main methodology developed in the paper “Questioning the news about economic growth: Sparse forecasting using thousands of news-based sentiment values” (Ardia, Bluteau and Boudt, 2017). See the project page and the vignette for respectively a brief and an extensive introduction to the package.


To install the package from CRAN, simply do:


The latest development version of sentometrics is available at To install this version (which may contain bugs!), execute:



Please cite sentometrics in publications. Use citation("sentometrics").


This software package originates from a Google Summer of Code 2017 project.