Processes big text data files in batches efficiently. For this purpose, it offers functions for splitting, parsing, tokenizing and creating a vocabulary. Moreover, it includes functions for building either a document-term matrix or a term-document matrix and extracting information from those (term-associations, most frequent terms). Lastly, it embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

Version: 1.0.9
Depends: R (≥ 3.2.3), Matrix
Imports: Rcpp (≥ 0.12.5), R6, data.table, utils
LinkingTo: Rcpp, RcppArmadillo (≥ 0.7.5), BH
Suggests: testthat, covr, knitr, rmarkdown
Published: 2018-01-16
Author: Lampros Mouselimis
Maintainer: Lampros Mouselimis <mouselimislampros at gmail.com>
BugReports: https://github.com/mlampros/textTinyR/issues
License: GPL-3
Copyright: inst/COPYRIGHTS
textTinyR copyright details
URL: https://github.com/mlampros/textTinyR
NeedsCompilation: yes
SystemRequirements: The package requires the following two components : A C++11 compiler and on a unix OS the boost-locale headers and libraries ( boost >= 1.55.0 , www.boost.org ). Debian/Ubuntu: libboost-locale-dev, Fedora : yum install boost-devel, OSX/brew : detailed installation instructions can be found in the README file
