Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.
Version: | 0.0.1 |
Depends: | R (≥ 3.4.0) |
Imports: | Rcpp (≥ 0.12), Matrix, digest (≥ 0.6.8), sparsepp (≥ 0.2.0) |
LinkingTo: | Rcpp, digest (≥ 0.6.8), sparsepp (≥ 0.2.0) |
Suggests: | testthat, knitr |
Published: | 2018-04-13 |
Author: | Vitalie Spinu [aut, cre] |
Maintainer: | Vitalie Spinu <spinuvit at gmail.com> |
BugReports: | https://github.com/vspinu/mlvocab/issues |
License: | GPL-3 |
URL: | https://github.com/vspinu/mlvocab/ |
NeedsCompilation: | yes |
SystemRequirements: | C++11 |
Materials: | README |
CRAN checks: | mlvocab results |
Reference manual: | mlvocab.pdf |
Package source: | mlvocab_0.0.1.tar.gz |
Windows binaries: | r-devel: mlvocab_0.0.1.zip, r-release: mlvocab_0.0.1.zip, r-oldrel: mlvocab_0.0.1.zip |
OS X binaries: | r-release: mlvocab_0.0.1.tgz, r-oldrel: mlvocab_0.0.1.tgz |
Please use the canonical form https://CRAN.R-project.org/package=mlvocab to link to this page.