Text corpus data analysis, with full support for Unicode. Functions for reading data from newline-delimited JSON files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies (including n-grams).
Version: | 0.8.0 |
Depends: | R (≥ 2.10) |
Imports: | Matrix |
Suggests: | knitr, testthat |
Published: | 2017-07-19 |
Author: | Patrick O. Perry [aut, cre], Martin Porter and Richard Boulton [ctb, cph] (Snowball), Unicode, Inc. [ctb, cph] (Unicode Character Database) |
Maintainer: | Patrick O. Perry <pperry at stern.nyu.edu> |
BugReports: | https://github.com/patperry/r-corpus/issues |
License: | Apache License (== 2.0) | file LICENSE |
URL: | https://github.com/patperry/r-corpus |
NeedsCompilation: | yes |
Materials: | README NEWS |
CRAN checks: | corpus results |
Reference manual: | corpus.pdf |
Vignettes: |
Chinese text handling Unicode: Emoji, accents, and other international text |
Package source: | corpus_0.8.0.tar.gz |
Windows binaries: | r-devel: corpus_0.8.0.zip, r-release: corpus_0.8.0.zip, r-oldrel: corpus_0.8.0.zip |
OS X El Capitan binaries: | r-release: corpus_0.8.0.tgz |
OS X Mavericks binaries: | r-oldrel: corpus_0.8.0.tgz |
Old sources: | corpus archive |
Please use the canonical form https://CRAN.R-project.org/package=corpus to link to this page.