Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford's CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.
Version: | 0.24 |
Depends: | dplyr, magrittr, R (≥ 2.10) |
Imports: | readr, rJava, RCurl |
Published: | 2016-11-11 |
Author: | Taylor B. Arnold |
Maintainer: | Taylor B. Arnold <taylor.arnold at acm.org> |
License: | GPL-3 |
NeedsCompilation: | no |
SystemRequirements: | Java (>= 7.0); Stanford CoreNLP <http://nlp.stanford.edu/ software/corenlp.shtml> (>= 3.5.2) |
Materials: | README |
CRAN checks: | cleanNLP results |
Reference manual: | cleanNLP.pdf |
Package source: | cleanNLP_0.24.tar.gz |
Windows binaries: | r-devel: cleanNLP_0.24.zip, r-release: cleanNLP_0.24.zip, r-oldrel: cleanNLP_0.24.zip |
OS X Mavericks binaries: | r-release: cleanNLP_0.24.tgz, r-oldrel: cleanNLP_0.24.tgz |
Please use the canonical form https://CRAN.R-project.org/package=cleanNLP to link to this page.