Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, a Python back end with 'spaCy' <https://spacy.io> or the Java back end 'CoreNLP' <http://stanfordnlp.github.io/CoreNLP/>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and word embeddings. Summary statistics regarding token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses.
Version: | 2.0.3 |
Depends: | R (≥ 2.10) |
Imports: | dplyr (≥ 0.7.4), Matrix (≥ 1.2), stringi, stats, methods, utils |
Suggests: | udpipe (≥ 0.3), reticulate (≥ 1.4), rJava (≥ 0.9-8), RCurl (≥ 1.95), knitr (≥ 1.15), rmarkdown (≥ 1.4), testthat (≥ 1.0.1), covr (≥ 2.2.2) |
Published: | 2018-01-22 |
Author: | Taylor B. Arnold [aut, cre] |
Maintainer: | Taylor B. Arnold <taylor.arnold at acm.org> |
BugReports: | http://github.com/statsmaths/cleanNLP/issues |
License: | LGPL-2 |
URL: | https://statsmaths.github.io/cleanNLP/ |
NeedsCompilation: | no |
SystemRequirements: | Python (>= 2.7.0); spaCy <https://spacy.io/> (>= 2.0); Java (>= 7.0); Stanford CoreNLP <http://nlp.stanford.edu/software/corenlp.shtml> (>= 3.7.0) |
Citation: | cleanNLP citation info |
Materials: | NEWS |
CRAN checks: | cleanNLP results |
Reference manual: | cleanNLP.pdf |
Vignettes: |
Exploring the State of the Union Addresses: A Case Study with cleanNLP |
Package source: | cleanNLP_2.0.3.tar.gz |
Windows binaries: | r-devel: cleanNLP_2.0.3.zip, r-release: cleanNLP_2.0.3.zip, r-oldrel: cleanNLP_2.0.3.zip |
OS X binaries: | r-release: cleanNLP_2.0.3.tgz, r-oldrel: cleanNLP_2.0.3.tgz |
Old sources: | cleanNLP archive |
Please use the canonical form https://CRAN.R-project.org/package=cleanNLP to link to this page.