Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Version: 1.2.2
Imports: rJava
Suggests: RCurl
Published: 2014-08-21
Author: See AUTHORS file.
Maintainer: Mario Annau <mario.annau at>
License: Apache License (== 2.0)
NeedsCompilation: no
