HTML tables are a valuable data source but extracting and recasting these data into a useful format can be tedious. htmltab is a package for extracting structured information from HTML tables. It is similar to readHTMLTable() of the XML package but provides two major advantages. First, the function automatically expands row and column spans in the header and body cells. Second, users are given more control over the identification of header and body rows which will end up in the R table. Additionally, the function preprocesses table code, removes unneeded parts and so helps to alleviate the need for tedious post-processing.
Version: | 0.6.0 |
Depends: | R (≥ 3.0.0) |
Imports: | XML (≥ 3.98.1.3), httr (≥ 1.0.0) |
Suggests: | testthat, knitr, magrittr (≥ 1.5), tidyr |
Published: | 2015-07-22 |
Author: | Christian Rubba [aut, cre] |
Maintainer: | Christian Rubba <christian.rubba at gmail.com> |
BugReports: | https://github.com/crubba/htmltab/issues |
License: | MIT + file LICENSE |
URL: | https://github.com/crubba/htmltab |
NeedsCompilation: | no |
Materials: | README NEWS |
In views: | WebTechnologies |
CRAN checks: | htmltab results |
Reference manual: | htmltab.pdf |
Vignettes: |
htmltab case studies |
Package source: | htmltab_0.6.0.tar.gz |
Windows binaries: | r-devel: htmltab_0.6.0.zip, r-release: htmltab_0.6.0.zip, r-oldrel: htmltab_0.6.0.zip |
OS X Snow Leopard binaries: | r-release: not available, r-oldrel: not available |
OS X Mavericks binaries: | r-release: htmltab_0.6.0.tgz |
Old sources: | htmltab archive |