rdomains: Get the category of content hosted by a domain

Install and Load the package

The latest development version of the package will always be on GitHub. To install the package from GitHub and to load the installed package:

#library(devtools)
install_github("soodoku/domain_classifier/rdomains")

To install the package from CRAN, type in:

install.packages("rdomains")

Next, load the package:

library(rdomains)

Shalla

To get category of the content from shallalist, first download the latest file using:

get_shalla_data()

And then, get the category using:

shalla_cat("http://www.google.com")
##   domain_name shalla_category
## 1  google.com   searchengines

DMOZ

To get category of the content from DMOZ, first download the archived parsed CSV file using:

get_dmoz_data()

And then, get the category using:

dmoz_cat("http://www.google.com")

ML

Probability that Domain Hosts Adult Content Based on features of Domain Name and Suffix alone:

adult_ml1_cat("http://www.google.com")
##   domain_name  category
## 1  google.com 0.3133728

Virustotal

Start by getting the API key from virustotal.

Get virustotal category by running:

virustotal_cat("http://www.google.com")
##                 domain   bitdefender dr_web  alexa        google       websense             trendmicro
## 1 http://www.google.com searchengines  chats google searchengines advertisements search engines portals

Trusted (McAfee)

Get the content category of a domain according to McAfee (Trusted):

trusted_cat("http://www.google.com")
##                    url          status   categorization   reputation
## 2 http://www.google.com Categorized URL - Search Engines Minimal Risk

Alexa Category

To get the category of content from Amazon (Alexa) (which provides it via DMOZ), start by getting credentials from https://aws.amazon.com/. Next, set the environment variables:

Sys.setenv("AWS_ACCESS_KEY_ID", "key_id") 
Sys.getenv("AWS_SECRET_ACCESS_KEY", "secret_key")

Then run,

alexa_cat(domain="http://www.google.com")[1,]
##                   Title                                           AbsolutePath
## 1 Search Engines/Google Top/Computers/Internet/Searching/Search_Engines/Google