This tutorial introduces the WaterML R package. This tutorial shows an example how to retrieve data from the Hydrologic Information System and do statistical analysis in R.

- Load the required libraries: WaterML for accessing CUAHSI HIS data. The package can be found in the R CRAN package repository

```
#import required libraries
library(WaterML)
```

- Find the CUAHSI HIS services from the HIS Central catalogue. The list of available services registered in HIS Central is also published here: http://hiscentral.cuahsi.org/pub_services.aspx. The GetServices() function returns a table with the URL, description, and citation of each service.

```
#get the list of supported CUAHSI HIS services
services <- GetServices()
```

`View(services)`

- Define the CUAHSI HIS service that you are connecting to by giving the URL to that service’s WSDL file. This example uses a service from the Ipswich River Watershed Association:
`http://hydroportal.cuahsi.org/ipswich/cuahsi_1_1.asmx?WSDL`

that enlists volunteers to collect data on the health of the Ipswich River and its tributaries in Massachusetts, USA. We can use the`GetVariables()`

and`GetSites()`

functions to get the tables of variables and sites on the server.

```
#point to an CUAHSI HIS service and get a list of the variables and sites
server <- "http://hydroportal.cuahsi.org/ipswich/cuahsi_1_1.asmx?WSDL"
variables <- GetVariables(server)
sites <- GetSites(server)
```

- Next we will select one site and find which variables are measured at that site. In this example we choose the site “Fish Brook, Brookview Rd, Boxford” with the full site code “IRWA:FB-BV”. Note that you can learn more about the variables at this site viewing the SiteInfo data table in RStudio.

```
#get full site info for all sites using the GetSiteInfo method
siteinfo <- GetSiteInfo(server, "IRWA:FB-BV")
```

`View(siteinfo)`

- Now we will get the data values using the GetValues method for two variables at the site: water temperature (full variable code
`IRWA:Temp`

) and dissolved oxygen (full variable code`IRWA:DO`

). In this example we get the values for all available days. Note that we can also use the`startDate`

and`endDate`

parameters to restrict the time period of interest. To get help on the GetValues function, you can type`?GetValues`

in the R console. Note that for this particular site there are 21 Temperature and 22 dissolved oxygen observations.

```
#get full site info for all sites using the GetSiteInfo method
Temp <- GetValues(server,siteCode="IRWA:FB-BV",variableCode="IRWA:Temp")
DO <- GetValues(server, siteCode="IRWA:FB-BV",variableCode="IRWA:DO")
```

- Plot the time series of temperature and dissolved oxygen. We use the plot function for the new water temperature plot and we use the
`points()`

function for adding the dissolved oxygen data points to the existing plot.

```
plot(DataValue~time, data=Temp, col="red")
points(DataValue~time, data=DO, col="blue")
```

- Create a merged table with columns: Time, DO (Dissolved Oxygen) and Temp (Temperature). We can create this table using the merge function based on the time column. Note that we renamed the automatically assigned column names in the merged table from DataValue.x to “DO” and from DataValue.y to “Temp”.

```
#merge our two tables based on the time column
data <- merge(DO, Temp, by="time")
#rename the column DataValue.x in the merged table to "DO"
names(data)[names(data)=="DataValue.x"] <- "DO"
#rename the column DataValue.y in the merged table to "Temp"
names(data)[names(data)=="DataValue.y"] <- "Temp"
```

- Now you can plot the data as scatter plot of dissolved oxygen concentration versus temperature.

`plot(DO~Temp, data=data)`

- Finally, we can fit a linear regression model to see is there is a relationship between water temperature and dissolved oxygen concentration at this site.

```
# Perform a linear regression on the dissolved oxygen vs. temperature values
model <- lm(DO~Temp, data=data)
```

```
summary(model)
abline(model)
```

The code creates two outputs when run in RStudio. First, it creates a scatter plot of dissolved oxygen concentration versus water temperature with the linear regression line.

Second, it outputs the results from the regression analysis. From these results, there appears to be a significant negative linear relationship between water temperature and dissolved oxygen at this site.

```
#>
#> Call:
#> lm(formula = DO ~ Temp, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.1075 -1.2097 -0.5861 1.1340 3.1318
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 8.90404 0.75302 11.824 1.26e-09 ***
#> Temp -0.16965 0.04813 -3.525 0.0026 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.584 on 17 degrees of freedom
#> Multiple R-squared: 0.4223, Adjusted R-squared: 0.3883
#> F-statistic: 12.43 on 1 and 17 DF, p-value: 0.002599
```

This tutorial shows how you can use the WaterML library in R to access data from a CUAHSI HIS web service directly within R without the need to first download data to your local computer. While this was demonstrated for a data service hosted by Ipswich River Watershed Association, the WaterML R package can be used to access data from any compliant CUAHSI HIS web service including the 100+ data services listed on the HIS Central website.

For additional information on the tutorial and the WaterML R Package, please refer to:

Jiri Kadlec, Bryn StClair, Daniel P.Ames, Richard A. Gill (2015). WaterML R package for managing ecological experiment data on a CUAHSI HydroServer. Ecological Informatics, 28, 19-28. http://www.sciencedirect.com/science/article/pii/S1574954115000801