A package for sampling weather stations via Wunderground


Wunderscraper helps tap and organize a wealth of real-time weather data from Wunderground. The real-time nature of Wunderground’s vast network of weather stations must be sampled; it is impossible to collect data from all the stations all the time. Wunderscraper provides flexible spatial and temporal sampling to efficiently build a representation of weather at hyper local scales.




Sampling is a method for constructing a representation of a population. At the heart of sampling theory is independence; sampling one unit shouldn’t change the probability of sampling another. Spatial sampling is especially challenging because units are not independent. Measurements at one weather station will be correlated with those at nearby stations. One way to preserve spatial independence is to partition space into units that are independent, and draw a representation from each partition.

Sampling methods offer a couple of basic tools for preserving independence and focusing on a population of interest. Multistage sampling is the primary tool for partitioning a population into independent units. The initial stages draw samples from a large unit, like regions or states, and later stages draw samples from smaller units nested within the larger ones, eg counties or zip codes. Stratified sampling is a tool for ensuring sub-populations recieve adequate coverage. Stratified sampling repeats a sample stage for each sub-population. Stratified sampling is useful for evenly covering sub-populations, or for oversampling a particularly small sub-population. See the examples in the next section for more details.