Abstract
In an effort to better understand the factors that influence hypertension and cardiovascular disease, visualization tools and metrics are often employed. However, these tools typically exist in silos through proprietary software. Until now, there has yet to be a comprehensive open-source R package that provides the necessary tools for analyzing such data in one place. Thebp
package provides an extensive framework for researchers to analyze both ABPM and non-ABPM blood pressure data in R through a variety of statistical methods and metrics from the literature as well as various data visualizations, with minimal code necessary to do so. This paper illustrates the main features of the bp
package by analyzing both a single-subject and multi-subject dataset.
Despite the tremendous progress in the medical field, cardiovascular disease (CVD) remains the leading cause of death worldwide. Hypertension, specifically, affects over 1.1 billion people annually according to the American Heart Association [9]. This package serves to visualize and quantify various aspects of hypertension in a more digestible format using various metrics proposed in the literature.
Blood pressure data can be analyzed at varying degrees of granularity depending on the reading frequency, presence of a sleep indicator, and whether or not the temporal structure is accounted for. These factors almost always depend on the type of device used, where ABPM monitors are predominantly used for the short term (within 24 hours) and home monitoring devices or office readings are used for measuring variability over the medium and long term (day-to-day, visit-to-visit, etc) [11]. Unlike continuous heart rate monitors or continuous glucose monitors, there are currently no commercially-available continuous blood pressure monitors available for the middle to long term, posing a unique challenge for research.
Of primary concern to researchers is the ability to accurately quantify blood pressure variability (BPV). BPV has been shown to be an important factor in predicting cardiovascular events and sudden death, especially during susceptible periods such as the first two hours of waking up [10]. There have been many proposed methods for characterizing this variability; this package seeks to incorporate as many of these metrics as possible.
We introduce the first comprehensive open-source R package, bp
, that both analyzes and visualizes blood pressure data. In an effort to help clinicians make sense of their patients’ data without requiring multiple software platforms for data processing, bp
uses only a minimal amount of code to do so and offers additional capabilities beyond the traditional proprietary counterparts. At the time of writing, to the best of the authors’ knowledge, there are currently no other available software packages through the Comprehensive R Archive Network (CRAN) dedicated to blood pressure analysis.
In this paper, we demonstrate the main functionality of the bp
package by exploring and analyzing both a single-subject pilot study of [ 8 ] and a multi-subject study, HYPNOS [ 11 ], to illustrate the differences between dataset structures and elaborate how to adjust the settings within the R package to accommodate either.
Blood pressure monitoring devices work by measuring the pressure of the artery’s restricted blood flow; for digital devices, the vibrations are translated into electrical signals. Unlike home monitors that only take readings upon the subject’s initiation, ambulatory blood pressure monitoring (ABPM) devices take automatic readings at pre-specified intervals over a 24-hour period or longer.
ABPM allows medical professionals to analyze blood pressure during sleep which has been shown to be a more accurate predictor of cardiovascular events than daytime blood pressure. ABPM also allows researchers to discern true hypertension from “whitecoat” hypertension in an office or laboratory setting. Because of the burden of assembling the device (and because of the lack of a commercial-grade alternative), ABPM measurements are intended for the short-term of 24-hours to a few days.
Home monitoring on the other hand, offers individuals the ability to record their blood pressure at will and can be tracked easily using mobile apps over the long-term of weeks, months, or years. However, because the user has to initiate the recording, readings cannot be taken during sleep.
As the nature of the two devices inhibits certain functionality depending on which device is used, we outline how to effectively analyze data for both types of devices.
According to the American Heart Association, there are currently 6 blood pressure stages that correspond to the readings from the monitoring devices: Low (Hypotension), Normal, Elevated, Stage 1 Hypertension, Stage 2 Hypertension, Hypertensive Crisis. Below is a table outlining the categories according to their definitions. Note that because of the ambiguity between Normal, Elevated, and Stage 1 diastolic blood pressure readings (because of the similar thresholds), this package splits the difference and sets a default threshold for Elevated DBP from 80 - 85 and Stage 1 Hypertension from 85 - 90. These thresholds can be adjusted by the user where applicable.
Blood Pressure Category | Systolic (mmHg) | Diastolic (mmHg) | |
---|---|---|---|
Low (Hypotension) | Less than 100 | and | Less than 60 |
Normal | 100 - 120 | and | 60 - 80 |
Elevated | 120 - 129 | and | 60 - 80 |
Stage 1 Hypertension | 130 - 139 | or | 80 - 89 |
Stage 2 Hypertension | 140 - 180 | or | 90 - 120 |
Hypertensive Crisis | Higher than 180 | and/or | Higher than 120 |
bp
PackageThe general workflow of the bp
package consists of 1) a data processing stage and 2) an analysis stage, in ideally as little as two lines of code. The processing stage formats the user’s supplied input data in such a way that it adheres to the rest of the bp
functions. The analysis stage uses the processed data to quantify various attributes of the blood pressure relationships or to provide various visualizations. One of the key abilities of the bp
package is bp_report
function which generates a report that combines such visualization plots into one easily digestible summary for clinicians or researchers to interpret an individual’s (or multiple individuals’) blood pressure results. We will walk through each of these stages in the subsequent sections.
process_data
functionBefore any analysis can be done, the user-supplied data set must be first processed into the proper format using process_data
to adhere to package data structure requirements and naming conventions. This function ensures that user-supplied data columns aren’t double counted or missed, since blood pressure data are often inconsistent and come from a wide variety of formats. While a tedious initial step, it will save time in the long-run as the resulting processed data will not require any future specification, which can then be directly plugged into the analysis functions. It is worth noting that if the user-supplied data set already adheres to the column naming conventions and data types, then the process_data
function will be unnecessary. However, it is good practice to still make use of this function as a sanity check to verify all available variables.
The basic workflow is to load in the user-supplied unprocessed raw data, process it with the process_data
function and save to a new dataframe. Note that the capitalization does not matter when specifying the columns.
## Load the sample hypnos_data
## In this scenario, the hypnos_data acts as the "user-supplied" data that is to be processed
data("hypnos_data")
## Assign the output of the process_data function to a new dataframe object
hypnos_proc <- process_data(hypnos_data,
sbp = 'syst',
dbp = 'diast',
bp_datetime = 'date.time',
id = 'id',
visit = 'visit',
hr = 'hr',
wake = 'wake',
pp = 'pp',
map = 'map',
rpp = 'rpp')
Notice how the column names of the original hypnos_data
changed in the processed data
. Notably, SYST
became SBP
, DIAST
became DBP
, and DATE.TIME
became DATE_TIME
.
names(hypnos_data)
#> [1] "NR." "DATE.TIME" "SYST" "MAP" "DIAST" "HR"
#> [7] "PP" "RPP" "WAKE" "ID" "VISIT" "DATE"
names(hypnos_proc)
#> [1] "ID" "DATE" "DATE_TIME" "VISIT" "WAKE"
#> [6] "SBP" "DBP" "MAP" "PP" "HR"
#> [11] "RPP" "NR." "TIME_OF_DAY" "DAY_OF_WEEK" "SBP_CATEGORY"
#> [16] "DBP_CATEGORY"
While the results seem to be trivial at first glance, let’s see what happens when we use a much different data set with a completely different naming convention: the bp_jhs
data set. Unlike hypnos_data
which has all of the available columns needed in the process_data
function with multiple subjects, bp_jhs
is a single-subject data set without many of the multi-subject identifiers such as ID
, VISIT
, or WAKE
(as it is non-ABPM data). Further, there is no MAP
or PP
column, but these (as we will see) can be automatically created.
## Load the sample bp_jhs data set
## As before, this is what will be referred to as the "user-supplied" data set
data("bp_jhs")
## Assign the output of the process_data function to a new dataframe object
jhs_proc <- process_data(bp_jhs,
sbp = 'sys.mmhg.',
dbp = 'dias.mmhg.',
bp_datetime = 'datetime',
hr = 'PULSE.BPM.')
#> No PP column found. Automatically generated from SBP and DBP columns.
#> No RPP column found. Automatically generated from SBP and HR columns.
#> No MAP column found. Automatically generated from SBP and DBP columns.
#> NOTE: Created DATE column from DATE_TIME column
head(jhs_proc, 5)
#> DATE DATE_TIME SBP DBP MAP PP HR RPP MONTH DAY YEAR
#> 1 2019-08-01 2019-08-01 09:15:54 132 80 97.33333 52 79 10428 8 1 2019
#> 2 2019-07-31 2019-07-31 11:39:59 126 77 93.33333 49 62 7812 7 31 2019
#> 3 2019-07-31 2019-07-31 11:38:07 128 76 93.33333 52 60 7680 7 31 2019
#> 4 2019-07-30 2019-07-30 13:47:46 130 81 97.33333 49 63 8190 7 30 2019
#> 5 2019-07-30 2019-07-30 13:46:15 134 83 100.00000 51 62 8308 7 30 2019
#> DAYOFWK TIME HOUR MEAL_TIME BPDELTA TIME_OF_DAY DAY_OF_WEEK ID
#> 1 Thu 09:15:54 9 Breakfast 52 Morning Thu 1
#> 2 Wed 11:39:59 11 Breakfast 49 Morning Wed 1
#> 3 Wed 11:38:07 11 Breakfast 52 Morning Wed 1
#> 4 Tue 13:47:46 13 Lunch 49 Afternoon Tue 1
#> 5 Tue 13:46:15 13 Lunch 51 Afternoon Tue 1
#> SBP_CATEGORY DBP_CATEGORY
#> 1 Stage 1 Normal
#> 2 Elevated Normal
#> 3 Elevated Normal
#> 4 Elevated Elevated
#> 5 Stage 1 Elevated
After a quick inspection of the original bp_jhs
data set and the newly processed data
data set, it should be evident that there was a lot going on “under the hood” of the process_data
function. As we can see from the column names, the awkward nuisance of typing sys.mmhg.
, dias.mmhg.
, pulse.bpm.
, and datetime
have now been replaced with the more concise SBP
, DBP
, HR
, and DATE_TIME
names, respectively. Additionally, MAP
, PP
, RPP
, SBP_Category
, and DBP_Category
were all calculated as additional columns which previously did not exist in the data. Additionally, if the supplied data has a column corresponding to a “date/time” format, the columns Time_of_Day
and DAY_OF_WEEK
will also be created for ease.
names(bp_jhs)
#> [1] "DateTime" "Month" "Day" "Year" "DayofWk"
#> [6] "Time" "Hour" "Meal_Time" "Sys.mmHg." "Dias.mmHg."
#> [11] "bpDelta" "Pulse.bpm."
names(jhs_proc)
#> [1] "DATE" "DATE_TIME" "SBP" "DBP" "MAP"
#> [6] "PP" "HR" "RPP" "MONTH" "DAY"
#> [11] "YEAR" "DAYOFWK" "TIME" "HOUR" "MEAL_TIME"
#> [16] "BPDELTA" "TIME_OF_DAY" "DAY_OF_WEEK" "ID" "SBP_CATEGORY"
#> [21] "DBP_CATEGORY"
NOTE: For consistency, process_data
will coerce all column names to upper-case.
After the data has been processed, we can now utilize the built-in metrics from the literature to characterize the blood pressure variability. To start, the following metrics are what is currently offered through the bp
package:
Function | Metric Name | Source |
---|---|---|
arv | Average Real Variability | Mena et al (2005) |
bp_mag | Blood Pressure Magnitude | Munter et al (2011) |
bp_range | Blood Pressure Range | Levitan et al (2013) |
cv | Coefficient of Variation | Munter et al (2011) |
sv | Successive Variation | Munter et al (2011) |
dip_calc | Nocturnal Dipping % and Classification | Okhubo et al (1997) |
Time-Dependent Dispersion Metrics
arv
- Average Real Variability
sv
- Successive Variation
Time-Independent Dispersion Metrics
bp_mag
- Blood Pressure Magnitude (peak and trough)
bp_range
- Blood Pressure Range
cv
- Coefficient of Variation
Sleep-dependent Metrics
dip_calc
- Nocturnal Dipping % and Classification
1 - (avg sleep BP / avg daytime BP)
. The severity of the dipping percentage indicates the corresponding classification of that individual (dipper, non-dipper, reverse dipper).Let’s say we are working with the hypnos_data
and would like to compare the time-dependent nature of the arv
with the sv
for each subject.
head(arv(hypnos_proc))
#> # A tibble: 6 x 6
#> # Groups: ID, VISIT [3]
#> ID VISIT WAKE ARV_SBP ARV_DBP N
#> <int> <int> <int> <dbl> <dbl> <int>
#> 1 70417 1 0 7.67 5.5 7
#> 2 70417 1 1 10.2 6 23
#> 3 70417 2 0 17.7 7.57 8
#> 4 70417 2 1 11.2 8.12 17
#> 5 70422 1 0 10.5 4 5
#> 6 70422 1 1 14.9 6.62 17
head(sv(hypnos_proc))
#> # A tibble: 6 x 6
#> # Groups: ID, VISIT [3]
#> ID VISIT WAKE SV_SBP SV_DBP N
#> <int> <int> <int> <dbl> <dbl> <int>
#> 1 70417 1 0 8.49 5.76 7
#> 2 70417 1 1 12.1 7.95 23
#> 3 70417 2 0 18.9 8.22 8
#> 4 70417 2 1 13.0 10.2 17
#> 5 70422 1 0 11.2 5.74 5
#> 6 70422 1 1 19.1 8.80 17
Comparing vertically can be challenging, so with some help from dplyr
we can obtain the following:
head(dplyr::left_join(arv(hypnos_proc), cv(hypnos_proc)))
#> Joining, by = c("ID", "VISIT", "WAKE", "N")
#> # A tibble: 6 x 10
#> # Groups: ID, VISIT [3]
#> ID VISIT WAKE ARV_SBP ARV_DBP N CV_SBP CV_DBP SD_SBP SD_DBP
#> <int> <int> <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 70417 1 0 7.67 5.5 7 4.87 8.65 5.71 4.82
#> 2 70417 1 1 10.2 6 23 6.98 8.74 9.03 5.88
#> 3 70417 2 0 17.7 7.57 8 9.46 12.3 12.9 7.43
#> 4 70417 2 1 11.2 8.12 17 8.44 11.5 11.5 7.56
#> 5 70422 1 0 10.5 4 5 5.21 7.02 7.22 4.09
#> 6 70422 1 1 14.9 6.62 17 9.86 11.5 14.9 7.59
Note that this is possible thanks to the work we did in standardizing column names from the processing step.
Suppose instad we wanted to look at peaks and troughs of the single-subject data set bp_jhs
. We would then call the bp_mag
function on our data.
head(bp_mag(jhs_proc))
#> # A tibble: 1 x 6
#> ID Peak_SBP Peak_DBP Trough_SBP Trough_DBP N
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1 20.3 15.1 19.7 17.9 222
Here, we notice something different. Because there weren’t ID
, VISIT
, or WAKE
columns, the bp_mag
aggregated everything together. This is technically correct, but say we wanted to glean more information by breaking our data down by DATE
instead; we would need to include the inc_date = TRUE
optional argument to the function.
tail(bp_mag(jhs_proc, inc_date = TRUE))
#> # A tibble: 6 x 7
#> # Groups: ID [1]
#> ID DATE Peak_SBP Peak_DBP Trough_SBP Trough_DBP N
#> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1 2019-07-26 9.8 4.80 6.20 5.2 5
#> 2 1 2019-07-28 5.5 3 4.5 3 4
#> 3 1 2019-07-29 7 2.75 9 2.25 4
#> 4 1 2019-07-30 2 1 2 1 2
#> 5 1 2019-07-31 1 0.5 1 0.5 2
#> 6 1 2019-08-01 0 0 0 0 1
Interpretation: While it may not seem obvious at first glance, the blood pressure magnitude (whether a peak or a trough) is calculated as \(peak = max(BP) - mean(BP)\) and \(trough = mean(BP) - min(BP)\) where BP could correspond to either SBP or DBP. If we manually inspect the data from 2019-07-31
we see that N = 2 measurements and within the bp_jhs
data set we see the two measurements are 126 and 128 for SBP and 77, and 76 for DBP. \(\bar{x}_{SBP} = \frac{(126+128)}{2} = 127\) and \(\bar{x}_{DBP} = \frac{(78+76)}{2} = 76.5\) so the respective peak and trough values from our output make sense.
So far, we have processed the original data and ran a couple metrics to get a clearer picture of the variability, now let’s visualize it. Though the processed data can easily be incorporated into other visualization packages or code (such as ggplot which we will demonstrate in the first example below with bp_mag
), the following visuals are currently included with the bp
package:
Function | Visual |
---|---|
bp_hist | Blood Pressure Stage Histograms |
bp_scatter | Blood Pressure Stage Scatter Plot (American Heart Association) |
dow_tod_plots | Day of Week / Time of Day chart |
bp_report | Exportable Blood Pressure Report |
Continuing with our previous example using the bp_jhs
data set, let’s suppose we wanted to explore how to peaks and troughs of systolic blood pressure changed over time. Note that there is a subtle assumption here in that we have multiple measurements for a given day, otherwise a single value will be both the peak and trough; however, the plot still works regardless.
viz_data <- bp_mag(jhs_proc, inc_date = TRUE)
plot(viz_data[which(viz_data$Peak_SBP > 0 & viz_data$N > 1),]$DATE, viz_data[which(viz_data$Peak_SBP > 0 & viz_data$N > 1),]$Peak_SBP, type = 'l', col = "red", xlab ="DATE", ylab = "Magnitude")
lines(viz_data[which(viz_data$Peak_SBP > 0 & viz_data$N > 1),]$DATE, viz_data[which(viz_data$Trough_SBP > 0 & viz_data$N > 1),]$Trough_SBP, col = "darkgreen")
legend("topright", legend = c("Peak", "Trough"), col = c("red","darkgreen"), lty =1)
From the above time series chart, notice that the values are absolute magnitudes for both peak and trough. So, when peak exceeds trough, as was evident during late-May and early-June, the interpretation is that blood pressure rose more on average than it fell. In other words, the variability of the blood pressure data is right-skewed more toward the high end. In contrast, in the very beginning the variability was more left-skewed favoring the low end of the spectrum since there were more troughs than peaks. We can verify this by looking at the very first day of measurements on 2020-04-16
as shown below:
head(viz_data[which(viz_data$Peak_SBP > 0 & viz_data$N > 1),])
#> # A tibble: 6 x 7
#> # Groups: ID [1]
#> ID DATE Peak_SBP Peak_DBP Trough_SBP Trough_DBP N
#> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1 2019-04-16 6.25 8 12.8 6 4
#> 2 1 2019-04-20 5.67 3.33 5.33 3.67 3
#> 3 1 2019-04-21 6 0 6 0 2
#> 4 1 2019-04-22 2 0.5 2 0.5 2
#> 5 1 2019-04-26 6.33 1.67 5.67 2.33 3
#> 6 1 2019-04-27 1.5 3 1.5 3 2
Recall that in the processing stage, there were additional columns that were automatically created. We will now visualize two of these, SBP_Category
and DBP_Category
, through the bp_hist
and bp_scatter
functions, and the Time_of_Day
column through the dow_tod_plots
function.
The bp_hist
returns three histograms of all readings corresponding to total number within each stage, frequency of SBP readings, and frequency of DBP readings. Furthermore, it breaks the data down by color according to which blood pressure stage it falls under:
The above plots show a cautiously high frequency of Elevated and Stage 1 readings for SBP
, but the frequency of DBP
readings seems to fare better in the Normal and Elevated stages.
Let’s now suppose that we wish to break down our readings by Time of Day and Day of Week. For this, we can implement the dow_tod_plots
function. Because this function is mainly used as a helper function for the bp_report
function, we need to add a couple steps using the gridExtra
package
bptable_ex <- dow_tod_plots(jhs_proc)
gridExtra::grid.arrange(bptable_ex[[1]], bptable_ex[[2]], bptable_ex[[3]], bptable_ex[[4]], nrow = 2)
As a final step before returning to our other example, let’s compile everything that we have done so far into a more compact and digestible report that visualizes everything simultaneously. To do so, we will rely on the bp_report
function, which will generate a report in PDF (although other formats such as PNG are available):
Suppose now that we turn our attention back to the hypnos_data
example where we joined the ARV
and CV
metrics together. We would now like to visualize these for all of the subjects to see if we can discern any patterns. From the first scatterplot matrix we see that the between-subject values differ and from the second scatterplot matrix we see that there is a very stark contrast between ARV
and CV
during sleep vs awake.
viz_arv_cv <- dplyr::left_join(arv(hypnos_proc), cv(hypnos_proc))
#> Joining, by = c("ID", "VISIT", "WAKE", "N")
pairs(viz_arv_cv[,4:(ncol(viz_arv_cv)-1)], upper.panel = NULL, col = factor(viz_arv_cv$ID))
As our understanding of cardiovascular disease continues to grow, this package will remain ongoing project. As such, collaboration is highly encouraged. Corrections to existing metrics, extensions or new method proposals and visualizations, and code optimization are all welcome.
In the short term, the following new features are to be incorporated with the next release of the package:
WAKE
), and existing metrics (dip_calc, etc)bp_report
function to include more visuals on another page, and include the ability to break down visuals by individuals (if data includes multiple subjects)Mancia, G., Di Rienzo, M., & Parati, G. (1993). Ambulatory blood pressure monitoring use in hypertension research and clinical practice. Hypertension, 21(4), 510-524.
Levitan, E., Kaciroti, N., Oparil, S. et al. Relationships between metrics of visit-to-visit variability of blood pressure. J Hum Hypertens 27, 589–593 (2013). doi: 10.1038/jhh.2013.19
Muntner, Paula,b; Joyce, Carac; Levitan, Emily B.a; Holt, Elizabethd; Shimbo, Daichie; Webber, Larry S.c; Oparil, Suzanneb; Re, Richardf; Krousel-Wood, Maried,g Reproducibility of visit-to-visit variability of blood pressure measured as part of routine clinical care, Journal of Hypertension: December 2011 - Volume 29 - Issue 12 - p 2332-2338 doi: 10.1097/HJH.0b013e32834cf213
O’Brien E, Sheridan J, O’Malley K . Dippers and non-dippers. Lancet 1988; 2: 397.
Ohkubo T, Imai Y, Tsuji I, Nagai K, Watanabe N, Minami N, Kato J, Kikuchi N, Nishiyama A, Aihara A, Sekino M, Satoh H, Hisamichi S. Relation between nocturnal decline in blood pressure and mortality. The Ohasama Study. Am J Hypertens. 1997 Nov;10(11):1201-7. doi: 10.1016/s0895-7061(97)00274-4. PMID: 9397237.
Mena L, Pintos S, Queipo NV, Aizpúrua JA, Maestre G, Sulbarán T. A reliable index for the prognostic significance of blood pressure variability. J Hypertens. 2005 Mar;23(3):505-11. doi: 10.1097/01.hjh.0000160205.81652.5a. PMID: 15716690.
Holt-Lunstad J, Jones BQ, Birmingham W. The influence of close relationships on nocturnal blood pressure dipping. Int J Psychophysiol. 2009 Mar;71(3):211-7. doi: 10.1016/j.ijpsycho.2008.09.008. Epub 2008 Oct 5. PMID: 18930771.
Schwenck J. Riding for Research: A 5,775-mile Cycling Journey Across North America. Harvard Dataverse https://dataverse.harvard.edu/dataverse/r4r
Webb S. AHA 2019 Heart Disease and Stroke Statistics. American College of Cardiology. https://www.acc.org/latest-in-cardiology/ten-points-to-remember/2019/02/15/14/39/aha-2019-heart-disease-and-stroke-statistics
Bilo G, Grillo A, Guida V, Parati G. Morning blood pressure surge: pathophysiology, clinical relevance and therapeutic aspects. Integr Blood Press Control. 2018;11:47-56. Published 2018 May 24. https://doi.org/10.2147/IBPC.S130277
Irina Gaynanova, Naresh Punjabi, Ciprian Crainiceanu, Modeling continuous glucose monitoring (CGM) data during sleep, Biostatistics, 2020. https://doi.org/10.1093/biostatistics/kxaa023