This vignette provides background information about the Healthy Eating Index and a broad overview of the application of the hei package. An additional vignette walking step-by-step through a typical analysis is provided as well.

Note: this workflow assumes the National Health and Nutrition Examination Survey (NHANES) and a derivative thereof, the Food Patterns Equivalents Database (FPED), as the sources of the data being analyzed. Thus, several of the functions in this package are designed to retrieve these data, which are in the public domain, automatically. If the function, hei(), which calculates HEI scores based on a number of other values, is to be used on data from any other source, it will need to be formatted just as the data retrieved from the FPED would be. As that is not the primary use case for this package, that workflow will not be demonstrated here.


The Healthy Eating Index is a dietary metric that gauges adherence to the Dietary Guidelines for Americans. The USDA uses the index primarily to keep tabs on the diet quality of the US population. By design, the HEI is geared toward use with in-depth dietary surveys such as the National Health and Nutrition Examination Survey (NHANES) conducted biannually by the CDC. The USDA offers guidance to researchers who would be interested in using the HEI in their analyse s of dietary trends and other health factors. Documentation on the National Institute of Health's official website walks through the steps of computing HEI scores from NHANES data with SAS. This package draws inspiration from the associated SAS code. Its intended use is as a tool for R users to perform these kinds of analyses and is best understood as a companion to NHANES. Note: an R package called nhanesA exists, is useful for retrieving NHANES data sets, and is required by this package.

To get started, first we load the hei package and the dplyr package. The latter, in case you are unfamiliar, provides access to a number of useful functions that, while not necessary, greatly simplify some of the data manipulation steps and as such is recommended and will be used throughout these examples. Some of the functions included are filter(), group_by(), and select(), as well as the %>% “pipe.”


NHANES and the get_demo()/get_diet() functions

NHANES is conducted using a complex sampling method, such that each participant, based on certain demographic characteristics, is assigned a weighting variable that, when incorporated appropriately into the analysis allows conclusions to be drawn regarding the overall US population. As part of the extensive survey, a thorough dietary recall is taken, in which participants relate each food and drink item consumed in the previous 24-hour period, along with the amount. A follow-up phone call during which an identical dietary recall is conducted takes place several weeks later. Thus, except in cases where the participant could not be reached for follow-up or their recall was deemed unreliable, there are two days' worth of dietary data associated with each participant. The get_demo() and get_diet() functions each take two arguments, one specifying the survey year (e.g. “2009/2010”) and another specifying the day (either “first”, “second”, or “both”). These functions return data sets with key NHANES demographic and dietary variables, respectively.

diet0910 <- get_diet("2009/2010", day = "both")

demo0910 <- get_demo("2009/2010")

FPED and the get_fped() function

In NHANES itself, each food item that each individual has eaten during the reported period is listed separately. FPED is a database with an entry for each NHANES participant and total amounts of the cup or ounce equivalents of the foods reported broken down into many constituent categories. E.g. if one entry in NHANES contains a slice of cheese pizza, the corresponding entry in FPED would contain, say, 0.25 cup eq., 0.5 cup eq., and 10 oz eq. in the tomato and tomato products, cheese, and refined grains categories, respectively. The amounts in each of these categories is what goes into the HEI score calculation. The hei function, get_fped(), takes a year and a day as its arguments (just like the first two functions) and returns the corresponding FPED data set. If you would like to learn more about the FPED or how it is created, you can do so here:

Note: the FPED data sets available through this package were originally downloaded from the above site and manually exported from SAS as .csv files.

fped0910 <- get_fped("2009/2010", day = "both")

Sealing the deal with the hei() function

The hei() function takes as its arguments three data sets: an FPED data set, a NHANES dietary data set, and an NHANES demographic data set. In other words, hei() simply takes as its arguments the outputs of the first three functions and outputs a single data set containing all the variables necessary to calculate HEI scores, key demographic data as well as an HEI score for each participant.

This data set can then be used in any number of analyses relating to diet and other outcomes. As it retains the unique NHANES identifying variable, it can be joined with any other NHANES data set to gain access to a host of other variables which may be of interest. An example of how to do this is provided in a second vignette.

hei0910 <- hei(fped0910,diet0910,demo0910)
## 1 51624       34 41.08475
## 2 51625        4 55.64421
## 3 51626       16 47.91648
## 4 51627       10 36.25378
## 5 51628       60 31.50794
## 6 51629       26 41.09784