Warning message with perccalc package

Jorge Cimentada

2018-03-30

While the other vignette shows you how to use perccalc appropriately, there are instances where there’s just too few categories to estimate percentiles properly. Imagine estimating a distribution of 1:100 percentiles with only three ordered categories, it just sounds too far fetched.

Let’s load our packages.

library(perccalc)
library(dplyr)
library(ggplot2)

For example, take the survey data on smoking habits.

smoking_data <-
  MASS::survey %>% # you will need to install the MASS package
  as_tibble() %>%
  select(Sex, Smoke, Pulse) %>%
  rename(
    gender = Sex,
    smoke = Smoke,
    pulse_rate = Pulse
  )

The final results is this dataset:

## # A tibble: 237 x 3
##    gender smoke pulse_rate
##    <fct>  <fct>      <int>
##  1 Male   Never         35
##  2 Female Never         40
##  3 Female Never         48
##  4 Male   Never         48
##  5 Female Never         50
##  6 Female Regul         50
##  7 Male   Regul         54
##  8 Male   Never         55
##  9 Male   Never         56
## 10 Male   Never         59
## # ... with 227 more rows

Note that there’s only four categories in the smoke variable. Let’s try to estimate the percentile difference.

smoking_data <-
  smoking_data %>%
  mutate(smoke = factor(smoke,
                        levels = c("Never", "Occas", "Regul", "Heavy"),
                        ordered = TRUE))

perc_diff(smoking_data, smoke, pulse_rate)
## Warning in perc_diff(smoking_data, smoke, pulse_rate): Too few categories in categorical variable to estimate the
##       variance-covariance matrix and standard errors. Proceeding without
##       estimated standard errors but perhaps you should increase the number
##       of categories
## difference 
##   385.1357

perc_diff returns the estimated coefficient but also warns you that it’s difficult for the function to estimate the standard error. This happens similarly for perc_dist.

perc_dist(smoking_data, smoke, pulse_rate) %>%
  head()
## Warning in perc_dist(smoking_data, smoke, pulse_rate): Too few categories in categorical variable to estimate the
##       variance-covariance matrix and standard errors. Proceeding without
##       estimated standard errors but perhaps you should increase the number
##       of categories
##   percentile  estimate
## 1          1  24.23446
## 2          2  47.82656
## 3          3  70.78474
## 4          4  93.11743
## 5          5 114.83308
## 6          6 135.94011