Important Miscellany

2018-11-30

The Importance of this miscellany

The features of strex that were deemed the most interesting have been given their own vignettes. However, the package was intended as a miscellany of useful functions, so the functions demonstrated here encapsulate the spirit of this package, i.e. functions that save R string manipulators time.

library(strex)

Could this be numeric?

Sometimes you don’t want to know whether something is numeric, just whether or not it could be. Now you can find out with str_can_be_numeric().

str_can_be_numeric(c("1a", "abc", "5", "2e7", "seven"))
#> [1] FALSE FALSE  TRUE  TRUE FALSE

Currency

To get currencies and amounts mentioned in strings, there are str_get_currency() and str_get_currencies(). str_get_currency() just returns the first seen currency symbol (really just the character before the first number). str_get_currencies() returns all currencies and amounts mentioned in a string. str_get_currencies() only takes arguments of length 1, so to use it with multiple strings, you’ll need to use purrr::map().

string <- c("Alan paid £5", "Joe paid $7")
str_get_currency(string)
#> [1] "£" "$"
string <- c("€1 is $1.17", "£1 is $1.29")
str_get_currency(string)  # only gets the first mentioned
#> [1] "€" "£"
str_get_currencies(string[1])
#> # A tibble: 2 x 2
#>   currency amount
#>   <chr>     <dbl>
#> 1 €          1   
#> 2 $          1.17
purrr::map(string, str_get_currencies)
#> [[1]]
#> # A tibble: 2 x 2
#>   currency amount
#>   <chr>     <dbl>
#> 1 €          1   
#> 2 $          1.17
#> 
#> [[2]]
#> # A tibble: 2 x 2
#>   currency amount
#>   <chr>     <dbl>
#> 1 £          1   
#> 2 $          1.29

Extract a single element of a string

This is a simple wrapper around stringr::str_sub().

string = "abcdefg"
str_sub(string, 3, 3)
#> [1] "c"
str_elem(string, 3)  # simpler and more exressive
#> [1] "c"

Extract numbers and non-numeric elements

string <- c("aa1bbb2ccc3", "xyz7ayc8jzk99elephant")
str_extract_numbers(string)
#> [[1]]
#> [1] 1 2 3
#> 
#> [[2]]
#> [1]  7  8 99
str_extract_non_numerics(string)
#> [[1]]
#> [1] "aa"  "bbb" "ccc"
#> 
#> [[2]]
#> [1] "xyz"      "ayc"      "jzk"      "elephant"

Split a string by its numbers

string <- c("aa1bbb2ccc3", "xyz7ayc8jzk99elephant")
str_split_by_nums(string)
#> [[1]]
#> [1] "aa"  "1"   "bbb" "2"   "ccc" "3"  
#> 
#> [[2]]
#> [1] "xyz"      "7"        "ayc"      "8"        "jzk"      "99"      
#> [7] "elephant"

Force a file name to have an extension

We can give files a given extension, leaving them alone if they already have it.

string <- c("spreadsheet1.csv", "spreadsheet2")
str_give_ext(string, "csv")
#> [1] "spreadsheet1.csv" "spreadsheet2.csv"

If the file already has an extension, we can append one or replace it.

str_give_ext(string, "xls")  # append
#> [1] "spreadsheet1.csv.xls" "spreadsheet2.xls"
str_give_ext(string, "csv", replace = TRUE)  # replace
#> [1] "spreadsheet1.csv" "spreadsheet2.csv"

Strip away a file extension

string <- c("spreadsheet1.csv", "spreadsheet2")
str_before_last_dot(string)
#> [1] "spreadsheet1" "spreadsheet2"

Remove quoted bits from a string

string <- "I hate having these \"quotes\" in the middle of my strings."
cat(string)
#> I hate having these "quotes" in the middle of my strings.
str_remove_quoted(string)
#> [1] "I hate having these  in the middle of my strings."

Split camel case

I’m not mad on CamelCase, I often want to deconstruct it.

library(magrittr)
string <- c("CamelVar1", c("CamelVar2"))
str_split_camel_case(string)
#> [[1]]
#> [1] "Camel" "Var1" 
#> 
#> [[2]]
#> [1] "Camel" "Var2"
string %>% 
  str_split_camel_case() %>% 
  purrr::map_chr(str_c, collapse = "_") %>% 
  str_to_lower()
#> [1] "camel_var1" "camel_var2"

Convert a string to a vector

This is something I did a lot to avoid using regular expression. Don’t do it for that purpose. Learn regex. https://regexone.com/ is a very good start.

string <- "R is good."
str_to_vec(string)
#>  [1] "R" " " "i" "s" " " "g" "o" "o" "d" "."

Trim anything, not just whitespace

What if something is needlessly surrounded by parentheses and we want to get rid of them?

string <- "(((Why all the parentheses?)))"
string %>% 
  str_trim_anything("(", side = "left") %>% 
  str_trim_anything(")", side = "r")
#> [1] "Why all the parentheses?"

Note that the pattern argument here isn’t regular expression, just plain.

Remove duplicated bits of strings

string <- c("I often write the word *my* twice in a row in my my sentences.")
str_singleize(string, " my")
#> [1] "I often write the word *my* twice in a row in my sentences."