5  Tidyverse Introduction

Author
Affiliation

Ryan McShane, Ph.D.

The University of Chicago

Published

Oct. 21st, 2024

Modified

Oct. 25th, 2024

5.0.1 Overview

  1. Ch 1: Data Visualization
  2. Ch 3: Data Transformation
  3. Ch 4: Workflow: Code Style
  4. Ch 5: Data Tidying
  5. Ch 7: Data Import

5.0.2 An aside on pipes: |> and %>%

Work commenced on the magrittr pipe, %>% on January 1st, 2014 and quickly became important to the tidyverse not long after that. It was inspired by F#’s |> pipe operator. It allows us to write, e.g.,

head(select(cars, speed), 3)
##   speed
## 1     4
## 2     4
## 3     7
cars %>% select(speed) %>% head(3)
##   speed
## 1     4
## 2     4
## 3     7

Or more generally, replace f(x) with x %>% f().

However, the magrittr pipe operator requires the magrittr library to be loaded, and often leads to issues with reproducibility. Fortunately, the native pipe operator, |>, was introduced in R 4.1.0 (released on May 21st, 2021). Now, we can do this, and it is broadly recommended.

head(select(cars, speed), 3)
##   speed
## 1     4
## 2     4
## 3     7
cars |> select(speed) |> head(3)
##   speed
## 1     4
## 2     4
## 3     7

Or more generally, replace f(x) with x |> f().

5.0.3 Some nuances to |> and %>%

  • By default, the pipe operator will pass the object to the first argument. E.g., f(x, y) and x |> f(y) are equivalent. Here, base and magrittr pipe operators are equivalent.

  • In many functions, the x, data, and .data arguments are first, as these are the objects we are most concerned with modifying. tidyverse functions are programmed this way by rule.

  • What if you want to pass an object to another argument? E.g., can you do y |> f(x)? Not quite.

  • In both magrittr and base, we can essentially do y |> f(x, .), but this is where the syntax diverges. If you use named functions with named arguments, then you will avoid most of the painful differences.

5.0.4 |> and %>%: passing object to argument after the first arg.

In magrittr %>%, we can pass an object to many arguments; the syntax and approach is the same:

cars %>% 
  select(speed) %>% 
  `%%`(., 7) %>%
  `==`(., 0) %>%
  if_else(condition = ., 
         true = paste0(., ", div. by 7\n"), 
         false = paste0(., ", not div. by 7\n")
         ) %>%
  head(3) %>% 
  cat(sep = "")
## FALSE, not div. by 7
## FALSE, not div. by 7
## TRUE, div. by 7

In base |>, we can only pass an object to one argument. Either the argument is named, we use an anonymous functions, or we use lambda notation.

# we want to use data, x, in multiple places, so need to write a function
if_div7 = function(val = 7, x) {
  if_else(condition = x,
    true = paste0(x, ", div. by ", val, "\n"),
    false = paste0(x, ", not div. by ", val, "\n"))
}

cars |>
  select(speed) |>     
  (\(.) . %% 7)() |>            # lambda notation
  (function(x = _) x == 0)() |> # anonymous function
  if_div7(val = 7, x = _) |>    # named function
  head(3) |>
  cat(sep = "")
## FALSE, not div. by 7
## FALSE, not div. by 7
## TRUE, div. by 7

5.1 Ch 1: Data Visualization

5.1.1 The Grammar of Graphics

  • This was a book written by Leland Wilkinson (of SPSS) published in 1999, with a robust second edition in 2005. Wilkinson who co-authored the Java production graphics library in the process of writing the book.
  • In 2005/2006, Hadley Wickham introduced the ggplot package and the paper, An introduction to ggplot: An implementation of the grammar of graphics in R.
  • By 2008, Hadley Wickham had written the reshape and ggplot2 packages as part of his dissertation at Iowa State.
  • Since then, ggplot2 has become a de facto standard among statisticians seeking to produce high-quality and reproducible data visualizations (although not all are converts).
  • ggplot2’s syntax and output has been imitated across several languages; most notably in the Python package, plotnine.

5.1.2 Anatomy of a ggplot call

This creates a blank plot:

ggplot()

Specifying data doesn’t update the plot yet, but it does tell ggplot where the data is coming from:

ggplot(data = cars)

aes specifies the aesthetics that will be plotted. Generally produces default scales and grid lines:

ggplot(data = cars, 
       mapping = aes(x = speed, y = dist))

The geom is the kind of plot layer we would like to add:

ggplot(data = cars, 
       mapping = aes(x = speed, y = dist)) + 
  geom_point()

Whichever layer was listed last is what shows up on top…

ggplot(data = cars, 
       mapping = aes(x = speed, y = dist)) + 
  geom_point() + 
  geom_density_2d()

This layer adds semi-transparent neon green highlighting over the original points:

ggplot(data = cars, 
       mapping = aes(x = speed, y = dist)) + 
  geom_point() + 
  geom_density_2d() +
  geom_point(size = 8, 
             color = "green", 
             alpha = 0.5)

The geom_point and geom_density_2d geoms expected an x and y aesthetic!!!

We can edit the text in the labs function

ggplot(data = cars, 
       mapping = aes(x = speed, y = dist)) + 
  geom_point() + 
  geom_density_2d() +
  geom_point(size = 8, 
             color = "green", 
             alpha = 0.5) + 
  labs(x = "Speed", 
       y = "Dist", 
       title = "Title here", 
       subtitle = "Subtitle here", 
       caption = "Caption here", 
       tag = 1) # Useful for numbering plots

Adding a title, subtitle, caption, and tag all take up real estate which squishes the plot. (We can modify sizes!)

We can also modify the theme (there are a few preset themes; I use theme_bw):

ggplot(data = cars, 
       mapping = aes(x = speed, y = dist)) + 
  geom_point(size = 4) + # point size
  geom_line(lwd = 1.25) + # LineWiDth 
  geom_density_2d() +
  geom_point(size = 8, # point size
             color = "green", # point color
             alpha = 0.5) + # transparency
  theme_dark(base_size = 30) # default font size

Please note that this plot is terrible for multiple reasons!

5.1.3 Aesthetic Mapping

Common aes parameters include:

  • x and y (Cartesian dimensions)
  • fill and color (change area and line coloring, respectively)
  • size, lwd, etc (changes some size aspect)
  • shape (changes, e.g., point shape)
  • And many more!

However, we could hypothetically supply virtually anything to the aes function:

aes(nonsense = abc, dillinger = farewell, abc = 3)
## Aesthetic mapping: 
## * `nonsense`  -> `abc`
## * `dillinger` -> `farewell`
## * `abc`       -> 3

So the aes parameter is meaningless unless it matches what the geom is expecting!

5.1.4 Geom Library: One Numerical Dimension

ggplot(data = cars, 
       mapping = aes(x = speed)) + 
  geom_boxplot()

ggplot(data = cars, 
       mapping = aes(x = speed)) + 
  geom_histogram(binwidth = 2.5)

ggplot(data = cars, 
       mapping = aes(x = speed)) + 
  geom_density(linewidth = 2)

ggplot(data = cars, 
       mapping = aes(x = speed)) + 
  geom_dotplot()

ggplot(data = cars, 
       mapping = aes(x = speed)) + 
  geom_freqpoly(binwidth = 2.5, 
                linewidth = 2)

ggplot(data = cars, 
       mapping = aes(sample = speed)) + 
  # Normal qq plot by default
  geom_qq() 

5.1.5 Geom Library: Categorical Variables

One categorical variable

ggplot(data = diamonds, 
       mapping = aes(x = color)) + 
  geom_bar()

Two categorical variables

ggplot(data = diamonds, 
       mapping = aes(x = color, y = cut)) + 
  geom_count()

One continuous variable, one categorical variable

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = cut)) + 
  geom_boxplot()

One continuous variable, one categorical variable

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = cut)) + 
  geom_violin()

5.1.6 Geom Library: Two Numerical Dimensions

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = depth)) + 
  geom_point(alpha = 0.1, size = 1) +
  ylim(c(59, 65))

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = depth)) + 
  geom_smooth(se = FALSE, lwd = 2) +
  ylim(c(59, 65))

(Default quantiles in quantile regression are c(0.25, 0.5, 0.75))

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = depth)) + 
  geom_quantile(lwd = 2) +
  ylim(c(59, 65))

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = depth)) + 
  geom_density_2d() +
  ylim(c(59, 65)) + 
  xlim(c(0.1, 1.7))

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = depth)) + 
  geom_density_2d_filled() +
  ylim(c(59, 65)) + 
  xlim(c(0.1, 1.7))

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = depth)) + 
  geom_bin2d() +
  ylim(c(59, 65)) + 
  xlim(c(0.1, 1.7))

ggplot(data = diamonds, 
       mapping = aes(x = carat, y = depth)) + 
# requires the hexbin package
  geom_hex() +
  ylim(c(59, 65)) + 
  xlim(c(0.1, 1.7))

5.1.7 Geom Library: Three Numerical Dimensions

ggplot(data = faithfuld, 
       mapping = aes(x = waiting, 
                     y = eruptions, 
                     z = density)) + 
  geom_contour(linewidth = 2)

ggplot(data = faithfuld, 
       mapping = aes(x = waiting, 
                     y = eruptions, 
                     z = density)) + 
  geom_contour_filled()

ggplot(data = faithfuld, 
       mapping = aes(x = waiting, 
                     y = eruptions, 
                     fill = density)) + 
  geom_raster()

ggplot(data = faithfuld, 
       mapping = aes(x = waiting, 
                     y = eruptions, 
                     fill = density)) + 
  geom_tile()

5.1.8 Geom Library: Functions (not Data Viz!!)

ggplot() + 
  stat_function(
    fun = dnorm, 
    lwd = 3, 
    color = "blue"
  ) +
  xlim(c(-3, 3))

ggplot() + 
  stat_function(
    fun = dnorm, 
    lwd = 3, 
    color = "navy",
    args = list(mean = 1, sd = 0.5)
  ) +
  xlim(c(-2, 2))

ggplot() + 
  geom_function(
    fun = function(x) x^2 - 3, 
    lwd = 3, 
    color = "orange"
  ) +
  xlim(c(-2, 2))

ggplot() + 
  geom_function(
    fun = function(x) x^2 + 2, 
    lwd = 3, 
    color = "green"
  ) +
  geom_function(
    fun = function(x) 3*x^3, 
    lwd = 3, 
    color = "blue"
  ) +
  stat_function(
    fun = dnorm, 
    lwd = 2, 
    color = "red",
    args = list(mean = 0.5, sd = 0.1)
  ) +
  xlim(c(0, 1)) + 
  ylim(c(0, 4))

5.1.9 fill Aesthetic

fill, color, group (and more) will create a legend automatically!

ggplot(data = diamonds, 
       mapping = aes(x = color, fill = cut)) +
  geom_bar()

5.1.10 Color Scale

Here, we are using the fill aesthetic, so we need a fill scale.

ggplot(data = diamonds, 
       mapping = aes(x = color, fill = cut)) +
  geom_bar() +
  scale_fill_brewer(palette = "Greens")

5.1.11 Faceting

Faceting requires a categorical variable. It creates miniature subplots for each level of the categorical variable. Whatever we plot beforehand gets broken into subsets in the facet. Importantly, the scales are identical for each facet.

ggplot(data = diamonds, 
       mapping = aes(x = color)) +
  geom_bar() +
  facet_wrap(facets = ~ cut)

ggplot(data = diamonds, 
       mapping = aes(x = color, 
                     fill = clarity)) +
  geom_bar() +
  facet_wrap(facets = ~ cut)

ggplot(data = diamonds, 
       mapping = aes(x = carat, 
                     y = depth)) + 
  geom_point(alpha = 0.1, size = 1) +
  ylim(c(59, 65)) +
  facet_wrap(facets = ~ cut)

ggplot(data = diamonds, 
       mapping = aes(x = carat, 
                     y = depth, 
                     color = cut)) + 
  geom_point(alpha = 0.3, size = 2)

5.1.12 Saving your ggplot

We often want to use our plot in other places, or save future computation time, etc.

# A creative plot name
the_plot_i_want_to_save = 
  ggplot(diamonds, aes(x = color, fill = cut)) + 
  geom_bar() +
  scale_fill_brewer(palette = "Greens")
ggsave(plot = the_plot_i_want_to_save, 
# Using a relative directory --
# this will save in a folder called "images".
# The folder is next to my .QMD.
       filename = "images/green_barchart.png", 
       units = "in", 
       width = 10,  # 10 inches
       height = 9,  # 9 inches
       dpi = 300)   # dots per inch
knitr::include_graphics(
# Recalling the file I just created.
  path = "images/green_barchart.png")

5.1.13 The ggplot2 cheatsheet

We have much more to cover with ggplot2! Here’s the “official” cheatsheet:

Download PDF file.

5.2 Ch 3: Data Transformation

5.2.1 dplyr

  • plyr was first published in ~2008, but not mentioned in Wickham’s dissertation.
  • By 2014, plyr was supplanted by dplyr, and plyr was eventually retired. You can still find plyr on CRAN, but it is no longer recommended (dplyr fixed many speed issues).
  • dplyr introduced a new .by feature in January 2023 which simplified the syntax (group_by -> do something -> ungroup).
  • dplyr is the package used to wrangle data (in data frames) and is centered around the key verbs filter, select, arrange, mutate, and summarize.

Star Wars dataset

library(dplyr)
glimpse(starwars)
## Rows: 87
## Columns: 14
## $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or…
## $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
## $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N…
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "…
## $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",…
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
## $ sex        <chr> "male", "none", "none", "male", "female", "male", "female",…
## $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "femini…
## $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T…
## $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma…
## $ films      <list> <"A New Hope", "The Empire Strikes Back", "Return of the J…
## $ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp…
## $ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",…

list columns??? 🤮🤮🤮

5.2.2 dplyr Syntax

  • First argument is always a data frame.
  • Subsequent arguments typically describe which columns to operate on, using the variable names (without quotes). This is called NSE – non-standard evaluation.
  • The output is (almost) always a data frame (and a tibble if you started with a tibble).
  • dplyr imports magrittr’s pipe operator (%>%), but it is now suggested you used the base R pipe operator (|>) unless the base R pipe operator doesn’t solve your problem. The problems we’ll face initially should all be resolvable with |>.

5.2.3 Rows

filter

# getting known non-binary characters 
# from the first 7 Star Wars movies
# (droids -> "none")
starwars |>
  filter(!(
      sex %in% c("male", "female", "none") 
        | is.na(sex)
    ))
## # A tibble: 1 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Jabba De…    175  1358 <NA>       green-tan… orange           600 herm… mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

Organizing Rows

  • sorts rows by one or more variables
starwars |> 
  filter(sex == "none") |>
  arrange(height, mass)
## # A tibble: 6 × 14
##   name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
##   <chr>   <int> <dbl> <chr>      <chr>       <chr>          <dbl> <chr> <chr>   
## 1 R2-D2      96    32 <NA>       white, blue red               33 none  masculi…
## 2 R4-P17     96    NA none       silver, red red, blue         NA none  feminine
## 3 R5-D4      97    32 <NA>       white, red  red               NA none  masculi…
## 4 C-3PO     167    75 <NA>       gold        yellow           112 none  masculi…
## 5 IG-88     200   140 none       metal       red               15 none  masculi…
## 6 BB8        NA    NA none       none        black             NA none  masculi…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
  • gets all unique rows or column combinations
starwars |> 
  distinct(sex, gender) |> 
  arrange(gender, sex)
## # A tibble: 6 × 2
##   sex            gender   
##   <chr>          <chr>    
## 1 female         feminine 
## 2 none           feminine 
## 3 hermaphroditic masculine
## 4 male           masculine
## 5 none           masculine
## 6 <NA>           <NA>
  • operates like distinct but also provides a count in n
starwars |> 
  count(sex, gender) |> 
  arrange(gender, sex)
## # A tibble: 6 × 3
##   sex            gender        n
##   <chr>          <chr>     <int>
## 1 female         feminine     16
## 2 none           feminine      1
## 3 hermaphroditic masculine     1
## 4 male           masculine    60
## 5 none           masculine     5
## 6 <NA>           <NA>          4

5.2.4 Columns: select

select

starwars |> 
  select(name, height, species)
## # A tibble: 87 × 3
##    name               height species
##    <chr>               <int> <chr>  
##  1 Luke Skywalker        172 Human  
##  2 C-3PO                 167 Droid  
##  3 R2-D2                  96 Droid  
##  4 Darth Vader           202 Human  
##  5 Leia Organa           150 Human  
##  6 Owen Lars             178 Human  
##  7 Beru Whitesun Lars    165 Human  
##  8 R5-D4                  97 Droid  
##  9 Biggs Darklighter     183 Human  
## 10 Obi-Wan Kenobi        182 Human  
## # ℹ 77 more rows
starwars |> 
  select(name:hair_color)
## # A tibble: 87 × 4
##    name               height  mass hair_color   
##    <chr>               <int> <dbl> <chr>        
##  1 Luke Skywalker        172    77 blond        
##  2 C-3PO                 167    75 <NA>         
##  3 R2-D2                  96    32 <NA>         
##  4 Darth Vader           202   136 none         
##  5 Leia Organa           150    49 brown        
##  6 Owen Lars             178   120 brown, grey  
##  7 Beru Whitesun Lars    165    75 brown        
##  8 R5-D4                  97    32 <NA>         
##  9 Biggs Darklighter     183    84 black        
## 10 Obi-Wan Kenobi        182    77 auburn, white
## # ℹ 77 more rows
starwars |> 
  select(!name:hair_color)
## # A tibble: 87 × 10
##    skin_color eye_color birth_year sex   gender homeworld species films vehicles
##    <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>  
##  1 fair       blue            19   male  mascu… Tatooine  Human   <chr> <chr>   
##  2 gold       yellow         112   none  mascu… Tatooine  Droid   <chr> <chr>   
##  3 white, bl… red             33   none  mascu… Naboo     Droid   <chr> <chr>   
##  4 white      yellow          41.9 male  mascu… Tatooine  Human   <chr> <chr>   
##  5 light      brown           19   fema… femin… Alderaan  Human   <chr> <chr>   
##  6 light      blue            52   male  mascu… Tatooine  Human   <chr> <chr>   
##  7 light      blue            47   fema… femin… Tatooine  Human   <chr> <chr>   
##  8 white, red red             NA   none  mascu… Tatooine  Droid   <chr> <chr>   
##  9 light      brown           24   male  mascu… Tatooine  Human   <chr> <chr>   
## 10 fair       blue-gray       57   male  mascu… Stewjon   Human   <chr> <chr>   
## # ℹ 77 more rows
## # ℹ 1 more variable: starships <list>
starwars |> 
  select(where(is.numeric))
## # A tibble: 87 × 3
##    height  mass birth_year
##     <int> <dbl>      <dbl>
##  1    172    77       19  
##  2    167    75      112  
##  3     96    32       33  
##  4    202   136       41.9
##  5    150    49       19  
##  6    178   120       52  
##  7    165    75       47  
##  8     97    32       NA  
##  9    183    84       24  
## 10    182    77       57  
## # ℹ 77 more rows

select helpers

starwars |> 
  select(starts_with("s") & !where(is.list))
## # A tibble: 87 × 3
##    skin_color  sex    species
##    <chr>       <chr>  <chr>  
##  1 fair        male   Human  
##  2 gold        none   Droid  
##  3 white, blue none   Droid  
##  4 white       male   Human  
##  5 light       female Human  
##  6 light       male   Human  
##  7 light       female Human  
##  8 white, red  none   Droid  
##  9 light       male   Human  
## 10 fair        male   Human  
## # ℹ 77 more rows
starwars |> 
  select(ends_with("color"))
## # A tibble: 87 × 3
##    hair_color    skin_color  eye_color
##    <chr>         <chr>       <chr>    
##  1 blond         fair        blue     
##  2 <NA>          gold        yellow   
##  3 <NA>          white, blue red      
##  4 none          white       yellow   
##  5 brown         light       brown    
##  6 brown, grey   light       blue     
##  7 brown         light       blue     
##  8 <NA>          white, red  red      
##  9 black         light       brown    
## 10 auburn, white fair        blue-gray
## # ℹ 77 more rows
starwars |> 
  select(contains("t"))
## # A tibble: 87 × 3
##    height birth_year starships
##     <int>      <dbl> <list>   
##  1    172       19   <chr [2]>
##  2    167      112   <chr [0]>
##  3     96       33   <chr [0]>
##  4    202       41.9 <chr [1]>
##  5    150       19   <chr [0]>
##  6    178       52   <chr [0]>
##  7    165       47   <chr [0]>
##  8     97       NA   <chr [0]>
##  9    183       24   <chr [1]>
## 10    182       57   <chr [5]>
## # ℹ 77 more rows
starwars |> 
  select(num_range("x", 1:3))
## # A tibble: 87 × 0

(No column names with numbers!)

5.2.5 Columns: mutate and defaults

Creates a new column at the end of the data set.

starwars |> 
  mutate(char_describe = paste0(name, " is a ", gender, " ", tolower(species), ".")) |> 
  select(char_describe) |>
  print(n = 5)
## # A tibble: 87 × 1
##   char_describe                       
##   <chr>                               
## 1 Luke Skywalker is a masculine human.
## 2 C-3PO is a masculine droid.         
## 3 R2-D2 is a masculine droid.         
## 4 Darth Vader is a masculine human.   
## 5 Leia Organa is a feminine human.    
## # ℹ 82 more rows
  • "all" -> (default) keeps everything
  • "none" keeps only the columns grouped by or generated in mutate
  • "used" -> new column(s) and the ones used to generate them
  • "unused" -> new column(s). Dumps every column used to generate them
starwars |> 
  mutate(char_describe = paste0(name, " is a ", gender, " ", tolower(species), "."), 
         .keep = "all") |> 
  colnames()
##  [1] "name"          "height"        "mass"          "hair_color"   
##  [5] "skin_color"    "eye_color"     "birth_year"    "sex"          
##  [9] "gender"        "homeworld"     "species"       "films"        
## [13] "vehicles"      "starships"     "char_describe"
starwars |> 
  mutate(char_describe = paste0(name, " is a ", gender, " ", tolower(species), "."), 
         .keep = "none") |> 
  colnames()
## [1] "char_describe"
starwars |> 
  mutate(char_describe = paste0(name, " is a ", gender, " ", tolower(species), "."), 
         .keep = "used") |> 
  colnames()
## [1] "name"          "gender"        "species"       "char_describe"
starwars |> 
  mutate(char_describe = paste0(name, " is a ", gender, " ", tolower(species), "."), 
         .keep = "unused") |> 
  colnames()
##  [1] "height"        "mass"          "hair_color"    "skin_color"   
##  [5] "eye_color"     "birth_year"    "sex"           "homeworld"    
##  [9] "films"         "vehicles"      "starships"     "char_describe"

Moves the new column to before a named column.

starwars |> 
  mutate(char_describe = paste0(name, " is a ", gender, " ", tolower(species), "."), 
         .before = name)
## # A tibble: 87 × 15
##    char_describe   name  height  mass hair_color skin_color eye_color birth_year
##    <chr>           <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl>
##  1 Luke Skywalker… Luke…    172    77 blond      fair       blue            19  
##  2 C-3PO is a mas… C-3PO    167    75 <NA>       gold       yellow         112  
##  3 R2-D2 is a mas… R2-D2     96    32 <NA>       white, bl… red             33  
##  4 Darth Vader is… Dart…    202   136 none       white      yellow          41.9
##  5 Leia Organa is… Leia…    150    49 brown      light      brown           19  
##  6 Owen Lars is a… Owen…    178   120 brown, gr… light      blue            52  
##  7 Beru Whitesun … Beru…    165    75 brown      light      blue            47  
##  8 R5-D4 is a mas… R5-D4     97    32 <NA>       white, red red             NA  
##  9 Biggs Darkligh… Bigg…    183    84 black      light      brown           24  
## 10 Obi-Wan Kenobi… Obi-…    182    77 auburn, w… fair       blue-gray       57  
## # ℹ 77 more rows
## # ℹ 7 more variables: sex <chr>, gender <chr>, homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>

Moves the new column to before an indexed column.

starwars |> 
  mutate(char_describe = paste0(name, " is a ", gender, " ", tolower(species), "."), 
         .before = 2)
## # A tibble: 87 × 15
##    name    char_describe height  mass hair_color skin_color eye_color birth_year
##    <chr>   <chr>          <int> <dbl> <chr>      <chr>      <chr>          <dbl>
##  1 Luke S… Luke Skywalk…    172    77 blond      fair       blue            19  
##  2 C-3PO   C-3PO is a m…    167    75 <NA>       gold       yellow         112  
##  3 R2-D2   R2-D2 is a m…     96    32 <NA>       white, bl… red             33  
##  4 Darth … Darth Vader …    202   136 none       white      yellow          41.9
##  5 Leia O… Leia Organa …    150    49 brown      light      brown           19  
##  6 Owen L… Owen Lars is…    178   120 brown, gr… light      blue            52  
##  7 Beru W… Beru Whitesu…    165    75 brown      light      blue            47  
##  8 R5-D4   R5-D4 is a m…     97    32 <NA>       white, red red             NA  
##  9 Biggs … Biggs Darkli…    183    84 black      light      brown           24  
## 10 Obi-Wa… Obi-Wan Keno…    182    77 auburn, w… fair       blue-gray       57  
## # ℹ 77 more rows
## # ℹ 7 more variables: sex <chr>, gender <chr>, homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>

Moves the new column to after a named or indexed column (like .before).

starwars |> 
  mutate(char_describe = paste0(name, " is a ", gender, " ", tolower(species), "."), 
         .after = 1)
## # A tibble: 87 × 15
##    name    char_describe height  mass hair_color skin_color eye_color birth_year
##    <chr>   <chr>          <int> <dbl> <chr>      <chr>      <chr>          <dbl>
##  1 Luke S… Luke Skywalk…    172    77 blond      fair       blue            19  
##  2 C-3PO   C-3PO is a m…    167    75 <NA>       gold       yellow         112  
##  3 R2-D2   R2-D2 is a m…     96    32 <NA>       white, bl… red             33  
##  4 Darth … Darth Vader …    202   136 none       white      yellow          41.9
##  5 Leia O… Leia Organa …    150    49 brown      light      brown           19  
##  6 Owen L… Owen Lars is…    178   120 brown, gr… light      blue            52  
##  7 Beru W… Beru Whitesu…    165    75 brown      light      blue            47  
##  8 R5-D4   R5-D4 is a m…     97    32 <NA>       white, red red             NA  
##  9 Biggs … Biggs Darkli…    183    84 black      light      brown           24  
## 10 Obi-Wa… Obi-Wan Keno…    182    77 auburn, w… fair       blue-gray       57  
## # ℹ 77 more rows
## # ℹ 7 more variables: sex <chr>, gender <chr>, homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>

5.2.6 Columns: mutate – some common approaches

starwars |> 
# BMI definition: 
# https://en.wikipedia.org/wiki/Body_mass_index
  mutate(BMI = mass / (height / 100)^2 |> round(1), 
         .before = 2) |> 
  select(name:mass) |> 
  print(n = 5)
## # A tibble: 87 × 4
##   name             BMI height  mass
##   <chr>          <dbl>  <int> <dbl>
## 1 Luke Skywalker  25.7    172    77
## 2 C-3PO           26.8    167    75
## 3 R2-D2           35.6     96    32
## 4 Darth Vader     33.2    202   136
## 5 Leia Organa     22.3    150    49
## # ℹ 82 more rows

dplyr::if_else is a vectorized function (as most, if not all, dplyr functions are).

starwars_if_else_output = starwars |> 
  mutate(
    BMI = mass / (height / 100)^2, 
    BMI_category = 
      if_else(
        condition = BMI < 30, 
        true = " Not Overweight",
        false = "Overweight",
        missing = NA
      ), 
    .before = 2
    ) |>
  select(name:mass)
starwars_if_else_output
## # A tibble: 87 × 5
##    name                 BMI BMI_category      height  mass
##    <chr>              <dbl> <chr>              <int> <dbl>
##  1 Luke Skywalker      26.0 " Not Overweight"    172    77
##  2 C-3PO               26.9 " Not Overweight"    167    75
##  3 R2-D2               34.7 "Overweight"          96    32
##  4 Darth Vader         33.3 "Overweight"         202   136
##  5 Leia Organa         21.8 " Not Overweight"    150    49
##  6 Owen Lars           37.9 "Overweight"         178   120
##  7 Beru Whitesun Lars  27.5 " Not Overweight"    165    75
##  8 R5-D4               34.0 "Overweight"          97    32
##  9 Biggs Darklighter   25.1 " Not Overweight"    183    84
## 10 Obi-Wan Kenobi      23.2 " Not Overweight"    182    77
## # ℹ 77 more rows
# (Good example of code styling with new lines!)
starwars_case_when_output = starwars |> 
  mutate(
    BMI = mass / (height / 100)^2, 
    BMI_category = 
      case_when(
        is.na(BMI) ~ NA, 
        BMI < 18.5 ~ "Underweight", 
        BMI < 25   ~ "Normal", 
        BMI < 30   ~ "Overweight", 
        BMI >= 30  ~ "Obese", 
        .default = "error"
        ),
    .before = 2
    ) |>
  select(name:mass)
starwars_case_when_output
## # A tibble: 87 × 5
##    name                 BMI BMI_category height  mass
##    <chr>              <dbl> <chr>         <int> <dbl>
##  1 Luke Skywalker      26.0 Overweight      172    77
##  2 C-3PO               26.9 Overweight      167    75
##  3 R2-D2               34.7 Obese            96    32
##  4 Darth Vader         33.3 Obese           202   136
##  5 Leia Organa         21.8 Normal          150    49
##  6 Owen Lars           37.9 Obese           178   120
##  7 Beru Whitesun Lars  27.5 Overweight      165    75
##  8 R5-D4               34.0 Obese            97    32
##  9 Biggs Darklighter   25.1 Overweight      183    84
## 10 Obi-Wan Kenobi      23.2 Normal          182    77
## # ℹ 77 more rows

Since base::ifelse is not vectorized, we need to work on an individual row basis with the rowwise function.

starwars_ifelse_output = starwars |> 
  rowwise() |> 
  mutate(hair_colors = 
    ifelse(test = 
             is.na(hair_color) | 
               hair_color == "none", 
           yes = 0, 
           no = I(hair_color) |> 
             readr::read_csv(
               show_col_types = FALSE, 
               col_types = "c") |>
             ncol()), 
    .after = name
  )
starwars_ifelse_output
## # A tibble: 87 × 15
## # Rowwise: 
##    name      hair_colors height  mass hair_color skin_color eye_color birth_year
##    <chr>           <dbl>  <int> <dbl> <chr>      <chr>      <chr>          <dbl>
##  1 Luke Sky…           1    172    77 blond      fair       blue            19  
##  2 C-3PO               0    167    75 <NA>       gold       yellow         112  
##  3 R2-D2               0     96    32 <NA>       white, bl… red             33  
##  4 Darth Va…           0    202   136 none       white      yellow          41.9
##  5 Leia Org…           1    150    49 brown      light      brown           19  
##  6 Owen Lars           2    178   120 brown, gr… light      blue            52  
##  7 Beru Whi…           1    165    75 brown      light      blue            47  
##  8 R5-D4               0     97    32 <NA>       white, red red             NA  
##  9 Biggs Da…           1    183    84 black      light      brown           24  
## 10 Obi-Wan …           2    182    77 auburn, w… fair       blue-gray       57  
## # ℹ 77 more rows
## # ℹ 7 more variables: sex <chr>, gender <chr>, homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>
# Define a function first
add_noise = function(input, n = 1) {
  output = input + rnorm(n = n, sd = 0.1)
return(output)
}
# Need to use rowwise()
starwars |> 
  rowwise() |> 
  mutate(noisy_height = height |> 
           add_noise()) |> 
  select(name, height, noisy_height) |> 
  head(3) |> 
  knitr::kable() |> 
  kableExtra::kable_styling(font_size = 30)
name height noisy_height
Luke Skywalker 172 172.11554
C-3PO 167 166.96995
R2-D2 96 96.01193
# Vectorized function
Add_noise = Vectorize(
    FUN = add_noise,
    vectorize.args = "input"
  )
# Don't need to use rowwise()
starwars |> 
  mutate(noisy_height = height |> 
           Add_noise()) |> 
  select(name, height, noisy_height) |> 
  head(3) |> 
  knitr::kable() |> 
  kableExtra::kable_styling(font_size = 30)
name height noisy_height
Luke Skywalker 172 172.00506
C-3PO 167 167.09383
R2-D2 96 95.83336

5.2.7 Columns: rename and relocate

rename

Hadley called, and he didn’t like that I used capitalized letters.

names(starwars_case_when_output)[1:3]
## [1] "name"         "BMI"          "BMI_category"
starwars_case_when_output |> 
  rename(bmi = BMI, 
         bmi_category = BMI_category) |> 
  print(n = 7)
## # A tibble: 87 × 5
##   name                 bmi bmi_category height  mass
##   <chr>              <dbl> <chr>         <int> <dbl>
## 1 Luke Skywalker      26.0 Overweight      172    77
## 2 C-3PO               26.9 Overweight      167    75
## 3 R2-D2               34.7 Obese            96    32
## 4 Darth Vader         33.3 Obese           202   136
## 5 Leia Organa         21.8 Normal          150    49
## 6 Owen Lars           37.9 Obese           178   120
## 7 Beru Whitesun Lars  27.5 Overweight      165    75
## # ℹ 80 more rows

relocate

We should have a good reason for reordering columns in a data frame. There weren’t any glaring single column issues, so here’s an arbitrary example:

starwars |> 
  relocate(homeworld, .after = name) |> 
  print(n = 3)
## # A tibble: 87 × 14
##   name   homeworld height  mass hair_color skin_color eye_color birth_year sex  
##   <chr>  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
## 1 Luke … Tatooine     172    77 blond      fair       blue              19 male 
## 2 C-3PO  Tatooine     167    75 <NA>       gold       yellow           112 none 
## 3 R2-D2  Naboo         96    32 <NA>       white, bl… red               33 none 
## # ℹ 84 more rows
## # ℹ 5 more variables: gender <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
starwars |> 
# organized so that now we have columns ordered by: 
# is.character, is.numeric, is.list
  relocate(where(is.character), .before = 1) |> 
  print(n = 3, width = Inf)
## # A tibble: 87 × 14
##   name           hair_color skin_color  eye_color sex   gender    homeworld
##   <chr>          <chr>      <chr>       <chr>     <chr> <chr>     <chr>    
## 1 Luke Skywalker blond      fair        blue      male  masculine Tatooine 
## 2 C-3PO          <NA>       gold        yellow    none  masculine Tatooine 
## 3 R2-D2          <NA>       white, blue red       none  masculine Naboo    
##   species height  mass birth_year films     vehicles  starships
##   <chr>    <int> <dbl>      <dbl> <list>    <list>    <list>   
## 1 Human      172    77         19 <chr [5]> <chr [2]> <chr [2]>
## 2 Droid      167    75        112 <chr [6]> <chr [0]> <chr [0]>
## 3 Droid       96    32         33 <chr [7]> <chr [0]> <chr [0]>
## # ℹ 84 more rows

5.2.8 Groups: summarize

group_by

group_by changes the grouping class, but doesn’t actually change the data yet:

starwars |> 
  group_by(homeworld)
## # A tibble: 87 × 14
## # Groups:   homeworld [49]
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

summarize

summarize returns only the group[ing] variable[s] and the resulting calculation[s]:

starwars |> 
  group_by(species) |> 
  summarize(mean_height = mean(height, na.rm = TRUE), 
            count = n()) |> 
  arrange(desc(mean_height))
## # A tibble: 38 × 3
##    species  mean_height count
##    <chr>          <dbl> <int>
##  1 Quermian        264      1
##  2 Wookiee         231      2
##  3 Kaminoan        221      2
##  4 Kaleesh         216      1
##  5 Gungan          209.     3
##  6 Pau'an          206      1
##  7 Besalisk        198      1
##  8 Cerean          198      1
##  9 Chagrian        196      1
## 10 Nautolan        196      1
## # ℹ 28 more rows

5.2.9 Groups: slice and Multiple Variables

slice_*

slice_head picks the first n or (100prop)% of observations per group. Why does this output only have three rows??

starwars |> 
  group_by(species) |> 
  slice_head(prop = 0.1) |> 
  select(name, species)
## # A tibble: 3 × 2
## # Groups:   species [1]
##   name           species
##   <chr>          <chr>  
## 1 Luke Skywalker Human  
## 2 Darth Vader    Human  
## 3 Leia Organa    Human

slice_tail picks the last n or (100prop)% of observations per group.

starwars |> 
  group_by(species) |> 
  slice_tail(n = 1) |> 
  select(name, species)
## # A tibble: 38 × 2
## # Groups:   species [38]
##    name                  species  
##    <chr>                 <chr>    
##  1 Ratts Tyerel          Aleena   
##  2 Dexter Jettster       Besalisk 
##  3 Ki-Adi-Mundi          Cerean   
##  4 Mas Amedda            Chagrian 
##  5 Zam Wesell            Clawdite 
##  6 BB8                   Droid    
##  7 Sebulba               Dug      
##  8 Wicket Systri Warrick Ewok     
##  9 Poggle the Lesser     Geonosian
## 10 Rugor Nass            Gungan   
## # ℹ 28 more rows

slice_min picks the smallest n or (100prop)% of observations per group.

starwars |> 
  group_by(species) |> 
  slice_min(order_by = height, n = 1) |>
  select(height, name, species) |> 
  arrange(desc(height), name, species)
## # A tibble: 40 × 3
## # Groups:   species [38]
##    height name            species 
##     <int> <chr>           <chr>   
##  1    264 Yarael Poof     Quermian
##  2    228 Chewbacca       Wookiee 
##  3    216 Grievous        Kaleesh 
##  4    213 Taun We         Kaminoan
##  5    206 Tion Medon      Pau'an  
##  6    198 Dexter Jettster Besalisk
##  7    198 Ki-Adi-Mundi    Cerean  
##  8    196 Jar Jar Binks   Gungan  
##  9    196 Kit Fisto       Nautolan
## 10    196 Mas Amedda      Chagrian
## # ℹ 30 more rows

slice_max picks the largest n or (100prop)% of observations per group.

starwars |> 
  group_by(species) |> 
  slice_max(order_by = height, n = 1) |>
  select(height, name, species) |> 
  arrange(desc(height), name, species)
## # A tibble: 38 × 3
## # Groups:   species [38]
##    height name            species 
##     <int> <chr>           <chr>   
##  1    264 Yarael Poof     Quermian
##  2    234 Tarfful         Wookiee 
##  3    229 Lama Su         Kaminoan
##  4    224 Roos Tarpals    Gungan  
##  5    216 Grievous        Kaleesh 
##  6    206 Tion Medon      Pau'an  
##  7    202 Darth Vader     Human   
##  8    200 IG-88           Droid   
##  9    198 Dexter Jettster Besalisk
## 10    198 Ki-Adi-Mundi    Cerean  
## # ℹ 28 more rows

slice_sample picks a sample of size n or (100prop)% of observations per group.

set.seed(27815)
starwars |> 
  group_by(species) |> 
  slice_sample(prop = 0.2) |>
  select(height, name, species)
## # A tibble: 8 × 3
## # Groups:   species [2]
##   height name           species
##    <int> <chr>          <chr>  
## 1     96 R4-P17         Droid  
## 2     NA Captain Phasma Human  
## 3    150 Leia Organa    Human  
## 4     NA Poe Dameron    Human  
## 5    183 Boba Fett      Human  
## 6    163 Shmi Skywalker Human  
## 7    182 Obi-Wan Kenobi Human  
## 8    188 Mace Windu     Human

Grouping by Multiple Variables

set.seed(27815)
starwars |> 
  filter(!is.na(species) & !is.na(homeworld)) |> 
  group_by(species, homeworld) |> 
  slice_sample(prop = 0.4) |>
# Note that I didn't select "homeworld" or "species"
  select(height, name)
## # A tibble: 7 × 4
## # Groups:   species, homeworld [4]
##   species homeworld height name              
##   <chr>   <chr>      <int> <chr>             
## 1 Gungan  Naboo        196 Jar Jar Binks     
## 2 Human   Alderaan     188 Raymus Antilles   
## 3 Human   Naboo        183 Ric Olié          
## 4 Human   Naboo        165 Dormé             
## 5 Human   Tatooine     183 Biggs Darklighter 
## 6 Human   Tatooine     163 Shmi Skywalker    
## 7 Human   Tatooine     165 Beru Whitesun Lars

5.2.10 Groups: ungroup vs .by

ungroup

ungroup removes the effect of group_by, and you can operate on the result as you may have expected.

set.seed(27815)
starwars |> 
  filter(!is.na(species) & !is.na(homeworld)) |> 
  group_by(species, homeworld) |> 
  slice_sample(prop = 0.4) |>
  ungroup() |> 
# Note that I didn't select "homeworld" or "species"
  select(height, name)
## # A tibble: 7 × 2
##   height name              
##    <int> <chr>             
## 1    196 Jar Jar Binks     
## 2    188 Raymus Antilles   
## 3    183 Ric Olié          
## 4    165 Dormé             
## 5    183 Biggs Darklighter 
## 6    163 Shmi Skywalker    
## 7    165 Beru Whitesun Lars

.by

.by prevents the need for group_by and a subsequent ungroup (this syntax was introduced in January 2023 as “experimental”, so you will still see the former paradigm in the wild):

starwars |> 
  summarize(mean_height = mean(height, na.rm = TRUE), 
            count = n(), 
            .by = species) |> 
  arrange(desc(mean_height)) |> 
  print(n = 5)
## # A tibble: 38 × 3
##   species  mean_height count
##   <chr>          <dbl> <int>
## 1 Quermian        264      1
## 2 Wookiee         231      2
## 3 Kaminoan        221      2
## 4 Kaleesh         216      1
## 5 Gungan          209.     3
## # ℹ 33 more rows

5.2.11 Closing Thoughts on Introductory dplyr

  • plyr handled both data frames and lists. It was split into two main parts: dplyr and purrr.
  • dplyr covers data frames, while purrr covers lists. It’s covered in Ch 26 (one of the last chapters we’ll see).
  • There are more dplyr features and functions.
    • As you reference Ch 3: Data Transformation (or any R4DS chapter) on the web, if there’s a function you want to learn more about, you can just click on it. This will navigate to the pkgdown documentation (usually at tidyverse.org). E.g., filter.
    • There, you can read the function documentation, as well as search for other functions in the search box at the top right.

We will revisit some more functions (pull, across, left_join, etc.) later. We’ve covered roughly 2/3 of the “official” dplyr cheatsheet:

Download PDF file.

5.3 Ch 4: Workflow: Code Style

5.3.1 More on Code Style

R4DS Ch 4 Style Guide

  • Absolutely follow the rules listed in sections:
    • 4.2: Spaces
    • 4.3: Pipes
    • 4.4: ggplot2
  • Not all of my code is presented in this way for space/presentation reasons – but you have much more space when outputting to .PDF.
  • Do see Section 5.2.6 (the mutate slide), the case_when tab, for a good example of formatting.

Tidyverse Style Guide

  • Ch 4 covers everything relevant here (more will be relevant later)

Google + MLR3 Style Guides

  • Nothing to add here

lintr

This package is used to check for style violations. We have written a linter for you to use with each subsequent assignment. The same linter will be used in the autograder.

5.4 Ch 5: Data Tidying

5.4.1 tidyr

  • reshape was part of Hadley’s dissertation, and handled these problems in 2005-2010.
  • reshape2 superseded reshape in 2010 and was eventually superseded by tidyr in 2014.
  • tidyr helps create tidy data and is the namesake of the tidyverse.
  • Tidying data creates a rigorous data structure which can be relied upon across the tidyverse.
  • The Python package pandas replicates tidyr (and dplyr) but does not support parallel computing (etc) – arrow to the rescue! (May cover Ch 22 at end of quarter!)

What makes data tidy?

  • Each variable is a column; each column is a variable.
  • Each observation is a row; each row is an observation.
  • Each value is a cell; each cell is a single value.

5.4.2 Lengthening Data

library(tidyr)
starwars_pivot_output = starwars |>
  pivot_longer(
    cols = where(is.numeric), 
    names_to = "numeric_column_name", 
    values_to = "value",
    values_drop_na = TRUE
  ) |> 
  select(name, numeric_column_name, value)
starwars_pivot_output
## # A tibble: 183 × 3
##    name           numeric_column_name value
##    <chr>          <chr>               <dbl>
##  1 Luke Skywalker height                172
##  2 Luke Skywalker mass                   77
##  3 Luke Skywalker birth_year             19
##  4 C-3PO          height                167
##  5 C-3PO          mass                   75
##  6 C-3PO          birth_year            112
##  7 R2-D2          height                 96
##  8 R2-D2          mass                   32
##  9 R2-D2          birth_year             33
## 10 Darth Vader    height                202
## # ℹ 173 more rows

Compare the dimensions we expect:

(sw_rows = starwars |> nrow())
## [1] 87
(sw_num_cols = starwars |> 
    select(where(is.numeric)) |> 
    ncol())
## [1] 3
(anticipate_rows = sw_rows*sw_num_cols)
## [1] 261
(observed_rows = starwars_pivot_output |> nrow())
## [1] 183
# how many rows we dropped (had missing values!)
anticipate_rows - observed_rows
## [1] 78

Why lengthen data?

  • ggplot2 – some geoms require it.
  • working with time series data
  • (and more!)

5.4.3 Widening Data

starwars_wider_output = starwars_pivot_output |>
  pivot_wider(
    names_from = numeric_column_name, 
    values_from = value
  )
starwars_wider_output
## # A tibble: 81 × 4
##    name               height  mass birth_year
##    <chr>               <dbl> <dbl>      <dbl>
##  1 Luke Skywalker        172    77       19  
##  2 C-3PO                 167    75      112  
##  3 R2-D2                  96    32       33  
##  4 Darth Vader           202   136       41.9
##  5 Leia Organa           150    49       19  
##  6 Owen Lars             178   120       52  
##  7 Beru Whitesun Lars    165    75       47  
##  8 R5-D4                  97    32       NA  
##  9 Biggs Darklighter     183    84       24  
## 10 Obi-Wan Kenobi        182    77       57  
## # ℹ 71 more rows

Compare the dimensions we expect:

(sw_rows = starwars |> nrow())
## [1] 87
(sw_wider_rows = starwars_wider_output |> nrow())
## [1] 81
# how many rows we lost!
sw_rows - sw_wider_rows
## [1] 6

Why widen data?

  • ggplot2 – some geoms require it.
  • working with governmental data
  • and many more!

5.4.4 tibble package to enhance data frames

See the tibble package documentation.

starwars |> tibble::is_tibble()
## [1] TRUE
starwars |> is.data.frame()
## [1] TRUE
class(starwars)
## [1] "tbl_df"     "tbl"        "data.frame"
starwars2 = starwars
class(starwars2) = "data.frame"
starwars2 |> tibble::is_tibble()
## [1] FALSE

Printing a tibble

print(starwars, n = 2, width = 50)
## # A tibble: 87 × 14
##   name          height  mass hair_color skin_color
##   <chr>          <int> <dbl> <chr>      <chr>     
## 1 Luke Skywalk…    172    77 blond      fair      
## 2 C-3PO            167    75 <NA>       gold      
## # ℹ 85 more rows
## # ℹ 9 more variables: eye_color <chr>,
## #   birth_year <dbl>, sex <chr>, gender <chr>,
## #   homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

Printing a data.frame

print(starwars2)
##                     name height   mass    hair_color          skin_color
## 1         Luke Skywalker    172   77.0         blond                fair
## 2                  C-3PO    167   75.0          <NA>                gold
## 3                  R2-D2     96   32.0          <NA>         white, blue
## 4            Darth Vader    202  136.0          none               white
## 5            Leia Organa    150   49.0         brown               light
## 6              Owen Lars    178  120.0   brown, grey               light
## 7     Beru Whitesun Lars    165   75.0         brown               light
## 8                  R5-D4     97   32.0          <NA>          white, red
## 9      Biggs Darklighter    183   84.0         black               light
## 10        Obi-Wan Kenobi    182   77.0 auburn, white                fair
## 11      Anakin Skywalker    188   84.0         blond                fair
## 12        Wilhuff Tarkin    180     NA  auburn, grey                fair
## 13             Chewbacca    228  112.0         brown             unknown
## 14              Han Solo    180   80.0         brown                fair
## 15                Greedo    173   74.0          <NA>               green
## 16 Jabba Desilijic Tiure    175 1358.0          <NA>    green-tan, brown
## 17        Wedge Antilles    170   77.0         brown                fair
## 18      Jek Tono Porkins    180  110.0         brown                fair
## 19                  Yoda     66   17.0         white               green
## 20             Palpatine    170   75.0          grey                pale
## 21             Boba Fett    183   78.2         black                fair
## 22                 IG-88    200  140.0          none               metal
## 23                 Bossk    190  113.0          none               green
## 24      Lando Calrissian    177   79.0         black                dark
## 25                 Lobot    175   79.0          none               light
## 26                Ackbar    180   83.0          none        brown mottle
## 27            Mon Mothma    150     NA        auburn                fair
## 28          Arvel Crynyd     NA     NA         brown                fair
## 29 Wicket Systri Warrick     88   20.0         brown               brown
## 30             Nien Nunb    160   68.0          none                grey
## 31          Qui-Gon Jinn    193   89.0         brown                fair
## 32           Nute Gunray    191   90.0          none       mottled green
## 33         Finis Valorum    170     NA         blond                fair
## 34         Padmé Amidala    185   45.0         brown               light
## 35         Jar Jar Binks    196   66.0          none              orange
## 36          Roos Tarpals    224   82.0          none                grey
## 37            Rugor Nass    206     NA          none               green
## 38              Ric Olié    183     NA         brown                fair
## 39                 Watto    137     NA         black          blue, grey
## 40               Sebulba    112   40.0          none           grey, red
## 41         Quarsh Panaka    183     NA         black                dark
## 42        Shmi Skywalker    163     NA         black                fair
## 43            Darth Maul    175   80.0          none                 red
## 44           Bib Fortuna    180     NA          none                pale
## 45           Ayla Secura    178   55.0          none                blue
## 46          Ratts Tyerel     79   15.0          none          grey, blue
## 47              Dud Bolt     94   45.0          none          blue, grey
## 48               Gasgano    122     NA          none         white, blue
## 49        Ben Quadinaros    163   65.0          none grey, green, yellow
## 50            Mace Windu    188   84.0          none                dark
## 51          Ki-Adi-Mundi    198   82.0         white                pale
## 52             Kit Fisto    196   87.0          none               green
## 53             Eeth Koth    171     NA         black               brown
## 54            Adi Gallia    184   50.0          none                dark
## 55           Saesee Tiin    188     NA          none                pale
## 56           Yarael Poof    264     NA          none               white
## 57              Plo Koon    188   80.0          none              orange
## 58            Mas Amedda    196     NA          none                blue
## 59          Gregar Typho    185   85.0         black                dark
## 60                 Cordé    157     NA         brown               light
## 61           Cliegg Lars    183     NA         brown                fair
## 62     Poggle the Lesser    183   80.0          none               green
## 63       Luminara Unduli    170   56.2         black              yellow
## 64         Barriss Offee    166   50.0         black              yellow
## 65                 Dormé    165     NA         brown               light
## 66                 Dooku    193   80.0         white                fair
## 67   Bail Prestor Organa    191     NA         black                 tan
## 68            Jango Fett    183   79.0         black                 tan
## 69            Zam Wesell    168   55.0        blonde fair, green, yellow
## 70       Dexter Jettster    198  102.0          none               brown
## 71               Lama Su    229   88.0          none                grey
## 72               Taun We    213     NA          none                grey
## 73            Jocasta Nu    167     NA         white                fair
## 74                R4-P17     96     NA          none         silver, red
## 75            Wat Tambor    193   48.0          none         green, grey
## 76              San Hill    191     NA          none                grey
## 77              Shaak Ti    178   57.0          none    red, blue, white
## 78              Grievous    216  159.0          none        brown, white
## 79               Tarfful    234  136.0         brown               brown
## 80       Raymus Antilles    188   79.0         brown               light
## 81             Sly Moore    178   48.0          none                pale
## 82            Tion Medon    206   80.0          none                grey
## 83                  Finn     NA     NA         black                dark
## 84                   Rey     NA     NA         brown               light
## 85           Poe Dameron     NA     NA         brown               light
## 86                   BB8     NA     NA          none                none
## 87        Captain Phasma     NA     NA          none                none
##        eye_color birth_year            sex    gender      homeworld
## 1           blue       19.0           male masculine       Tatooine
## 2         yellow      112.0           none masculine       Tatooine
## 3            red       33.0           none masculine          Naboo
## 4         yellow       41.9           male masculine       Tatooine
## 5          brown       19.0         female  feminine       Alderaan
## 6           blue       52.0           male masculine       Tatooine
## 7           blue       47.0         female  feminine       Tatooine
## 8            red         NA           none masculine       Tatooine
## 9          brown       24.0           male masculine       Tatooine
## 10     blue-gray       57.0           male masculine        Stewjon
## 11          blue       41.9           male masculine       Tatooine
## 12          blue       64.0           male masculine         Eriadu
## 13          blue      200.0           male masculine       Kashyyyk
## 14         brown       29.0           male masculine       Corellia
## 15         black       44.0           male masculine          Rodia
## 16        orange      600.0 hermaphroditic masculine      Nal Hutta
## 17         hazel       21.0           male masculine       Corellia
## 18          blue         NA           <NA>      <NA>     Bestine IV
## 19         brown      896.0           male masculine           <NA>
## 20        yellow       82.0           male masculine          Naboo
## 21         brown       31.5           male masculine         Kamino
## 22           red       15.0           none masculine           <NA>
## 23           red       53.0           male masculine      Trandosha
## 24         brown       31.0           male masculine        Socorro
## 25          blue       37.0           male masculine         Bespin
## 26        orange       41.0           male masculine       Mon Cala
## 27          blue       48.0         female  feminine      Chandrila
## 28         brown         NA           male masculine           <NA>
## 29         brown        8.0           male masculine          Endor
## 30         black         NA           male masculine        Sullust
## 31          blue       92.0           male masculine           <NA>
## 32           red         NA           male masculine Cato Neimoidia
## 33          blue       91.0           male masculine      Coruscant
## 34         brown       46.0         female  feminine          Naboo
## 35        orange       52.0           male masculine          Naboo
## 36        orange         NA           male masculine          Naboo
## 37        orange         NA           male masculine          Naboo
## 38          blue         NA           male masculine          Naboo
## 39        yellow         NA           male masculine       Toydaria
## 40        orange         NA           male masculine      Malastare
## 41         brown       62.0           male masculine          Naboo
## 42         brown       72.0         female  feminine       Tatooine
## 43        yellow       54.0           male masculine       Dathomir
## 44          pink         NA           male masculine         Ryloth
## 45         hazel       48.0         female  feminine         Ryloth
## 46       unknown         NA           male masculine    Aleen Minor
## 47        yellow         NA           male masculine        Vulpter
## 48         black         NA           male masculine        Troiken
## 49        orange         NA           male masculine           Tund
## 50         brown       72.0           male masculine     Haruun Kal
## 51        yellow       92.0           male masculine          Cerea
## 52         black         NA           male masculine    Glee Anselm
## 53         brown         NA           male masculine       Iridonia
## 54          blue         NA         female  feminine      Coruscant
## 55        orange         NA           male masculine        Iktotch
## 56        yellow         NA           male masculine        Quermia
## 57         black       22.0           male masculine          Dorin
## 58          blue         NA           male masculine       Champala
## 59         brown         NA           <NA>      <NA>          Naboo
## 60         brown         NA           <NA>      <NA>          Naboo
## 61          blue       82.0           male masculine       Tatooine
## 62        yellow         NA           male masculine       Geonosis
## 63          blue       58.0         female  feminine         Mirial
## 64          blue       40.0         female  feminine         Mirial
## 65         brown         NA         female  feminine          Naboo
## 66         brown      102.0           male masculine        Serenno
## 67         brown       67.0           male masculine       Alderaan
## 68         brown       66.0           male masculine   Concord Dawn
## 69        yellow         NA         female  feminine          Zolan
## 70        yellow         NA           male masculine           Ojom
## 71         black         NA           male masculine         Kamino
## 72         black         NA         female  feminine         Kamino
## 73          blue         NA         female  feminine      Coruscant
## 74     red, blue         NA           none  feminine           <NA>
## 75       unknown         NA           male masculine          Skako
## 76          gold         NA           male masculine     Muunilinst
## 77         black         NA         female  feminine          Shili
## 78 green, yellow         NA           male masculine          Kalee
## 79          blue         NA           male masculine       Kashyyyk
## 80         brown         NA           male masculine       Alderaan
## 81         white         NA           <NA>      <NA>         Umbara
## 82         black         NA           male masculine         Utapau
## 83          dark         NA           male masculine           <NA>
## 84         hazel         NA         female  feminine           <NA>
## 85         brown         NA           male masculine           <NA>
## 86         black         NA           none masculine           <NA>
## 87       unknown         NA         female  feminine           <NA>
##           species
## 1           Human
## 2           Droid
## 3           Droid
## 4           Human
## 5           Human
## 6           Human
## 7           Human
## 8           Droid
## 9           Human
## 10          Human
## 11          Human
## 12          Human
## 13        Wookiee
## 14          Human
## 15         Rodian
## 16           Hutt
## 17          Human
## 18           <NA>
## 19 Yoda's species
## 20          Human
## 21          Human
## 22          Droid
## 23     Trandoshan
## 24          Human
## 25          Human
## 26   Mon Calamari
## 27          Human
## 28          Human
## 29           Ewok
## 30      Sullustan
## 31          Human
## 32      Neimodian
## 33          Human
## 34          Human
## 35         Gungan
## 36         Gungan
## 37         Gungan
## 38          Human
## 39      Toydarian
## 40            Dug
## 41          Human
## 42          Human
## 43         Zabrak
## 44        Twi'lek
## 45        Twi'lek
## 46         Aleena
## 47     Vulptereen
## 48          Xexto
## 49          Toong
## 50          Human
## 51         Cerean
## 52       Nautolan
## 53         Zabrak
## 54     Tholothian
## 55       Iktotchi
## 56       Quermian
## 57        Kel Dor
## 58       Chagrian
## 59           <NA>
## 60           <NA>
## 61          Human
## 62      Geonosian
## 63       Mirialan
## 64       Mirialan
## 65          Human
## 66          Human
## 67          Human
## 68          Human
## 69       Clawdite
## 70       Besalisk
## 71       Kaminoan
## 72       Kaminoan
## 73          Human
## 74          Droid
## 75        Skakoan
## 76           Muun
## 77        Togruta
## 78        Kaleesh
## 79        Wookiee
## 80          Human
## 81           <NA>
## 82         Pau'an
## 83          Human
## 84          Human
## 85          Human
## 86          Droid
## 87          Human
##                                                                                                                                        films
## 1                                            A New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith, The Force Awakens
## 2                     A New Hope, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 3  A New Hope, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith, The Force Awakens
## 4                                                               A New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith
## 5                                            A New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith, The Force Awakens
## 6                                                                                      A New Hope, Attack of the Clones, Revenge of the Sith
## 7                                                                                      A New Hope, Attack of the Clones, Revenge of the Sith
## 8                                                                                                                                 A New Hope
## 9                                                                                                                                 A New Hope
## 10                    A New Hope, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 11                                                                             The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 12                                                                                                           A New Hope, Revenge of the Sith
## 13                                           A New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith, The Force Awakens
## 14                                                                A New Hope, The Empire Strikes Back, Return of the Jedi, The Force Awakens
## 15                                                                                                                                A New Hope
## 16                                                                                        A New Hope, Return of the Jedi, The Phantom Menace
## 17                                                                                   A New Hope, The Empire Strikes Back, Return of the Jedi
## 18                                                                                                                                A New Hope
## 19                                The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 20                                The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 21                                                                         The Empire Strikes Back, Return of the Jedi, Attack of the Clones
## 22                                                                                                                   The Empire Strikes Back
## 23                                                                                                                   The Empire Strikes Back
## 24                                                                                               The Empire Strikes Back, Return of the Jedi
## 25                                                                                                                   The Empire Strikes Back
## 26                                                                                                     Return of the Jedi, The Force Awakens
## 27                                                                                                                        Return of the Jedi
## 28                                                                                                                        Return of the Jedi
## 29                                                                                                                        Return of the Jedi
## 30                                                                                                                        Return of the Jedi
## 31                                                                                                                        The Phantom Menace
## 32                                                                             The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 33                                                                                                                        The Phantom Menace
## 34                                                                             The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 35                                                                                                  The Phantom Menace, Attack of the Clones
## 36                                                                                                                        The Phantom Menace
## 37                                                                                                                        The Phantom Menace
## 38                                                                                                                        The Phantom Menace
## 39                                                                                                  The Phantom Menace, Attack of the Clones
## 40                                                                                                                        The Phantom Menace
## 41                                                                                                                        The Phantom Menace
## 42                                                                                                  The Phantom Menace, Attack of the Clones
## 43                                                                                                                        The Phantom Menace
## 44                                                                                                                        Return of the Jedi
## 45                                                                             The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 46                                                                                                                        The Phantom Menace
## 47                                                                                                                        The Phantom Menace
## 48                                                                                                                        The Phantom Menace
## 49                                                                                                                        The Phantom Menace
## 50                                                                             The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 51                                                                             The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 52                                                                             The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 53                                                                                                   The Phantom Menace, Revenge of the Sith
## 54                                                                                                   The Phantom Menace, Revenge of the Sith
## 55                                                                                                   The Phantom Menace, Revenge of the Sith
## 56                                                                                                                        The Phantom Menace
## 57                                                                             The Phantom Menace, Attack of the Clones, Revenge of the Sith
## 58                                                                                                  The Phantom Menace, Attack of the Clones
## 59                                                                                                                      Attack of the Clones
## 60                                                                                                                      Attack of the Clones
## 61                                                                                                                      Attack of the Clones
## 62                                                                                                 Attack of the Clones, Revenge of the Sith
## 63                                                                                                 Attack of the Clones, Revenge of the Sith
## 64                                                                                                                      Attack of the Clones
## 65                                                                                                                      Attack of the Clones
## 66                                                                                                 Attack of the Clones, Revenge of the Sith
## 67                                                                                                 Attack of the Clones, Revenge of the Sith
## 68                                                                                                                      Attack of the Clones
## 69                                                                                                                      Attack of the Clones
## 70                                                                                                                      Attack of the Clones
## 71                                                                                                                      Attack of the Clones
## 72                                                                                                                      Attack of the Clones
## 73                                                                                                                      Attack of the Clones
## 74                                                                                                 Attack of the Clones, Revenge of the Sith
## 75                                                                                                                      Attack of the Clones
## 76                                                                                                                      Attack of the Clones
## 77                                                                                                 Attack of the Clones, Revenge of the Sith
## 78                                                                                                                       Revenge of the Sith
## 79                                                                                                                       Revenge of the Sith
## 80                                                                                                           A New Hope, Revenge of the Sith
## 81                                                                                                 Attack of the Clones, Revenge of the Sith
## 82                                                                                                                       Revenge of the Sith
## 83                                                                                                                         The Force Awakens
## 84                                                                                                                         The Force Awakens
## 85                                                                                                                         The Force Awakens
## 86                                                                                                                         The Force Awakens
## 87                                                                                                                         The Force Awakens
##                                vehicles
## 1    Snowspeeder, Imperial Speeder Bike
## 2                                      
## 3                                      
## 4                                      
## 5                 Imperial Speeder Bike
## 6                                      
## 7                                      
## 8                                      
## 9                                      
## 10                      Tribubble bongo
## 11 Zephyr-G swoop bike, XJ-6 airspeeder
## 12                                     
## 13                                AT-ST
## 14                                     
## 15                                     
## 16                                     
## 17                          Snowspeeder
## 18                                     
## 19                                     
## 20                                     
## 21                                     
## 22                                     
## 23                                     
## 24                                     
## 25                                     
## 26                                     
## 27                                     
## 28                                     
## 29                                     
## 30                                     
## 31                      Tribubble bongo
## 32                                     
## 33                                     
## 34                                     
## 35                                     
## 36                                     
## 37                                     
## 38                                     
## 39                                     
## 40                                     
## 41                                     
## 42                                     
## 43                         Sith speeder
## 44                                     
## 45                                     
## 46                                     
## 47                                     
## 48                                     
## 49                                     
## 50                                     
## 51                                     
## 52                                     
## 53                                     
## 54                                     
## 55                                     
## 56                                     
## 57                                     
## 58                                     
## 59                                     
## 60                                     
## 61                                     
## 62                                     
## 63                                     
## 64                                     
## 65                                     
## 66                     Flitknot speeder
## 67                                     
## 68                                     
## 69           Koro-2 Exodrive airspeeder
## 70                                     
## 71                                     
## 72                                     
## 73                                     
## 74                                     
## 75                                     
## 76                                     
## 77                                     
## 78          Tsmeu-6 personal wheel bike
## 79                                     
## 80                                     
## 81                                     
## 82                                     
## 83                                     
## 84                                     
## 85                                     
## 86                                     
## 87                                     
##                                                                                                   starships
## 1                                                                                  X-wing, Imperial shuttle
## 2                                                                                                          
## 3                                                                                                          
## 4                                                                                           TIE Advanced x1
## 5                                                                                                          
## 6                                                                                                          
## 7                                                                                                          
## 8                                                                                                          
## 9                                                                                                    X-wing
## 10 Jedi starfighter, Trade Federation cruiser, Naboo star skiff, Jedi Interceptor, Belbullab-22 starfighter
## 11                                                Naboo fighter, Trade Federation cruiser, Jedi Interceptor
## 12                                                                                                         
## 13                                                                      Millennium Falcon, Imperial shuttle
## 14                                                                      Millennium Falcon, Imperial shuttle
## 15                                                                                                         
## 16                                                                                                         
## 17                                                                                                   X-wing
## 18                                                                                                   X-wing
## 19                                                                                                         
## 20                                                                                                         
## 21                                                                                                  Slave 1
## 22                                                                                                         
## 23                                                                                                         
## 24                                                                                        Millennium Falcon
## 25                                                                                                         
## 26                                                                                                         
## 27                                                                                                         
## 28                                                                                                   A-wing
## 29                                                                                                         
## 30                                                                                        Millennium Falcon
## 31                                                                                                         
## 32                                                                                                         
## 33                                                                                                         
## 34                                                     Naboo fighter, H-type Nubian yacht, Naboo star skiff
## 35                                                                                                         
## 36                                                                                                         
## 37                                                                                                         
## 38                                                                                     Naboo Royal Starship
## 39                                                                                                         
## 40                                                                                                         
## 41                                                                                                         
## 42                                                                                                         
## 43                                                                                                 Scimitar
## 44                                                                                                         
## 45                                                                                                         
## 46                                                                                                         
## 47                                                                                                         
## 48                                                                                                         
## 49                                                                                                         
## 50                                                                                                         
## 51                                                                                                         
## 52                                                                                                         
## 53                                                                                                         
## 54                                                                                                         
## 55                                                                                                         
## 56                                                                                                         
## 57                                                                                         Jedi starfighter
## 58                                                                                                         
## 59                                                                                            Naboo fighter
## 60                                                                                                         
## 61                                                                                                         
## 62                                                                                                         
## 63                                                                                                         
## 64                                                                                                         
## 65                                                                                                         
## 66                                                                                                         
## 67                                                                                                         
## 68                                                                                                         
## 69                                                                                                         
## 70                                                                                                         
## 71                                                                                                         
## 72                                                                                                         
## 73                                                                                                         
## 74                                                                                                         
## 75                                                                                                         
## 76                                                                                                         
## 77                                                                                                         
## 78                                                                                 Belbullab-22 starfighter
## 79                                                                                                         
## 80                                                                                                         
## 81                                                                                                         
## 82                                                                                                         
## 83                                                                                                         
## 84                                                                                                         
## 85                                                                                                   X-wing
## 86                                                                                                         
## 87

5.4.5 The tidyr cheatsheet

We have much more to cover with tidyr! Here’s the “official” cheatsheet:

Download PDF file.

5.5 Ch 7: Data Import

5.5.1 Data Import

readr

library(readr)
URL_vince = "https://vincentarelbundock.github.io/Rdatasets/csv/"
URL_vince_data = "pscl/AustralianElectionPolling.csv"
URL = paste0(URL_vince, URL_vince_data)
read_csv(file = URL) |> print(n = 3)
## Rows: 239 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (3): org, source, remark
## dbl  (10): rownames, ALP, Lib, Nat, Green, FamilyFirst, Dems, OneNation, DK,...
## date  (2): startDate, endDate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 239 × 15
##   rownames   ALP   Lib   Nat Green FamilyFirst  Dems OneNation    DK sampleSize
##      <dbl> <dbl> <dbl> <dbl> <dbl>       <dbl> <dbl>     <dbl> <dbl>      <dbl>
## 1        1  39.5  44.5     0   8.5         2       2         1     0      1451.
## 2        2  39    44       0   8.5         1.5     2         1     0      2090 
## 3        3  38    46       0   6           0       0         0     0      1150 
## # ℹ 236 more rows
## # ℹ 5 more variables: org <chr>, startDate <date>, endDate <date>,
## #   source <chr>, remark <chr>

However, sometimes columns will read in as a less efficient vector type (e.g., double instead of integer, character instead of date or datetime, etc, and you need to specify a column type.

5.5.2 readr Column Specification

By default, read_csv will guess the column type. It usually does a pretty good job, although sometimes guesses conservatively (e.g., double instead of integer, character instead of factor, etc). You can specify what individual columns should be, by name.

read1 = read_csv(
  file = URL, 
  col_type = list(
    rownames = col_integer(),
    # as opposed to scales::col_factor()
    org = readr::col_factor(), 
    # readr guesses date and format correctly
    # but if you wanted to specify...
    startDate = col_date(format = "%Y-%m-%d")
    )
  )
read1 |> print(n = 4, width = 48)
## # A tibble: 239 × 15
##   rownames   ALP   Lib   Nat Green FamilyFirst
##      <int> <dbl> <dbl> <dbl> <dbl>       <dbl>
## 1        1  39.5  44.5     0   8.5         2  
## 2        2  39    44       0   8.5         1.5
## 3        3  38    46       0   6           0  
## 4        4  36    46.5     0   9           2.5
## # ℹ 235 more rows
## # ℹ 9 more variables: Dems <dbl>,
## #   OneNation <dbl>, DK <dbl>,
## #   sampleSize <dbl>, org <fct>,
## #   startDate <date>, endDate <date>,
## #   source <chr>, remark <chr>

We can also use shortcodes as surrogates for function calls, but we must accept the defaults. Usually, this is not a problem. Factors, dates, times, and datetimes often need special handling, though (sometimes this means storing them as a character type column and modifying with mutate).

read2 = read_csv(
  file = URL, 
  col_type = list(
    rownames = "i",
    org = "f"
    )
  )
read2 |> print(n = 5, width = 75)
## # A tibble: 239 × 15
##   rownames   ALP   Lib   Nat Green FamilyFirst  Dems OneNation    DK
##      <int> <dbl> <dbl> <dbl> <dbl>       <dbl> <dbl>     <dbl> <dbl>
## 1        1  39.5  44.5     0   8.5         2     2           1     0
## 2        2  39    44       0   8.5         1.5   2           1     0
## 3        3  38    46       0   6           0     0           0     0
## 4        4  36    46.5     0   9           2.5   1.5         1     0
## 5        5  33    47       0   8           0     0           0     0
## # ℹ 234 more rows
## # ℹ 6 more variables: sampleSize <dbl>, org <fct>, startDate <date>,
## #   endDate <date>, source <chr>, remark <chr>

Here, we need to give a single vector with every column type in shortcode form. However, we are limited to type defaults. And, we have to give every column a shortcode, which can get quite tedious (and be error-prone) with many columns.

read3 = read_csv(
  file = URL,
  col_type = "idddddddddfDDcc"
  )
read3 |> print(n = 5, width = 60)
## # A tibble: 239 × 15
##   rownames   ALP   Lib   Nat Green FamilyFirst  Dems
##      <int> <dbl> <dbl> <dbl> <dbl>       <dbl> <dbl>
## 1        1  39.5  44.5     0   8.5         2     2  
## 2        2  39    44       0   8.5         1.5   2  
## 3        3  38    46       0   6           0     0  
## 4        4  36    46.5     0   9           2.5   1.5
## 5        5  33    47       0   8           0     0  
## # ℹ 234 more rows
## # ℹ 8 more variables: OneNation <dbl>, DK <dbl>,
## #   sampleSize <dbl>, org <fct>, startDate <date>,
## #   endDate <date>, source <chr>, remark <chr>

Here, we can specify a default (e.g., double), and then clarify what other columns should be.

read4 = read_csv(
  file = URL, 
  col_type = list(
    .default = col_double(), 
    rownames = col_integer(),
    org = readr::col_factor(),
    startDate = col_date(),
    endDate = col_date(),
    source = col_character(),
    remark = col_character()
    )
  )
(read1_types = lapply(X = read1, FUN = typeof) |> do.call(c, args = _))
##    rownames         ALP         Lib         Nat       Green FamilyFirst 
##   "integer"    "double"    "double"    "double"    "double"    "double" 
##        Dems   OneNation          DK  sampleSize         org   startDate 
##    "double"    "double"    "double"    "double"   "integer"    "double" 
##     endDate      source      remark 
##    "double" "character" "character"
read4_types = lapply(X = read4, FUN = typeof) |> do.call(c, args = _)
identical(read1_types, read4_types)
## [1] TRUE
print(read4, n = 3, width = 65)
## # A tibble: 239 × 15
##   rownames   ALP   Lib   Nat Green FamilyFirst  Dems OneNation
##      <int> <dbl> <dbl> <dbl> <dbl>       <dbl> <dbl>     <dbl>
## 1        1  39.5  44.5     0   8.5         2       2         1
## 2        2  39    44       0   8.5         1.5     2         1
## 3        3  38    46       0   6           0       0         0
## # ℹ 236 more rows
## # ℹ 7 more variables: DK <dbl>, sampleSize <dbl>, org <fct>,
## #   startDate <date>, endDate <date>, source <chr>, remark <chr>

You could also use spec() to get the defaults readr gets on an initial read and refine them. For large data, you can also grab a small number of rows to make the type assignment a faster iterative process with n_max.

read_csv(file = URL, n_max = 30) |> readr::spec()
## cols(
##   rownames = col_double(),
##   ALP = col_double(),
##   Lib = col_double(),
##   Nat = col_double(),
##   Green = col_double(),
##   FamilyFirst = col_double(),
##   Dems = col_double(),
##   OneNation = col_double(),
##   DK = col_double(),
##   sampleSize = col_double(),
##   org = col_character(),
##   startDate = col_date(format = ""),
##   endDate = col_date(format = ""),
##   source = col_character(),
##   remark = col_character()
## )

Then, paste in the result (e.g., cols(...)) after col_type = and modify as desired.

read5 = read_csv(
  file = URL, 
  col_type = cols(
    rownames = col_double(),
    ALP = col_double(),
    Lib = col_double(),
    Nat = col_double(),
    Green = col_double(),
    FamilyFirst = col_double(),
    Dems = col_double(),
    OneNation = col_double(),
    DK = col_double(),
    sampleSize = col_double(),
    org = col_character(),
    startDate = col_date(format = ""),
    endDate = col_date(format = ""),
    source = col_character(),
    remark = col_character()
  )
)

5.5.3 janitor::clean_names

read_csv(file = URL) |> 
  janitor::clean_names() |>
  print(n = 4)
## # A tibble: 239 × 15
##   rownames   alp   lib   nat green family_first  dems one_nation    dk
##      <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl> <dbl>      <dbl> <dbl>
## 1        1  39.5  44.5     0   8.5          2     2            1     0
## 2        2  39    44       0   8.5          1.5   2            1     0
## 3        3  38    46       0   6            0     0            0     0
## 4        4  36    46.5     0   9            2.5   1.5          1     0
## # ℹ 235 more rows
## # ℹ 6 more variables: sample_size <dbl>, org <chr>, start_date <date>,
## #   end_date <date>, source <chr>, remark <chr>

When reading data in from the wild (especially when column names contain illegal characters – spaces, parentheses, %, &, @, #, etc.), it’s best to use janitor::clean_names immediately so that all of your downstream code is easier to write and read.

5.5.4 The readr cheatsheet

We won’t cover much more of readr; do reference the cheatsheet and package documentation! (Note: we will likely cover readxl and googlesheets4)

Download PDF file.