tidyr::unite()

Function of the Week

Function of the Week

Unite()

In this document, I will introduce the unite() function and show what it’s for.

The unite() function is part of the tidyr package, which is housed within the larger umbrella package, tidyverse. You can get tidyr by installing tidyr directly, library(tidyr), or by installing the entire tidyverse package, library(tidyverse).

#load tidyverse up
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.4     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#example dataset
library(palmerpenguins)
data(penguins)

#Subsetting to a smaller data frame
small_penguin <- penguins[1:5,]
small_penguin
## # A tibble: 5 x 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## # … with 1 more variable: year <int>

What is it for?

The unite() function concatenates two or more columns into a single, new, column.1

knitr::include_graphics("image/unite_graphic.png")

Arguments

1. data Specify the data frame

2. col= Use ‘col=’ to specify the name of your new column. You do not have to actually add the text “col=”, rather you can simply state your new column name in quotations.

3. … List the columns you wish to concatenate, separating each column name by a comma (e.g. x,y). Conversely, you can separate each column with a colon (e.g, x:y). To combine more than one column, list each column individually (e.g., x,y,z). If the columns are sequential, you only need to list the first and last variable, however you must use a colon in your argument here, instead of a comma (e.g., a:d would combine columns a, b, c, and d).

4. sep= Specify the separator to use between values. If no separator is specified, the default separator is an underscore. To add a space between the values, use sep=" “, with a space between the quotation marks. To remove any spaces, use sep=”", without a space between the quotation marks.

5. remove Default value is true. If true, the output data frame will only contain the new column with the concatenated variables. If false, the output data frame will include both the new column as well as the original individual columns.

6. na.rm Default value is false. If false, missing values will not be removed prior to uniting each value. If set to true, missing values will be removed.

Variations in how to combine columns in your argument statement

#Comma
small_penguin %>%
  unite("Species Island", species, island, remove=FALSE) %>%
  select("Species Island", species, island)
## # A tibble: 5 x 3
##   `Species Island` species island   
##   <chr>            <fct>   <fct>    
## 1 Adelie_Torgersen Adelie  Torgersen
## 2 Adelie_Torgersen Adelie  Torgersen
## 3 Adelie_Torgersen Adelie  Torgersen
## 4 Adelie_Torgersen Adelie  Torgersen
## 5 Adelie_Torgersen Adelie  Torgersen
#Colon
small_penguin %>%
  unite("Species Island", species:island) %>%
  select("Species Island")
## # A tibble: 5 x 1
##   `Species Island`
##   <chr>           
## 1 Adelie_Torgersen
## 2 Adelie_Torgersen
## 3 Adelie_Torgersen
## 4 Adelie_Torgersen
## 5 Adelie_Torgersen
#Multiple
small_penguin %>%
  unite("Species Island Bill Length", species:bill_length_mm, remove=FALSE) %>%
  select("Species Island Bill Length", species, island, bill_length_mm)
## # A tibble: 5 x 4
##   `Species Island Bill Length` species island    bill_length_mm
##   <chr>                        <fct>   <fct>              <dbl>
## 1 Adelie_Torgersen_39.1        Adelie  Torgersen           39.1
## 2 Adelie_Torgersen_39.5        Adelie  Torgersen           39.5
## 3 Adelie_Torgersen_40.3        Adelie  Torgersen           40.3
## 4 Adelie_Torgersen_NA          Adelie  Torgersen           NA  
## 5 Adelie_Torgersen_36.7        Adelie  Torgersen           36.7

Different options for separating values in your new column

#Default, underscore
small_penguin %>%
  unite("Species_Island", species:island) %>%
  select("Species_Island")
## # A tibble: 5 x 1
##   Species_Island  
##   <chr>           
## 1 Adelie_Torgersen
## 2 Adelie_Torgersen
## 3 Adelie_Torgersen
## 4 Adelie_Torgersen
## 5 Adelie_Torgersen
#Add a space
small_penguin %>%
  unite("Species Island", species:island, sep=" ") %>%
  select("Species Island")
## # A tibble: 5 x 1
##   `Species Island`
##   <chr>           
## 1 Adelie Torgersen
## 2 Adelie Torgersen
## 3 Adelie Torgersen
## 4 Adelie Torgersen
## 5 Adelie Torgersen
#No space
small_penguin %>%
  unite("SpeciesIsland", species:island, sep="") %>%
  select("SpeciesIsland")
## # A tibble: 5 x 1
##   SpeciesIsland  
##   <chr>          
## 1 AdelieTorgersen
## 2 AdelieTorgersen
## 3 AdelieTorgersen
## 4 AdelieTorgersen
## 5 AdelieTorgersen
#Add a comma
small_penguin %>%
  unite("Species, Island", species:island, sep=", ") %>%
  select("Species, Island")
## # A tibble: 5 x 1
##   `Species, Island`
##   <chr>            
## 1 Adelie, Torgersen
## 2 Adelie, Torgersen
## 3 Adelie, Torgersen
## 4 Adelie, Torgersen
## 5 Adelie, Torgersen
#Add a comma and additional text
small_penguin %>%
  unite("Species, and Island", species:island, sep=", and ", remove=FALSE) %>%
  select("Species, and Island", species, island)
## # A tibble: 5 x 3
##   `Species, and Island` species island   
##   <chr>                 <fct>   <fct>    
## 1 Adelie, and Torgersen Adelie  Torgersen
## 2 Adelie, and Torgersen Adelie  Torgersen
## 3 Adelie, and Torgersen Adelie  Torgersen
## 4 Adelie, and Torgersen Adelie  Torgersen
## 5 Adelie, and Torgersen Adelie  Torgersen

Working with missing values

#remove NA values when combining two values
small_penguin_2 <-
    unite(small_penguin,
        "year_bill_depth", 
        year, bill_depth_mm, 
        sep = ", and ",
        remove=FALSE,
        na.rm=TRUE) %>%
    
    select(year_bill_depth, bill_depth_mm, year)
small_penguin_2
## # A tibble: 5 x 3
##   year_bill_depth bill_depth_mm  year
##   <chr>                   <dbl> <int>
## 1 2007, and 18.7           18.7  2007
## 2 2007, and 17.4           17.4  2007
## 3 2007, and 18             18    2007
## 4 2007                     NA    2007
## 5 2007, and 19.3           19.3  2007

The complement, separate()

Conversely, you can separate information in one column into two or more new columns using the separate() function.

Is it helpful?

I think it’s helpful for preparing data and information to go into a report or table. For example, a lot of times in the table 1 of a manuscript, the count is listed with the proportion in the same column, such as ‘## (%)’. Analyses typically separate these values into two separate columns, even in the output, so being able to combine this information prior to the final output saves a lot of time and essentially eliminates the potential for error from doing the combination manually. Formatting, such as parentheses and commas also have to be added in manually, so being able to add these pieces in through the unite() function is immensely helpful.