Function of the Week
In this document, I will introduce the unite() function and show what it’s for.
The unite() function is part of the tidyr package, which is housed within the larger umbrella package, tidyverse. You can get tidyr by installing tidyr directly, library(tidyr), or by installing the entire tidyverse package, library(tidyverse).
#load tidyverse up
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#example dataset
#Subsetting to a smaller data frame
<- penguins[1:5,]
small_penguin small_penguin
## # A tibble: 5 x 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <fct> <fct> <dbl> <dbl> <int> <int> <fct>
## 1 Adelie Torge… 39.1 18.7 181 3750 male
## 2 Adelie Torge… 39.5 17.4 186 3800 fema…
## 3 Adelie Torge… 40.3 18 195 3250 fema…
## 4 Adelie Torge… NA NA NA NA <NA>
## 5 Adelie Torge… 36.7 19.3 193 3450 fema…
## # … with 1 more variable: year <int>
What is it for?
The unite() function concatenates two or more columns into a single, new, column.1
::include_graphics("image/unite_graphic.png") knitr
1. data Specify the data frame
2. col= Use ‘col=’ to specify the name of your new column. You do not have to actually add the text “col=”, rather you can simply state your new column name in quotations.
3. … List the columns you wish to concatenate, separating each column name by a comma (e.g. x,y). Conversely, you can separate each column with a colon (e.g, x:y). To combine more than one column, list each column individually (e.g., x,y,z). If the columns are sequential, you only need to list the first and last variable, however you must use a colon in your argument here, instead of a comma (e.g., a:d would combine columns a, b, c, and d).
4. sep= Specify the separator to use between values. If no separator is specified, the default separator is an underscore. To add a space between the values, use sep=" “, with a space between the quotation marks. To remove any spaces, use sep=”", without a space between the quotation marks.
5. remove Default value is true. If true, the output data frame will only contain the new column with the concatenated variables. If false, the output data frame will include both the new column as well as the original individual columns.
6. na.rm Default value is false. If false, missing values will not be removed prior to uniting each value. If set to true, missing values will be removed.
Variations in how to combine columns in your argument statement
small_penguin unite("Species Island", species, island, remove=FALSE) %>%
select("Species Island", species, island)
## # A tibble: 5 x 3
## `Species Island` species island
## <chr> <fct> <fct>
## 1 Adelie_Torgersen Adelie Torgersen
## 2 Adelie_Torgersen Adelie Torgersen
## 3 Adelie_Torgersen Adelie Torgersen
## 4 Adelie_Torgersen Adelie Torgersen
## 5 Adelie_Torgersen Adelie Torgersen
small_penguin unite("Species Island", species:island) %>%
select("Species Island")
## # A tibble: 5 x 1
## `Species Island`
## <chr>
## 1 Adelie_Torgersen
## 2 Adelie_Torgersen
## 3 Adelie_Torgersen
## 4 Adelie_Torgersen
## 5 Adelie_Torgersen
small_penguin unite("Species Island Bill Length", species:bill_length_mm, remove=FALSE) %>%
select("Species Island Bill Length", species, island, bill_length_mm)
## # A tibble: 5 x 4
## `Species Island Bill Length` species island bill_length_mm
## <chr> <fct> <fct> <dbl>
## 1 Adelie_Torgersen_39.1 Adelie Torgersen 39.1
## 2 Adelie_Torgersen_39.5 Adelie Torgersen 39.5
## 3 Adelie_Torgersen_40.3 Adelie Torgersen 40.3
## 4 Adelie_Torgersen_NA Adelie Torgersen NA
## 5 Adelie_Torgersen_36.7 Adelie Torgersen 36.7
Different options for separating values in your new column
#Default, underscore
small_penguin unite("Species_Island", species:island) %>%
## # A tibble: 5 x 1
## Species_Island
## <chr>
## 1 Adelie_Torgersen
## 2 Adelie_Torgersen
## 3 Adelie_Torgersen
## 4 Adelie_Torgersen
## 5 Adelie_Torgersen
#Add a space
small_penguin unite("Species Island", species:island, sep=" ") %>%
select("Species Island")
## # A tibble: 5 x 1
## `Species Island`
## <chr>
## 1 Adelie Torgersen
## 2 Adelie Torgersen
## 3 Adelie Torgersen
## 4 Adelie Torgersen
## 5 Adelie Torgersen
#No space
small_penguin unite("SpeciesIsland", species:island, sep="") %>%
## # A tibble: 5 x 1
## SpeciesIsland
## <chr>
## 1 AdelieTorgersen
## 2 AdelieTorgersen
## 3 AdelieTorgersen
## 4 AdelieTorgersen
## 5 AdelieTorgersen
#Add a comma
small_penguin unite("Species, Island", species:island, sep=", ") %>%
select("Species, Island")
## # A tibble: 5 x 1
## `Species, Island`
## <chr>
## 1 Adelie, Torgersen
## 2 Adelie, Torgersen
## 3 Adelie, Torgersen
## 4 Adelie, Torgersen
## 5 Adelie, Torgersen
#Add a comma and additional text
small_penguin unite("Species, and Island", species:island, sep=", and ", remove=FALSE) %>%
select("Species, and Island", species, island)
## # A tibble: 5 x 3
## `Species, and Island` species island
## <chr> <fct> <fct>
## 1 Adelie, and Torgersen Adelie Torgersen
## 2 Adelie, and Torgersen Adelie Torgersen
## 3 Adelie, and Torgersen Adelie Torgersen
## 4 Adelie, and Torgersen Adelie Torgersen
## 5 Adelie, and Torgersen Adelie Torgersen
Working with missing values
#remove NA values when combining two values
small_penguin_2 unite(small_penguin,
year, bill_depth_mm, sep = ", and ",
na.rm=TRUE) %>%
select(year_bill_depth, bill_depth_mm, year)
## # A tibble: 5 x 3
## year_bill_depth bill_depth_mm year
## <chr> <dbl> <int>
## 1 2007, and 18.7 18.7 2007
## 2 2007, and 17.4 17.4 2007
## 3 2007, and 18 18 2007
## 4 2007 NA 2007
## 5 2007, and 19.3 19.3 2007
The complement, separate()
Conversely, you can separate information in one column into two or more new columns using the separate() function.
Is it helpful?
I think it’s helpful for preparing data and information to go into a report or table. For example, a lot of times in the table 1 of a manuscript, the count is listed with the proportion in the same column, such as ‘## (%)’. Analyses typically separate these values into two separate columns, even in the output, so being able to combine this information prior to the final output saves a lot of time and essentially eliminates the potential for error from doing the combination manually. Formatting, such as parentheses and commas also have to be added in manually, so being able to add these pieces in through the unite() function is immensely helpful.
Source for graphic: https://tidyr.tidyverse.org/↩︎