fct::fct_infreq()
Function of the Week: fct_infreq
Stephanie Elliott
2021-02-24
fct_infreq
In this document, I will introduce the fct_infreq function in the forcats package and show what it’s for.
#load tidyverse up
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
#example dataset
library(palmerpenguins)
data(penguins)
What is it for?
This function simply reorders factors by their frequency in the data.
#Species is a character vector so it needs to be converted to a factor.
penguins %>%
mutate(species = factor(species, levels = c("Adelie","Gentoo", "Chinstrap")))
## # A tibble: 344 x 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torge… 39.1 18.7 181 3750
## 2 Adelie Torge… 39.5 17.4 186 3800
## 3 Adelie Torge… 40.3 18 195 3250
## 4 Adelie Torge… NA NA NA NA
## 5 Adelie Torge… 36.7 19.3 193 3450
## 6 Adelie Torge… 39.3 20.6 190 3650
## 7 Adelie Torge… 38.9 17.8 181 3625
## 8 Adelie Torge… 39.2 19.6 195 4675
## 9 Adelie Torge… 34.1 18.1 193 3475
## 10 Adelie Torge… 42 20.2 190 4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
#confirm that it is now a factor
class(penguins$species)
## [1] "factor"
#before it is ordered the tabyl shows up with them in the order that they appear in the data
tabyl(penguins$species)
## penguins$species n percent
## Adelie 152 0.4418605
## Chinstrap 68 0.1976744
## Gentoo 124 0.3604651
tabyl(fct_infreq(penguins$species))
## fct_infreq(penguins$species) n percent
## Adelie 152 0.4418605
## Gentoo 124 0.3604651
## Chinstrap 68 0.1976744
#you can also use it within ggplot to order your factors
#here's the graph without
ggplot(penguins, aes(x = species)) +
geom_bar() +
coord_flip()
#Here's the graph with it being ordered by frequency
ggplot(penguins, aes(x = fct_infreq(species))) +
geom_bar() +
coord_flip()
Is it helpful?
This is a very practical tool for being able to look at and present your data in a logical fashion. It’s probably most useful when there are a larger number of factors than 3 but it provided a good example to use a small number of factors. I will surely use this frequently.