fct::fct_infreq()

Function of the Week: fct_infreq

fct_infreq

In this document, I will introduce the fct_infreq function in the forcats package and show what it’s for.

#load tidyverse up
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.4     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
#example dataset
library(palmerpenguins)
data(penguins)

What is it for?

This function simply reorders factors by their frequency in the data.

#Species is a character vector so it needs to be converted to a factor. 

penguins %>% 
  mutate(species = factor(species, levels = c("Adelie","Gentoo", "Chinstrap"))) 
## # A tibble: 344 x 8
##    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##    <fct>   <fct>           <dbl>         <dbl>            <int>       <int>
##  1 Adelie  Torge…           39.1          18.7              181        3750
##  2 Adelie  Torge…           39.5          17.4              186        3800
##  3 Adelie  Torge…           40.3          18                195        3250
##  4 Adelie  Torge…           NA            NA                 NA          NA
##  5 Adelie  Torge…           36.7          19.3              193        3450
##  6 Adelie  Torge…           39.3          20.6              190        3650
##  7 Adelie  Torge…           38.9          17.8              181        3625
##  8 Adelie  Torge…           39.2          19.6              195        4675
##  9 Adelie  Torge…           34.1          18.1              193        3475
## 10 Adelie  Torge…           42            20.2              190        4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
#confirm that it is now a factor

class(penguins$species)
## [1] "factor"
#before it is ordered the tabyl shows up with them in the order that they appear in the data 

tabyl(penguins$species)
##  penguins$species   n   percent
##            Adelie 152 0.4418605
##         Chinstrap  68 0.1976744
##            Gentoo 124 0.3604651
tabyl(fct_infreq(penguins$species))
##  fct_infreq(penguins$species)   n   percent
##                        Adelie 152 0.4418605
##                        Gentoo 124 0.3604651
##                     Chinstrap  68 0.1976744
#you can also use it within ggplot to order your factors

#here's the graph without

ggplot(penguins, aes(x = species)) + 
  geom_bar() + 
  coord_flip()

#Here's the graph with it being ordered by frequency
ggplot(penguins, aes(x = fct_infreq(species))) + 
  geom_bar() + 
  coord_flip()

Is it helpful?

This is a very practical tool for being able to look at and present your data in a logical fashion. It’s probably most useful when there are a larger number of factors than 3 but it provided a good example to use a small number of factors. I will surely use this frequently.