dplyr::slice_max()

Bradley Hopkins

In this document, I will introduce the slice_max() function and show what it’s for.

#load tidyverse up
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'tibble' was built under R version 4.0.3
## Warning: package 'tidyr' was built under R version 4.0.3
## Warning: package 'readr' was built under R version 4.0.3
## Warning: package 'purrr' was built under R version 4.0.3
## Warning: package 'dplyr' was built under R version 4.0.3
## Warning: package 'stringr' was built under R version 4.0.3
## Warning: package 'forcats' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)

What is it for?

slice_max() is a sub-function of slice(), the function we discussed last week. Like slice(), slice_max() also makes it easier to visualize a data frame by providing a look at what is contained inside. What makes slice_max() unique is that it allows us to specify a variable we want to sort by and then returns the maximum values for that variable in the number of rows we specify. The difference between slice_() and slice_max() is demonstrated below.

mtcars %>% slice(n = 1:5)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
mtcars %>% slice_max(order_by = mpg, n = 5)
##                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat X1-9      27.3   4 79.0  66 4.08 1.935 18.90  1  1    4    1

Take note of how the output for slice_max() has returned the rows of mtcars with the highest values for the variable mpg in descending order.

Is it helpful?

This function is similar to its parent function slice(), but fills a slightly different niche. As with slice(), slice_max() is most useful because it can be used with %>%. This makes it slightly easier to pare down and organize a set of data compared to entering a [row, column] reference with the data frame. This function can also be combined with other methods of sorting data to produce more interesting results. In the example below, we filter mtcars down to only 6-cylinder cars first and then pipe into slice_max() to produce a list of 6-cylinder cars with the highest mpg:

mtcars %>% filter(cyl == 6) %>% slice_max(order_by = mpg, n = 5)
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

This function is useful for visualizing data, providing users with a tool that can help users better understand the hierarchy of certain variables. It may even provide some utility in organizing and sorting data if used as part of a more complex pipeline of functions.