dplyr::crossing()
Claire Bradbury
Function of the Week: crossing()
Claire Bradbury
2021-02-24
Introduction
In this document, I will introduce the crossing()
function and show what it’s for.
Data set
data("starwars")
kable(head(starwars))
name | height | mass | hair_color | skin_color | eye_color | birth_year | gender | homeworld | species | films | vehicles | starships |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Luke Skywalker | 172 | 77 | blond | fair | blue | 19.0 | male | Tatooine | Human | c(“Revenge of the Sith”, “Return of the Jedi”, “The Empire Strikes Back”, “A New Hope”, “The Force Awakens”) | c(“Snowspeeder”, “Imperial Speeder Bike”) | c(“X-wing”, “Imperial shuttle”) |
C-3PO | 167 | 75 | NA | gold | yellow | 112.0 | NA | Tatooine | Droid | c(“Attack of the Clones”, “The Phantom Menace”, “Revenge of the Sith”, “Return of the Jedi”, “The Empire Strikes Back”, “A New Hope”) | character(0) | character(0) |
R2-D2 | 96 | 32 | NA | white, blue | red | 33.0 | NA | Naboo | Droid | c(“Attack of the Clones”, “The Phantom Menace”, “Revenge of the Sith”, “Return of the Jedi”, “The Empire Strikes Back”, “A New Hope”, “The Force Awakens”) | character(0) | character(0) |
Darth Vader | 202 | 136 | none | white | yellow | 41.9 | male | Tatooine | Human | c(“Revenge of the Sith”, “Return of the Jedi”, “The Empire Strikes Back”, “A New Hope”) | character(0) | TIE Advanced x1 |
Leia Organa | 150 | 49 | brown | light | brown | 19.0 | female | Alderaan | Human | c(“Revenge of the Sith”, “Return of the Jedi”, “The Empire Strikes Back”, “A New Hope”, “The Force Awakens”) | Imperial Speeder Bike | character(0) |
Owen Lars | 178 | 120 | brown, grey | light | blue | 52.0 | male | Tatooine | Human | c(“Attack of the Clones”, “Revenge of the Sith”, “A New Hope”) | character(0) | character(0) |
For my examples I will be using the starwars
data set from the vcd
package. The starwars
data set comes from the Star Wars API website, SWAPI at http://swapi.co/.
What is it for?
Discuss what the function does. Learn from the examples, but show how to use it using another data set such as
penguins
.
This function shows all combinations of variables within a table.
Characters’ Home World and Species
For example, if you wanted to know what species of semi-major character lived on each planet in Star Wars you could use crossing()
on the Star Wars data set.
kable(crossing(starwars[c("homeworld", "species")]))
homeworld | species |
---|---|
Alderaan | Human |
Aleen Minor | Aleena |
Bespin | Human |
Bestine IV | Human |
Cato Neimoidia | Neimodian |
Cerea | Cerean |
Champala | Chagrian |
Chandrila | Human |
Concord Dawn | Human |
Corellia | Human |
Coruscant | Human |
Coruscant | Tholothian |
Dathomir | Zabrak |
Dorin | Kel Dor |
Endor | Ewok |
Eriadu | Human |
Geonosis | Geonosian |
Glee Anselm | Nautolan |
Haruun Kal | Human |
Iktotch | Iktotchi |
Iridonia | Zabrak |
Kalee | Kaleesh |
Kamino | Human |
Kamino | Kaminoan |
Kashyyyk | Wookiee |
Malastare | Dug |
Mirial | Mirialan |
Mon Cala | Mon Calamari |
Muunilinst | Muun |
Naboo | Droid |
Naboo | Gungan |
Naboo | Human |
Naboo | NA |
Nal Hutta | Hutt |
Ojom | Besalisk |
Quermia | Quermian |
Rodia | Rodian |
Ryloth | Twi’lek |
Serenno | Human |
Shili | Togruta |
Skako | Skakoan |
Socorro | Human |
Stewjon | Human |
Sullust | Sullustan |
Tatooine | Droid |
Tatooine | Human |
Toydaria | Toydarian |
Trandosha | Trandoshan |
Troiken | Xexto |
Tund | Toong |
Umbara | NA |
Utapau | Pau’an |
Vulpter | Vulptereen |
Zolan | Clawdite |
NA | Droid |
NA | Human |
NA | Yoda’s species |
NA | NA |
So looking at the table above, if I wanted to know what Star Wars characters had Tatooine as their home world I could see that there are droids and humans call Tatooine home.
Characters’ Species and Home World
Another example, is trying to see the combinations of characters species and home world.
kable(crossing(starwars[c("species", "homeworld")]))
species | homeworld |
---|---|
Aleena | Aleen Minor |
Besalisk | Ojom |
Cerean | Cerea |
Chagrian | Champala |
Clawdite | Zolan |
Droid | Naboo |
Droid | Tatooine |
Droid | NA |
Dug | Malastare |
Ewok | Endor |
Geonosian | Geonosis |
Gungan | Naboo |
Human | Alderaan |
Human | Bespin |
Human | Bestine IV |
Human | Chandrila |
Human | Concord Dawn |
Human | Corellia |
Human | Coruscant |
Human | Eriadu |
Human | Haruun Kal |
Human | Kamino |
Human | Naboo |
Human | Serenno |
Human | Socorro |
Human | Stewjon |
Human | Tatooine |
Human | NA |
Hutt | Nal Hutta |
Iktotchi | Iktotch |
Kaleesh | Kalee |
Kaminoan | Kamino |
Kel Dor | Dorin |
Mirialan | Mirial |
Mon Calamari | Mon Cala |
Muun | Muunilinst |
Nautolan | Glee Anselm |
Neimodian | Cato Neimoidia |
Pau’an | Utapau |
Quermian | Quermia |
Rodian | Rodia |
Skakoan | Skako |
Sullustan | Sullust |
Tholothian | Coruscant |
Togruta | Shili |
Toong | Tund |
Toydarian | Toydaria |
Trandoshan | Trandosha |
Twi’lek | Ryloth |
Vulptereen | Vulpter |
Wookiee | Kashyyyk |
Xexto | Troiken |
Yoda’s species | NA |
Zabrak | Dathomir |
Zabrak | Iridonia |
NA | Naboo |
NA | Umbara |
NA | NA |
If you notice, the returned table is ordered alphabetically by the first column, this makes it easy to search. So I am looking for other inhabitants of Tatooine such as, Jawas, Tuskan Raiders, and Hutts. The table above shows no entry for Jawas or Tuskan Raiders. There is at least one character that is of the Hutt species, but they call Nal Hutta their home world.
Gender, Skin Color, and Species Representation
If I want to see the representation of characters by gender, skin color, and species I could use crossing()
for preliminary analysis.
kable(crossing(starwars[c("species", "skin_color", "gender")]))
species | skin_color | gender |
---|---|---|
Aleena | grey, blue | male |
Besalisk | brown | male |
Cerean | pale | male |
Chagrian | blue | male |
Clawdite | fair, green, yellow | female |
Droid | gold | NA |
Droid | metal | none |
Droid | none | none |
Droid | white, blue | NA |
Droid | white, red | NA |
Dug | grey, red | male |
Ewok | brown | male |
Geonosian | green | male |
Gungan | green | male |
Gungan | grey | male |
Gungan | orange | male |
Human | dark | male |
Human | fair | female |
Human | fair | male |
Human | light | female |
Human | light | male |
Human | pale | male |
Human | tan | male |
Human | white | male |
Hutt | green-tan, brown | hermaphrodite |
Iktotchi | pale | male |
Kaleesh | brown, white | male |
Kaminoan | grey | female |
Kaminoan | grey | male |
Kel Dor | orange | male |
Mirialan | yellow | female |
Mon Calamari | brown mottle | male |
Muun | grey | male |
Nautolan | green | male |
Neimodian | mottled green | male |
Pau’an | grey | male |
Quermian | white | male |
Rodian | green | male |
Skakoan | green, grey | male |
Sullustan | grey | male |
Tholothian | dark | female |
Togruta | red, blue, white | female |
Toong | grey, green, yellow | male |
Toydarian | blue, grey | male |
Trandoshan | green | male |
Twi’lek | blue | female |
Twi’lek | pale | male |
Vulptereen | blue, grey | male |
Wookiee | brown | male |
Wookiee | unknown | male |
Xexto | white, blue | male |
Yoda’s species | green | male |
Zabrak | brown | male |
Zabrak | red | male |
NA | dark | male |
NA | fair | male |
NA | pale | female |
NA | silver, red | female |
NA | unknown | female |
From the table above, all that can be surmised is that there are many characters that have a skin color and species where there is only a prominent male character. Such as for humans, female characters only have the skin colors of fair or light, while male characters have dark, fair, light, pale, tan, and white skin colors.
Is it helpful?
Discuss whether you think this function is useful for you and your work. Is it the best thing since sliced bread, or is it not really relevant to your work?
Benefits of crossing()
I think the function can be useful if you want to see a combination of a few columns of data, such as I did for the Star Wars data set above. In this case, crossing()
is most useful for searching for an entry of data or a unique combination.
crossing()
can be used to preliminary analyse data and in conjunction of other models to determine what combinations of variables exist in the data set and/or are missing from the data set.
Downside of crossing()
However, trying to see if there is enough gender and skin color representation by species in Star Wars, crossing()
is not the most beneficial. For this purpose the group_by()
and summarize()
functions would be preferable.
kable(starwars %>% group_by(species, skin_color, gender) %>% summarise(count = n()))
species | skin_color | gender | count |
---|---|---|---|
Aleena | grey, blue | male | 1 |
Besalisk | brown | male | 1 |
Cerean | pale | male | 1 |
Chagrian | blue | male | 1 |
Clawdite | fair, green, yellow | female | 1 |
Droid | gold | NA | 1 |
Droid | metal | none | 1 |
Droid | none | none | 1 |
Droid | white, blue | NA | 1 |
Droid | white, red | NA | 1 |
Dug | grey, red | male | 1 |
Ewok | brown | male | 1 |
Geonosian | green | male | 1 |
Gungan | green | male | 1 |
Gungan | grey | male | 1 |
Gungan | orange | male | 1 |
Human | dark | male | 4 |
Human | fair | female | 3 |
Human | fair | male | 13 |
Human | light | female | 6 |
Human | light | male | 5 |
Human | pale | male | 1 |
Human | tan | male | 2 |
Human | white | male | 1 |
Hutt | green-tan, brown | hermaphrodite | 1 |
Iktotchi | pale | male | 1 |
Kaleesh | brown, white | male | 1 |
Kaminoan | grey | female | 1 |
Kaminoan | grey | male | 1 |
Kel Dor | orange | male | 1 |
Mirialan | yellow | female | 2 |
Mon Calamari | brown mottle | male | 1 |
Muun | grey | male | 1 |
Nautolan | green | male | 1 |
Neimodian | mottled green | male | 1 |
Pau’an | grey | male | 1 |
Quermian | white | male | 1 |
Rodian | green | male | 1 |
Skakoan | green, grey | male | 1 |
Sullustan | grey | male | 1 |
Tholothian | dark | female | 1 |
Togruta | red, blue, white | female | 1 |
Toong | grey, green, yellow | male | 1 |
Toydarian | blue, grey | male | 1 |
Trandoshan | green | male | 1 |
Twi’lek | blue | female | 1 |
Twi’lek | pale | male | 1 |
Vulptereen | blue, grey | male | 1 |
Wookiee | brown | male | 1 |
Wookiee | unknown | male | 1 |
Xexto | white, blue | male | 1 |
Yoda’s species | green | male | 1 |
Zabrak | brown | male | 1 |
Zabrak | red | male | 1 |
NA | dark | male | 1 |
NA | fair | male | 1 |
NA | pale | female | 1 |
NA | silver, red | female | 1 |
NA | unknown | female | 1 |
Using the group_by()
and summarise()
functions shows that there are quite a bit more male characters for each species than there are female characters. If the goal of analyzing the data is for statistical applications, it would likely be more useful to have numerical values attached to the grouped variables.