class: left, middle, inverse, title-slide # Data Visualization with R and ggplot2 ### Jessica Minnier, PhD & Meike Niederhausen, PhD
OCTRI Biostatistics, Epidemiology, Research & Design (BERD) Workshop
###
2020/03/04 & 2020/05/20
slides:
bit.ly/berd_ggplot
pdf:
bit.ly/berd_ggplot_pdf
--- layout: true <!-- <div class="my-footer"><span>bit.ly/berd_tidy</span></div> --> <!-- <div class="my-footer"><a href="#visualtoc"> Visual TOC</span></div> --> --- # Load files for today's workshop .pull-left[ 1. Open slides [bit.ly/berd_ggplot](http://bit.ly/berd_ggplot) 1. Get project folder (detailed instructions: [bit.ly/berd_ggplot_instructions](https://bit.ly/berd_ggplot_instructions)) + Download zip folder at [bit.ly/berd_ggplot_zip](http://bit.ly/berd_ggplot_zip) + UNZIP completely (right click-> "extract all") + Open unzipped folder + Open (double click) `berd_ggplot_project.Rproj` + Inside RStudio 'Files' tab: click on file `00-install.R` and click "Run" to run all lines of code. 1. Open google doc for asking questions: [https://bit.ly/berd_doc](bit.ly/berd_doc) ] .pull-right[ <center><img src="img/horst_ggplot2_masterpiece.png" width="100%" height="100%"><a href="https://github.com/allisonhorst/stats-illustrations"><br>Allison Horst</a></center> ] --- # Learning objectives .pull-left[ - Understand the basic idea behind grammar of graphics - Be able to data to visual elements - Be able to customize plots in various ways - Use ggplot extensions to make even more plots! ] .pull-right[ <center><img src="img/ggplot-grammar-of-graphics-stack.png" width="100%" height="100%"><a href="https://www.sites.univ-rennes2.fr/mastersigat/Cours/Atelier%20Visualisation%20de%20donn%C3%A9es%20CERGY.pdf"></a></center> ] --- class: center, middle, inverse <center><img src="img/ggplotlogo.png" width="40%" height="40%"><a href="https://ggplot2.tidyverse.org/index.html"></a></center> --- # Grammar of Graphics - The "The Grammar of Graphics," is the theoretical basis for the ggplot2 package. + Much like how we construct sentences in any language by using a linguistic grammar (nouns, verbs, etc.), the grammar of graphics allows us to specify the components of a statistical graphic. In short, the grammar tells us that: >A statistical graphic is a mapping of data variables to aesthetic attributes of geometric objects. 3 **essential** components to a graphic: - data: the data-set comprised of variables that we plot - geom: this refers to our type of geometric objects we see in our plot (points, lines, bars, etc.) - aes: aesthetic attributes of the geometric object that we can perceive on a graphic. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data-set. --- # Grammar of ggplot2 <center><img src="img/khealy_ggplot1.png" width="100%" height="100%"><a href="https://github.com/rstudio-conf-2020/dataviz"><br>Kieran Healy</a></center> --- # ggplot basics <center><img src="img/ggplot_basics_from_ppt.png" width="90%" height="100%"></center> --- class: center, middle, inverse # Tidy Data <img src="img/horst_tidyverse.jpg" width="70%" height="70%"> [Allison Horst](https://github.com/allisonhorst/stats-illustrations) --- ## Ggplot needs tidy data What are __tidy__ data? 1. Each variable forms a column 2. Each observation forms a row 3. Each value has its own cell ![](img/r4ds_tidy_data.png) [G. Grolemond & H. Wickham's R for Data Science](https://r4ds.had.co.nz/tidy-data.html) See BERD workshop [Data Wrangling Part 1](https://jminnier-berd-r-courses.netlify.com/02-data-wrangling-tidyverse/02_data_wrangling_slides_part1.html#14) slides for more info. --- ## Gapminder data .pull-left[ [Gapminder](https://www.gapminder.org) is a foundation "fact tank" that collects reliable global statistics on many different measures, such as average life expectancy, population size, food supply, water source, etc. for individual countries <img src="img/gapminder_screenshot.png" width="90%" height="100%"> [Gapminder](https://www.gapminder.org/tools/#$chart-type=bubbles) ] .pull-right[ * `Gapminder_vars_2011.csv` contains select measures restricted to the year 2011. * `Gapminder_vars_2011_long.csv` is the same data as in `Gapminder_vars_2011.csv`, but in a _long_ format * Instead of individual columns for `CO2emissions`, `ElectricityUsePP`, ... `WaterSourcePrct`, * there is a column called `Measures` which contains these variables names and * a column called `Values` with the actual values for these measures. * This means the dataset contains multiple rows per country to account for each of these measures. ] --- ## A look at the `Gapminder_vars_2011.csv` dataset ```r gapminder2011 <- read_csv("data/Gapminder_vars_2011.csv") glimpse(gapminder2011) ``` ``` Rows: 195 Columns: 19 $ country <chr> "Afghanistan", "Albania", "Algeria", … $ CO2emissions <dbl> 0.412, 1.790, 3.290, 5.870, 1.250, 5.… $ ElectricityUsePP <dbl> NA, 2210.0, 1120.0, NA, 207.0, NA, 29… $ FoodSupplykcPPD <dbl> 2110, 3130, 3220, NA, 2410, 2370, 316… $ IncomePP <dbl> 1660, 10200, 13000, 42000, 5910, 1860… $ LifeExpectancyYrs <dbl> 56.7, 76.7, 76.7, 82.6, 60.9, 76.9, 7… $ FemaleLiteracyRate <dbl> 13.0, 95.7, NA, NA, 58.6, 99.4, 97.9,… $ population <dbl> 2.97e+07, 2.93e+06, 3.68e+07, 8.38e+0… $ WaterSourcePrct <dbl> 52.6, 88.1, 92.6, 100.0, 40.3, 97.0, … $ WaterSourcePrct_2011_quart <chr> "Q1", "Q2", "Q2", "Q4", "Q1", "Q3", "… $ geo <chr> "afg", "alb", "dza", "and", "ago", "a… $ four_regions <chr> "asia", "europe", "africa", "europe",… $ eight_regions <chr> "asia_west", "europe_east", "africa_n… $ six_regions <chr> "south_asia", "europe_central_asia", … $ members_oecd_g77 <chr> "g77", "others", "g77", "others", "g7… $ latitude <dbl> 33.00000, 41.00000, 28.00000, 42.5077… $ longitude <dbl> 66.00000, 20.00000, 3.00000, 1.52109,… $ world_bank_region <chr> "South Asia", "Europe & Central Asia"… $ world_bank_4_income_groups_2017 <chr> "Low income", "Upper middle income", … ``` --- name:visualtoc # Visual Table of Contents <a href="#barplot"><img src="figs/barplot_regions_out-1.png"width="150" height="150" title=figs/barplot_regions_out-1.png alt=figs/barplot_regions_out-1.png></a><a href="#histogram"><img src="figs/hist_LifeExp_out-1.png"width="150" height="150" title=figs/hist_LifeExp_out-1.png alt=figs/hist_LifeExp_out-1.png></a><a href="#density"><img src="figs/density_LifeExp_out-1.png"width="150" height="150" title=figs/density_LifeExp_out-1.png alt=figs/density_LifeExp_out-1.png></a><a href="#ridgeline"><img src="figs/ridges_LifeExp_out-1.png"width="150" height="150" title=figs/ridges_LifeExp_out-1.png alt=figs/ridges_LifeExp_out-1.png></a><a href="#boxplot"><img src="figs/boxplot_LifeExp_out-1.png"width="150" height="150" title=figs/boxplot_LifeExp_out-1.png alt=figs/boxplot_LifeExp_out-1.png></a><a href="#scatterplot"><img src="figs/scatter_FoodvsLifeExp_out-1.png"width="150" height="150" title=figs/scatter_FoodvsLifeExp_out-1.png alt=figs/scatter_FoodvsLifeExp_out-1.png></a><a href="#bubbleplot"><img src="figs/bubble_FemLitvsLifeExp_out-1.png"width="150" height="150" title=figs/bubble_FemLitvsLifeExp_out-1.png alt=figs/bubble_FemLitvsLifeExp_out-1.png></a><a href="#lineplot"><img src="figs/lineplot_YearLifeExp_out-1.png"width="150" height="150" title=figs/lineplot_YearLifeExp_out-1.png alt=figs/lineplot_YearLifeExp_out-1.png></a><a href="#ggmarginal"><img src="figs/margins_FoodvsLifeExp_out-1.png"width="150" height="150" title=figs/margins_FoodvsLifeExp_out-1.png alt=figs/margins_FoodvsLifeExp_out-1.png></a><a href="#correlation"><img src="figs/corrplotmix-1.png"width="150" height="150" title=figs/corrplotmix-1.png alt=figs/corrplotmix-1.png></a><a href="#facetdensity"><img src="figs/facet_density_all_out-1.png"width="150" height="150" title=figs/facet_density_all_out-1.png alt=figs/facet_density_all_out-1.png></a><a href="#facethist"><img src="figs/facet2x_density_all_out-1.png"width="150" height="150" title=figs/facet2x_density_all_out-1.png alt=figs/facet2x_density_all_out-1.png></a><a href="#volcano"><img src="figs/volcanoplot_out-1.png"width="150" height="150" title=figs/volcanoplot_out-1.png alt=figs/volcanoplot_out-1.png></a><a href="#heatmap"><img src="figs/heatmap_out-1.png"width="150" height="150" title=figs/heatmap_out-1.png alt=figs/heatmap_out-1.png></a><a href="#sidebyside"><img src="figs/ggpubr_out-1.png"width="150" height="150" title=figs/ggpubr_out-1.png alt=figs/ggpubr_out-1.png></a> Inspired by [EvaMaeRey (Gina Reynolds)](https://github.com/EvaMaeRey), author of the amazing [`flipbookr`](https://github.com/EvaMaeRey/flipbookr) package. --- name:barplot # Barplot .pull-left[ ```r ggplot(data = gapminder2011, aes(x = four_regions, fill = eight_regions)) + geom_bar() + labs(x = "World Regions", y = "Number of countries", title = "Barplot") + theme_bw() + theme( axis.text.x = element_text(angle = -30, hjust = 0), text = element_text(family = "Palatino")) + scale_fill_viridis_d(name = "Subregions") ``` See the last section of [ggplot1 Aesthetic specifications](https://ggplot2.tidyverse.org/articles/ggplot2-specs.html) for an explanation of `hjust`. <center><img src="img/hjust_vjust.png" width="50%" height="100%"></center> ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder2011) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + * aes(x = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = four_regions) + * geom_bar() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = four_regions) + geom_bar() + * aes(fill = eight_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = four_regions) + geom_bar() + aes(fill = eight_regions) + * scale_fill_discrete( * name = "Subregions" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = four_regions) + geom_bar() + aes(fill = eight_regions) + scale_fill_discrete( name = "Subregions" ) + * labs(x = "World Regions", * y = "Number of countries", * title = "Barplot") ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = four_regions) + geom_bar() + aes(fill = eight_regions) + scale_fill_discrete( name = "Subregions" ) + labs(x = "World Regions", y = "Number of countries", title = "Barplot") + * theme_bw() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = four_regions) + geom_bar() + aes(fill = eight_regions) + scale_fill_discrete( name = "Subregions" ) + labs(x = "World Regions", y = "Number of countries", title = "Barplot") + theme_bw() + * theme(axis.text.x=element_text( * angle = -30, hjust = 0)) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_8_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = four_regions) + geom_bar() + aes(fill = eight_regions) + scale_fill_discrete( name = "Subregions" ) + labs(x = "World Regions", y = "Number of countries", title = "Barplot") + theme_bw() + theme(axis.text.x=element_text( angle = -30, hjust = 0)) + * scale_fill_viridis_d(name = "Subregions") ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_9_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = four_regions) + geom_bar() + aes(fill = eight_regions) + scale_fill_discrete( name = "Subregions" ) + labs(x = "World Regions", y = "Number of countries", title = "Barplot") + theme_bw() + theme(axis.text.x=element_text( angle = -30, hjust = 0)) + scale_fill_viridis_d(name = "Subregions") + * theme( * text = element_text(family = "Palatino")) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/barplot_regions_auto_10_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- name:histogram # Histogram .pull-left[ ```r ggplot(data = gapminder2011, aes(x = LifeExpectancyYrs, fill = four_regions) ) + geom_histogram() + scale_fill_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + labs( x = "Life Expectancy (years)", title = "Histogram" ) + ggthemes::theme_economist() + theme(legend.position="bottom") ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder2011) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + * aes(x = LifeExpectancyYrs) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + * geom_histogram() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_histogram() + * aes(fill = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_histogram() + aes(fill = four_regions) + * scale_fill_discrete( * name = "Regions", * labels = c("Africa", "Americas", * "Asia", "Europe") * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_histogram() + aes(fill = four_regions) + scale_fill_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + * labs( * x = "Life Expectancy (years)", * title = "Histogram" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_histogram() + aes(fill = four_regions) + scale_fill_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + labs( x = "Life Expectancy (years)", title = "Histogram" ) + * ggthemes::theme_economist() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_histogram() + aes(fill = four_regions) + scale_fill_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + labs( x = "Life Expectancy (years)", title = "Histogram" ) + ggthemes::theme_economist() + * theme(legend.position="bottom") ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/hist_LifeExp_auto_8_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- # Legend position * "Generic" positions * `legend.position = "left"` * Other options: "top", "right", "bottom", "none" * Specificied by location * `legend.position = c(x,y)` * Specify x and y coordinates of position * Values should be between 0 and 1 * __c(0,0)__ corresponds to the __bottom left__ * __c(1,1)__ corresponds to the __top right__ --- name:density # Density Plot .pull-left[ ```r ggplot(data = gapminder2011, aes(x = LifeExpectancyYrs, fill = four_regions) ) + geom_density(alpha = 0.4) + scale_fill_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + labs( x = "Life Expectancy (years)", title = "Density Plot" ) + hrbrthemes::theme_ipsum() + theme(legend.position=c(.2,.8)) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder2011) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + * aes(x = LifeExpectancyYrs) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + * geom_density() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_density() + * aes(fill = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_density() + aes(fill = four_regions) + * aes(alpha=.4) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_density() + aes(fill = four_regions) + aes(alpha=.4) + * scale_fill_discrete( * name = "Regions", * labels = c("Africa", "Americas", * "Asia", "Europe") * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_density() + aes(fill = four_regions) + aes(alpha=.4) + scale_fill_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + * hrbrthemes::theme_ipsum() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_density() + aes(fill = four_regions) + aes(alpha=.4) + scale_fill_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + hrbrthemes::theme_ipsum() + * theme(legend.position=c(.2,.8)) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_8_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + geom_density() + aes(fill = four_regions) + aes(alpha=.4) + scale_fill_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + hrbrthemes::theme_ipsum() + theme(legend.position=c(.2,.8)) + * labs( * x = "Life Expectancy (years)", * title = "Density Plot" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/density_LifeExp_auto_9_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- name:ridgeline # Ridgeline Plot .pull-left[ ```r library(ggridges) ggplot(data = gapminder2011, aes(x = LifeExpectancyYrs, y = four_regions, fill = four_regions) ) + geom_density_ridges(alpha = 0.4) + ggthemes::theme_clean() + theme(legend.position="none") + labs( x = "Life Expectancy (years)", y = "Regions", title = "Ridgeline Density Plot" ) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *library(ggridges) ``` ]] .column[.content[ ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) *ggplot(data = gapminder2011) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) ggplot(data = gapminder2011) + * aes(x = LifeExpectancyYrs) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + * aes(y = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + aes(y = four_regions) + * geom_density_ridges() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + aes(y = four_regions) + geom_density_ridges() + * aes(fill = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + aes(y = four_regions) + geom_density_ridges() + aes(fill = four_regions) + * aes(alpha = 0.4) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + aes(y = four_regions) + geom_density_ridges() + aes(fill = four_regions) + aes(alpha = 0.4) + * ggthemes::theme_clean() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_8_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + aes(y = four_regions) + geom_density_ridges() + aes(fill = four_regions) + aes(alpha = 0.4) + ggthemes::theme_clean() + * theme(legend.position = "none") ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_9_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r library(ggridges) ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + aes(y = four_regions) + geom_density_ridges() + aes(fill = four_regions) + aes(alpha = 0.4) + ggthemes::theme_clean() + theme(legend.position = "none") + * labs( * x = "Life Expectancy (years)", * y = "Regions", * title = "Ridgeline Density Plot" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/ridges_LifeExp_auto_10_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- name:boxplot # Boxplot .pull-left[ ```r ggplot(data = gapminder2011, aes(x = LifeExpectancyYrs, # New! y = four_regions, fill = four_regions) ) + geom_boxplot(alpha = 0.3) + # add outlier.shape = NA # coord_flip() + theme_fivethirtyeight() + theme(axis.title = element_text()) + scale_fill_fivethirtyeight() + theme(legend.position = "none") + geom_jitter(width = .1, alpha = 0.3) + geom_violin(colour = "grey", alpha = .2) + labs( x = "World Region", y = "Life Expectancy (years)", title = "Boxplot" ) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder2011) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + * aes(x = LifeExpectancyYrs) # New! ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! * geom_boxplot(alpha=.3) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + * aes(y = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + aes(y = four_regions) + * aes(fill = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + aes(y = four_regions) + aes(fill = four_regions) + * theme_fivethirtyeight() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + aes(y = four_regions) + aes(fill = four_regions) + theme_fivethirtyeight() + * scale_fill_fivethirtyeight() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + aes(y = four_regions) + aes(fill = four_regions) + theme_fivethirtyeight() + scale_fill_fivethirtyeight() + * theme(axis.title = element_text()) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_8_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + aes(y = four_regions) + aes(fill = four_regions) + theme_fivethirtyeight() + scale_fill_fivethirtyeight() + theme(axis.title = element_text()) + * theme(legend.position = "none") ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_9_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + aes(y = four_regions) + aes(fill = four_regions) + theme_fivethirtyeight() + scale_fill_fivethirtyeight() + theme(axis.title = element_text()) + theme(legend.position = "none") + * geom_jitter( * width = .1, * alpha = 0.3 * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_10_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + aes(y = four_regions) + aes(fill = four_regions) + theme_fivethirtyeight() + scale_fill_fivethirtyeight() + theme(axis.title = element_text()) + theme(legend.position = "none") + geom_jitter( width = .1, alpha = 0.3 ) + * geom_violin( * colour = "grey", * alpha = .2 * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_11_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011) + aes(x = LifeExpectancyYrs) + # New! geom_boxplot(alpha=.3) + aes(y = four_regions) + aes(fill = four_regions) + theme_fivethirtyeight() + scale_fill_fivethirtyeight() + theme(axis.title = element_text()) + theme(legend.position = "none") + geom_jitter( width = .1, alpha = 0.3 ) + geom_violin( colour = "grey", alpha = .2 ) + * labs( * x = "World Region", * y = "Life Expectancy (years)", * title = "Boxplot" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/boxplot_LifeExp_auto_12_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- # Exercise Complete the third section of the `practice_ggplot.Rmd` file: "Bar plot". --- name:scatterplot # Scatterplot .pull-left[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions) ) + geom_point(alpha = 0.4) + geom_smooth(se = FALSE)+ geom_smooth(method = lm) + theme_minimal() + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + labs( x = "Daily Food Supply Per Person (kc)", y = "Life Expectancy (years)", title = "Scatterplot" ) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder2011, * aes(x = FoodSupplykcPPD, * y = LifeExpectancyYrs) * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs) ) + * geom_point(alpha=.4) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs) ) + geom_point(alpha=.4) + * aes(color = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs) ) + geom_point(alpha=.4) + aes(color = four_regions) + * scale_color_colorblind( * name = "Regions", * labels = c("Africa", "Americas", * "Asia", "Europe") * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs) ) + geom_point(alpha=.4) + aes(color = four_regions) + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + * geom_smooth(se = FALSE) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs) ) + geom_point(alpha=.4) + aes(color = four_regions) + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + geom_smooth(se = FALSE) + * geom_smooth(method = lm) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs) ) + geom_point(alpha=.4) + aes(color = four_regions) + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + geom_smooth(se = FALSE) + geom_smooth(method = lm) + * theme_minimal() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs) ) + geom_point(alpha=.4) + aes(color = four_regions) + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + geom_smooth(se = FALSE) + geom_smooth(method = lm) + theme_minimal() + * labs( * x = "Daily Food Supply Per Person (kc)", * y = "Life Expectancy (years)", * title = "Scatterplot" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/scatter_FoodvsLifeExp_auto_8_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- name:bubbleplot # Bubbleplot .pull-left[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions, size = population) ) + geom_point(alpha = 0.4) + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + scale_size( name = "Population Size (millions)", breaks = c(1e08,5e08,1e09), labels = c(100,500,1000) ) + hrbrthemes::theme_ipsum() + labs( x = "Daily Food Supply PP (kc)", y = "Life Expectancy (years)", title = "Bubbleplot" ) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/bubble_FemLitvsLifeExp_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder2011, * aes(x = FoodSupplykcPPD, * y = LifeExpectancyYrs, * color = four_regions) * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/bubble_FemLitvsLifeExp_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions) ) + * geom_point(alpha=.4) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/bubble_FemLitvsLifeExp_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions) ) + geom_point(alpha=.4) + * aes(size = population) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/bubble_FemLitvsLifeExp_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions) ) + geom_point(alpha=.4) + aes(size = population) + * scale_color_colorblind( * name = "Regions", * labels = c("Africa", "Americas", * "Asia", "Europe") * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/bubble_FemLitvsLifeExp_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions) ) + geom_point(alpha=.4) + aes(size = population) + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + * scale_size( * name = "Population Size (millions)", * breaks = c(1e08,5e08,1e09), * labels = c(100,500,1000) * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/bubble_FemLitvsLifeExp_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions) ) + geom_point(alpha=.4) + aes(size = population) + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + scale_size( name = "Population Size (millions)", breaks = c(1e08,5e08,1e09), labels = c(100,500,1000) ) + * hrbrthemes::theme_ipsum() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/bubble_FemLitvsLifeExp_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions) ) + geom_point(alpha=.4) + aes(size = population) + scale_color_colorblind( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + scale_size( name = "Population Size (millions)", breaks = c(1e08,5e08,1e09), labels = c(100,500,1000) ) + hrbrthemes::theme_ipsum() + * labs( * x = "Daily Food Supply PP (kc)", * y = "Life Expectancy (years)", * title = "Bubbleplot" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/bubble_FemLitvsLifeExp_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- # Exercise Complete the fourth section of the `practice_ggplot.Rmd` file: "Bubbleplot" --- name:lineplot # Lineplot .pull-left[ For the Lineplot example we are using the `gapminder::gapminder` dataset since it has longitudinal data across many years for life expectancy. ```r library(gapminder) ggplot(data = gapminder, aes(x = year, y = lifeExp, color = continent, group = country) ) + geom_point(alpha = 0.4) + geom_line(alpha = 0.7) + scale_color_colorblind(name = "Continents") + ggthemes::theme_clean() + labs( x = "Year", y = "Life Expectancy (years)", title = "Lineplot", subtitle = "Time series", caption = "Source: gapminder package" ) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/lineplot_YearLifeExp_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- ## R package `gapminder` has a dataset called `gapminder` For the Lineplot example we are using the `gapminder::gapminder` dataset since it has longitudinal data across many years for life expectancy. ```r library(gapminder) head(gapminder,15) ``` ``` # A tibble: 15 x 6 country continent year lifeExp pop gdpPercap <fct> <fct> <int> <dbl> <int> <dbl> 1 Afghanistan Asia 1952 28.8 8425333 779. 2 Afghanistan Asia 1957 30.3 9240934 821. 3 Afghanistan Asia 1962 32.0 10267083 853. 4 Afghanistan Asia 1967 34.0 11537966 836. 5 Afghanistan Asia 1972 36.1 13079460 740. 6 Afghanistan Asia 1977 38.4 14880372 786. 7 Afghanistan Asia 1982 39.9 12881816 978. 8 Afghanistan Asia 1987 40.8 13867957 852. 9 Afghanistan Asia 1992 41.7 16317921 649. 10 Afghanistan Asia 1997 41.8 22227415 635. 11 Afghanistan Asia 2002 42.1 25268405 727. 12 Afghanistan Asia 2007 43.8 31889923 975. 13 Albania Europe 1952 55.2 1282697 1601. 14 Albania Europe 1957 59.3 1476505 1942. 15 Albania Europe 1962 64.8 1728137 2313. ``` --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder, * aes(x = year, * y = lifeExp, * color = continent) * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/lineplot_YearLifeExp_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder, aes(x = year, y = lifeExp, color = continent) ) + * geom_point(alpha = .4) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/lineplot_YearLifeExp_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder, aes(x = year, y = lifeExp, color = continent) ) + geom_point(alpha = .4) + * geom_line(alpha = .7) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/lineplot_YearLifeExp_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder, aes(x = year, y = lifeExp, color = continent) ) + geom_point(alpha = .4) + geom_line(alpha = .7) + * aes(group = country) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/lineplot_YearLifeExp_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder, aes(x = year, y = lifeExp, color = continent) ) + geom_point(alpha = .4) + geom_line(alpha = .7) + aes(group = country) + * scale_color_colorblind( * name = "Continents" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/lineplot_YearLifeExp_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder, aes(x = year, y = lifeExp, color = continent) ) + geom_point(alpha = .4) + geom_line(alpha = .7) + aes(group = country) + scale_color_colorblind( name = "Continents" ) + * ggthemes::theme_clean() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/lineplot_YearLifeExp_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder, aes(x = year, y = lifeExp, color = continent) ) + geom_point(alpha = .4) + geom_line(alpha = .7) + aes(group = country) + scale_color_colorblind( name = "Continents" ) + ggthemes::theme_clean() + * labs( * x = "Year", * y = "Life Expectancy (years)", * title = "Lineplot", * subtitle = "Time series", * caption = "Source: gapminder package" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/lineplot_YearLifeExp_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- name::ggmarginal # `ggmarginal` https://cran.r-project.org/web/packages/ggExtra/vignettes/ggExtra.html .pull-left-40[ ```r # library(ggExtra) p <- ggplot(data = gapminder2011, aes(x = FoodSupplykcPPD, y = LifeExpectancyYrs, color = four_regions) ) + geom_point(alpha = .4) + scale_color_discrete( name = "Regions", labels = c("Africa", "Americas", "Asia", "Europe") ) + theme(legend.position="bottom") + labs( x = "Daily Food Supply PP (kc)", y = "Life Expectancy (years)", title = "Scatterplot" ) ``` ] .pull-right-60[ ```r ggMarginal(p, type = "density", margins = "both", groupColour = TRUE, groupFill = TRUE ) ``` <img src="04_ggplot_slides_files/figure-html/margins_FoodvsLifeExp_out-1.png" width="720" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # Corrolelograms --- name:correlation ## Correlation matrix ```r M <- cor(gapminder2011 %>% select(FoodSupplykcPPD:WaterSourcePrct), use = "complete.obs" # specified since there are missing values ) M ``` ``` FoodSupplykcPPD IncomePP LifeExpectancyYrs FoodSupplykcPPD 1.0000000 0.48221951 0.64233437 IncomePP 0.4822195 1.00000000 0.51567562 LifeExpectancyYrs 0.6423344 0.51567562 1.00000000 FemaleLiteracyRate 0.4816309 0.44804036 0.64921874 population 0.0768498 -0.07838737 0.05467681 WaterSourcePrct 0.6980454 0.53687914 0.80693858 FemaleLiteracyRate population WaterSourcePrct FoodSupplykcPPD 0.48163092 0.07684980 0.69804539 IncomePP 0.44804036 -0.07838737 0.53687914 LifeExpectancyYrs 0.64921874 0.05467681 0.80693858 FemaleLiteracyRate 1.00000000 -0.05188109 0.74980282 population -0.05188109 1.00000000 0.05559188 WaterSourcePrct 0.74980282 0.05559188 1.00000000 ``` --- ## `corrplot::corrplot()` https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html .pull-left[ ```r library(corrplot) corrplot(M, method = "number") ``` <img src="04_ggplot_slides_files/figure-html/unnamed-chunk-6-1.png" width="504" style="display: block; margin: auto;" /> ] .pull-right[ ```r corrplot(M, method = "ellipse") ``` <img src="04_ggplot_slides_files/figure-html/unnamed-chunk-7-1.png" width="720" style="display: block; margin: auto;" /> ```r corrplot.mixed(M) ``` <img src="04_ggplot_slides_files/figure-html/unnamed-chunk-8-1.png" width="504" style="display: block; margin: auto;" /> ] --- name:ggcorr ## `GGally::ggcorr()` https://ggobi.github.io/ggally/index.html ```r # library(GGally) gapminder2011 %>% select(FoodSupplykcPPD:WaterSourcePrct) %>% # specifying which columns to use ggcorr() ``` <img src="04_ggplot_slides_files/figure-html/unnamed-chunk-9-1.png" width="720" style="display: block; margin: auto;" /> --- name:ggpairs ## `GGally::ggpairs()` https://ggobi.github.io/ggally/index.html ```r # library(GGally) gapminder2011 %>% select(FoodSupplykcPPD:WaterSourcePrct) %>% # specifying which columns to use ggpairs() ``` <img src="04_ggplot_slides_files/figure-html/unnamed-chunk-10-1.png" width="720" style="display: block; margin: auto;" /> --- class: inverse, middle, center # Faceting --- name:facetdensity # Faceted Density Plot .pull-left[ ```r ggplot(data = gapminder2011_long, aes(x = Values, color = four_regions) ) + facet_wrap(~ Measures, scales = "free", ncol = 2 ) + geom_density() + ggthemes::theme_few() + theme(legend.position="top") + labs( x = "", title = "Faceted Density Plots", # Add a figure number! tag = "Fig 1", # note that color is being # specified inside labs! color = "Regions" ) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/facet_density_all_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Wide vs. long data - __Wide__ data has one row per subject, with multiple columns for their repeated measurements - __Long__ data has multiple rows per subject, with one column for the measurement variable and another indicating from when/where the repeated measures are from .pull-left[ wide <img src="img/SBP_wide2.png" width="73%" height="73%"> See BERD workshop [Data Wrangling Part 2](https://jminnier-berd-r-courses.netlify.com/02-data-wrangling-tidyverse/02_data_wrangling_slides_part2.html#26) for slides on how to make wide data long. ] .pull-right[ long <img src="img/SBP_long2.png" width="30%" height="30%"> ] --- # Dataset `Gapminder_vars_2011_long.csv` (1/2) * This is the same 2011 Gapminder data we've been using thus far, but in a __long format__ instead of wide. * Instead of individual columns for `CO2emissions`, `ElectricityUsePP`, ... `WaterSourcePrct`, * there is a column called `Measures` which contains these variables names and * a column called `Values` with the actual values for these measures. * This means the dataset contains multiple rows per country to account for each of these measures. ```r gapminder2011_long <- read_csv("data/Gapminder_vars_2011_long.csv") glimpse(gapminder2011_long) ``` ``` Rows: 1,365 Columns: 8 $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist… $ population <dbl> 29700000, 29700000, 29700000, 29700000, 29700000, 29700… $ four_regions <chr> "asia", "asia", "asia", "asia", "asia", "asia", "asia",… $ eight_regions <chr> "asia_west", "asia_west", "asia_west", "asia_west", "as… $ six_regions <chr> "south_asia", "south_asia", "south_asia", "south_asia",… $ WorldRegions <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia",… $ Measures <chr> "CO2emissions", "ElectricityUsePP", "FoodSupplykcPPD", … $ Values <dbl> 4.12e-01, NA, 2.11e+03, 1.66e+03, 5.67e+01, 1.30e+01, 5… ``` --- # Dataset `Gapminder_vars_2011_long.csv` (2/2) ```r gapminder2011_long %>% select(country, population, four_regions, Measures, Values) %>% head(15) ``` ``` # A tibble: 15 x 5 country population four_regions Measures Values <chr> <dbl> <chr> <chr> <dbl> 1 Afghanistan 29700000 asia CO2emissions 0.412 2 Afghanistan 29700000 asia ElectricityUsePP NA 3 Afghanistan 29700000 asia FoodSupplykcPPD 2110 4 Afghanistan 29700000 asia IncomePP 1660 5 Afghanistan 29700000 asia LifeExpectancyYrs 56.7 6 Afghanistan 29700000 asia FemaleLiteracyRate 13 7 Afghanistan 29700000 asia WaterSourcePrct 52.6 8 Albania 2930000 europe CO2emissions 1.79 9 Albania 2930000 europe ElectricityUsePP 2210 10 Albania 2930000 europe FoodSupplykcPPD 3130 11 Albania 2930000 europe IncomePP 10200 12 Albania 2930000 europe LifeExpectancyYrs 76.7 13 Albania 2930000 europe FemaleLiteracyRate 95.7 14 Albania 2930000 europe WaterSourcePrct 88.1 15 Algeria 36800000 africa CO2emissions 3.29 ``` --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder2011_long, * aes(x = Values) * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet_density_all_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + * geom_density() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet_density_all_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + geom_density() + * facet_wrap(~ Measures, * scales = "free", * ncol = 2 * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet_density_all_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + geom_density() + facet_wrap(~ Measures, scales = "free", ncol = 2 ) + * aes(color = four_regions) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet_density_all_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + geom_density() + facet_wrap(~ Measures, scales = "free", ncol = 2 ) + aes(color = four_regions) + * ggthemes::theme_few() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet_density_all_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + geom_density() + facet_wrap(~ Measures, scales = "free", ncol = 2 ) + aes(color = four_regions) + ggthemes::theme_few() + * theme( legend.position="top") ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet_density_all_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + geom_density() + facet_wrap(~ Measures, scales = "free", ncol = 2 ) + aes(color = four_regions) + ggthemes::theme_few() + theme( legend.position="top") + * labs( * x = "", * title = "Faceted Density Plots", * tag = "Fig 1", * color = "Regions" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet_density_all_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- name:facethist # Faceted 2x Histogram .pull-left[ Note that a __long__ dataset is being used in this example! See next slide for an explanation. ```r ggplot(data = gapminder2011_long) + geom_histogram(aes(x = Values), fill = "darkorange") + facet_grid( four_regions ~ Measures, scales = "free_x" ) + ggthemes::theme_igray() + theme( strip.text.y = element_text(size=10, angle=45, face = "bold"), strip.text.x = element_text(size=6), axis.text.x = element_text(angle=45, hjust=1) ) + labs( x = "", title = "Faceted Density Plots" ) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/facet2x_density_all_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *ggplot(data = gapminder2011_long, * aes(x = Values) * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet2x_density_all_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + * facet_grid( * four_regions ~ Measures, * scales = "free_x" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet2x_density_all_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + facet_grid( four_regions ~ Measures, scales = "free_x" ) + * geom_histogram(fill = "darkorange") ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet2x_density_all_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + facet_grid( four_regions ~ Measures, scales = "free_x" ) + geom_histogram(fill = "darkorange") + * ggthemes::theme_igray() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet2x_density_all_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + facet_grid( four_regions ~ Measures, scales = "free_x" ) + geom_histogram(fill = "darkorange") + ggthemes::theme_igray() + * theme( * strip.text.y = * element_text(size=10, * angle=45, * face = "bold"), * strip.text.x = element_text(size=6), * axis.text.x = element_text(angle=45, * hjust=1) * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet2x_density_all_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = gapminder2011_long, aes(x = Values) ) + facet_grid( four_regions ~ Measures, scales = "free_x" ) + geom_histogram(fill = "darkorange") + ggthemes::theme_igray() + theme( strip.text.y = element_text(size=10, angle=45, face = "bold"), strip.text.x = element_text(size=6), axis.text.x = element_text(angle=45, hjust=1) ) + * labs( * x = "", * title = "Faceted Density Plots" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/facet2x_density_all_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class:inverse, center, middle # Gene Expression --- # Pasilla Data ```r glimpse(pasilla_data) ``` ``` Rows: 8,377 Columns: 15 $ gene <chr> "FBgn0000008", "FBgn0000017", "FBgn0000018", "FBgn0000… $ baseMean <dbl> 95.144292, 4352.553569, 418.610484, 6.406200, 989.7202… $ fc <dbl> 1.0015792, 0.8467929, 0.9300151, 1.1573681, 0.9383590,… $ log2FoldChange <dbl> 0.002276441, -0.239918944, -0.104673912, 0.210847792, … $ lfcSE <dbl> 0.2237287, 0.1263369, 0.1484891, 0.6895876, 0.1477518,… $ stat <dbl> 0.01017501, -1.89904084, -0.70492676, 0.30575928, -0.6… $ pvalue <dbl> 9.918817e-01, 5.755911e-02, 4.808558e-01, 7.597879e-01… $ padj <dbl> 9.972108e-01, 2.880017e-01, 8.268337e-01, 9.435011e-01… $ treated1 <dbl> 7.607917, 11.938311, 9.143372, 6.479135, 10.187606, 6.… $ treated2 <dbl> 7.834912, 12.024557, 9.011505, 6.577240, 10.024965, 6.… $ treated3 <dbl> 7.595052, 12.013565, 8.944883, 6.475226, 10.017728, 6.… $ untreated1 <dbl> 7.567298, 12.045721, 9.315269, 6.565256, 10.450060, 6.… $ untreated2 <dbl> 7.642174, 12.284647, 9.098290, 6.479802, 10.080550, 6.… $ untreated3 <dbl> 7.844603, 12.455939, 8.966546, 6.422196, 10.069559, 6.… $ untreated4 <dbl> 7.669147, 12.077404, 9.066286, 6.395509, 9.996554, 6.4… ``` --- name:volcano # Volcano Plot .pull-left[ ```r # Create subset for labeling pasilla_data_top = pasilla_data %>% filter(-log10(padj) > 10, abs(log2FoldChange) > 2.5) ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + geom_point() + scale_y_reverse() + aes(color = padj < 0.05) + ggrepel::geom_text_repel( data = pasilla_data_top, aes(label = gene), color = "black", box.padding = 0.5, min.segment.length = 0) + xlim(c(-7,7)) + geom_vline(xintercept = c(-2.5, 2.5), lty = "dashed", color="grey") + ggthemes::theme_clean() + labs( x = bquote(~Log[2]~ "fold change"), y = bquote(~Log[10]~adjusted~italic(P)), title = "Volcano Plot", subtitle = "Gene Expression of Pasilla Data" ) ``` <img src="04_ggplot_slides_files/figure-html/volcanoplot_nice-1.png" width="504" style="display: block; margin: auto;" /> ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: split-40 count: false .column[.content[ ```r *ggplot(data = pasilla_data, * aes(x = log2FoldChange, * y = log10(padj))) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_1_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + * geom_point() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_2_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + geom_point() + * scale_y_reverse() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_3_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + geom_point() + scale_y_reverse() + * aes(color = padj < 0.05) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_4_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + geom_point() + scale_y_reverse() + aes(color = padj < 0.05) + * ggrepel::geom_text_repel( * data = pasilla_data_top, * aes(label = gene), color = "black", * box.padding = 0.5, * min.segment.length = 0) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_5_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + geom_point() + scale_y_reverse() + aes(color = padj < 0.05) + ggrepel::geom_text_repel( data = pasilla_data_top, aes(label = gene), color = "black", box.padding = 0.5, min.segment.length = 0) + * xlim(c(-7,7)) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_6_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + geom_point() + scale_y_reverse() + aes(color = padj < 0.05) + ggrepel::geom_text_repel( data = pasilla_data_top, aes(label = gene), color = "black", box.padding = 0.5, min.segment.length = 0) + xlim(c(-7,7)) + * geom_vline(xintercept = c(-2.5, 2.5), * lty = "dashed", color="grey") ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_7_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + geom_point() + scale_y_reverse() + aes(color = padj < 0.05) + ggrepel::geom_text_repel( data = pasilla_data_top, aes(label = gene), color = "black", box.padding = 0.5, min.segment.length = 0) + xlim(c(-7,7)) + geom_vline(xintercept = c(-2.5, 2.5), lty = "dashed", color="grey") + * ggthemes::theme_clean() ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_8_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- class: split-40 count: false .column[.content[ ```r ggplot(data = pasilla_data, aes(x = log2FoldChange, y = log10(padj))) + geom_point() + scale_y_reverse() + aes(color = padj < 0.05) + ggrepel::geom_text_repel( data = pasilla_data_top, aes(label = gene), color = "black", box.padding = 0.5, min.segment.length = 0) + xlim(c(-7,7)) + geom_vline(xintercept = c(-2.5, 2.5), lty = "dashed", color="grey") + ggthemes::theme_clean() + * labs( * x = bquote(~Log[2]~ "fold change"), * y = bquote(~Log[10]~adjusted~italic(P)), * title = "Volcano Plot", * subtitle="Gene Expression of Pasilla Data" * ) ``` ]] .column[.content[ <img src="04_ggplot_slides_files/figure-html/volcanoplot_auto_9_output-1.png" width="504" style="display: block; margin: auto;" /> ]] --- name:heatmap # Heatmap with `pheatmap::pheatmap()` It's possible to make heatmaps in ggplot2 with `geom_tile()`, but there are many other better functions using base R that cluster and annotate the data. This is using `pheatmap` package. .pull-left-60[ We need to create the data: ```r # select expression data pasilla_heat <- pasilla_data %>% select(treated1:untreated4) # subtract off gene-specific means pasilla_heat <- pasilla_heat - rowMeans(pasilla_heat) # calculate standard deviation of each centered gened sd_gene <- apply(pasilla_heat,1,sd) # select top 500 most variable pasilla_heat <- pasilla_heat[order(sd_gene, decreasing = TRUE)[1:500],] # create annotation data pasilla_col <- data.frame( trt = factor(c(rep("trt",3), rep("untrt",4))), id = 1:7, row.names=colnames(pasilla_heat)) ``` ] .pull-right-40[ ```r head(pasilla_heat, n = 3) ``` ``` treated1 treated2 treated3 untreated1 untreated2 untreated3 untreated4 2390 -1.5997691 0.8713581 1.568570 -1.500656 -2.103343 1.338488 1.4253512 521 -1.3218267 0.9954861 1.278523 -1.521073 -1.425689 1.040472 0.9541077 7886 -0.5901012 0.8225366 1.339219 -1.762754 -1.701829 1.155933 0.7369965 ``` ```r pasilla_col ``` ``` trt id treated1 trt 1 treated2 trt 2 treated3 trt 3 untreated1 untrt 4 untreated2 untrt 5 untreated3 untrt 6 untreated4 untrt 7 ``` ] --- # Heatmap with `pheatmap::pheatmap()` .pull-left[ ```r pheatmap::pheatmap( mat = pasilla_heat, show_rownames = FALSE, annotation_col = pasilla_col ) ``` ] .pull-right[ <img src="04_ggplot_slides_files/figure-html/heatmap_out-1.png" width="504" style="display: block; margin: auto;" /> ] --- name:sidebyside # Side by side plot with [`ggpubr`](https://rpkgs.datanovia.com/ggpubr/) .pull-left[ ```r p1 <- ggplot(data = pasilla_data, aes(x = log2FoldChange, y = -log10(padj), color = log10(baseMean))) + geom_point() + geom_vline(xintercept = c(-2.5, 2.5), lty = 2, color="grey") + theme_few() + scale_color_viridis_c() + labs(x = bquote(~Log[2]~ "fold change"), y = bquote(~Log[10]~adjusted~italic(P)), title = "Volcano Plot") p2 <- ggplot(data = pasilla_data, aes(x = baseMean, y = log2FoldChange, color = log10(baseMean))) + geom_point() + scale_x_log10() + geom_hline(yintercept = 0, color = "red") + theme_few() + scale_color_viridis_c() + labs(y = bquote(~Log[2]~ "fold change"), x = bquote(~Log[10]~ "mean expression"), title = "MA Plot") ``` ] .pull-right[ ```r ggpubr::ggarrange(p1, p2, labels = "AUTO", common.legend = TRUE, legend = "bottom") ``` <img src="04_ggplot_slides_files/figure-html/ggpubr_out-1.png" width="80%" height="80%" style="display: block; margin: auto;" /> Other options: [cowplot](https://wilkelab.org/cowplot/articles/index.html) and [patchwork](https://github.com/thomasp85/patchwork). ] --- name:ggplotly # Interactive `plotly` graphs with `ggplotly()` .pull-left-40[ ```r # Save ggplot p1 <- ggplot( data = pasilla_data, aes(x = log2FoldChange, y = -log10(padj), color = log10(baseMean), * key = gene) ) + geom_point() + geom_vline( xintercept = c(-2.5, 2.5), lty = 2, color="grey") + theme_few() + scale_color_viridis_c() ``` ```r *plotly::ggplotly(p1) ``` ] .pull-right-60[
] --- name:ggsave # Saving plots .pull-left[ ```r ggsave(plot = p1, filename = "figs/volcanoplot_small.png", height = 4, width = 4, units = "in", dpi = 100) ``` <center><img src="figs/volcanoplot_small.png" width="60%" height="60%"></center> ] .pull-right[ ```r ggsave(plot = p1, filename = "figs/volcanoplot_large.png", height = 10, width = 10, units = "in", dpi = 300) ``` <center><img src="figs/volcanoplot_large.png" width="60%" height="60%"></center> ] --- # Exercise Complete the fifth section of the `practice_ggplot.Rmd` file: "Histogram". --- class:center, middle, inverse # References and Links --- <!-- ![](img/poster_big.png) --> <center><img src="img/poster_big.png" width="85%" height="80%"><a href="https://www.data-to-viz.com/poster.html"><br>https://www.data-to-viz.com/poster.html</a></center> --- # Many, many ggplot extensions! Some examples at the [ggplot2 extensions gallery](https://exts.ggplot2.tidyverse.org/gallery/) <center><img src="img/ggextensiongallery.png" width="100%" height="100%"><a href="https://exts.ggplot2.tidyverse.org/gallery/"><br>Allison Horst</a></center> --- # Many, many themes and palettes/scales! We used themes from [`ggthemes`](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/) and [`hrbrthemes`](https://github.com/hrbrmstr/hrbrthemes) as well as built in themes, but there are many more: .pull-left[<center><img src="img/simpsonstheme.png" width="50%" height="50%"><a href="https://ryo-n7.github.io/2019-05-16-introducing-tvthemes-package/"><br>TV Themes</a></center> <center><img src="img/ggpomological.png" width="50%" height="50%"><a href="https://www.garrickadenbuie.com/project/ggpomological/"><br>ggpomological</a></center>] .pull-right[ <center><img src="img/bbplot.png" width="50%" height="50%"><a href="https://github.com/bbc/bbplot/"><br>bbplot for BBC themes</a></center> <center><img src="img/ggthemr.png" width="50%" height="50%"><a href="https://www.shanelynn.ie/themes-and-colours-for-r-ggplots-with-ggthemr/"><br>ggthemr</a></center> ] from ["Themes to improve your ggplot figures" by David Keyes](https://rfortherestofus.com/2019/08/themes-to-improve-your-ggplot-figures/) --- # R colors and palettes .pull-left[ <center><img src="img/ggplot2-colour-names.png" width="50%" height="50%"><a href="http://sape.inf.usi.ch/quick-reference/ggplot2/colour"><br>Built in R Colors</a></center> ] .pull-right[ <center><img src="img/wesanderson.png" width="60%" height="60%"><a href="https://www.datanovia.com/en/blog/ggplot-colors-best-tricks-you-will-love/"><br>wesanderson package</a></center> ] --- # Changing the order of names (levels) within a categorical variable * The default order of names within a categorical variable is alphanumeric * Such as `africa, americas, asia, europe` for the `four_regions` variable * Often we want a different order when making plots though. ## `factor` level variables Do this by making the categorical variable a `factor` level variable in R. * We can change the order of names within a `factor` level variable, and even rename the levels. * The `forcats` package makes this easy to do. See https://forcats.tidyverse.org/. --- # References <!-- to-do: check this slide --> - [ggplot cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf) - [ggplot2 package reference](https://ggplot2.tidyverse.org/reference/) - [ggplot2: Elegant Graphics for Data Analysis](https://ggplot2-book.org/) by Hadley Wickham - [*Data Visualizaton* online textbook by Kieran Healy](https://socviz.co/makeplot.html) - [R Graphics Cookbook](http://www.cookbook-r.com/Graphs/) by Winston Chang - [*R for Data Science* online textbook by Hadley Wickham](https://r4ds.had.co.nz/data-visualisation.html) - [*Introduction to Data Science* online textbook by Rafael A. Irizarry](https://rafalab.github.io/dsbook/ggplot2.html) Example plots and extensions: - [R Graph Gallery](https://www.r-graph-gallery.com/) - [ggplot2 extension gallery](https://exts.ggplot2.tidyverse.org/gallery/) - [All Your Figure Are Belong To Us](https://yutannihilation.github.io/allYourFigureAreBelongToUs/) - [from Data to Viz](https://www.data-to-viz.com/) - beautiful flowcharts to help you decide on a plot based on the variable type(s); check out their [poster](https://www.data-to-viz.com/poster.html) - [Top 50 ggplot2 Visualizations - The Master List (With Full R Code)](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html) OHSU class: - [CS 631 Data Visualization](https://www.ohsu.edu/school-of-medicine/csee/data-science) --- # Inspiration for this talk - [github/flipbookr](https://github.com/EvaMaeRey/flipbookr) - [Kieran Healy's rstudio::conf2020 data viz materials](https://github.com/rstudio-conf-2020/dataviz) --- class: left # Thank you! ## Contact info: - Jessica Minnier: _minnier@ohsu.edu_ - Meike Niederhausen: _niederha@ohsu.edu_ ## This workshop info: - Code for these slides are on github, with links to other course materials: [jminnier/berd_r_courses](https://github.com/jminnier/berd_r_courses) - The `.Rmd` file that generated the slides is on [github](https://github.com/jminnier/berd_r_courses/blob/master/04-ggplot/04_ggplot_slides.Rmd) and can be downloaded [here](https://jminnier-berd-r-courses.netlify.com/04-ggplot/04_ggplot_slides.Rmd), though you need to download the whole [R project](https://github.com/jminnier/berd_r_courses/archive/master.zip) to knit the file. - The project folder of examples can be downloaded at [github.com/jminnier/berd_ggplot_project](https://github.com/jminnier/berd_ggplot_project) & the solutions are in the `solns/` folder.