Part 2: Loading Data, data.frames, and ggplot2

Materials from class on Wednesday, January 13, 2021

Class Video

Slides

For Loops / Projects / ggplot2

Open the slides in a separate window: https://sph-r-programming.netlify.com/slides/02-for_loops#1

Function of the Week Assignment

Please refer to the function_assignment project in the RStudio Cloud workspace. We will go over this in class.

Post-Class

Please fill out the following survey and we will discuss the results during the next lecture. All responses will be anonymous.

  • Clearest Point: What was the most clear part of the lecture?
  • Muddiest Point: What was the most unclear part of the lecture to you?
  • Anything Else: Is there something you’d like me to know?

https://ohsu.ca1.qualtrics.com/jfe/form/SV_0rjxy6FgXapnMk5

Muddiest Points

little bit confused about how you loaded data. I used a different method when using Rstudio.

There are multiple routes to loading data in R.

There is the option to load data using the file loading wizard, which you may find a little easier to use. But it’s worth talking about all of the different ways loading data can go wrong, which the wizard might not be able to help you with.

It wasn’t unclear but I’d like to learn more about ggplots. Can we customize the formatting of plots we create (like change colors, text size, etc.)?

Stay tuned. We’re covering it in class today!

I always had trouble understanding this first comma inside the third bracket, glimpse(namcs[,1:5]), what does it do?

The comma in the brackets can be hard to wrap your head around.

We use the comma to specify both the row number and the column number in subsetting data.

  • The numbers before the comma refer to the rows
  • The number after the comma refers to the columns.
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.0.3
data(penguins)
knitr::kable(penguins[1:10,])
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
Adelie Torgersen 38.9 17.8 181 3625 female 2007
Adelie Torgersen 39.2 19.6 195 4675 male 2007
Adelie Torgersen 34.1 18.1 193 3475 NA 2007
Adelie Torgersen 42.0 20.2 190 4250 NA 2007

For example, if I wanted to refer to the first row and first column of penguins, I could use this:

penguins[1, 1]
## # A tibble: 1 x 1
##   species
##   <fct>  
## 1 Adelie

To refer to the entire first row of penguins, I can remove the second number. Note that the comma remains.

penguins[1, ]
## # A tibble: 1 x 8
##   species island bill_length_mm bill_depth_mm flipper_length_~ body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge~           39.1          18.7              181        3750 male 
## # ... with 1 more variable: year <int>

To refer to the entire first column of penguins, I can remove the first number:

penguins[,1]
## # A tibble: 344 x 1
##    species
##    <fct>  
##  1 Adelie 
##  2 Adelie 
##  3 Adelie 
##  4 Adelie 
##  5 Adelie 
##  6 Adelie 
##  7 Adelie 
##  8 Adelie 
##  9 Adelie 
## 10 Adelie 
## # ... with 334 more rows

And to get a range of columns and rows in penguins, we can put in a sequence:

penguins[5:10, 1:5]
## # A tibble: 6 x 5
##   species island    bill_length_mm bill_depth_mm flipper_length_mm
##   <fct>   <fct>              <dbl>         <dbl>             <int>
## 1 Adelie  Torgersen           36.7          19.3               193
## 2 Adelie  Torgersen           39.3          20.6               190
## 3 Adelie  Torgersen           38.9          17.8               181
## 4 Adelie  Torgersen           39.2          19.6               195
## 5 Adelie  Torgersen           34.1          18.1               193
## 6 Adelie  Torgersen           42            20.2               190

For loops - though not bad! I think I just need to practice them more now.

Keep at it!

the function presentation: where do we upload it?

There is a Sakai submission. Please upload both the HTML and the Rmd when you submit so I can get it up on the website.

Projects in RStudio Desktop

There were some questions about RStudio Desktop and projects, so here is a short video on how to setup projects in RStudio Desktop. We’ll have a session to install RStudio Desktop to your own machine in the future.

Thanks for Letting me Know

I appreciate you advocating for this class to be taught before the Biostats series as per some of our requests. I appreciate the pace you are taking and your genuine want to make sure we are learning the material. Thank you very much.

Much appreciated, thank you. I’ve emailed both Rochelle and Jessica and they will be thinking about it.