An Introduction to R

Andrew Singleton

harp training 16/02/2022

What is R?

  • A language developed for statistical analysis
  • Can be used interactively or run scripts
  • Has a vast array of packages for doing all sorts of things
  • Huge community - used to be toxic, but now extremely welcoming and helpful

Information about R

RStudio

  • RStudio is an IDE designed for R
  • Makes many tasks easier
    • Tab completion
    • Function help popups
    • Project management
    • Inspecting data
    • Code highlighting
    • Code formatting
    • Report writing
    • Package development
    • Website development
  • Other code editors are available (!)

Getting started

Basic math operators

3 + 3
## [1] 6
14 - 6
## [1] 8
56 / 8
## [1] 7
17 * 17
## [1] 289
17 ^ 2
## [1] 289
3 %% 2
## [1] 1

Data types and classes

Main types

  • double
  • integer
  • character
  • logical
  • complex
  • closure (it’s normally a function)

Data types and classes

Main classes

  • numeric
  • factor
  • Date / POSIXct
  • list
  • data.frame
  • matrix
  • array

Variables

  • R’s assignment operator is <-
  • = works too (but is frowned upon as it has other purposes)
  • Variable names are case sensitive
  • The tide is moving towards snake case for variable names
  • Avoid using . in variable names except at the beginning
  • There are some reserved names - they can be used by try not to…

Vectors and arrays

  • R counts from 1
  • All scalars are element 1 of a 1D vector
  • Vectors are created by concatenation
  • Elements are accessed using [ ]

Functions

  • Functions are called with arguments in brackets
  • All arguments are named but order can be used
  • ... Ellipsis either used to allow infinite inputs, or passed to other functions
  • Use ?<function_name> to see documentation

Comparisons and logical operators

  • > >= < <= == !=
  • %in%
  • & | !
3 > 4
## [1] FALSE
5 == 10
## [1] FALSE
3 %in% c(2, 3, 4)
## [1] TRUE
TRUE & TRUE
## [1] TRUE
c(TRUE, FALSE, FALSE) | c(FALSE, FALSE, TRUE)
## [1]  TRUE FALSE  TRUE

Missing Values

  • NA (Not Available)
  • NAs are “contagious”
5 > NA
## [1] NA
12 + NA
## [1] NA
mean(c(3, 4, 5, NA, 7))
## [1] NA
NA == NA
## [1] NA
is.na(NA)
## [1] TRUE

Packages

  • Install from CRAN using install.packages("<package>")
  • Make available to session using library(<package>)
    • library generally better than require as it will stop execution if the package is not installed - require only warns
  • Using namespaces is an option (and a must when developing your own package)
  • Some packages are loaded by default: datasets, utils, grDevices, graphics, stats, methods

Getting help

  • Use help(package = "<package>") to find out what a package contains
  • Package vignettes are the best source of information, often detailing common workflows
    • vignette(package = "<package>")
    • vignette("<topic>", package = "<package>")
  • ?<function>
  • ??<keyword>

Exercises

  • Find the sum of 74, 387, 96, 208, 7
  • Find the mean of the same numbers
  • Modify sd(c(4, 7, 22, 3, 19, 10, NA, 44, NA, 6)) to make it work
  • Install the “praise” package and give yourself a compliment

Solutions

sum(74, 387, 96, 208, 7)
## [1] 772
mean(c(74, 387, 96, 208, 7))
## [1] 154.4
sd(c(4, 7, 22, 3, 19, 10, NA, 44, NA, 6))
## [1] NA
sd(c(4, 7, 22, 3, 19, 10, NA, 44, NA, 6), na.rm = TRUE)
## [1] 13.8248
install.packages("praise")
library(praise)
praise()
## [1] "You are flawless!"

Up Next

Data frames and lists