BC3203

Intro to R

R as a Calculator

1+1

2^3


2 * 2

3/4

101 %% 10

# This is a comment. It is ignored by R
103 %% 10

Note that R will print the result of a calculation like this automatically

We can suppress printing with ; (Don’t do this)

Variables

a <- 1
b <- 1.5

a
b

a + b

a * b

c <- a * b
c

Note that nothing gets printed when you assign a variable. To print the variable just type its name

Some common variables are built-in

2*pi*10^2

Somewhat confusingly R has three assignment operators

myname = "Ira"
myname <- "Ira"
"Ira" -> myname

You should try to always use <- as this is preferred. Use of -> is STRONGLY discouraged

Variable names

a and b are actually terrible variable names for real code.

You should name your variables descriptively eg

first_name <- "Darth"
last_name <- "Vader"

cat(first_name,last_name)

Note the similarity between R and bash here. We just used a function called cat which concatenates its arguments and prints them to the screen.

Functions

Functions encapsulate (potentially complex) computations into a single named command. All functions have this form;

somename()

For example the sqrt function calculates the square root of its argument;

sqrt(4)
1 + 2*sqrt(4)

Some functions have no arguments

date()

Some functions have more than 1 argument. Multiple arguments are separated with commas.

pi
round(pi,2)

Most built-in functions provide feedback when used incorrectly

sqrt(-1)

Discovering functions

How do you know which functions are available? builtins() will display all the functions that are built into the base of R. There over 1300 of them though. There are also tens of thousands of new functions available in packages.

Start with simple functions. Try to remember each new function you learn. You will quickly know enough to do many useful things. A good way to help remember is to use RStudio Cheatsheets .

length(builtins())

Getting help on functions

Lets find out more about the cat function

?cat
help(cat)

Try some other functions

?paste

How would we use this?

full_name <- paste(first_name,last_name)

full_name <- paste(first_name,last_name,sep = "_")

Rstudio contextual help

Try typing the function round() in RStudio. Notice how it pops up a help box with information on the possible arguments.

Defining functions

The whole point of a programming language is that it can be extended and built upon. The main way this is done in R is through functions. A function allows you to capture code for re-use.

Functions are objects in R just like everything else.

A function is way to attach a name to some code.

A function has inputs and outputs Functions can also have “side effects”

A simple function

square <- function(input){
  input*input
}

square(2)
square(3)

This function has a single input and a single output

Here is a function with two arguments

pow <- function(base,exponent){
  base^exponent
}
pow(2,3)

Inside the function body we can perform as many actions as we like. Only the result of the final statement is returned as output.

pow <- function(base,exponent){
  tmp <- base^exponent
  tmp
}
pow(2,3)

Controlling logical flow with if statements

These are used to allow your code to do different things depending on a condition.

Let’s make a function that converts a single base from DNA to RNA

dna2rna <- function(nucleic_acid_base){
  if ( nucleic_acid_base == "T") {
    return("U")
  } else {
    return(nucleic_acid_base)
  }
}

dna2rna("A")
dna2rna("C")
dna2rna("T")
dna2rna("G")

What if we gave it a lower case letter?

dna2rna("t")

Let’s fix our function.

dna2rna <- function(nucleic_acid_base){
  upper_base = toupper(nucleic_acid_base)
  if ( upper_base == "T") {
    return("U")
  } else {
    return(upper_base)
  }
}

dna2rna("t")

Vectors

A vector is a collection of items. They must all be of the same type

# Character

c("a","b","cc")

# Numeric

vi <- c(1,4,5,2)
class(vi)

# Logical

vl <- c(TRUE,TRUE,FALSE)

vl
class(vl)

The c function is a very important function. It is used to create and concatenate vectors.

a <- c(1,2,3)

b <- c(5,6,7)

c(a,b)

Remember that vectors must contain values that are all the same type. What happens if we try to concatenate vectors of different types?

a_numeric <- c(1,2,3)

a_char <- c("1","2","3")

c(a_numeric,a_char) # Everything converted to character

We can easily create integer vectors with shorthand expressions like this

2:6

seq(1,10,by=2)
seq(1,10,by=3)

rep(1:2, times = 3)

rep(1:2, each = 3)

Accessing elements of vectors .. uses square brackets []

x <- c(1,4,5,2)

x[1]
x[3]

# Multiple 
x[c(1,3)]

# Logical
x[x>2]
x>2

# Negative accesses everything else
x[-1]
x[-2]
x[-c(1,4)]

# This is a built-in variable
letters

# Every second letter
letters[seq(1,26,by=2)]

# The %in% operator
letters[letters %in% c("a","z")]

Named vectors

abundance <- c(1200,34,6)
kingdom <- c("bacteria", "fungi" , "animal")
names(abundance) <- kingdom
abundance

abundance <- c("bacteria" = 1200,"fungi" = 34,"animal" = 6)

These can be accessed by name

abundance["bacteria"]

# Names are just auxilliary data attached to the vector.  We can still do all normal vector things
abundance*abundance

abundance[abundance>10]

Lists

Each item within a list can be of a different type. In fact lists can contain pretty much anything

list(c(1,2,3), c("a","b"))

Data frames

A data frame is a special type of list where each item has the same length

# This is not allowed
data.frame(c(1,2,3), c("a","b"))
# This is
data.frame(c(1,2,3), c("a","b","c"))

# We can make this look nicer by using names
data.frame(numbers = c(1,2,3),characters = c("a","b","c"))

Make another data frame

count <- c(120, 31, 4)
taxon <- c("bacteria","fungi","animal")
haploid <- c(TRUE,FALSE,FALSE)

df <- data.frame(count,taxon)

# Can access a row like this
df[1,]
df[c(1,3),]

# Can access columns like this
df$count
df$taxon


nrow(df)
ncol(df)

# Can add another column using cbind

df <- cbind(df,haploid)
ncol(df)

# One of the columns is a logical vector
df$haploid

# We can use it to select rows

df[df$haploid,]

df[!df$haploid,]
dim(df)

Loops

Just like in BASH, R also has loops. It has both for and while loops

for (number in 1:4){
  print(number*number)
}

Vector operations

We just used a for loop to calculate the square of each number in the sequence 1:4

In R a for loop is often a last resort for doing this kind of thing. Many operations in R are vectorised. This means they work for single values as well as for vectors.

In general if you can do something in a vectorised way you should because

1:4*1:4

1:4+1:4

Basic plotting

# Make some numbers to plot
x <- 1:100*0.1
y <- cos(x)

plot(x,y)

Another example

random_sample <- rnorm(1000, mean = 0, sd = 1)
hist(random_sample)

Tips on using R and RStudio

Know the keyboard shortcuts. Some of the most useful are;

  1. Run a line (I’m using) Command-Enter on my mac
  2. Type the assignment operator (<-) [Option-Minus]

Keep commands in a file

Use the environment variable display

Use projects to organise code and data into one folder.