1+1
2^3
2 * 2
3/4
101 %% 10
# This is a comment. It is ignored by R
103 %% 10
Note that R will print the result of a calculation like this automatically
We can suppress printing with ;
(Don’t do this)
a <- 1
b <- 1.5
a
b
a + b
a * b
c <- a * b
c
Note that nothing gets printed when you assign a variable. To print the variable just type its name
Some common variables are built-in
2*pi*10^2
Somewhat confusingly R has three assignment operators
myname = "Ira"
myname <- "Ira"
"Ira" -> myname
You should try to always use <-
as this is preferred. Use of ->
is
STRONGLY discouraged
a
and b
are actually terrible variable names for real code.
You should name your variables descriptively eg
first_name <- "Darth"
last_name <- "Vader"
cat(first_name,last_name)
Note the similarity between R and bash here. We just used a function
called cat
which concatenates its arguments and prints them to the
screen.
Functions encapsulate (potentially complex) computations into a single named command. All functions have this form;
somename()
For example the sqrt function calculates the square root of its argument;
sqrt(4)
1 + 2*sqrt(4)
Some functions have no arguments
date()
Some functions have more than 1 argument. Multiple arguments are separated with commas.
pi
round(pi,2)
Most built-in functions provide feedback when used incorrectly
sqrt(-1)
How do you know which functions are available? builtins()
will display
all the functions that are built into the base of R. There over 1300 of
them though. There are also tens of thousands of new functions available
in packages.
Start with simple functions. Try to remember each new function you learn. You will quickly know enough to do many useful things. A good way to help remember is to use RStudio Cheatsheets .
length(builtins())
Lets find out more about the cat
function
?cat
help(cat)
Try some other functions
?paste
How would we use this?
full_name <- paste(first_name,last_name)
full_name <- paste(first_name,last_name,sep = "_")
Try typing the function round()
in RStudio. Notice how it pops up a
help box with information on the possible arguments.
The whole point of a programming language is that it can be extended and built upon. The main way this is done in R is through functions. A function allows you to capture code for re-use.
Functions are objects in R just like everything else.
A function is way to attach a name to some code.
A function has inputs and outputs Functions can also have “side effects”
A simple function
square <- function(input){
input*input
}
square(2)
square(3)
This function has a single input and a single output
Here is a function with two arguments
pow <- function(base,exponent){
base^exponent
}
pow(2,3)
Inside the function body
we can perform as many actions as we like.
Only the result of the final statement is returned as output.
pow <- function(base,exponent){
tmp <- base^exponent
tmp
}
pow(2,3)
These are used to allow your code to do different things depending on a condition.
Let’s make a function that converts a single base from DNA to RNA
dna2rna <- function(nucleic_acid_base){
if ( nucleic_acid_base == "T") {
return("U")
} else {
return(nucleic_acid_base)
}
}
dna2rna("A")
dna2rna("C")
dna2rna("T")
dna2rna("G")
What if we gave it a lower case letter?
dna2rna("t")
Let’s fix our function.
dna2rna <- function(nucleic_acid_base){
upper_base = toupper(nucleic_acid_base)
if ( upper_base == "T") {
return("U")
} else {
return(upper_base)
}
}
dna2rna("t")
A vector is a collection of items. They must all be of the same type
# Character
c("a","b","cc")
# Numeric
vi <- c(1,4,5,2)
class(vi)
# Logical
vl <- c(TRUE,TRUE,FALSE)
vl
class(vl)
The c
function is a very important function. It is used to create and
concatenate vectors.
a <- c(1,2,3)
b <- c(5,6,7)
c(a,b)
Remember that vectors must contain values that are all the same type. What happens if we try to concatenate vectors of different types?
a_numeric <- c(1,2,3)
a_char <- c("1","2","3")
c(a_numeric,a_char) # Everything converted to character
We can easily create integer vectors with shorthand expressions like this
2:6
seq(1,10,by=2)
seq(1,10,by=3)
rep(1:2, times = 3)
rep(1:2, each = 3)
x <- c(1,4,5,2)
x[1]
x[3]
# Multiple
x[c(1,3)]
# Logical
x[x>2]
x>2
# Negative accesses everything else
x[-1]
x[-2]
x[-c(1,4)]
# This is a built-in variable
letters
# Every second letter
letters[seq(1,26,by=2)]
# The %in% operator
letters[letters %in% c("a","z")]
abundance <- c(1200,34,6)
kingdom <- c("bacteria", "fungi" , "animal")
names(abundance) <- kingdom
abundance
abundance <- c("bacteria" = 1200,"fungi" = 34,"animal" = 6)
These can be accessed by name
abundance["bacteria"]
# Names are just auxilliary data attached to the vector. We can still do all normal vector things
abundance*abundance
abundance[abundance>10]
Each item within a list can be of a different type. In fact lists can contain pretty much anything
list(c(1,2,3), c("a","b"))
A data frame is a special type of list where each item has the same length
# This is not allowed
data.frame(c(1,2,3), c("a","b"))
# This is
data.frame(c(1,2,3), c("a","b","c"))
# We can make this look nicer by using names
data.frame(numbers = c(1,2,3),characters = c("a","b","c"))
Make another data frame
count <- c(120, 31, 4)
taxon <- c("bacteria","fungi","animal")
haploid <- c(TRUE,FALSE,FALSE)
df <- data.frame(count,taxon)
# Can access a row like this
df[1,]
df[c(1,3),]
# Can access columns like this
df$count
df$taxon
nrow(df)
ncol(df)
# Can add another column using cbind
df <- cbind(df,haploid)
ncol(df)
# One of the columns is a logical vector
df$haploid
# We can use it to select rows
df[df$haploid,]
df[!df$haploid,]
dim(df)
Just like in BASH, R also has loops. It has both for
and while
loops
for (number in 1:4){
print(number*number)
}
We just used a for loop to calculate the square of each number in the sequence 1:4
In R
a for loop is often a last resort for doing this kind of thing.
Many operations in R are vectorised. This means they work for single
values as well as for vectors.
In general if you can do something in a vectorised way you should because
1:4*1:4
1:4+1:4
# Make some numbers to plot
x <- 1:100*0.1
y <- cos(x)
plot(x,y)
Another example
random_sample <- rnorm(1000, mean = 0, sd = 1)
hist(random_sample)
Know the keyboard shortcuts. Some of the most useful are;
Keep commands in a file
Use the environment variable display
Use projects to organise code and data into one folder.