We didn't get to this topic last week…

Functions that help avoid writing for() loops

The apply() family of functions

These functions are called "functionals" because they take a function as an argument, and apply that function in specified ways

  • apply() - function applied to rows or colums of dataframe/matrix
  • tapply() - function applied to vector, grouped by a factor
  • lapply() - function applied to each element of a vector

apply()

3 main arguments:

  • X - name of the dataframe or matrix
  • MARGIN - whether to apply function over rows (1) or colums (2)
  • FUN - name of function to apply

apply()

myDF <- iris[,1:4]
head(myDF)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 3          4.7         3.2          1.3         0.2
## 4          4.6         3.1          1.5         0.2
## 5          5.0         3.6          1.4         0.2
## 6          5.4         3.9          1.7         0.4
apply(myDF, MARGIN=1, FUN=sum)
##   [1] 10.2  9.5  9.4  9.4 10.2 11.4  9.7 10.1  8.9  9.6 10.8 10.0  9.3  8.5
##  [15] 11.2 12.0 11.0 10.3 11.5 10.7 10.7 10.7  9.4 10.6 10.3  9.8 10.4 10.4
##  [29] 10.2  9.7  9.7 10.7 10.9 11.3  9.7  9.6 10.5 10.0  8.9 10.2 10.1  8.4
##  [43]  9.1 10.7 11.2  9.5 10.7  9.4 10.7  9.9 16.3 15.6 16.4 13.1 15.4 14.3
##  [57] 15.9 11.6 15.4 13.2 11.5 14.6 13.2 15.1 13.4 15.6 14.6 13.6 14.4 13.1
##  [71] 15.7 14.2 15.2 14.8 14.9 15.4 15.8 16.4 14.9 12.8 12.8 12.6 13.6 15.4
##  [85] 14.4 15.5 16.0 14.3 14.0 13.3 13.7 15.1 13.6 11.6 13.8 14.1 14.1 14.7
##  [99] 11.7 13.9 18.1 15.5 18.1 16.6 17.5 19.3 13.6 18.3 16.8 19.4 16.8 16.3
## [113] 17.4 15.2 16.1 17.2 16.8 20.4 19.5 14.7 18.1 15.3 19.2 15.7 17.8 18.2
## [127] 15.6 15.8 16.9 17.6 18.2 20.1 17.0 15.7 15.7 19.1 17.7 16.8 15.6 17.5
## [141] 17.8 17.4 15.5 18.2 18.2 17.2 15.7 16.7 17.3 15.8

tapply()

3 main arguments:

  • X - name of the vector
  • INDEX - name of factor used to group X
  • FUN - function to apply to the groups of X specified by INDEX

tapply()

head(chickwts)
##   weight      feed
## 1    179 horsebean
## 2    160 horsebean
## 3    136 horsebean
## 4    227 horsebean
## 5    217 horsebean
## 6    168 horsebean
tapply(chickwts$weight, INDEX = chickwts$feed, FUN=mean)
##    casein horsebean   linseed  meatmeal   soybean sunflower 
##  323.5833  160.2000  218.7500  276.9091  246.4286  328.9167

lapply()

The function lapply() is like a for loop, but is optimized and explicitly functional.

lapply() usually takes two arguments

  • X which is a list or a vector to loop over
  • FUN which is a function to apply to each element of X.

Thus, FUN must be able to accept any element of X as an argument.

x <- c("dog","cat", "cucumber")
lapply(x, FUN=nchar)
## [[1]]
## [1] 3
## 
## [[2]]
## [1] 3
## 
## [[3]]
## [1] 8

Writing your own functions

Functions are the best way to encapsulate code that you want to repeat again and again.

A function accepts arguments and returns a single (and only a single) object.

Variable names defined within a function only exist within the function (not within the global environment).

Writing your own functions

is.it.positive <- function(number){
  if(number > 0) {return("The number is positive")}
  else return("The number is not positive")
}
is.it.positive(1)
## [1] "The number is positive"
is.it.positive(-2)
## [1] "The number is not positive"
is.it.positive(0)
## [1] "The number is not positive"

Lists

The "l" in lapply() stands for "list", because that's what you get as a result.

lists can hold any type of data, not every element needs to be the same type.

myList <- list(firstOne=1, nextOne="too", lastOne=rnorm(10))
myList
## $firstOne
## [1] 1
## 
## $nextOne
## [1] "too"
## 
## $lastOne
##  [1]  1.19331760 -0.26058574  0.36991882 -1.35527832 -0.24278636
##  [6] -1.16880655  0.28993881  0.43269700  0.08443408  0.80500857

often use double brackets [[]] to index a list

myList[[3]]
##  [1]  1.19331760 -0.26058574  0.36991882 -1.35527832 -0.24278636
##  [6] -1.16880655  0.28993881  0.43269700  0.08443408  0.80500857
myList[3]
## $lastOne
##  [1]  1.19331760 -0.26058574  0.36991882 -1.35527832 -0.24278636
##  [6] -1.16880655  0.28993881  0.43269700  0.08443408  0.80500857

For five marks….what is the difference?

[[]] versus []