Thu 14 Jan 2016 — Sat 21 Sep 2019

Advanced R

Data structures


Have type, length and attributes.

Atomic vectors (containing atomic values) are always flat.


Lists can contain things of different types, and are recursive.


Modifiy the levels of factor by assigning to levels: levels(myFactor) <- myNewLevels.

This was rearrange the insides as well, because our new levels are converted to integers c(1, 2, 3, ...), but the actual data in our factor is unchanged.

Avoid this using relevel.

Arrays and Matricies

Arrays are nD, matrices are 2D.

  • Length and Names

    Vectors have length and names.

    Matrices have ncol, nrow, colnames, rownames.

    Arrays have dim, dimnames.

  • Combine

    Vectors have c()

    Matrices have rbind() and cbind()

    Arrays have abind()

  • List-Arrays

    You can set the above properties on a list to make some weird data structures.

  • Data frames

    These are a list of vectors. They share a lot of properties with matrices, but allow us to have different types of columns.

TODO Subsetting


Functions which I need to learn: str


head, tail, subset

order takes a number of vectors. It returns the ordering of the first vector, with ties broken by the subsequent vectors.


assign lets you set a variable (in a particular environment) get looks up a variable in an environment with(data, f) creates an environment from data, making its names available as variables, then evaluated f.


a %in% b tests if a exists in b match(a, b) returns positions of a in b

all.equal tests near equality (even across arrays etc.) identical tests absolute equality and returns just a TRUE or FALSE, no vectors involved

Style Guide


I() prevents conversion of an object.


  • S3 (lax)
  • S4 (strict)
  • RC (Java-style, mutable)


You can assign a class to a object.

You can put methods on an object.

You can create a generic dispatch mechanism.


Environments bind names to values, much like in a list or data frame. They're all kinds of associative array.

Environments are not copied on assignment.

Functions have:

  • evaluation environment (temp variables)
  • enclosing environment (environment())
  • parent frame (calling environment)
## These go up the call stack == sys.parent(1)

## These go up the call stack recursively.
sys.calls(), sys.frames(), sys.parents()

## This bundles the above in a user-friendly way.

## This gets the current evaluation environment

## This gets the enclosing environment

Weird Evaluation

(got as far as plyr::arrange() exercise)

There's the pryr library to help with this.

Functions return promises. substitute captures those promises as an expression instead of evaluating them.

quote is simpler than substitute. It doesn't go looking for the name of the thing in the environment, it just gets it as is. eval and quote are opposites.

evalq is quote wrapped in eval. It's useful for something?

deparse takes an expression and makes a string. It's often useful when returning errors.

You can pass the enclos argument to eval to choose its environment's parent.

Non-standard execution is good in the REPL, less so in the script? It makes things less composable? This is because substitute cares about the parent scope. It makes functions referentially opaque.

Assignment is hard to reason about because it uses non-standard execution.

Always provide an escape hatch version if you're using non-standard execution in a function. This allows someone to pass in an already quoted argument.

TODO Formula Objects

Need to understand ~.

TODO Expressions

Names and symbols are sometimes confused, as are call and language objects.

Helper Functions

all.vars all.names


We can use, is.symbol or, and is.pairlist.

Assignments, indexes, operators etc. are types of call.

Missing name symbol

Create this using quote(expr=).


The formals function lets us get and set the arguments of a function, e.g. by changing the defaults.


The first part is a symbol which names the call.

The other parts are arguments. You can get them out by name or position.

Pryr provides standardise_call to rearrange the arguments to be ordered in a standard way (probably alphabetically by name?).


These are cons cells.

They're used for function arguments.

TODO codetools


TODO Macros in R

Macros don't come with environments/scope. Mostly this is bad, but sometimes it is useful.

For example, in combination with assignments, they tend to modify the original version of a thing. This is useful when we want to do multiple changes to the same object (since R generally does not allow actual mutation).


subset2 <- function(x, condition) { conditioncall <- quote(condition) r <- eval(conditioncall, x) x[r, , drop = FALSE] }