Mon 19 Oct 2015 — Sat 21 Sep 2019

See all the extra notes in advanced R.

as.character(NA) means a missing character value.

Comments are # like this.


R is lexically scoped, but you can also look at evaluation frames.

Quoting and Evaluation

D() does eval. expression() does quote.

Global Config

Use options(OPT = val) to set these.


Syntactic names start with letters or a dot. They may also include numbers and underscores. Alternatively, you can use just about anything if you quote it.

Control Flow

if, while and for (x in y) all behave as normal.

switch(f, listOfExpressions) is a function which will use the output of f as either a list index or a dictionary key.


We include files using source(). This has an echo parameter.

We load a package using library().


all.equal() compares things, allowing a tolerance.

x:y can be used to create a range (which is a Vector).


Use tryCatch() around an expression.

stop() warning()


browser() halts and passes control to the user so they can inspect things.

debug() and undebug() do something similar?

trace() and untrace() hook a function.


cat() shows a string unescaped.


These are enumerations.

They may be ordered.


typeof returns the type.

Objects have an attrs property which contains a key-value map.

<- is assignment.


Null is a singleton. Test for it with is.null().


Defined using a class attribute.


Declare our function body as UseMethod('methodName', dispatchArgument). If dispatchArgument is not specified, we use the first argument of the function itself instead.

Write myFunction.class <- function(args) classSpecificBody

Potentially you could keep your methods on some other object, but that would be confusing.

NextMethod provides an inheritance mechanism.


We make symbols using and quote.

They are used as names.


R has lazy evaluation. This is mostly hidden away from us.

The value in a promise is empty until the first time we look at it.


Typing the name of a function in the REPL shows the source code.

Function objects are closures. They contain an arg list, a body, and an environment. Their arguments are supplied to them as promises.x

The function body could point to all sorts of things, e.g. a symbol or a constant.

There isn't a difference between operators and functions, although operators usually need to be quoted if we want to use them as functions due to having nonstandard names. This even applies to indexing.

Primitive Functions

Test for these with is.primitive. These use call-by-reference, and so are confusing.

Higher Order functions

Vectorize() works on a scalar function.

outer() does a full join, thus creating two large vectors of all the combinations. If then applies a function to this.

Defining Functions

function (arglist) body

Where body is usually in braces.

We can specify default arguments. These are evaluated in the function's own frame.

We can use ... to gather multiple arguments into a pairlist.


Each expression contains one or more statements, which can be accessed as a list.

These are already parsed, but not evaled.


Contains a frame (key-value dict) and an enclosure (pointer to parent).

emptyenv() is the top-most parent.

Environments are re-used. Modifying them may have effects in many places.


If you do a vector operation, the size of the longest vector will be used. The other vectors will be repeated as necessary. If any vector is length 0, this rule is reversed.


There's stack() and unstack().

There's also reshape() which is generally powerful.


Use which() to filter based on an expression.

We can also use subset(). What's the difference?


Everything is 1-indexed.

[] returns a list. It can accept other vectors as an argument. [[]] returns a single element. $ works on recursive cell-based data structures. It doesn't allow numerical indexing - only by characters/symbols.

We don't have to specify all of our indexes - we can leave some out to say 'all of these'. e.g. x[,0] creates a single-column matrix.

We can also pass in vectors or matrices of indexes.

You can index by integer, true/false, character (checked against the names attributes), or Factor (converted to int).

You can modify the indexing functions to have special behaviour for your classes. Dataframes do this.

  • Flattening

    Matrices may be automatically flattened into vectors. You may use drop = FALSE as an extra argument to prevent this.

    R can also have size 0 vectors and matrices in this way.

  • Settings NULLs

    If x is a vector and i an index to that vector, x[i] <- NULL removes an item, while x[i] <- list(NULL) sets it to null.


Vectors may have a names attribute which can be used as an alternative way to index its contents.

Vector Types

Vectors can include booleans, lists, and integer, floating and complex numbers.

Generic Vectors

Lists are vectors which contain arbitrary objects (not restricted to one type).

Arrays and Matrices

These have a fixed size described in their dim attribute.

A matrix will be implemented on top of a single vector of the same length as the product of the values in dim.

The dirnames attribute labels the matrix dimensions.

Data Frames

This is a general thing which is a labelled cases-by-variables matrix of data. Whatever that means?

They have an order() function.

Getting help

  • help(package=MyPackage)
  • help(function)

? is the help operator.


R packages can includes tests and demos.

R and Emacs

Emacs Speaks Statistics

ESS Manual

ESS gives you an interactive shell. It also keeps a transcript.

M-x R runs an R process.

C-c C-l loads an R file using source().

C-c C-e C-d is M-x ess-dump-object-into-edit-buffer.

C-c C-v is M-x ess-display-help-on-object.

Tab first attempts to indent, then does command completion.


sudo aptitude install ess r-recommended

I might want to add this to .emacs:

(setq ess-history-file nil)

Do I need to install auto-complete?


It can write R documentation files. It can also write inline structured comments using Roxygen.


Org-babel will use an ESS buffer.



Use testthat with these instructions.



Split-Apply-Combine for data analysis plyr on Github

Names are like 'ioply', where i is input and o is output. These can be a for array, d for data frame, l for list, or _ for discarded output.

e.g. ddply goes from data-frame to data-frame.


The . function is used to capture variable names at various points.

You can use it with plain variables .(a, b, c), with functions .(a * b), and with name assignments .(product = a * b).


There is a family of functions m*ply which behaves like mapply.


There is a family of functions r*ply which is used for sampling.


Takes a function which takes multiple arguments. Returns a function which takes a list.


Takes a list of functions, returns a functions that returns a list with the result of each of its inputs, so:

each(max, min) returns a function which makes:

c(min = min(x), max = max(x))


Takes a function of a vector. Returns a function of a data-frame.


Replace errors with a default value.


A Layered Grammar of Graphics

Differs from the 'The Grammar of Graphics' book.

Layered data.

Hierarchy of defaults.

Position, style etc. are aesthetics. We convert data into aesthetic units by applying a scale and coordinate system.

We may also wish to apply a statistical transformation such as binning or aggregating.


There is a cascading design where everything has defaults, and you can override things at multiple levels.


Faceting is splitting a dataset into subsets.


Layers allow us to include, for example, a scatterplot and a smoothed line on the same graphic.

A layer has a geometry, a statistical transformation, a position adjustment, a dataset and a set of aesthetic mappings.

Great, well that's certainly cleared that up.


We use + to put pieces together, starting from a ggplot() call.

We use the aes() command to specify our aesthetics.

We use when referring to variables computed by a statistic.

Histogram Example

A histogram is defined thusly:

ggplot(data = diamonds, mapping = aes(price)) + layer(geom = "bar", stat = "bin", mapping = aes(y = ..count..))


deparse lets us turn objects into expressions.