Mon 19 Oct 2015 — Sat 21 Sep 2019

See all the extra notes in advanced R.

as.character(NA) means a missing character value.

Comments are # like this.

Scope

R is lexically scoped, but you can also look at evaluation frames.

Quoting and Evaluation

D() does eval. expression() does quote.

Global Config

Use options(OPT = val) to set these.

Naming

Syntactic names start with letters or a dot. They may also include numbers and underscores. Alternatively, you can use just about anything if you quote it.

Control Flow

if, while and for (x in y) all behave as normal.

switch(f, listOfExpressions) is a function which will use the output of f as either a list index or a dictionary key.

Files

We include files using source(). This has an echo parameter.

We load a package using library().

Numbers

all.equal() compares things, allowing a tolerance.

x:y can be used to create a range (which is a Vector).

Errors

Use tryCatch() around an expression.

stop() warning()

Debugging

browser() halts and passes control to the user so they can inspect things.

debug() and undebug() do something similar?

trace() and untrace() hook a function.

Strings

cat() shows a string unescaped.

Factors

These are enumerations.

They may be ordered.

Objects

typeof returns the type.

Objects have an attrs property which contains a key-value map.

<- is assignment.

NULL

Null is a singleton. Test for it with is.null().

Classes

Defined using a class attribute.

Methods

Declare our function body as UseMethod('methodName', dispatchArgument). If dispatchArgument is not specified, we use the first argument of the function itself instead.

Write myFunction.class <- function(args) classSpecificBody

Potentially you could keep your methods on some other object, but that would be confusing.

NextMethod provides an inheritance mechanism.

Symbols

We make symbols using as.name and quote.

They are used as names.

Promises

R has lazy evaluation. This is mostly hidden away from us.

The value in a promise is empty until the first time we look at it.

Functions

Typing the name of a function in the REPL shows the source code.

Function objects are closures. They contain an arg list, a body, and an environment. Their arguments are supplied to them as promises.x

The function body could point to all sorts of things, e.g. a symbol or a constant.

There isn't a difference between operators and functions, although operators usually need to be quoted if we want to use them as functions due to having nonstandard names. This even applies to indexing.

Primitive Functions

Test for these with is.primitive. These use call-by-reference, and so are confusing.

Higher Order functions

Vectorize() works on a scalar function.

outer() does a full join, thus creating two large vectors of all the combinations. If then applies a function to this.

Defining Functions

function (arglist) body

Where body is usually in braces.

We can specify default arguments. These are evaluated in the function's own frame.

We can use ... to gather multiple arguments into a pairlist.

Expressions

Each expression contains one or more statements, which can be accessed as a list.

These are already parsed, but not evaled.

Environments

Contains a frame (key-value dict) and an enclosure (pointer to parent).

emptyenv() is the top-most parent.

Environments are re-used. Modifying them may have effects in many places.

Vector

If you do a vector operation, the size of the longest vector will be used. The other vectors will be repeated as necessary. If any vector is length 0, this rule is reversed.

Transforms

There's stack() and unstack().

There's also reshape() which is generally powerful.

Filtering

Use which() to filter based on an expression.

We can also use subset(). What's the difference?

Indexing

Everything is 1-indexed.

[] returns a list. It can accept other vectors as an argument. [[]] returns a single element. $ works on recursive cell-based data structures. It doesn't allow numerical indexing - only by characters/symbols.

We don't have to specify all of our indexes - we can leave some out to say 'all of these'. e.g. x[,0] creates a single-column matrix.

We can also pass in vectors or matrices of indexes.

You can index by integer, true/false, character (checked against the names attributes), or Factor (converted to int).

You can modify the indexing functions to have special behaviour for your classes. Dataframes do this.

  • Flattening

    Matrices may be automatically flattened into vectors. You may use drop = FALSE as an extra argument to prevent this.

    R can also have size 0 vectors and matrices in this way.

  • Settings NULLs

    If x is a vector and i an index to that vector, x[i] <- NULL removes an item, while x[i] <- list(NULL) sets it to null.

Names

Vectors may have a names attribute which can be used as an alternative way to index its contents.

Vector Types

Vectors can include booleans, lists, and integer, floating and complex numbers.

Generic Vectors

Lists are vectors which contain arbitrary objects (not restricted to one type).

Arrays and Matrices

These have a fixed size described in their dim attribute.

A matrix will be implemented on top of a single vector of the same length as the product of the values in dim.

The dirnames attribute labels the matrix dimensions.

Data Frames

This is a general thing which is a labelled cases-by-variables matrix of data. Whatever that means?

They have an order() function.

Getting help

  • help(package=MyPackage)
  • help(function)

? is the help operator.

Packages

R packages can includes tests and demos.

R and Emacs

Emacs Speaks Statistics

ESS Manual

ESS gives you an interactive shell. It also keeps a transcript.

M-x R runs an R process.

C-c C-l loads an R file using source().

C-c C-e C-d is M-x ess-dump-object-into-edit-buffer.

C-c C-v is M-x ess-display-help-on-object.

Tab first attempts to indent, then does command completion.

Installations

sudo aptitude install ess r-recommended

I might want to add this to .emacs:

(setq ess-history-file nil)

Do I need to install auto-complete?

Documentation

It can write R documentation files. It can also write inline structured comments using Roxygen.

Org-Babel

Org-babel will use an ESS buffer.

Docs

Testing

Use testthat with these instructions.

install.packages("testthat");

Plyr

Split-Apply-Combine for data analysis plyr on Github

Names are like 'ioply', where i is input and o is output. These can be a for array, d for data frame, l for list, or _ for discarded output.

e.g. ddply goes from data-frame to data-frame.

Naming

The . function is used to capture variable names at various points.

You can use it with plain variables .(a, b, c), with functions .(a * b), and with name assignments .(product = a * b).

m*ply

There is a family of functions m*ply which behaves like mapply.

r*ply

There is a family of functions r*ply which is used for sampling.

splat

Takes a function which takes multiple arguments. Returns a function which takes a list.

each

Takes a list of functions, returns a functions that returns a list with the result of each of its inputs, so:

each(max, min) returns a function which makes:

c(min = min(x), max = max(x))

colwise

Takes a function of a vector. Returns a function of a data-frame.

failwith

Replace errors with a default value.

ggplot2

A Layered Grammar of Graphics

Differs from the 'The Grammar of Graphics' book.

Layered data.

Hierarchy of defaults.

Position, style etc. are aesthetics. We convert data into aesthetic units by applying a scale and coordinate system.

We may also wish to apply a statistical transformation such as binning or aggregating.

Defaults

There is a cascading design where everything has defaults, and you can override things at multiple levels.

Faceting

Faceting is splitting a dataset into subsets.

Layers

Layers allow us to include, for example, a scatterplot and a smoothed line on the same graphic.

A layer has a geometry, a statistical transformation, a position adjustment, a dataset and a set of aesthetic mappings.

Great, well that's certainly cleared that up.

Syntax

We use + to put pieces together, starting from a ggplot() call.

We use the aes() command to specify our aesthetics.

We use ..property.. when referring to variables computed by a statistic.

Histogram Example

A histogram is defined thusly:

ggplot(data = diamonds, mapping = aes(price)) + layer(geom = "bar", stat = "bin", mapping = aes(y = ..count..))

Parsing

deparse lets us turn objects into expressions.