See all the extra notes in advanced R.
as.character(NA) means a missing character value.
# like this.
R is lexically scoped, but you can also look at evaluation frames.
Quoting and Evaluation
D() does eval.
expression() does quote.
options(OPT = val) to set these.
Syntactic names start with letters or a dot. They may also include numbers and underscores. Alternatively, you can use just about anything if you quote it.
for (x in y) all behave as normal.
switch(f, listOfExpressions) is a function which will use the output of f as either a list index or a dictionary key.
We include files using
source(). This has an
We load a package using
all.equal() compares things, allowing a tolerance.
x:y can be used to create a range (which is a Vector).
tryCatch() around an expression.
browser() halts and passes control to the user so they can inspect things.
undebug() do something similar?
untrace() hook a function.
cat() shows a string unescaped.
These are enumerations.
They may be ordered.
typeof returns the type.
Objects have an
attrs property which contains a key-value map.
<- is assignment.
Null is a singleton. Test for it with
Defined using a
Declare our function body as
UseMethod('methodName', dispatchArgument). If dispatchArgument is not specified, we use the first argument of the function itself instead.
myFunction.class <- function(args) classSpecificBody
Potentially you could keep your methods on some other object, but that would be confusing.
NextMethod provides an inheritance mechanism.
We make symbols using
They are used as names.
R has lazy evaluation. This is mostly hidden away from us.
The value in a promise is empty until the first time we look at it.
Typing the name of a function in the REPL shows the source code.
Function objects are closures. They contain an arg list, a body, and an environment. Their arguments are supplied to them as promises.x
The function body could point to all sorts of things, e.g. a symbol or a constant.
There isn't a difference between operators and functions, although operators usually need to be quoted if we want to use them as functions due to having nonstandard names. This even applies to indexing.
Test for these with
is.primitive. These use call-by-reference, and so are confusing.
Higher Order functions
Vectorize() works on a scalar function.
outer() does a full join, thus creating two large vectors of all the combinations. If then applies a function to this.
function (arglist) body
Where body is usually in braces.
We can specify default arguments. These are evaluated in the function's own frame.
We can use
... to gather multiple arguments into a pairlist.
Each expression contains one or more statements, which can be accessed as a list.
These are already parsed, but not evaled.
Contains a frame (key-value dict) and an enclosure (pointer to parent).
emptyenv() is the top-most parent.
Environments are re-used. Modifying them may have effects in many places.
If you do a vector operation, the size of the longest vector will be used. The other vectors will be repeated as necessary. If any vector is length 0, this rule is reversed.
reshape() which is generally powerful.
which() to filter based on an expression.
We can also use
subset(). What's the difference?
Everything is 1-indexed.
 returns a list. It can accept other vectors as an argument. [] returns a single element. $ works on recursive cell-based data structures. It doesn't allow numerical indexing - only by characters/symbols.
We don't have to specify all of our indexes - we can leave some out to say 'all of these'. e.g.
x[,0] creates a single-column matrix.
We can also pass in vectors or matrices of indexes.
You can index by integer, true/false, character (checked against the names attributes), or Factor (converted to int).
You can modify the indexing functions to have special behaviour for your classes. Dataframes do this.
Matrices may be automatically flattened into vectors. You may use
drop = FALSEas an extra argument to prevent this.
R can also have size 0 vectors and matrices in this way.
- Settings NULLs
If x is a vector and i an index to that vector,
x[i] <- NULLremoves an item, while
x[i] <- list(NULL)sets it to null.
Vectors may have a
names attribute which can be used as an alternative way to index its contents.
Vectors can include booleans, lists, and integer, floating and complex numbers.
Lists are vectors which contain arbitrary objects (not restricted to one type).
Arrays and Matrices
These have a fixed size described in their
A matrix will be implemented on top of a single vector of the same length as the product of the values in
dirnames attribute labels the matrix dimensions.
This is a general thing which is a labelled cases-by-variables matrix of data. Whatever that means?
They have an
? is the help operator.
R packages can includes tests and demos.
R and Emacs
Emacs Speaks Statistics
ESS gives you an interactive shell. It also keeps a transcript.
M-x R runs an R process.
C-c C-l loads an R file using
C-c C-e C-d is
C-c C-v is
Tab first attempts to indent, then does command completion.
sudo aptitude install ess r-recommended
I might want to add this to .emacs:
(setq ess-history-file nil)
Do I need to install auto-complete?
It can write R documentation files. It can also write inline structured comments using Roxygen.
Org-babel will use an ESS buffer.
R Internals is probably mostly not that useful.
Names are like 'ioply', where i is input and o is output. These can be a for array, d for data frame, l for list, or _ for discarded output.
ddply goes from data-frame to data-frame.
. function is used to capture variable names at various points.
You can use it with plain variables
.(a, b, c), with functions
.(a * b), and with name assignments
.(product = a * b).
There is a family of functions
m*ply which behaves like mapply.
There is a family of functions
r*ply which is used for sampling.
Takes a function which takes multiple arguments. Returns a function which takes a list.
Takes a list of functions, returns a functions that returns a list with the result of each of its inputs, so:
each(max, min) returns a function which makes:
c(min = min(x), max = max(x))
Takes a function of a vector. Returns a function of a data-frame.
Replace errors with a default value.
Differs from the 'The Grammar of Graphics' book.
Hierarchy of defaults.
Position, style etc. are aesthetics. We convert data into aesthetic units by applying a scale and coordinate system.
We may also wish to apply a statistical transformation such as binning or aggregating.
There is a cascading design where everything has defaults, and you can override things at multiple levels.
Faceting is splitting a dataset into subsets.
Layers allow us to include, for example, a scatterplot and a smoothed line on the same graphic.
A layer has a geometry, a statistical transformation, a position adjustment, a dataset and a set of aesthetic mappings.
Great, well that's certainly cleared that up.
+ to put pieces together, starting from a
We use the
aes() command to specify our aesthetics.
..property.. when referring to variables computed by a statistic.
A histogram is defined thusly:
ggplot(data = diamonds, mapping = aes(price)) + layer(geom = "bar", stat = "bin", mapping = aes(y = ..count..))
deparse lets us turn objects into expressions.