See all the extra notes in advanced R.
as.character(NA) means a missing character value.
Comments are # like this
.
Scope
R is lexically scoped, but you can also look at evaluation frames.
Quoting and Evaluation
D()
does eval.
expression()
does quote.
Global Config
Use options(OPT = val)
to set these.
Naming
Syntactic names start with letters or a dot. They may also include numbers and underscores. Alternatively, you can use just about anything if you quote it.
Control Flow
if
, while
and for (x in y)
all behave as normal.
switch(f, listOfExpressions)
is a function which will use the output of f as either a list index or a dictionary key.
Files
We include files using source()
. This has an echo
parameter.
We load a package using library()
.
Numbers
all.equal()
compares things, allowing a tolerance.
x:y
can be used to create a range (which is a Vector).
Errors
Use tryCatch()
around an expression.
stop()
warning()
Debugging
browser()
halts and passes control to the user so they can inspect things.
debug()
and undebug()
do something similar?
trace()
and untrace()
hook a function.
Strings
cat()
shows a string unescaped.
Factors
These are enumerations.
They may be ordered.
Objects
typeof
returns the type.
Objects have an attrs
property which contains a key-value map.
<-
is assignment.
NULL
Null is a singleton. Test for it with is.null()
.
Classes
Defined using a class
attribute.
Methods
Declare our function body as UseMethod('methodName', dispatchArgument)
. If dispatchArgument is not specified, we use the first argument of the function itself instead.
Write myFunction.class <- function(args) classSpecificBody
Potentially you could keep your methods on some other object, but that would be confusing.
NextMethod
provides an inheritance mechanism.
Symbols
We make symbols using as.name
and quote
.
They are used as names.
Promises
R has lazy evaluation. This is mostly hidden away from us.
The value in a promise is empty until the first time we look at it.
Functions
Typing the name of a function in the REPL shows the source code.
Function objects are closures. They contain an arg list, a body, and an environment. Their arguments are supplied to them as promises.x
The function body could point to all sorts of things, e.g. a symbol or a constant.
There isn't a difference between operators and functions, although operators usually need to be quoted if we want to use them as functions due to having nonstandard names. This even applies to indexing.
Primitive Functions
Test for these with is.primitive
. These use call-by-reference, and so are confusing.
Higher Order functions
Vectorize()
works on a scalar function.
outer()
does a full join, thus creating two large vectors of all the combinations. If then applies a function to this.
Defining Functions
function (arglist) body
Where body is usually in braces.
We can specify default arguments. These are evaluated in the function's own frame.
We can use ...
to gather multiple arguments into a pairlist.
Expressions
Each expression contains one or more statements, which can be accessed as a list.
These are already parsed, but not evaled.
Environments
Contains a frame (key-value dict) and an enclosure (pointer to parent).
emptyenv()
is the top-most parent.
Environments are re-used. Modifying them may have effects in many places.
Vector
If you do a vector operation, the size of the longest vector will be used. The other vectors will be repeated as necessary. If any vector is length 0, this rule is reversed.
Transforms
There's stack()
and unstack()
.
There's also reshape()
which is generally powerful.
Filtering
Use which()
to filter based on an expression.
We can also use subset()
. What's the difference?
Indexing
Everything is 1-indexed.
[] returns a list. It can accept other vectors as an argument. [[]] returns a single element. $ works on recursive cell-based data structures. It doesn't allow numerical indexing - only by characters/symbols.
We don't have to specify all of our indexes - we can leave some out to say 'all of these'. e.g. x[,0]
creates a single-column matrix.
We can also pass in vectors or matrices of indexes.
You can index by integer, true/false, character (checked against the names attributes), or Factor (converted to int).
You can modify the indexing functions to have special behaviour for your classes. Dataframes do this.
- Flattening
Matrices may be automatically flattened into vectors. You may use
drop = FALSE
as an extra argument to prevent this.R can also have size 0 vectors and matrices in this way.
- Settings NULLs
If x is a vector and i an index to that vector,
x[i] <- NULL
removes an item, whilex[i] <- list(NULL)
sets it to null.
Names
Vectors may have a names
attribute which can be used as an alternative way to index its contents.
Vector Types
Vectors can include booleans, lists, and integer, floating and complex numbers.
Generic Vectors
Lists are vectors which contain arbitrary objects (not restricted to one type).
Arrays and Matrices
These have a fixed size described in their dim
attribute.
A matrix will be implemented on top of a single vector of the same length as the product of the values in dim
.
The dirnames
attribute labels the matrix dimensions.
Data Frames
This is a general thing which is a labelled cases-by-variables matrix of data. Whatever that means?
They have an order()
function.
Getting help
help(package=MyPackage)
help(function)
?
is the help operator.
Packages
R packages can includes tests and demos.
R and Emacs
Emacs Speaks Statistics
ESS gives you an interactive shell. It also keeps a transcript.
M-x R
runs an R process.
C-c C-l
loads an R file using source()
.
C-c C-e C-d
is M-x ess-dump-object-into-edit-buffer
.
C-c C-v
is M-x ess-display-help-on-object
.
Tab first attempts to indent, then does command completion.
Installations
sudo aptitude install ess r-recommended
I might want to add this to .emacs:
(setq ess-history-file nil)
Do I need to install auto-complete?
Documentation
It can write R documentation files. It can also write inline structured comments using Roxygen.
Org-Babel
Org-babel will use an ESS buffer.
Docs
R Language Definition (I got to https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Control-structures).
R Internals is probably mostly not that useful.
Testing
Use testthat with these instructions.
install.packages("testthat");
Plyr
Split-Apply-Combine for data analysis plyr on Github
Names are like 'ioply', where i is input and o is output. These can be a for array, d for data frame, l for list, or _ for discarded output.
e.g. ddply
goes from data-frame to data-frame.
Naming
The .
function is used to capture variable names at various points.
You can use it with plain variables .(a, b, c)
, with functions .(a * b)
, and with name assignments .(product = a * b)
.
m*ply
There is a family of functions m*ply
which behaves like mapply.
r*ply
There is a family of functions r*ply
which is used for sampling.
splat
Takes a function which takes multiple arguments. Returns a function which takes a list.
each
Takes a list of functions, returns a functions that returns a list with the result of each of its inputs, so:
each(max, min)
returns a function which makes:
c(min = min(x), max = max(x))
colwise
Takes a function of a vector. Returns a function of a data-frame.
failwith
Replace errors with a default value.
ggplot2
Differs from the 'The Grammar of Graphics' book.
Layered data.
Hierarchy of defaults.
Position, style etc. are aesthetics. We convert data into aesthetic units by applying a scale and coordinate system.
We may also wish to apply a statistical transformation such as binning or aggregating.
Defaults
There is a cascading design where everything has defaults, and you can override things at multiple levels.
Faceting
Faceting is splitting a dataset into subsets.
Layers
Layers allow us to include, for example, a scatterplot and a smoothed line on the same graphic.
A layer has a geometry, a statistical transformation, a position adjustment, a dataset and a set of aesthetic mappings.
Great, well that's certainly cleared that up.
Syntax
We use +
to put pieces together, starting from a ggplot()
call.
We use the aes()
command to specify our aesthetics.
We use ..property..
when referring to variables computed by a statistic.
Histogram Example
A histogram is defined thusly:
ggplot(data = diamonds, mapping = aes(price)) + layer(geom = "bar", stat = "bin", mapping = aes(y = ..count..))
Parsing
deparse
lets us turn objects into expressions.