Advanced R
Data structures
Vectors
Have type
, length
and attributes
.
Atomic vectors (containing atomic values) are always flat.
Lists
Lists can contain things of different types, and are recursive.
Factors
Modifiy the levels of factor by assigning to levels: levels(myFactor) <- myNewLevels
.
This was rearrange the insides as well, because our new levels are converted to integers c(1, 2, 3, ...)
, but the actual data in our factor is unchanged.
Avoid this using relevel
.
Arrays and Matricies
Arrays are nD, matrices are 2D.
- Length and Names
Vectors have length and names.
Matrices have ncol, nrow, colnames, rownames.
Arrays have dim, dimnames.
- Combine
Vectors have c()
Matrices have rbind() and cbind()
Arrays have abind()
- List-Arrays
You can set the above properties on a list to make some weird data structures.
- Data frames
These are a list of vectors. They share a lot of properties with matrices, but allow us to have different types of columns.
TODO Subsetting
Vocab
Functions which I need to learn:
str
Indexing
head
, tail
, subset
order
takes a number of vectors. It returns the ordering of the first vector, with ties broken by the subsequent vectors.
Variables
assign
lets you set a variable (in a particular environment)
get
looks up a variable in an environment
with(data, f)
creates an environment from data, making its names available as variables, then evaluated f.
Comparison
a %in% b
tests if a exists in b
match(a, b)
returns positions of a in b
all.equal
tests near equality (even across arrays etc.)
identical
tests absolute equality and returns just a TRUE or FALSE, no vectors involved
Style Guide
TODO Functions
Objects
http://adv-r.had.co.nz/OO-essentials.html
I()
prevents conversion of an object.
There's:
- S3 (lax)
- S4 (strict)
- RC (Java-style, mutable)
S3
You can assign a class to a object.
You can put methods on an object.
You can create a generic dispatch mechanism.
Environments
http://adv-r.had.co.nz/Environments.html
Environments bind names to values, much like in a list or data frame. They're all kinds of associative array.
Environments are not copied on assignment.
Functions have:
- evaluation environment (temp variables)
- enclosing environment (environment())
- parent frame (calling environment)
## These go up the call stack sys.call(0) == sys.parent(1) ## These go up the call stack recursively. sys.calls(), sys.frames(), sys.parents() ## This bundles the above in a user-friendly way. sys.status() ## This gets the current evaluation environment environment() ## This gets the enclosing environment parent.env(environment())
TODO Debugging
Weird Evaluation
http://adv-r.had.co.nz/Computing-on-the-language.html
(got as far as plyr::arrange() exercise)
There's the pryr library to help with this.
Functions return promises. substitute
captures those promises as an expression instead of evaluating them.
quote
is simpler than substitute
. It doesn't go looking for the name of the thing in the environment, it just gets it as is. eval
and quote
are opposites.
evalq
is quote
wrapped in eval
. It's useful for something?
deparse
takes an expression and makes a string. It's often useful when returning errors.
You can pass the enclos argument to eval
to choose its environment's parent.
Non-standard execution is good in the REPL, less so in the script? It makes things less composable? This is because substitute
cares about the parent scope. It makes functions referentially opaque.
Assignment is hard to reason about because it uses non-standard execution.
Always provide an escape hatch version if you're using non-standard execution in a function. This allows someone to pass in an already quoted argument.
DONE Non-standard Eval Rules
TODO Formula Objects
Need to understand ~
.
TODO Expressions
http://adv-r.had.co.nz/Expressions.html
Names and symbols are sometimes confused, as are call and language objects.
Helper Functions
all.vars
all.names
Tests
We can use is.call
, is.symbol
or is.name
, and is.pairlist
.
Assignments, indexes, operators etc. are types of call.
Missing name symbol
Create this using quote(expr=).
Formals
The formals
function lets us get and set the arguments of a function, e.g. by changing the defaults.
Calls
The first part is a symbol which names the call.
The other parts are arguments. You can get them out by name or position.
Pryr provides standardise_call
to rearrange the arguments to be ordered in a standard way (probably alphabetically by name?).
Pairlists
These are cons cells.
They're used for function arguments.
TODO codetools
findGlobals
TODO Macros in R
https://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf#page=10
Macros don't come with environments/scope. Mostly this is bad, but sometimes it is useful.
For example, in combination with assignments, they tend to modify the original version of a thing. This is useful when we want to do multiple changes to the same object (since R generally does not allow actual mutation).
TODO DSLs
http://adv-r.had.co.nz/dsl.html
subset2 <- function(x, condition) { conditioncall <- quote(condition) r <- eval(conditioncall, x) x[r, , drop = FALSE] }