This is a fairly disorganized bunch of stuff to help me nail down what things mean very broadly, without spending the time to really understand them (yet).

## Kernel Methods

Kernel methods have some concept of similarity or nearness. (Similarity doesn't necessarily have to be defined as a distance, but it is common and often convenient.)

Usually this is nearness to the training data, or to something calculated from the training data.

### Kernel Trick

This is somehow inner-product related.

We can map our feature space onto some other space, then minimize that other space.

## Regularization

Add extra information to help with overfitting.

Usually this is a penalty on complexity.

## Norms

Norms are length/distance measurements.

They operate on vectors.

The zero vector should be length 0, other vectors should not be.

Semi-norms can have many length 0 vectors.

### p-norm

Let p be a positive integer (not 0).

thing = Sum over all components of a vector:

- (component ^ p)

thing ^ (1/p)

p of 1 is 'taxicab norm', p of 2 is Euclidean, and so on.

#### Lp space

When we talk about L1, L2 and so on, we are talking about a space described by p-norms where p is that number.

Great.