This is a fairly disorganized bunch of stuff to help me nail down what things mean very broadly, without spending the time to really understand them (yet).
Kernel methods have some concept of similarity or nearness. (Similarity doesn't necessarily have to be defined as a distance, but it is common and often convenient.)
Usually this is nearness to the training data, or to something calculated from the training data.
This is somehow inner-product related.
We can map our feature space onto some other space, then minimize that other space.
Add extra information to help with overfitting.
Usually this is a penalty on complexity.
Norms are length/distance measurements.
They operate on vectors.
The zero vector should be length 0, other vectors should not be.
Semi-norms can have many length 0 vectors.
Let p be a positive integer (not 0).
thing = Sum over all components of a vector:
- (component ^ p)
thing ^ (1/p)
p of 1 is 'taxicab norm', p of 2 is Euclidean, and so on.
When we talk about L1, L2 and so on, we are talking about a space described by p-norms where p is that number.