Zum Inhalt springen
# General Learning Theory

#### In a general learning setting, we have an Input-Space \(\mathcal{X}\) and an Output-Space \(\mathcal{Y}\). For example could the Input \(\mathcal{X}\) be the space of all pixel-values of an Image and the Output \(\mathcal{Y}\) the set of all classes, which could be seen in the picture. The goal is to find a mapping \(h: \mathcal{X} \rightarrow \mathcal{Y}\), which in the best case represents the true mapping from every possible \(x \in \mathcal{X}\) to the correct \(y\). This could than be seen as learning the real-world relationship between both variables.

In general an \((x,y)\) Tupel will come from a probability Distribution \(P(x,y) \) over \(\mathcal{X}x\mathcal{Y}\), so some values \(x\) will occur with different outcomes \(y\). Therefore to find a mapping \(h\) to map from \(x\) to always the one correct \(y\) is impossible. Instead we would like to find a function \(h\) which minimizes a loss-function \(l(h(x),y)\) over the probability Distribution \(P(x,y) \). Formally this is done by minimizing the expected value of the loss-function (dependend of h)

#### \(E(h)=\int l(h(x),y) dP(x,y)\\ \)

#### over all possible mappings \(h\) from a a-priori designed Hypotheses-Space \(\mathcal{H}\).

The real learning part comes from the problem, that one does not know the true probability Distribution \(P(x,y) \). We could only sample a Training-Set \((x_{i},y_{i}), i=1,\dots,n\) from \(P(x,y) \), which allows us to calculate the training-error

#### \(E_{emp}(h)= \frac{1}{n}\sum_{i=1}^{n}l(h(x_{i}),y_{i})\\ \)

####
which is an estimate of \(E(h)\). So instead of minimizing \(E(h)\), we could minimize \(E_{emp}(h)\) over \(h\). If n is large enough, this will likely provide us with a good estimate of \(E(h)\), and also with a mapping \(h\), which should have a low value for \(E(h)\).

The quality of this generalization from the Training-Set is mainly influenced by the number of samples \(n\), the Hypotheses Space \(\mathcal{H}\), which gives us an a-priori bias for the mapping \(h\) (inductive bias) and the choise of the loss-function \(l\). Some of those problems will be adressed in the next blog posts in more detail.

In general an \((x,y)\) Tupel will come from a probability Distribution \(P(x,y) \) over \(\mathcal{X}x\mathcal{Y}\), so some values \(x\) will occur with different outcomes \(y\). Therefore to find a mapping \(h\) to map from \(x\) to always the one correct \(y\) is impossible. Instead we would like to find a function \(h\) which minimizes a loss-function \(l(h(x),y)\) over the probability Distribution \(P(x,y) \). Formally this is done by minimizing the expected value of the loss-function (dependend of h)

The real learning part comes from the problem, that one does not know the true probability Distribution \(P(x,y) \). We could only sample a Training-Set \((x_{i},y_{i}), i=1,\dots,n\) from \(P(x,y) \), which allows us to calculate the training-error

The quality of this generalization from the Training-Set is mainly influenced by the number of samples \(n\), the Hypotheses Space \(\mathcal{H}\), which gives us an a-priori bias for the mapping \(h\) (inductive bias) and the choise of the loss-function \(l\). Some of those problems will be adressed in the next blog posts in more detail.