20 Basis Expansion

In this chapter we describe function basis expansion for regression models.

20.1 Basis Functions

So far we have talked about regression models in their most basic way: linear regression models. So let’s take a step back and recall the general regression framework in which the main idea is to find a function \(f()\) that takes in one or more input features, and returns a response value. In addition, we also suppose that there is a noise or error term to indicate the imperfect nature of our model:

Assumet that we have a data set consisting of \(n\) data points: \(\mathcal{D} = \{ (\mathbf{x_1}, y_1), \dots, (\mathbf{x_n}, y_n) \}\), where \(\mathbf{x_i} \in \mathcal{X}\), and \(y_i \in \mathbb{R}\). In regression we assume an unknown target function \(f: \mathcal{X} \to \mathbb{R}\). In the simplest case when we only have one input feature then \(\mathcal{X} = \mathbb{R}\). In the multidimensional case when there are \(p\) predictors, we have that \(\mathcal{X} = \mathbb{R^p}\).

Perhaps the simplest way to extend linear models is to look for \(f\) in a finite dimensional space of functions spanned by a given basis. In other words, we specify a set of functions \(\phi_0, \phi_1, \dots, \phi_m\) from \(\mathcal{X}\) to \(\mathbb{R}\), and estimate \(f\) in the form of a linear combination as:

\[ \widehat{f}(x) = \sum_{q=0}^{m} b_q \ \phi_q(x) \]

In this way, performing the regression then reduces to finding the parameters \(b_0, b_1, \dots, b_m\).

20.2 Linear Regression

In the one dimensional case, we can use \(\phi_0(x) = 1\) and \(\phi_1(x) = x\). This gives the simple linear regression

\[ \widehat{f}(x) = b_0 \phi_0(x) + b_1 \phi_1(x) = b_0 + b_1 x \]

In the multidimensional case, we can take \(\phi_1(\mathbf{x}) = [\mathbf{x}]_1\), \(\phi_2(\mathbf{x}) = [\mathbf{x}]_2\), all the way to \(\phi_p(\mathbf{x}) = [\mathbf{x}]_p\). Here \([\mathbf{x}]_j\) denotes the \(j\)-th element of the input vector \(\mathbf{x} \in \mathcal{X}\).

\[\begin{align*} \widehat{f}(\mathbf{x_i}) &= b_0 \phi_0(\mathbf{x_i}) + b_1 \phi_1(\mathbf{x_i}) + \dots + b_p \phi_p(\mathbf{x_i}) \\ &= b_0 + b_1 [\mathbf{x_i}]_1 + \dots + b_p [\mathbf{x_i}]_p \\ &= b_0 + b_1 x_{i1} + \dots + b_p x_{ip} \end{align*}\]

Notice that the constant term \(\phi_0(x) = 1\) will be common to all the function classes covered in the next sections.

20.3 Polynomial Regression

Another possible choice of basis (in the one-dimensional case) is to choose \(\phi_q(x) = x^q\) for \(q = 1, 2, \dots, m\). This allows us to fit \(f\) from the class of polynomial functions of degree at most \(m\):

\[\begin{align*} \widehat{f}(x) &= \sum_{q=0}^{m} b_q \phi_q(x) \\ &= b_0 \phi_0(x) + b_1 \phi_1(x) + b_2 \phi_2(x) + \dots + b_m \phi_m(x) \\ &= b_0 + b_1 x + b_2 x^2 + \dots + b_m x^m \end{align*}\]

What about the multidimensional case? Let’s consider an example with \(p=2\) input features \(X_1\) and \(X_2\), and a polynomial of degree \(m=2\).

\[ \widehat{f}(X_1, X_2) = b_0 + b_1 X_1 + b_2 X_2 + b_3 X_1 X_2 + b_4 X_1^2 + b_5 X_2^2 \]

As you can tell, the notation with the \(j\)-index for the features, and the \(q\)-index for the basis functions becomes a challenge. One possibility to have a more logical notation, although it also becomes more cluttered is to use a multi-index \(q = (q_1, q_2)\) with \(q_1 + q_2 \leq m\). Here’s how the above equation would be written with the modified index notation:

\[ \widehat{f}(X_1, X_2) = b_0 + b_{(1,0)} X_1 + b_{(0,1)} X_2 + b_{(1,1)} X_1 X_2 + b_{(2,0)} X_1^2 + b_{(0,2)} X_2^2 \]

Taking into account the notation \([\mathbf{x}]_j\) denoting the \(j\)-th element of the input vector \(\mathbf{x} \in \mathcal{X}\), the above equation becomes:

\[ \widehat{f}(\mathbf{x}) = b_0 + b_{(1,0)} [\mathbf{x}]_1 + b_{(0,1)} [\mathbf{x}]_2 + b_{(1,1)} [\mathbf{x}]_1 [\mathbf{x}]_2 + b_{(2,0)} [\mathbf{x}]_1^2 + b_{(0,2)} [\mathbf{x}]_2^2 \]

With this modified notation \(q = (q_1, q_2)\), the model can be compactly expressed as:

\[ \widehat{f}(\mathbf{x}) = \sum_{(q_1, q_2)} b_{q} \phi_q (\mathbf{x}) \]

Again, this is not the most beautiful notation, but it makes sense.

20.4 Gaussian RBF’s

Another possible basis function is that of Gaussian Radial Basis Functions (RBF’s)

\[ \phi_z = exp \left\{ \frac{-\| x - z\|^2}{2 \sigma^2} \right\} \]

where \(\sigma\) is a pre-set scale parameter.