# 20 Basis Expansion

In this chapter we describe function basis expansion for regression models.

## 20.1 Basis Functions

So far we have talked about regression models in their most basic way: linear regression models. So let’s take a step back and recall the general regression framework in which the main idea is to find a function $$f()$$ that takes in one or more input features, and returns a response value. In addition, we also suppose that there is a noise or error term to indicate the imperfect nature of our model:

Assumet that we have a data set consisting of $$n$$ data points: $$\mathcal{D} = \{ (\mathbf{x_1}, y_1), \dots, (\mathbf{x_n}, y_n) \}$$, where $$\mathbf{x_i} \in \mathcal{X}$$, and $$y_i \in \mathbb{R}$$. In regression we assume an unknown target function $$f: \mathcal{X} \to \mathbb{R}$$. In the simplest case when we only have one input feature then $$\mathcal{X} = \mathbb{R}$$. In the multidimensional case when there are $$p$$ predictors, we have that $$\mathcal{X} = \mathbb{R^p}$$.

Perhaps the simplest way to extend linear models is to look for $$f$$ in a finite dimensional space of functions spanned by a given basis. In other words, we specify a set of functions $$\phi_0, \phi_1, \dots, \phi_m$$ from $$\mathcal{X}$$ to $$\mathbb{R}$$, and estimate $$f$$ in the form of a linear combination as:

$\widehat{f}(x) = \sum_{q=0}^{m} b_q \ \phi_q(x)$

In this way, performing the regression then reduces to finding the parameters $$b_0, b_1, \dots, b_m$$.

## 20.2 Linear Regression

In the one dimensional case, we can use $$\phi_0(x) = 1$$ and $$\phi_1(x) = x$$. This gives the simple linear regression

$\widehat{f}(x) = b_0 \phi_0(x) + b_1 \phi_1(x) = b_0 + b_1 x$

In the multidimensional case, we can take $$\phi_1(\mathbf{x}) = [\mathbf{x}]_1$$, $$\phi_2(\mathbf{x}) = [\mathbf{x}]_2$$, all the way to $$\phi_p(\mathbf{x}) = [\mathbf{x}]_p$$. Here $$[\mathbf{x}]_j$$ denotes the $$j$$-th element of the input vector $$\mathbf{x} \in \mathcal{X}$$.

\begin{align*} \widehat{f}(\mathbf{x_i}) &= b_0 \phi_0(\mathbf{x_i}) + b_1 \phi_1(\mathbf{x_i}) + \dots + b_p \phi_p(\mathbf{x_i}) \\ &= b_0 + b_1 [\mathbf{x_i}]_1 + \dots + b_p [\mathbf{x_i}]_p \\ &= b_0 + b_1 x_{i1} + \dots + b_p x_{ip} \end{align*}

Notice that the constant term $$\phi_0(x) = 1$$ will be common to all the function classes covered in the next sections.

## 20.3 Polynomial Regression

Another possible choice of basis (in the one-dimensional case) is to choose $$\phi_q(x) = x^q$$ for $$q = 1, 2, \dots, m$$. This allows us to fit $$f$$ from the class of polynomial functions of degree at most $$m$$:

\begin{align*} \widehat{f}(x) &= \sum_{q=0}^{m} b_q \phi_q(x) \\ &= b_0 \phi_0(x) + b_1 \phi_1(x) + b_2 \phi_2(x) + \dots + b_m \phi_m(x) \\ &= b_0 + b_1 x + b_2 x^2 + \dots + b_m x^m \end{align*}

What about the multidimensional case? Let’s consider an example with $$p=2$$ input features $$X_1$$ and $$X_2$$, and a polynomial of degree $$m=2$$.

$\widehat{f}(X_1, X_2) = b_0 + b_1 X_1 + b_2 X_2 + b_3 X_1 X_2 + b_4 X_1^2 + b_5 X_2^2$

As you can tell, the notation with the $$j$$-index for the features, and the $$q$$-index for the basis functions becomes a challenge. One possibility to have a more logical notation, although it also becomes more cluttered is to use a multi-index $$q = (q_1, q_2)$$ with $$q_1 + q_2 \leq m$$. Here’s how the above equation would be written with the modified index notation:

$\widehat{f}(X_1, X_2) = b_0 + b_{(1,0)} X_1 + b_{(0,1)} X_2 + b_{(1,1)} X_1 X_2 + b_{(2,0)} X_1^2 + b_{(0,2)} X_2^2$

Taking into account the notation $$[\mathbf{x}]_j$$ denoting the $$j$$-th element of the input vector $$\mathbf{x} \in \mathcal{X}$$, the above equation becomes:

$\widehat{f}(\mathbf{x}) = b_0 + b_{(1,0)} [\mathbf{x}]_1 + b_{(0,1)} [\mathbf{x}]_2 + b_{(1,1)} [\mathbf{x}]_1 [\mathbf{x}]_2 + b_{(2,0)} [\mathbf{x}]_1^2 + b_{(0,2)} [\mathbf{x}]_2^2$

With this modified notation $$q = (q_1, q_2)$$, the model can be compactly expressed as:

$\widehat{f}(\mathbf{x}) = \sum_{(q_1, q_2)} b_{q} \phi_q (\mathbf{x})$

Again, this is not the most beautiful notation, but it makes sense.

## 20.4 Gaussian RBF’s

Another possible basis function is that of Gaussian Radial Basis Functions (RBF’s)

$\phi_z = exp \left\{ \frac{-\| x - z\|^2}{2 \sigma^2} \right\}$

where $$\sigma$$ is a pre-set scale parameter.