Itâ€™s a general confusion among people to confuse polynomial regression with non-linear regression.

The â€ślinearâ€ť in linear regression means that the hypothesis learnt is a linear function of the (learnable) parameters of the model. It does **not** mean a linear function of the input/independent variables.

For example, both:

y = Î¸_{0} + Î¸_{1}x

y = Î¸_{0} + Î¸_{1}x + Î¸_{2}x^{2}

â€¦correspond to a linear regression hypothesis, because y is always a linear function of the parameters (Î¸).

We sometimes call the equation (2) as polynomial (linear) regression, because the linear model has its input variables with polynomial degrees (that is, power > 1)

If you were to design a hypothesis like this:

y = Î¸_{0} + Î¸_{1}x + Î¸_{1}^{2}x^{2}

Even this cannot be called a non-linear regression, since you can theoretically think of replacing Î¸_{1}^{2} with just another parameter Î¸_{2} which could be learnt nearly equal to that of Î¸_{1}^{2} . Or in other words, polynomial modelling is not same as that of non-linear modelling.

Non-linearity is introduced in a model only when we bring in non-linear functions or transformations into the hypothesis.

Training a supervised neural network is an example of non-linear regression. Why?

Consider an NN with 1 hidden layer followed by output layer:

y = f^{out}(f^{h1}(X))

where the hidden layer is a non-linear mapping from X, which could be defined as:

f^{h1}(X) = g(WX+b)

( Non-linearity is introduced by the activation function g(z) )

This is just one easy example of non-linear regression.

In Data Science, depending upon the application, people use specific non-linear hypothesis functions for their domain data to fit their model.