You often want to find parameters that minimise some cost function. One approach could be using gradient descent. In this article, we will be talking about another approach called the normal equation.
Prerequisites
- Linear algebra basics
- Vector basics
Getting started
Wow! One line. Also by using this method, you don't need to do feature scaling.
Let's verify dimensions
Octave implementation
pinv(X'*X)*X'*y
How it works? (optional)
Please feel free to skip this section. This is quite maths heavy. Here is the outline of the mathematical proof. For brevity, we omit the full proof and only provide an outline.
Cautions
- Some of you may have noticed what if is non invertible (singular) -
this is the reason we use
pinv
as it always gives us a value for - Need to compute which is slow if is very large since most implementations have time complexity ( is the dimensions)