You often want to find parameters that minimise some cost function. One approach could be using gradient descent. In this article, we will be talking about another approach called the normal equation.

## Prerequisites

- Linear algebra basics
- Vector basics

## Getting started

Wow! One line. Also by using this method, you don't need to do feature scaling.

### Let's verify dimensions

## Octave implementation

`pinv(X'*X)*X'*y`

## How it works? (optional)

Please feel free to skip this section. This is quite maths heavy. Here is the outline of the mathematical proof. For brevity, we omit the full proof and only provide an outline.

## Cautions

- Some of you may have noticed what if is non invertible (singular) -
this is the reason we use
`pinv`

as it always gives us a value for - Need to compute which is slow if is very large since most implementations have time complexity ( is the dimensions)