Linear Regression is to obtain a line that best fits the data.
The best fit line is the one for which total prediction error (all data points) are as small as possible.
Error is the distance between the point to the regression line.
Concepts: Predication Equation, Cost Function (MSE), Normal Equation, error
$$ \hat{y}(\theta, x) = \theta_0 + \theta_1 x_1 + ... + \theta_p x_p $$
y: the predicted value.
p: number of features.
θ: the ith model parameter.
θ_0 bias, θ_i weight
x: the ith feature value.
$$ \hat y (\theta, x) = h_\theta ( x ) = \theta^T \cdot x $$
A concise version using a vectorized form.
h: hypothesis function, using the model parameter θ.
θ: (θ1, ..., θ_p), coefficients; θ0, intercept.
RMSE, Root Mean Square Error, the distance of the instance to the predicted line.
MSE: Mean Square Error, a performance measure.
$$ MSE(X, h_\theta) = 1/m \sum\limits_{i=1}^m(\theta^T x^{(i)} - y^{(i)})^2 $$
Conclusion: to train a Linear Regression model, you need to find the value of θ that minimize the MSE.
It's often the case that a learning algorithm will try to optimize a different function than the performance measure used to evaluate the final model, as it's easier to compute, or we want to constrain the model.