Combining bias and standard error
Standard error can be used to compare two unbiased estimators — the estimator with the lower standard error is better — but what is the best way to decide on which of two biased estimators performs better?
The concepts of bias and standard error can be combined into a single value called the estimator's mean squared error.
Definition
The mean squared error of an estimator \(\hat{\theta}\) of a parameter \(\theta\) is
\[ \MSE(\hat{\theta}) = E\left[ (\hat{\theta} - \theta)^2 \right] \]Its relationship to our earlier definitions of bias and standard error is given by the following result.
Mean squared error
The mean squared error of an estimator \(\hat{\theta}\) of \(\theta\) is
\[ \MSE(\hat{\theta}) = \Var(\hat{\theta}) + \Bias(\hat{\theta})^2 \]Remembering that \(E[\hat{\theta}] \) need not be equal to \(\theta\),
\[ \begin{align} \MSE(\hat{\theta}) & = E\left[ (\hat{\theta} - \theta)^2 \right] \\ & = E\left[ \left((\hat{\theta} - E[\hat{\theta}]) - (\theta - E[\hat{\theta}]) \right)^2 \right] \\ & = E\left[ (\hat{\theta} - E[\hat{\theta}])^2 + (\theta - E[\hat{\theta}])^2 - 2(\hat{\theta} - E[\hat{\theta}])(\theta - E[\hat{\theta}]) \right] \end{align} \]Since \((\theta - E[\hat{\theta}])\) is a constant,
\[ \begin{align} \MSE(\hat{\theta}) & = E\left[ (\hat{\theta} - E[\hat{\theta}])^2\right] + (\theta - E[\hat{\theta}])^2 - 2E\left[(\hat{\theta} - E[\hat{\theta}])\right] \times(\theta - E[\hat{\theta}]) \\ & = E\left[ (\hat{\theta} - E[\hat{\theta}])^2\right] + (\theta - E[\hat{\theta}])^2 \\ & = \Var(\hat{\theta}) + \Bias(\hat{\theta})^2 \end{align} \]