Bayesian inference

20 October 2012

In Bayesian inference the objective is to learn the distribution of a parameter of interest based on the combination of prior information and observed information (data). These two forms of knowledge are combined using Bayes' theorem.
A well-known example is the (conjugate) prior for a proportion $\theta$ with prior distribution the beta distribution with density
$$f(\theta; \alpha, \beta)=\frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)}, \quad \theta\in [0,1], \quad \alpha,\beta >0,$$
where $B(\alpha,\beta)$ is the beta function $\Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)$ and $\Gamma$ is the gamma function $\Gamma(\alpha) = \int_0^\infty x^{\alpha-1}e^{-x}dx$. The density for the data $x$ is a binomial distribution $\text{Bin}(n,x)$
$$f(x;\theta) = \binom{n}{x}\theta^x(1-\theta)^{n-x}, \quad x\in \{0,1,2,\ldots\}.$$
The posterior $\pi(\theta | x)$ is then
$$ \pi(\theta | x) = \frac{1}{Z}\pi(x|\theta)\pi(\theta)\propto \theta^{x+\alpha-1}(1-\theta)^{n-x+\beta-1}, $$
which of course, is also a beta distribution, $\text{Beta}(\alpha+x,\beta+n-x)$.
(See Young and Smith (2005, p. 23) for a good explanation of this.)

A point of discussion is that there are contexts where a single true value of $\theta_0$ seems correct, yet a Bayesian demands a probability distribution. Can you then pick $\pi(\theta)=\delta(\theta,\theta_0)$ as a prior?

Using decision theory it is easy to see why the posterior mean is a natural estimator. Baysian risk is determined by evaluating the loss function $L(\theta,d(x))$, for some function $d(x)$ of the data, over sampling and over parameters, that is
$$r(\theta,d(x)) = E_xE_\theta\{L(\theta,d(x)) \pi(theta|x)\pi(\theta) \}.$$
Only the expectation over $\theta$ is required, since the minimum for $d(x)$ is required. Suppose that the loss function is the squared loss $L(\theta,d(x))=(\theta-d(x))^2$. Then we have
$$\frac{\partial}{\partial d}\int_\Theta (\theta - d(x))^2 \pi(\theta |x)d\theta=-2\int_\Theta (\theta-d(x))\pi(\theta |x)d\theta.$$
This leads to
$$d(x) = \int_\Theta \theta\pi(\theta |x)d\theta,$$
which is the mean of the posterior distribution.

References

Vic Barnett (1971). Comparative statistical inference. London: John Wiley & Sons.
G.A. Young and R.L. Smith (2005). Essentials of statistical inference. Cambridge University Press.