In the last post, I introduced Bayes' rule and explained how the posterior is proportional to the product of the likelihood and the prior. To set up the Bayesian model, we saw that we needed to specify these two quantities. In this post, I want to zoom into each of those quantities and build some intuition for what they really encode, and how the choice of one can influence the choice of the other.
Your Bias, Made Explicit
The prior, as we saw previously encodes our knowledge about the parameter before we see any data. But I want to reframe this slightly: the prior is your bias. It is your bias in the sense that you are entering into this modeling problem with beliefs, assumptions, and expectations. Every scientist has them. The Bayesian framework simply asks you to write them down.
In our medical test example, the prior was our knowledge that the disease affects 1 in 10,000 people. That's a strong belief: i.e. before we even run the test, we already think it's very unlikely that any given person has the disease. And that belief had an enormous effect on the answer: it is the reason the posterior probability was only about 1%, even with a 99% accurate test.
This raises a natural question: what if our prior belief is wrong? And more generally, how much should our prior matter relative to the data?
The Conviction of Priors
Not all priors are created equal. They exist on a spectrum of informativeness, which measures how much conviction they carry.
An uninformative prior is the equivalent of saying "I have no idea." It assigns roughly equal probability to all parameter values, letting the data do all the talking. At the other end of the spectrum, a strongly informative prior is a sharp, confident belief - it concentrates most of the probability mass around a narrow range of values. Think of it as a very strong opinion.
The key intuition is this: a strong prior is hard to move. If your prior is very informative, you are essentially telling the model "I'm fairly certain the answer is around here." The data will still pull the posterior toward what it suggests, but it will take a lot of data to overcome that initial conviction. If your strong prior happens to be wrong, this is a problem as your estimates will be biased (in a bad sense) for a long time, requiring much more data than you might expect to arrive at a good answer.
This is why, in practice, we typically default to weakly informative priors. A weakly informative prior says "i have some general sense of what's reasonable, but I'm not attached to it." It rules out clearly nonsensical values (for example, a negative height, or a probability greater than 1) without putting too much weight on any particular region. This gives the model enough structure to be realistic, while remaining loose enough to let the data speak. It is a practical sweet spot, and generally a good default for most problems.
The Data-Generating Story
Now let's turn to the other half of the model: the likelihood . Where the prior encodes what you believe about the parameter, the likelihood encodes your assumptions about how the data was generated.
More precisely, the likelihood answers the question: "If the parameter were this value, how probable is the data I actually observed?" It is a model of the process that connects your parameters to your observations.
In our medical test example, the likelihood was the test's accuracy. It told us: given that a person truly has the disease, the probability of a positive result is 99%. And given that a person does not have the disease, the probability of a (false) positive is 1%. That's the data-generating story — the mechanism by which the true health status (the parameter) produced the test result (the data).
Choosing the likelihood is, in many ways, the scientific part of the modeling process. It's where your domain knowledge lives. A physicist studying radioactive decay might choose a Poisson likelihood; a biologist counting successes in a fixed number of trials might choose a Binomial. The choice should be driven by what you understand (or assume) about the process that generated your data.
Once you've specified both the prior and the likelihood, you have a complete Bayesian model. In principle, the posterior follows directly from Bayes' Rule. But before we get to computing that posterior, there's one more idea worth discussing - a trick that can make the whole problem much easier.
When the Prior and Likelihood Play Nice
Suppose you have settled on a likelihood that makes scientific sense for your problem. Now, you're choosing a prior. You have a few reasonable candidates and they all encode your beliefs well enough. Is there a reason to prefer one over another?
This is where conjugacy comes in. Some prior–likelihood pairs have a special mathematical property: when you combine them through Bayes' Rule, the posterior ends up in the same distributional family as the prior. This is called a conjugate pair, and the prior is called a conjugate prior for that likelihood.
Why does this matter? Because when the posterior is in a known family, you don't need any heavy computational machinery to work with it. You can write it down directly, read off its parameters, and sample from it immediately. No approximation, no iteration, just a clean, closed-form answer.
Think of it as a look-ahead strategy. When you're deciding on a prior, you can check: "Does this prior happen to be conjugate with my likelihood?" If it does and if it still encodes reasonable beliefs, you've made your life significantly easier!
A classic example is the Beta–Binomial pair. If your likelihood is Binomial (you're modeling the number of successes in a fixed number of trials) and you choose a Beta distribution as your prior, the posterior is also a Beta distribution, just with updated parameters. The math works out neatly, and you get your answer directly.
Of course, most real-world problems are not this clean. The models we care about tend to involve many parameters, complex dependencies, and likelihoods for which no conjugate prior exists (or for which the conjugate prior doesn't encode reasonable beliefs). In these cases, we can't write down the posterior in closed form. We need computational strategies to estimate it.
That is exactly where we're headed next.
See you in the next post!