Why an equal-weight average in change point detection?

ronghongbo · February 28, 2024, 1:41am

Hello, I am reading 8.2 Change point models | Stan User’s Guide (mc-stan.org) and confused by this part: “Posterior distribution of the discrete change point”. There it says

   p(s∣D)∝q(s∣D)=1/M∑exp(lp[m,s]).

But why every draw has an equal weight 1/M? Why not this:

  p(s|D) = ∑p(s|e, l, D)p(e, l | D) for all parameters e and l?

Thanks,
Hongbo

jsocolar · February 28, 2024, 2:23am

This is right, but marginalizing over e and l yields the first expression in your post.

This is how inference from MCMC samples works in general. For example, if we want to estimate the mean of a posterior distribution from a set of MCMC samples, we take the mean of the samples with equal weighting.

ronghongbo · February 28, 2024, 5:57am

Thanks! The problem might be that p(e,l | D) computed by mcmc is not a true distribution, as it doesn’t sum to 1. So p(e,l | D) cannot be used for integration. But I still don’t quite understand why 1/M is used. Maybe it’s an approximation?

jsocolar · February 28, 2024, 12:51pm

1/M is how you take an average across draws

Edit: an important point here is that if we have a bunch of iteration-wise probabilities of a single binary outcome (i.e. a posterior distribution for the probability), then we can summarize these to a single posterior probability of the binary outcome by taking the (arithmetic) mean of the posterior.

ronghongbo · February 29, 2024, 6:29am

Thanks!

Found that the Stan user manual has a later section explaining the math behind: 8.5 The mathematics of recovering marginalized parameters | Stan User’s Guide (mc-stan.org).

For this specific Change Point Dection model in 8.2 Change point models | Stan User’s Guide (mc-stan.org), based on Section 8.5, I think we can reason as follows:

\Pr[S = s\mid D] = \mathbb{E}[I(S = s) \mid D] where I(S = s) is the indicator function, which equals 1 only when S = s

= \mathbb{E}[\mathbb{E}[I(S = s) \mid D, e, l]] by the law of iterated expections

= \mathbb{E}[\Pr(S=s \mid D, e, l)]

=\mathbb{E}[\frac{\Pr(S=s, D\mid e, l)}{\Pr(D\mid e, l)}]

=\frac{1}{M}\sum_{m}\frac{\Pr(S=s, D\mid e, l)}{\Pr(D\mid e, l)}

Here M is the number of the parameters (e, l) sampled by MCMC. These parameters, however, are not independent because MCMC uses Markov Chain to go from one parameter to another. To use the above estimation, the assumption is that M is big enough, so that the law of large numbers ensures that the average is close to the true mean.

Hopefully this understanding is correct and help clarify things up.

Topic		Replies	Views
Latent discrete parameters Algorithms	1	726	July 2, 2018
Stan's default uniform priors Modeling	7	3732	May 13, 2018
Stability of model averaging- taking into account between sample variation? Modeling	5	400	December 5, 2019
Standard MH step for a subset of Parameter? Developers	2	241	September 25, 2023
Model averaging and relationship to mixture weights Modeling	2	401	January 16, 2020

Why an equal-weight average in change point detection?

Related Topics