Estimating expectations of marginalised discrete variable

jandraor · October 15, 2024, 2:57pm

Hi,

I’m following Stan’s manual (Latent Discrete Parameters) to estimate a discrete parameter. All chains converge and the method seems to recover the ground truth. At least visually, but I don’t know how to summarise it with a probability statement. Let’s assume that the parameter of interest is \tau, which can take values between 360 and 4920. I have evaluated the likelihood at discrete points of this interval in increments of 120.

This is the Stan code that generates the probabilities and draws. According to Stan’s manual, sim_cp should only be used for visual inspection, but for precise calculations (expectations), I should use prob_cp. I would like a statement that says “There is a 95% chance that the change point (cp) is between a & b”. Any pointers as to how to do that will be highly appreciated. I got lost with the notation in the section “estimating expectations”.

generated quantities {
  array[n_i_y] int sim_cp;
  array[n_i_y] vector[n_cp] prob_cp;

  for(i in 1:n_i_y) {
    sim_cp[i] = cp[categorical_logit_rng(lp[i, ])];
    prob_cp[i] = softmax(lp[i]);
  }
}

In R, for a given individual i, I have a data frame with three columns (sample.csv (1.3 KB)), where j is the index of the change point, prob_cp the probability, and .iter the iteration.

Thanks.

Bob_Carpenter · October 21, 2024, 7:21pm

You also want to add randomness here if you are doing posterior predictive inference.

In general, suppose you have an unknown theta in Stan. You can define an indicator as follows that assigns a value 1 if \theta \in (a, b) and value 0 otherwise (Stan’s comparison functions return either 0 for false or 1 for true).

generated quantities {
  int<lower=0, upper=1> Pr_btw = a < theta && theta < b;
}

The posterior mean of this quantity is the probability:

\Pr[\theta \in (a, b) \mid y] = \int_\Theta \textrm{I}_{(a, b)}\!(\theta) \cdot p(\theta | y) \, \textrm{d} \theta \approx \frac{1}{M} \sum_{m=1}^M \textrm{I}_{(a, b)}(\theta^{(m)}),

where \theta^{(m)} \sim p(\theta \mid y) are the posterior draws and \textrm{I}_A(u) = 1 if u \in A and 0 otherwise (i.e., it’s an indicator function for the interval).

Topic		Replies	Views
Discrete change point unnormalized probability estimate Modeling	2	649	April 25, 2018
Estimation, not marginalisation, of discrete parameters: a solution? General discrete-parameters	3	725	May 18, 2022
Discrete parameter in Stan Developers	14	5624	February 7, 2019
Posterior Predictive Checks After Sampling Modeling	3	820	October 23, 2022
Latent variable model - marginalising out discrete parameter Modeling	6	636	August 23, 2020

Estimating expectations of marginalised discrete variable

Related topics