Posterior predictive check with custom pdf (censored shifted Wald)


#1

Dear Stan community,

I’m new and currently trying to fit a right-censored shifted-Wald distribution to reaction time data using rstan. Using the Wiener distribution makes less sense due to the way the experiment was designed (it’s a go/no go task, so there is only one decision boundary and not two). Mainly, I would like to evaluate the effect of experimental conditions A, B and their interaction on the parameters of this distribution.
I managed to produce the following code with the help of the available literature. It may not be optimal, but the chains converge so far.

It seems like I should use the log pdf of the shifted-Wald, can someone explain whether this is the appropriate choice and why?

I would like to check the quality of the model fit by means of posterior predictive checking, but I’m at a loss as to what I should do within the generated quantities block given my model. Would you mind pointing me to the appropriate documentation if I missed it, or providing me with some help?

Thank you in advance.

DRR

functions {
// shifted Wald pdf (see Miller 2018)
  real shifted_Wald_lpdf(real x, real gamma, real alpha, real theta){
    real tmp1;
    real tmp2;
    tmp1 = alpha / (sqrt(2 * pi() * (pow((x - theta), 3))));
    tmp2 = exp(-1 * (pow((alpha - gamma * (x-theta)),2)/(2*(x-theta))));
    return log(tmp1*tmp2); // unsure about why I should use the log.
  }
  real swald_lccdf(real x, real gamma, real alpha, real theta){
    real tmp1;
    real tmp2;
    real tmp3;
    // compute the log of the complementary CDF of the shifted wald (see Miller 2018)
    tmp1 = Phi((gamma*(x - theta) - alpha)/sqrt(x-theta));
    tmp2 = exp(2*alpha*gamma);
    tmp3 = Phi((gamma*(x - theta) + alpha)/sqrt(x-theta));
    return log(1 - tmp1-(tmp2*tmp3));
  }
}
data{
  int<lower=0> N_obs;                           // number of observed RTs
  int<lower=1> J;                                    // n of participants
  vector<lower=0>[J] N_cens;                // number of right-censored trials
  vector<lower=0>[N_obs] rt_obs;          // RT data
  vector<lower=0>[J] U;                          // censoring point (eventually this will be fixed for all participants when the actual censoring point is retrieved)
  int<lower=1, upper=2> soa[N_obs];     // experimental manipulation A (2 levels)
  int<lower=0,upper=1> cg[N_obs];        // experimental manipulation B (2 levels)
  int<lower=1,upper=J> id[N_obs];         // identifier
}
parameters {
  // fixed-effects parameters
  vector[12] b;
}
transformed parameters {
}

model{
    // parameters
  real gamma;             // drift
  real alpha;                // boundary
  real theta;                 // ndt
  
  //priors
  b ~ normal(0, 100);        // diffuse prior
  

  for(i in 1 : N_obs) {
  gamma = b[1] + (b[2])*cg[i] + (b[3] + (b[4])*cg[i])*soa[i];
  alpha = b[5] + (b[6])*cg[i] + (b[7] + (b[8])*cg[i])*soa[i];
  theta = b[9] + (b[10])*cg[i] + (b[11] + (b[12])*cg[i])*soa[i];
  
  target += shifted_Wald_lpdf(rt_obs[i]|gamma,alpha,theta) + N_cens[id[i]] * swald_lccdf(U[id[i]]|gamma,alpha,theta);
  }
}

generated quantities {
  vector[N_obs] log_lik;
  real gamma; 
  real alpha;
  real theta;
    for (i in 1:N_obs){
      gamma = b[1] + (b[2])*cg[i] + (b[3] + (b[4])*cg[i])*soa[i];
      alpha = b[5] + (b[6])*cg[i] + (b[7] + (b[8])*cg[i])*soa[i];
      theta = b[9] + (b[10])*cg[i] + (b[11] + (b[12])*cg[i])*soa[i];
  
    log_lik[i] = shifted_Wald_lpdf(rt_obs[i]|gamma,alpha,theta) + N_cens[id[i]] * swald_lccdf(U[id[i]]|gamma,alpha,theta);
  }
}


When adding random effects to a model, Log probability evaluates to log(0), i.e. negative infinity
#2

Here’s a post that discusses in general how to get a posterior predictive distribution: How to get samples from the posterior predictive distribtion using stan

It seems to me that the missing piece of the puzzle is a function that can return a random sample of a Wald distribution. I’m not sure how you would do that within Stan itself, but you can use R, Python, or whatever language you prefer to post-process the MCMC samples that you’ve obtained from Stan in order to generate a posterior predictive distribution. It looks like there are functions available for both R and Python that can generate samples from a Wald distribution.


#3

I’m by no means an expert on the Wald distribution. But Google tells me it also likes to go by the name of Inverse Gaussian.

Wikipedia lists an algorithm for generating a random variate from an inverse Gaussian. It seems straight-forward enough to do. Except in your case you need some sort of shifting or censoring, so I’m not sure. Maybe you could generate the variate and reject those below the threshold or something.

Another way of doing it would be to write an inverse CDF function using the algebraic solver [going to go out on a limb here and claim there is no closed-form inverse CDF for this distribution] and then apply this cool trick by @lcomm.


#4

Hi,

Thank you for your response.

I saw the algorithm, but I don’t know how I can adapt it to a shifted distribution, so I came here looking for other suggestions!

I like the idea of using the cool trick with the algebraic solver, but it’s a bit beyond my knowledge and ability. I did the following, I’m far from sure this is the right thing. Can you tell me if I’m going in the right direction?

For now the model crashes dramatically with my Rstudio session at some random point during sampling, so I’m assuming I did something wrong.

Most of the time, the output to the Rstudio console is :

Chain 1: Exception: Exception: Exception: Phi: x is nan but must not be nan! in model […] at line 25, 44, 150

Here is my updated stan script.

functions {
  real shifted_Wald_lpdf(real x, real gamma, real alpha, real theta){
    real tmp1;
    real tmp2;
    tmp1 = alpha / (sqrt(2 * pi() * (pow((x - theta), 3))));
    tmp2 = exp(-1 * (pow(alpha - gamma*x + gamma*theta,2)/(2*(x-theta))));
    return log(tmp1*tmp2);
  }
    // log of the complementary CDF 
  real swald_lccdf(real x, real gamma, real alpha, real theta){
    real tmp1;
    real tmp2;
    real tmp3;
    tmp1 = Phi((gamma*x - gamma*theta - alpha)/sqrt(x-theta));
    tmp2 = exp(2*alpha*gamma);
    tmp3 = Phi((gamma*x - gamma*theta + alpha)/sqrt(x-theta));
    return log(1 - (tmp1+(tmp2*tmp3)));
  }
  // shifted Wald cdf
  real swald_cdf(real x, real gamma, real alpha, real theta){
    real tmp1;
    real tmp2;
    real tmp3;
    tmp1 = Phi((gamma*x - gamma*theta - alpha)/sqrt(x-theta));
    tmp2 = exp(2*alpha*gamma);
    tmp3 = Phi((gamma*x - gamma*theta + alpha)/sqrt(x-theta));
    return tmp1+(tmp2*tmp3);
  }  
  // algebraic solver for inverse cdf
 vector algebra_system(vector y, vector params,
                              real[] x_r, int[] x_i){
    real gamma;
    real alpha;
    real theta;
    real est;
    real u;
    vector[1] ret;
    gamma = params[1];
    alpha = params[2];
    theta = params[3];
    u = params[4];
    est = y[1];
    ret[1] = swald_cdf(y[1],gamma,alpha,theta) - u;
    return ret;
  }
}
data{
  int<lower=0> N_obs; // number of observed rts
  int<lower=1> J;                     // n of subjects
  vector<lower=0>[J] N_cens; // number of right-censored trials
  vector<lower=0>[N_obs] rt_obs; // data
  vector<lower=0>[J] U; // censoring point
  int<lower=1, upper=2> soa[N_obs];                         // condition A
  int<lower=0,upper=1> cg[N_obs];      // condition B
  int<lower=1,upper=J> id[N_obs];         // subj identifier
}

parameters {
  // fixed-effects parameters
  vector[12] b;
}

model{
    // model parameters
  real gamma;                // drift
  real alpha;                // boundary
  real theta;                // ndt
  
  //priors
  b ~ normal(0, 100);        // diffuse prior for beta coefficients
  

  for(i in 1 : N_obs) {
  gamma = b[1] + (b[2])*cg[i] + (b[3] + (b[4])*cg[i])*soa[i];
  alpha = b[5] + (b[6])*cg[i] + (b[7] + (b[8])*cg[i])*soa[i];
  theta = b[9] + (b[10])*cg[i] + (b[11] + (b[12])*cg[i])*soa[i];
  
  target += shifted_Wald_lpdf(rt_obs[i]|gamma,alpha,theta) + N_cens[id[i]] * swald_lccdf(U[id[i]]|gamma,alpha,theta);
  }
}

generated quantities {
  vector[N_obs] log_lik;
  vector[N_obs] p_RT;
  vector[1] sample;
  real gamma; // fitted parameters
  real alpha;
  real theta;
  real p;
  real u;
  int x_i[0];
  real x_r[0];
  vector[1] y_guess;
  row_vector[4] solver_params;
    for (i in 1:N_obs){
      gamma = b[1] + (b[2])*cg[i] + (b[3] + (b[4])*cg[i])*soa[i];
      alpha = b[5] + (b[6])*cg[i] + (b[7] + (b[8])*cg[i])*soa[i];
      theta = b[9] + (b[10])*cg[i] + (b[11] + (b[12])*cg[i])*soa[i];
  
    log_lik[i] = shifted_Wald_lpdf(rt_obs[i]|gamma,alpha,theta) + N_cens[id[i]] * swald_lccdf(U[id[i]]|gamma,alpha,theta);
  //generate samples
     p = swald_cdf(U[id[i]], gamma, alpha, theta);  // cdf for bounds
     u = uniform_rng(0, p);
    y_guess = rep_vector(0.5, 1);
     solver_params = [gamma,alpha,theta, u];
    sample = algebra_solver(algebra_system, 
                        y_guess,
                        solver_params',
                        x_r,
                        x_i);
    p_RT [i] = sample[1];
  }
}




#5

A few things:

  • I’m not sure, but judging from SciPy’s page on invgauss, I think that in tmp2 of shifted_Wald_lpdf, 2*(x-theta) is supposed to be followed by ^2.

  • This is a nitpick, but for the sake of readability, you should probably use the “^” operator rather than the pow function.

  • This is another nitpick, but when writing your shifted_Wald_lpdf function, you may want to analytically find the log of the shifted Wald distribution and code that. This way, you aren’t taking the exponential of something, only to undo that exponential when you apply the logarithm. That may also head off possible numerical overflow issues at the pass.

  • I highly recommend estimating the posterior predictive distribution outside of your Stan run. That is, run your Stan model, save the resulting MCMC samples to a file, and then pass those saved samples into a script that calculates the posterior predictive distribution. This way, if you initially get your script wrong, you can go back and modify it without having to rerun your Stan model again. Also, you can pick a scripting language in which a function to sample from a shifted Wald distribution has already been written (like R or Python).


#6

I made the nitpick changes. Your comment on not having to re-run the Stan model made a lot of sense, I hadn’t really thought about that. I’ll calculate my posterior predictive distribution as you suggested.

Thank you both for the help and advice!