Generated quantities from custom probability function

Hi,

I am trying to fit a group of one-and-zero inflated latent variable models. I wrote the syntax to fit the beta family beta.stan (4.2 KB)

I tested it with simulated data, and it works great. If you notice, I generate model-based predicted responses based on the same model under the generated quantities block to do some posterior predictive checks.

My questions come when I try to do the same thing with the Simplex family. simplex.stan (3.8 KB)

There is a custom simplex_lpdf function defined at the beginning. I tested this syntax with simulated data, and it also works just fine. The only missing piece is the model-based predicted responses in the generated quantities block. I am just stuck on how to generate data from the custom probability distribution. I have never done that before.

Should I write a simplex_rng function, and how would that be? Or, is it something I can build on top of the custom simplex_lpdf function, and how?

I searched the manual and the web but can’t find any examples.

Thank you for the help.

I can do this outside of Stan using R (via simplexreg package) after I extract the posterior draws of model parameters, but I am just wondering if I can do it simpler inside the Stan model syntax like the beta model.

  # Extract the posterior samples 

  posteriors <- extract(stanfit_simplex)

  # This was run with 4 chains and 750 sampling iterations on each chain 
  # (after 250 warmup iterations)
  # So, there are 3000 draws for each parameter 

  # Posterior samples for item parameters (3000 x 10)
  # Each column represents the item, each row represents a draw

  beta_posterior   <- posteriors$beta
  alpha_posterior  <- posteriors$alpha
  phi_posterior    <- posteriors$phi
  gamma0_posterior <- posteriors$gamma0
  gamma1_posterior <- posteriors$gamma1
  
  # Posterior samples for person parameters (3000 x 250)
  # Each column represents the item, and each row represents a draw
  
  theta_posterior <- posteriors$theta

  # Vectorized item and person indices
  
  p   <- test$numeric_id        
  i   <- test$item              
  
  # Matrix to save simulated values
  
  Y <- matrix(nrow=3000,ncol=nrow(test))
  
  # Progress bar
  
  pb <- txtProgressBar(min = 0, max = 3000, style = 3)
  
  # Iterate over 3000 draws
  
  for(iter in 1:3000){
    
    # Probability of 0s and 1s (vectorized)
    # given the person and item parameters at the corresponding draw
    
    q0 <- 1/(1 + exp(-(gamma0_posterior[iter,i]-alpha_posterior[iter,i]*theta_posterior[iter,p])))
    q1 <- 1/(1 + exp(-(gamma1_posterior[iter,i]-alpha_posterior[iter,i]*theta_posterior[iter,p])))
 
    # simplex distribution parameters in case the response is neither 0 nor 1
    # (vectorized)
    
    mu_  <- 1/(1+exp(-(beta_posterior[iter,i]+alpha_posterior[iter,i]*theta_posterior[iter,p])))
    sig_ <- sqrt(phi_posterior[iter,i])

    # Simulate the continuous response (vectrorized) 
   
    y_   <- rsimplex(n = length(mu_),mu = mu_, sig = sig_)
    
    # Sample 0,1, or 2
    # If it is 2 replace it with the continous response
    
    # Some preparation for the inputs required by the map function
    
    prob    <- data.frame(p0 = q0, 
                          p1 = 1-q1, 
                          p2 = q1-q0,
                          y = y_)
    
    prob$id <- 1:nrow(prob)
    by_id   <- split(prob,prob$id)
    
    # map is iterating over all observations (much faster then for loop)
    
    C <- map(.x = by_id,
             .f = function(x){
               tmp <- sample(0:2, 1, prob = c(x$p0,x$p1,x$p2))
               tmp <- ifelse(tmp==2,x$y,tmp)
             })
    
    # Save the simulated values at the correspond draw
    Y[iter,] <- unlist(C)
    
    setTxtProgressBar(pb, iter)
  }
1 Like

I can’t seem to follow your links to see what your simplex density looks like so I’m not sure if there’s a direct way to code the RNG.
But, as was recently pointed out to me, you can always use rejection sampling to turn a given rng into one for your desired distribution. How efficient that will be depends on how expensive the densities (known rng and your distribution) are to compute and how tightly you can surround your density with a multiple of the density for which you know the rng.

1 Like

Thank you for referring to the other post. I will check that.

For reference, this is how it is defined at the top of the simplex.stan file:

functions {
  real simplex_lpdf(real y, real mu, real sig) {
    real x=-.5*(log(2)+log(pi())+log(sig)+3*log(y)+3*log(1-y)) - 1*(y-mu)^2/(2*sig*y*(1-y)*mu^2*(1-mu)^2);
    return x;
  }
}
f\left(X_{pi}\middle|\theta_{pi},\alpha_i,\beta_i,\delta_i\right)=\frac{1}{\sqrt{2\pi\delta_i\left[X_{pi}\left(1-X_{pi}\right)\right]^3}}exp\left(-\frac{\left(X_{pi}-\mu_{pi}\right)^2}{2\delta_iX_{pi}\left(1-X_{pi}\right)\mu_{pi}^2\left(1-\mu_{pi}\right)^2}\right)
\mu_{pi}=\frac{1}{1+e^{-\left(\alpha_i\theta_p+\beta_i\right)}}