Hi,
I am trying to fit a group of one-and-zero inflated latent variable models. I wrote the syntax to fit the beta family beta.stan (4.2 KB)
I tested it with simulated data, and it works great. If you notice, I generate model-based predicted responses based on the same model under the generated quantities block to do some posterior predictive checks.
My questions come when I try to do the same thing with the Simplex family. simplex.stan (3.8 KB)
There is a custom simplex_lpdf function defined at the beginning. I tested this syntax with simulated data, and it also works just fine. The only missing piece is the model-based predicted responses in the generated quantities block. I am just stuck on how to generate data from the custom probability distribution. I have never done that before.
Should I write a simplex_rng function, and how would that be? Or, is it something I can build on top of the custom simplex_lpdf function, and how?
I searched the manual and the web but can’t find any examples.
Thank you for the help.
I can do this outside of Stan using R (via simplexreg
package) after I extract the posterior draws of model parameters, but I am just wondering if I can do it simpler inside the Stan model syntax like the beta model.
# Extract the posterior samples
posteriors <- extract(stanfit_simplex)
# This was run with 4 chains and 750 sampling iterations on each chain
# (after 250 warmup iterations)
# So, there are 3000 draws for each parameter
# Posterior samples for item parameters (3000 x 10)
# Each column represents the item, each row represents a draw
beta_posterior <- posteriors$beta
alpha_posterior <- posteriors$alpha
phi_posterior <- posteriors$phi
gamma0_posterior <- posteriors$gamma0
gamma1_posterior <- posteriors$gamma1
# Posterior samples for person parameters (3000 x 250)
# Each column represents the item, and each row represents a draw
theta_posterior <- posteriors$theta
# Vectorized item and person indices
p <- test$numeric_id
i <- test$item
# Matrix to save simulated values
Y <- matrix(nrow=3000,ncol=nrow(test))
# Progress bar
pb <- txtProgressBar(min = 0, max = 3000, style = 3)
# Iterate over 3000 draws
for(iter in 1:3000){
# Probability of 0s and 1s (vectorized)
# given the person and item parameters at the corresponding draw
q0 <- 1/(1 + exp(-(gamma0_posterior[iter,i]-alpha_posterior[iter,i]*theta_posterior[iter,p])))
q1 <- 1/(1 + exp(-(gamma1_posterior[iter,i]-alpha_posterior[iter,i]*theta_posterior[iter,p])))
# simplex distribution parameters in case the response is neither 0 nor 1
# (vectorized)
mu_ <- 1/(1+exp(-(beta_posterior[iter,i]+alpha_posterior[iter,i]*theta_posterior[iter,p])))
sig_ <- sqrt(phi_posterior[iter,i])
# Simulate the continuous response (vectrorized)
y_ <- rsimplex(n = length(mu_),mu = mu_, sig = sig_)
# Sample 0,1, or 2
# If it is 2 replace it with the continous response
# Some preparation for the inputs required by the map function
prob <- data.frame(p0 = q0,
p1 = 1-q1,
p2 = q1-q0,
y = y_)
prob$id <- 1:nrow(prob)
by_id <- split(prob,prob$id)
# map is iterating over all observations (much faster then for loop)
C <- map(.x = by_id,
.f = function(x){
tmp <- sample(0:2, 1, prob = c(x$p0,x$p1,x$p2))
tmp <- ifelse(tmp==2,x$y,tmp)
})
# Save the simulated values at the correspond draw
Y[iter,] <- unlist(C)
setTxtProgressBar(pb, iter)
}
1 Like
I can’t seem to follow your links to see what your simplex density looks like so I’m not sure if there’s a direct way to code the RNG.
But, as was recently pointed out to me, you can always use rejection sampling to turn a given rng into one for your desired distribution. How efficient that will be depends on how expensive the densities (known rng and your distribution) are to compute and how tightly you can surround your density with a multiple of the density for which you know the rng.
1 Like
Thank you for referring to the other post. I will check that.
For reference, this is how it is defined at the top of the simplex.stan file:
functions {
real simplex_lpdf(real y, real mu, real sig) {
real x=-.5*(log(2)+log(pi())+log(sig)+3*log(y)+3*log(1-y)) - 1*(y-mu)^2/(2*sig*y*(1-y)*mu^2*(1-mu)^2);
return x;
}
}
f\left(X_{pi}\middle|\theta_{pi},\alpha_i,\beta_i,\delta_i\right)=\frac{1}{\sqrt{2\pi\delta_i\left[X_{pi}\left(1-X_{pi}\right)\right]^3}}exp\left(-\frac{\left(X_{pi}-\mu_{pi}\right)^2}{2\delta_iX_{pi}\left(1-X_{pi}\right)\mu_{pi}^2\left(1-\mu_{pi}\right)^2}\right)
\mu_{pi}=\frac{1}{1+e^{-\left(\alpha_i\theta_p+\beta_i\right)}}