How to return a value at a specific percentile from a skew-normal estimation in Stan?

Hello,

I am doing a simple estimation using a skew-normal distribution (e.g., y ~ skew_normal(xi, omega, alpha)). I wanted to ask how to return a distribution of values for a specific percentile (e.g., wanting to get the 50th percentile for length of stay)?

For example, if using the sn package in R, I would run the command below and find the 50th percentile is about 30 days. Is there something I can do like that in the transformed parameters block? Any suggestions would be greatly appreciated.

qsn(0.5, xi=-0.9260, omega=46.99, alpha=28.16)

Thank you,
Steve

You need an inverse cdf (aka quantile function) for that and we don’t supply those in Stan because we couldn’t find good implementations in general with derivatives.

We do have cdf functions and root finders, which you can use to compute inverse cdfs, but that’s probably going to be stable.

What do you need the quantiles for in the model? It’s unusual to see them show up in a model definition. If they were more widely used, we’d probably work harder to get them into Stan.

1 Like

Do you actually need these values in the model, or are you just trying to obtain them after fitting the model? In the former case I don’t really have anything to add to what @Bob_Carpenter said.

However, in the latter case then you can still use the qsn() function in R, you would just give it the posterior draws of the model parameters. Here’s an example of how to do that using a simulated example, which will result in obtaining the posterior distribution of the median (50th percentile):

library(cmdstanr)    
library(sn)       
library(bayesplot)

set.seed(123)
N <- 100
true_xi    <- 0  
true_omega <- 1  
true_alpha <- 1
y <- sn::rsn(n = N, xi = true_xi, omega = true_omega, alpha = true_alpha)
data_list <- list(N = N, y = y)

stan_file <- write_stan_file("
data {
  int<lower=0> N;     
  vector[N] y;       
}
parameters {
  real xi;              
  real<lower=0> omega;  
  real alpha; 
}
model {
  xi ~ normal(0, 5);
  omega ~ normal(0, 5);
  alpha ~ normal(0, 5);
  y ~ skew_normal(xi, omega, alpha);
}
")
mod <- cmdstan_model(stan_file)
fit <- mod$sample(data = data_list)
posterior_draws <- fit$draws(format = "df")

# Use sn::qsn() on each of the posterior draws
median_draws <- mapply(function(xi, omega, alpha) {
  sn::qsn(0.5, xi = xi, omega = omega, alpha = alpha)
}, posterior_draws$xi, posterior_draws$omega, posterior_draws$alpha)

# Get summary statistics and plot the draws
summary(median_draws)
mcmc_hist(cbind("Median" = median_draws))            
1 Like

Thanks, Bob! I appreciate your message and the info you provided. I am mainly interested in the 50th percentile. I have formulas for the mean and mode but I read there isn’t a closed form calculation in the Azzalini book. I use similar analyses in a large health care system with many hospitals for target setting of various outcomes and to help with “net present value” calculations for the Finance department. Your skew_normal() in Stan has been extremely helpful! Knowing certain percentiles would be a nice-to-have but really isn’t necessary. I have written some code in R for my output. I have a shiny app that produces diagnostics, posterior predictive checks, etc, so I just wanted to check in case I missed something. I would bet my case would be uncommon and definitely not worth implementing any changes. Plus, I saw Jonah’s response and his code will be of great help.
Thanks again,
Steve

Thanks, Jonah!
I really appreciate your code and time looking into this. You are correct, I would only need to obtain the percentiles after fitting the model. I have already copied your code and I think that provides all of the info that I was looking for.
Thanks again,
Steve

1 Like