Some Guidance on summarising results from the stan model

[edit: escaped code and auto-indented]

Hello there, I have an output from a stan model and saved it as text. My struggle now is summarising it into statistics that i should include in the results section of my paper.

some guide

here is the code that generated the results

data {
  int<lower=1> Nobs;         // number of observations 
  int<lower=1> N1;         // number of students 
  int<lower=1> N2;         // number of course units
  int<lower=1> J;         // number of fixed effects  
  int y[Nobs];              // resits
  matrix[Nobs,J] X; //the model matrix
  int male[Nobs]; // sex of student 
  int bbs[Nobs]; // BBS student 
  int bps[Nobs]; // BPS student 
  int bqe[Nobs]; // BQE student 
  int sas[Nobs]; // SAS student 
  int<lower=1, upper=N1> id[Nobs]; //  student 
  int<lower=1, upper=N2> course[Nobs]; // course unit

}

parameters {
  vector[J] delta;  
  vector<lower=0>[J] tau;  
  vector<lower=0, upper=1>[N2] gamma;  // geometric paratemeter
  vector[N2] theta1;      // scale parameter
  real<lower=0> sigma1;      // scale parameter
  vector[J] beta[N2];          // fixed effects parameters
  real b[N1];          // student random effects
}
transformed parameters {
  vector<lower=0>[N2] mu ;      // scale parameter
  vector<lower=0>[N2] theta ;      // scale parameter
  vector<lower=0>[Nobs] mu1[N2];
  vector<lower=0>[N2] betamu;      // mean parameter
  vector<lower=0>[N2] betasigm;      // variance parameter

  for (k in 1:N2){
    for (i in 1:Nobs){
      mu1[k,i] = inv_logit(X[i]*beta[course[i]] + b[id[i]]);
    }
  }
   
  for (n in 1:N2){
    mu[n] = mean(mu1[n]);
    theta[n] = exp(theta1[n]);
    betamu[n] =(1-theta[n])/(mu[n]-theta[n]);
    betasigm[n] = (mu[n]*(1-mu[n])*(1-theta[n]))/pow((mu[n] -theta[n]),2)*(mu[n] -(2*theta[n]));
  }

}
 	
model  {
  delta ~ normal(0,5); //weakly informative priors on the regression coefficients
  tau ~ cauchy(0,2.5); //weakly informative priors
  sigma1 ~ gamma(2,0.1); //weakly informative priors
  b ~ normal(0.0, sigma1); //student random effects 

  for(j in 1:N2){
    beta[j] ~ normal(delta,tau); //hyperprior for group-level regression coefficients   
    theta1[j] ~ lognormal(0,1); //hyperprior theta1
    gamma[j] ~ beta((mu[j])/theta[j], (1 - (mu[j]))/theta[j]); //prior gamma
  }
       
  for (k in 1:Nobs){
    target += neg_binomial_lpmf(y[k] | 1, gamma[course[k]]/(1-gamma[course[k]])); // log-likelihood
  }
     
}

generated quantities {
  int y_rep[Nobs];
  vector[Nobs] log_lik;
  
  for (i in 1:Nobs) {
    y_rep[i] = neg_binomial_rng(1, gamma[course[i]]/(1-gamma[course[i]]));
    log_lik[i] = neg_binomial_lpmf(y[i] |1, gamma[course[i]]/(1-gamma[course[i]]));
     
  }
    
}

Hey Susan,

what stan version are you using and which interface ? with rstan, you can use the
extract() or the get_posterior_mean() to inspect individual parameters.
You have to work with the stan object in your IDE. Here is an example, how to extract estimated parameters:

# Fit-The-Shit----

          fit <- stan(file = "YOUR_MODEL",data=stan.dat,
                         warmup = 1500, iter = 3000,
                         chains = 4,refresh = 100)

          # Extract Posterior Parameters from fit object---- 
          
          post_samples <- rstan::extract(fit, pars=c("tau","gamma","b"), inc_warmup = F)
          
          # Hyper Parameter
          hyper_means <- as.vector(get_posterior_mean(fit, par="hyper_pars")[,5])

If you use extract, you have to calculate the colmeans afterwards. If you use get_posterior_mean you geht the means auf all chains. That’s why you have to specify [,5] here, to get the mean of all chains.

cheers

jan

Which Stan interface did you use to generate the fit? Each of them has their own way of accessing the draws and summary statistics.

What you need to report in a paper depends your field and publication venue. If you let us know what you want to get out in the way of statistics, we can let you know how to do it with your specific interface.

[edit: escaped R call]

thanks, @Bob_Carpenter, and @jangoe these will surely help.

I am using stan from the r interface. My project is estimating three main things, but if I manage to extract information for one, the others will be easy to generate. Attached is a simple display of how I want to present my results.

below is where I fit my model from

fit <- stan( file = 'modeling resits_beta-geom_05-31-22v1.stan',
             data = stan.data,
             init=init2,
             seed = 12345, #1973597442, #this seed works when A=20, B=25
             control = list(
               # adapt_gamma: default=0.05
               adapt_gamma = 0.05,
               # adapt_kappa: default=0.8
               adapt_kappa = 0.8,
               # adapt_t0: default=10
               adapt_t0 = 10,
               # adapt_init_buffer: default=75
               adapt_init_buffer = 75,
               # adapt_term_buffer: default=50
               adapt_term_buffer = 50,
               # adapt_window: default=25
               adapt_window = 25,
               # adapt_delta: target acceptance probability; default=0.8; max=0.9999999999999999444222
               adapt_delta = 0.9999999999999999444222,
               # max_treedepth: should increase if adapt_delta is high; default=10.
               max_treedepth = 10),
             #memory.limit(size = 100),
             iter = 1000, warmup=500, chains =2, cores=2)

Summary of Results.pdf (55.9 KB)

Ok, from your code I can see that you’re using RStan rather than cmdstanr. (I’d suggest moving to cmdstanr as it’s up to date with the current release of Stan and easier to install.)

First, you get posterior credible intervals, not confidence intervals using Bayesian inference. So I’d suggest checking that. The column you label “mean” I would label as “estimated value” (that is the posterior mean because we’r using a Bayesian estimator).

I’m not sure how the course titles relate to the parameters in your model. But let’s say alpha is the parameter (or transformed parameter or generated quantity) corresponding to microecon. To get the 95% posterior intervals in RStan, I think you can do this:

alpha_draws <- extract(fit, "alpha")
alpha_posterior_interval_95 <- quantile(c(0.025, 0.975), alpha_draws)

Here’s the doc for the extract function: