Hello there, I have an output from a stan model and saved it as text. My struggle now is summarising it into statistics that i should include in the results section of my paper.
some guide
here is the code that generated the results
data {
int<lower=1> Nobs; // number of observations
int<lower=1> N1; // number of students
int<lower=1> N2; // number of course units
int<lower=1> J; // number of fixed effects
int y[Nobs]; // resits
matrix[Nobs,J] X; //the model matrix
int male[Nobs]; // sex of student
int bbs[Nobs]; // BBS student
int bps[Nobs]; // BPS student
int bqe[Nobs]; // BQE student
int sas[Nobs]; // SAS student
int<lower=1, upper=N1> id[Nobs]; // student
int<lower=1, upper=N2> course[Nobs]; // course unit
}
parameters {
vector[J] delta;
vector<lower=0>[J] tau;
vector<lower=0, upper=1>[N2] gamma; // geometric paratemeter
vector[N2] theta1; // scale parameter
real<lower=0> sigma1; // scale parameter
vector[J] beta[N2]; // fixed effects parameters
real b[N1]; // student random effects
}
transformed parameters {
vector<lower=0>[N2] mu ; // scale parameter
vector<lower=0>[N2] theta ; // scale parameter
vector<lower=0>[Nobs] mu1[N2];
vector<lower=0>[N2] betamu; // mean parameter
vector<lower=0>[N2] betasigm; // variance parameter
for (k in 1:N2){
for (i in 1:Nobs){
mu1[k,i] = inv_logit(X[i]*beta[course[i]] + b[id[i]]);
}
}
for (n in 1:N2){
mu[n] = mean(mu1[n]);
theta[n] = exp(theta1[n]);
betamu[n] =(1-theta[n])/(mu[n]-theta[n]);
betasigm[n] = (mu[n]*(1-mu[n])*(1-theta[n]))/pow((mu[n] -theta[n]),2)*(mu[n] -(2*theta[n]));
}
}
model {
delta ~ normal(0,5); //weakly informative priors on the regression coefficients
tau ~ cauchy(0,2.5); //weakly informative priors
sigma1 ~ gamma(2,0.1); //weakly informative priors
b ~ normal(0.0, sigma1); //student random effects
for(j in 1:N2){
beta[j] ~ normal(delta,tau); //hyperprior for group-level regression coefficients
theta1[j] ~ lognormal(0,1); //hyperprior theta1
gamma[j] ~ beta((mu[j])/theta[j], (1 - (mu[j]))/theta[j]); //prior gamma
}
for (k in 1:Nobs){
target += neg_binomial_lpmf(y[k] | 1, gamma[course[k]]/(1-gamma[course[k]])); // log-likelihood
}
}
generated quantities {
int y_rep[Nobs];
vector[Nobs] log_lik;
for (i in 1:Nobs) {
y_rep[i] = neg_binomial_rng(1, gamma[course[i]]/(1-gamma[course[i]]));
log_lik[i] = neg_binomial_lpmf(y[i] |1, gamma[course[i]]/(1-gamma[course[i]]));
}
}
what stan version are you using and which interface ? with rstan, you can use the extract() or the get_posterior_mean() to inspect individual parameters.
You have to work with the stan object in your IDE. Here is an example, how to extract estimated parameters:
# Fit-The-Shit----
fit <- stan(file = "YOUR_MODEL",data=stan.dat,
warmup = 1500, iter = 3000,
chains = 4,refresh = 100)
# Extract Posterior Parameters from fit object----
post_samples <- rstan::extract(fit, pars=c("tau","gamma","b"), inc_warmup = F)
# Hyper Parameter
hyper_means <- as.vector(get_posterior_mean(fit, par="hyper_pars")[,5])
If you use extract, you have to calculate the colmeans afterwards. If you use get_posterior_mean you geht the means auf all chains. That’s why you have to specify [,5] here, to get the mean of all chains.
Which Stan interface did you use to generate the fit? Each of them has their own way of accessing the draws and summary statistics.
What you need to report in a paper depends your field and publication venue. If you let us know what you want to get out in the way of statistics, we can let you know how to do it with your specific interface.
I am using stan from the r interface. My project is estimating three main things, but if I manage to extract information for one, the others will be easy to generate. Attached is a simple display of how I want to present my results.
Ok, from your code I can see that you’re using RStan rather than cmdstanr. (I’d suggest moving to cmdstanr as it’s up to date with the current release of Stan and easier to install.)
First, you get posterior credible intervals, not confidence intervals using Bayesian inference. So I’d suggest checking that. The column you label “mean” I would label as “estimated value” (that is the posterior mean because we’r using a Bayesian estimator).
I’m not sure how the course titles relate to the parameters in your model. But let’s say alpha is the parameter (or transformed parameter or generated quantity) corresponding to microecon. To get the 95% posterior intervals in RStan, I think you can do this: