I am trying to subset specific columns from a vector but some are non-sequential. I have looked through the Stan Reference Manual and don’t see an example. I think it should be an easy fix but by now I have spent too long on a seemingly trivial operation. Any advice would be appreciated.

Stan code:

data {
int N; // number of observations
vector[N] y; //
int K; // number of predictors
matrix[N, K] data_matrix; // matrix of predictors
}

I have also tried to subset them separately and then use ‘append_col’ but that is not working (even though the stan functions reference manual (section 5.10; https://mc-stan.org/docs/2_24/functions-reference/matrix-concatenation.html) says you should be able to use append col on a vector and a matrix…). Not sure what I am doing wrong here.

May I also ask how this would work in a hierarchical model? If I proceed as below, the result is a beta_int and a beta_slope posterior for each observation (and the posteriors for the observations in the same group have the exact same posterior of course). I get why this is happening, however, I only want to extract the unique posteriors, so one slope and one intercept for each group rather than for each observation. This is similar to the earlier example (but here of course I am estimating the population-level coefficients too). Any help is appreciated!

data {
int<lower=1> N; //the number of observations
int<lower=1> J; //the number of groups
int<lower=1> K; //number of columns in the model matrix
int<lower=1,upper=J> id[N]; //vector of group indices
matrix[N,K] X; //the model matrix
vector[N] y; //the response variable
}
parameters {
vector[K] gamma; //population-level regression coefficients
vector<lower=0>[K] tau; //the standard deviation of the regression coefficients
vector[K] beta[J]; //matrix of group-level regression coefficients
real<lower=0> sigma; //standard deviation of the individual observations
}
transformed parameters {
vector[N] mu; //linear predictor
vector[N] beta_ints;
vector[N] beta_slopes;
for(n in 1:N){
mu[n] = X[n] * beta[id[n]]; //compute the linear predictor using relevant group-level regression //coefficients
beta_ints[n] = beta[id[n], 1];
beta_slopes[n] = beta[id[n], 2];
}
}
model {
gamma ~ student_t(3, 0, 10); //prior on pop-level coefficients
tau ~ cauchy(0, 10); //prior on pop-level sd
sigma ~ gamma(2,0.1); //prior on group-level sd
for(j in 1:J){
beta[j] ~ normal(gamma,tau); //fill the matrix of group-level regression coefficients
}
y ~ normal(mu,sigma);
}
generated quantities {
real log_lik;
log_lik =
normal_lpdf(y | mu, sigma);
}

I’m not sure if I understand. In that model beta_ints and beta_slopes are just (per-observation) copies of (per-group) beta. You can drop those and instead analyze the betas. In particular, the group intercepts are beta[1:J,1].

Sorry, I just saw this (and stuck with the non-hierarchical mode). Interestingly, this messes with loo(). I didn’t even think to check this but when I went to compare models, the models I extract the coefficients to be used in another step produced crazy looic values and loo values that are all “very bad”. Any ideas as to why this may be? Thanks for all your help!

Thanks @nhuurre! However, the bad loo values result from the same model with “good” loo values, the only difference is using the extra code to extract those coefficients.

In other words, when I do not attempt to extract the coefficients from the model, I get fine loo values. When I do use the extra code to extract the coefficients in the same model with otherwise the same code, I get “very bad” loo values. Since the loo is estimated using the log lik, I have no idea as to why this would occur, as the log lik isn’t changing. The only change is just extracting and manipulating the coefficient values in the transformed parameters block…

This definitely sounds strange. I agree that this shouldn’t affect loo. @bigmanj Are you able to share the data you’re using to fit the model and the latest version of your Stan program that leads to this problem with loo? Or if your data is private could you share some simulated data so I could try fitting your model and see if I can figure out why you’re seeing this strange behavior? Hopefully I’ll be able to figure out what’s going on here.