# Subset multiple columns of a vector in transformed parameters block

Hi all,

I am trying to subset specific columns from a vector but some are non-sequential. I have looked through the Stan Reference Manual and don’t see an example. I think it should be an easy fix but by now I have spent too long on a seemingly trivial operation. Any advice would be appreciated.

Stan code:

data {
int N; // number of observations
vector[N] y; //
int K; // number of predictors
matrix[N, K] data_matrix; // matrix of predictors
}

parameters {
vector[K] beta;
real<lower=0> sigma;
}

transformed parameters {
vector[N] mu_y;
vector[N] beta_ints;

mu_y = data_matrix * beta;
beta_ints = beta[1, 3:39]; // trying to extract the 1 and 3 through 39 cols
}

model {
beta ~ student_t(3, 0, 10);
sigma ~ cauchy(0, 10);

y ~ normal(mu_y, sigma);
}

generated quantities {
real log_lik;

log_lik =
normal_lpdf(y | mu_y, sigma);
}

I have also tried to subset them separately and then use ‘append_col’ but that is not working (even though the stan functions reference manual (section 5.10; https://mc-stan.org/docs/2_24/functions-reference/matrix-concatenation.html) says you should be able to use append col on a vector and a matrix…). Not sure what I am doing wrong here.

e.g.,

transformed parameters {
vector[N] mu_y;
real beta_int;
vector[N] beta_int_vector
vector[N] beta_ints;
vector[N] beta_ints_all;

mu_y = data_matrix * beta;
beta_int = beta[1];
beta_int_vector = to_vector([beta_int]);
beta_ints = beta[3:39];
beta_ints_all = append_col(beta_int_vector, beta_ints);
}

Thanks!

``````transformed parameters {
vector[38] beta_ints;

beta_ints[1] = beta[1];
beta_ints[2:38] = beta[3:39];
}
``````
2 Likes

Thank you! That worked perfectly.

May I also ask how this would work in a hierarchical model? If I proceed as below, the result is a beta_int and a beta_slope posterior for each observation (and the posteriors for the observations in the same group have the exact same posterior of course). I get why this is happening, however, I only want to extract the unique posteriors, so one slope and one intercept for each group rather than for each observation. This is similar to the earlier example (but here of course I am estimating the population-level coefficients too). Any help is appreciated!

``````data {
int<lower=1> N; //the number of observations
int<lower=1> J; //the number of groups
int<lower=1> K; //number of columns in the model matrix
int<lower=1,upper=J> id[N]; //vector of group indices
matrix[N,K] X; //the model matrix
vector[N] y; //the response variable
}

parameters {
vector[K] gamma; //population-level regression coefficients
vector<lower=0>[K] tau; //the standard deviation of the regression coefficients

vector[K] beta[J]; //matrix of group-level regression coefficients
real<lower=0> sigma; //standard deviation of the individual observations

}

transformed parameters {

vector[N] mu; //linear predictor
vector[N] beta_ints;
vector[N] beta_slopes;

for(n in 1:N){
mu[n] = X[n] * beta[id[n]]; //compute the linear predictor using relevant group-level regression //coefficients
beta_ints[n] = beta[id[n], 1];
beta_slopes[n] = beta[id[n], 2];
}
}

model {
gamma ~ student_t(3, 0, 10); //prior on pop-level coefficients
tau ~ cauchy(0, 10); //prior on pop-level sd
sigma ~ gamma(2,0.1); //prior on group-level sd

for(j in 1:J){
beta[j] ~ normal(gamma,tau); //fill the matrix of group-level regression coefficients
}

y ~ normal(mu,sigma);

}

generated quantities {

real log_lik;

log_lik =
normal_lpdf(y | mu, sigma);
}
``````

I’m not sure if I understand. In that model `beta_ints` and `beta_slopes` are just (per-observation) copies of (per-group) `beta`. You can drop those and instead analyze the `beta`s. In particular, the group intercepts are `beta[1:J,1]`.

Sorry, I just saw this (and stuck with the non-hierarchical mode). Interestingly, this messes with loo(). I didn’t even think to check this but when I went to compare models, the models I extract the coefficients to be used in another step produced crazy looic values and loo values that are all “very bad”. Any ideas as to why this may be? Thanks for all your help!

Bad loo values might mean the model is very flexible. For example, maybe there are too few observations per group.

Thanks @nhuurre! However, the bad loo values result from the same model with “good” loo values, the only difference is using the extra code to extract those coefficients.

In other words, when I do not attempt to extract the coefficients from the model, I get fine loo values. When I do use the extra code to extract the coefficients in the same model with otherwise the same code, I get “very bad” loo values. Since the loo is estimated using the log lik, I have no idea as to why this would occur, as the log lik isn’t changing. The only change is just extracting and manipulating the coefficient values in the transformed parameters block…

Any ideas?

Thanks!

No idea. Maybe @jonah knows what’s going on here.

This definitely sounds strange. I agree that this shouldn’t affect loo. @bigmanj Are you able to share the data you’re using to fit the model and the latest version of your Stan program that leads to this problem with loo? Or if your data is private could you share some simulated data so I could try fitting your model and see if I can figure out why you’re seeing this strange behavior? Hopefully I’ll be able to figure out what’s going on here.

Thank you @jonah ! I am working on sending you the models, model call, and data now. Can I email them?

Thanks again.

No problem. There should be a button that lets you upload files when responding to a post, but if that doesn’t work for you then yeah email is fine.

Awesome, thanks! I just saw the upload button, so will try that. Thanks again.

1 Like