Reading data into arrays of vectors

Hello:

This is just a simple how to question. On page 138 -139 of the STAN reference manual discussing hierarchical logistic regression the code for the data specifies the data (x) as an array of row_vectors and the code for the hierarchical regression parameters (beta) species them as an array of vectors. The code is given below. From R, how what would the list for reading in the array of row_vectors x data look like? And if I wanted to read in initial values for the vectors of beta, what would that list look like? For the purpose of illustration, let’s assume N = 25, D = 3, and L = 5.

I know that this seems like a simple question but I have done a fair amount of searching and cannot find any examples.

Thanks for your help.

Steve

data {
int<lower=1> D;
int<lower=0> N;
int<lower=1> L;
int<lower=0,upper=1> y[N];
int<lower=1,upper=L> ll[N];
row_vector[D] x[N];
}
parameters {
real mu[D];
real<lower=0> sigma[D];
vector[D] beta[L];
}
model {
for (d in 1:D) {
mu[d] ~ normal(0, 100);
for (l in 1:L)
beta[l,d] ~ normal(mu[d], sigma[d]);
}
for (n in 1:N)
y[n] ~ bernoulli(inv_logit(x[n] * beta[ll[n]]));
}

1 Like

In terms of providing the data to Stan, they’re no different. To initialize Stan objects, just think about building an N-dimensional array with indices in the same order as the data structure. The row vector vs. column vector doesn’t really matter here.

row_vector[D] x[N] is a list of length N where each element is a row vector of length D. So first index is N, second is D.

vector[D] x[N] is a list of length N where each element is a vector of length D. Again, first index is N, second is D.

You’d feed in either as an NxD array, if that makes sense (something like array(0, dim = c(N, D)) in R).

matrix[D, L] x[N] would be array(0, dim = c(N, D, L)) (list of N matrices with D columns and L rows).

That make sense?

Ok. So let’s say I have a N x D matrix XMAT with the data for x. I would do XARRAY <- as.array(XMAT). Then specify something like the mllogreg_dat = list(D = 3, N = 25, L = 5, x = XARRAY, ll = ldat, y = ydat) and then call stan with data = mllogreg_dat. Is that right?

Then would the same thing work for reading in the intial values.? For example lets say I have BETAMAT as the LxD matrix for the initial values. Each row of the matrix is the vector for beta[l].
Then if I do BETAARRAY = as.array(BETAMAT) and specify init.values = list(beta = BETAARRAY, …) and call stan with init=list(init.values) this will place each row of BETAARRAY (originally BETAMAT) as each of the L elements of beta in the STAN program.

Am I understanding this right?

Sorry for being so pedantic

XMAT could stay as a matrix there, but I don’t see any obvious problems.

For the initial conditions, you have to supply initial conditions for each chain. The input format should be similar, but if there are 4 chains, you’ll need to do something like:

list(list_of_initial_conditions_for_chain_1,
  list_of_initial_conditions_for_chain_2,
  list_of_initial_conditions_for_chain_3,
  list_of_initial_conditions_for_chain_4)

Again, LxD is the right shape and I think a matrix would work fine (no need to cast). The types are a little handwavy at the interface I think.

Btw usually Stan does a pretty good job of picking random initial conditions. If you can get away without specifying them, do that.

It all seems to work.

Thanks so much for your help!

Steve

1 Like