Hello,
I am coding up some Bayesian Neural Networks in Stan. Below I provide the simplest model (no hidden layers) to expose my question more easily.
I am wondering if there is an alternative to my code to make things more efficient. I have been going through stan docs and it seems there are some things I might be able to do, but then in practice the model is not compiling. I provide my questions below the code.
data {
// It is usefull to set the lower or upper values for the variables. This improves readiability
int<lower=0> N; // number of draws from the model
int<lower=2> C; // number of classes. Also for simplicity assume the input has same dimensionality
matrix[N,C] x; // iid observations. X is a matrix with rows being samples and colums the features
int<lower=1> y[N]; // iid observations. Y is an array of length N containing the class label
// definition of the prior p(w|0,I), passed as argument
real mu_w;
real<lower=0> std_w;
}
parameters {
matrix[C,C] W; // the parameter we want to infer
}
model {
// prior over the weights p(w)
to_vector(W) ~ normal(mu_w,std_w);
// likelihood p(t|NNet_w(x))
matrix[N,C] p = x*W; // get parameters from Categorical Likelihood
for (n in 1:N)
y[n] ~ categorical_logit(p[n]');
}
- First question: I was wondering if there would be a way to vectorize the likelhood evaluation. From Stan docs it seems we can 12.5 Categorical Distribution | Stan Functions Reference, but from stan examples it seems we can’t 1.6 Multi-Logit Regression | Stan User’s Guide . I was thinking in doing something like this:
// prior over the weights p(w)
to_vector(W) ~ normal(mu_w,std_w);
// likelihood p(t|NNet_w(x))
matrix[N,C] p = x*W; // get parameters from Categorical Likelihood
// reshape to work with batched categorical
vector[C] p_vec[N];
for (n in 1:N)
p_vec[n] = p[n]';
y ~ categorical_logit(p_vec);
But this code is not compiling, something weird because from what I understand, the categorical_logit can works with int arrays of y
and arrays of vectors p_vec
, as stated in section 12.5.4 here 12.5 Categorical Distribution | Stan Functions Reference . Not sure if I am interpreting the documentation correctly. In any case, what of the two options I expose is more efficient?. I am not sure since the later requires to copy elements in memory, but I have also understood from the documentation that evaluation of log probabilities using vectorized operations is much faster.
Following my previous observation, I have also tried the following code, which seems to compile and sample:
// prior over the weights p(w)
to_vector(W) ~ normal(mu_w,std_w);
// likelihood p(t|NNet_w(x))
matrix[N,C] p = x*W; // get parameters from Categorical Likelihood
y ~ categorical_logit(to_vector(p));
but I am not sure why this is working. From what I understand to_vector(p)
will create a column vector with N*C
elements. How does fit with the fact that the vector y
has shape N
?. Does Stan automatically split the vector p
in vectors matching the number of elements in y
?.
- Second question: For placing a prior that goes beyond the standard normal, let say with some correlation, and since there is no matrix normal distribution implemented yet in Stan, I guess the only way to do that would be through reshaping?. For example:
parameters {
matrix[C,C] W; // the parameter we want to infer
}
model {
// prior over the weights p(w)
W ~ to_matrix(multi_normal_cholesky(mu_w,L_w), C, C) // row_major vs column_major would depend on how do you want to establish correlations
// likelihood p(t|NNet_w(x))
matrix[N,C] p = x*W; // get parameters from Categorical Likelihood