Hey guys! I’m new here, someone from stackoverflow recommended me to ask for help on this forum. If i’m doing anything wrong with this post please let me know! I’ve read the category description and tried to find a similar problem on this forum but i haven’t had any luck so far.
I want to learn how to fit a (bayesian) neural network model in stan with R. I’m following the example on this website: Neural Networks in Stan: Or how I was utterly surprised that it worked at all. | Stephen R. Martin, PhD
My first step was to literally copy and paste the code that was needed to fit a categorical model, and run it. It immediately failed. And i can not figure out why it is failing.
When i run the model, it gives me an output. But then when i want to read the posterior sample to get the predictions for my validation set, i get this error message:
Warning message:
In matrix(., N_test, 3, byrow = TRUE) :
data length [151] is not a sub-multiple or multiple of the number of rows [50]
The R code:
library(rstan)
library(magrittr)
sm <- stan_model("./stan_model.stan")
fit_nn_cat <- function(x_train, y_train, x_test, y_test, H, n_H, method = "optimizing", ...) {
stan_data <- list(
N = nrow(x_train),
P = ncol(x_train),
x = x_train,
labels = y_train,
H = H,
n_H = n_H,
N_test = length(y_test)
)
if(method == "optimizing") {
optOut <- optimizing(sm, data = stan_data)
test_char <- paste0("output_test[",1:length(y_test), ",",rep(1:max(y_train), each = length(y_test)),"]")
y_test_pred <- matrix(optOut$par[test_char], stan_data$N_test, max(y_train))
y_test_cat <- apply(y_test_pred, 1, which.max)
out <- list(y_test_pred = y_test_pred,
y_test_cat = y_test_cat,
conf = table(y_test_cat, y_test),
fit = optOut)
return(out)
} else if(method == "sampling") {
out <- sampling(sm, data = stan_data, pars = "output_test", ...)
return(out)
}
}
data(iris)
x <- iris[,1:4]
y <- as.numeric(as.factor((iris[,"Species"])))
N_test <- 50
test_indices <- sample(1:nrow(x), N_test)
x_train <- x[-test_indices,]
y_train <- y[-test_indices]
x_test <- x[test_indices,]
y_test <- y[test_indices]
fit_nuts <- fit_nn_cat(x_train, y_train, x_test, y_test, 2, 50, method = "sampling", cores = 4, iter = 1000)
cat_nuts <- summary(fit_nuts)$summary[,"mean"] %>%
matrix(N_test, 3, byrow = TRUE) %>%
apply(1, which.max)
table(cat_nuts, y_test)
The second-to-last line is where the error happens.
The STAN code:
functions {
vector[] nn_predict(matrix x, matrix d_t_h, matrix[] h_t_h, matrix h_t_d, row_vector[] hidden_bias, row_vector y_bias) {
int N = rows(x);
int n_H = cols(d_t_h);
int H = size(hidden_bias);
int num_labels = cols(y_bias) + 1;
matrix[N, n_H] hidden_layers[H];
vector[num_labels] output_layer_logit[N];
vector[N] ones = rep_vector(1., N);
hidden_layers[1] = inv_logit(x * d_t_h + ones * hidden_bias[1]);
for(h in 2:H) {
hidden_layers[h] = inv_logit(hidden_layers[h-1] * h_t_h[h - 1] + ones * hidden_bias[h]);
}
for(n in 1:N) {
output_layer_logit[n, 1] = 0.0;
output_layer_logit[n, 2:num_labels] = (hidden_layers[H, n] * h_t_d + y_bias)';
}
return(output_layer_logit);
}
}
data {
int N; // Number of training samples
int P; // Number of predictors (features)
matrix[N, P] x; // Feature data
int labels[N]; // Outcome labels
int H; // Number of hidden layers
int n_H; // Number of nodes per layer (All get the same)
int N_test; // Number of test samples
matrix[N_test, P] x_test; // Test predictors
}
transformed data {
int num_labels = max(labels); // How many labels are there
}
parameters {
matrix[P, n_H] data_to_hidden_weights; // Data -> Hidden 1
matrix[n_H, n_H] hidden_to_hidden_weights[H - 1]; // Hidden[t] -> Hidden[t+1]
matrix[n_H, num_labels - 1] hidden_to_data_weights; // Hidden[T] -> Labels. Base class gets 0.
// ordered[n_H] hidden_bias[H]; // Use ordered if using NUTS
row_vector[n_H] hidden_bias[H]; // Hidden layer biases
row_vector[num_labels - 1] labels_bias; // Labels biases. Base class gets 0.
}
transformed parameters {
vector[num_labels] output_layer_logit[N]; // Predicted output layer logits
output_layer_logit = nn_predict(x,
data_to_hidden_weights,
hidden_to_hidden_weights,
hidden_to_data_weights,
hidden_bias,
labels_bias);
}
model {
// Priors
to_vector(data_to_hidden_weights) ~ std_normal();
for(h in 1:(H-1)) {
to_vector(hidden_to_hidden_weights[h]) ~ std_normal();
}
to_vector(hidden_to_data_weights) ~ std_normal();
for(h in 1:H) {
to_vector(hidden_bias[h]) ~ std_normal();
}
labels_bias ~ std_normal();
for(n in 1:N) { // Likelihood
labels[n] ~ categorical_logit(output_layer_logit[n]);
}
}
generated quantities {
vector[num_labels] output_layer_logit_test[N_test] = nn_predict(x_test,
data_to_hidden_weights,
hidden_to_hidden_weights,
hidden_to_data_weights,
hidden_bias,
labels_bias);
matrix[N_test, num_labels] output_test;
for(n in 1:N_test) {
output_test[n] = softmax(output_layer_logit_test[n])';
}
}
At first i thought something was wrong with the parameter initialization. But i’m new to STAN and i probably don’t understand fully what is happening. In the for-loop at the end, i’d expect the output_test variable to get N_test
columns. And N_test
= 50. Yet in the returned model, output_test has 151 columns.
What is going wrong here?