Neural network fitting but result gives wrong number of columns

Hey guys! I’m new here, someone from stackoverflow recommended me to ask for help on this forum. If i’m doing anything wrong with this post please let me know! I’ve read the category description and tried to find a similar problem on this forum but i haven’t had any luck so far.

I want to learn how to fit a (bayesian) neural network model in stan with R. I’m following the example on this website: Neural Networks in Stan: Or how I was utterly surprised that it worked at all. | Stephen R. Martin, PhD

My first step was to literally copy and paste the code that was needed to fit a categorical model, and run it. It immediately failed. And i can not figure out why it is failing.

When i run the model, it gives me an output. But then when i want to read the posterior sample to get the predictions for my validation set, i get this error message:

Warning message:
In matrix(., N_test, 3, byrow = TRUE) :
  data length [151] is not a sub-multiple or multiple of the number of rows [50]

The R code:

library(rstan)
library(magrittr)

sm <- stan_model("./stan_model.stan")

fit_nn_cat <- function(x_train, y_train, x_test, y_test, H, n_H, method = "optimizing", ...) {
  stan_data <- list(
    N = nrow(x_train),
    P = ncol(x_train),
    x = x_train,
    labels = y_train,
    H = H,
    n_H = n_H,
    N_test = length(y_test)
  )
  if(method == "optimizing") {
    optOut <- optimizing(sm, data = stan_data)
    test_char <- paste0("output_test[",1:length(y_test), ",",rep(1:max(y_train), each = length(y_test)),"]") 
    y_test_pred <- matrix(optOut$par[test_char], stan_data$N_test, max(y_train))
    y_test_cat <- apply(y_test_pred, 1, which.max)
    out <- list(y_test_pred = y_test_pred,
                y_test_cat = y_test_cat,
                conf = table(y_test_cat, y_test),
                fit = optOut)
    return(out)
  } else if(method == "sampling") {
    out <- sampling(sm, data = stan_data, pars = "output_test", ...)
    return(out)
  } 
}

data(iris)
x <- iris[,1:4]
y <- as.numeric(as.factor((iris[,"Species"])))

N_test <- 50
test_indices <- sample(1:nrow(x), N_test)
x_train <- x[-test_indices,]
y_train <- y[-test_indices]
x_test <- x[test_indices,]
y_test <- y[test_indices]

fit_nuts <- fit_nn_cat(x_train, y_train, x_test, y_test, 2, 50, method = "sampling", cores = 4, iter = 1000)

cat_nuts <- summary(fit_nuts)$summary[,"mean"] %>%
  matrix(N_test, 3, byrow = TRUE) %>%
  apply(1, which.max)
table(cat_nuts, y_test)

The second-to-last line is where the error happens.

The STAN code:

functions {
  vector[] nn_predict(matrix x, matrix d_t_h, matrix[] h_t_h, matrix h_t_d, row_vector[] hidden_bias, row_vector y_bias) {
    int N = rows(x);
    int n_H = cols(d_t_h);
    int H = size(hidden_bias);
    int num_labels = cols(y_bias) + 1;
    matrix[N, n_H] hidden_layers[H];
    vector[num_labels] output_layer_logit[N];
    vector[N] ones = rep_vector(1., N);

    hidden_layers[1] = inv_logit(x * d_t_h + ones * hidden_bias[1]);
    for(h in 2:H) {
      hidden_layers[h] = inv_logit(hidden_layers[h-1] * h_t_h[h - 1] + ones * hidden_bias[h]);
    }
    for(n in 1:N) {
      output_layer_logit[n, 1] = 0.0;
      output_layer_logit[n, 2:num_labels] = (hidden_layers[H, n] * h_t_d + y_bias)';
    }
    return(output_layer_logit);
  }
}

data {
  int N; // Number of training samples
  int P; // Number of predictors (features)
  matrix[N, P] x; // Feature data
  int labels[N]; // Outcome labels
  int H; // Number of hidden layers
  int n_H; // Number of nodes per layer (All get the same)

  int N_test; // Number of test samples
  matrix[N_test, P] x_test; // Test predictors
}

transformed data {
  int num_labels = max(labels); // How many labels are there
}

parameters {
  matrix[P, n_H] data_to_hidden_weights; // Data -> Hidden 1
  matrix[n_H, n_H] hidden_to_hidden_weights[H - 1]; // Hidden[t] -> Hidden[t+1]
  matrix[n_H, num_labels - 1] hidden_to_data_weights; // Hidden[T] -> Labels. Base class gets 0.
  // ordered[n_H] hidden_bias[H]; // Use ordered if using NUTS
  row_vector[n_H] hidden_bias[H]; // Hidden layer biases
  row_vector[num_labels - 1] labels_bias; // Labels biases. Base class gets 0.
}

transformed parameters {
  vector[num_labels] output_layer_logit[N]; // Predicted output layer logits

  output_layer_logit = nn_predict(x,
                                  data_to_hidden_weights,
                                  hidden_to_hidden_weights,
                                  hidden_to_data_weights,
                                  hidden_bias,
                                  labels_bias);

}

model {
  // Priors
  to_vector(data_to_hidden_weights) ~ std_normal();

  for(h in 1:(H-1)) {
    to_vector(hidden_to_hidden_weights[h]) ~ std_normal();
  }

  to_vector(hidden_to_data_weights) ~ std_normal();

  for(h in 1:H) {
    to_vector(hidden_bias[h]) ~ std_normal();
  }
  labels_bias ~ std_normal();

  for(n in 1:N) { // Likelihood
    labels[n] ~ categorical_logit(output_layer_logit[n]);
  }
}

generated quantities {
  vector[num_labels] output_layer_logit_test[N_test] = nn_predict(x_test,
							   data_to_hidden_weights,
							   hidden_to_hidden_weights,
							   hidden_to_data_weights,
							   hidden_bias,
							   labels_bias);
  matrix[N_test, num_labels] output_test;
  for(n in 1:N_test) {
    output_test[n] = softmax(output_layer_logit_test[n])';
  }
}

At first i thought something was wrong with the parameter initialization. But i’m new to STAN and i probably don’t understand fully what is happening. In the for-loop at the end, i’d expect the output_test variable to get N_test columns. And N_test = 50. Yet in the returned model, output_test has 151 columns.

What is going wrong here?

1 Like

I can’t re-run the code right now; but could you run:

ncol(as.matrix(fit_nuts, pars = "output_test"))

I did, the result is 150. What do i do wrong in the code that defines cat_nuts then?

Bizarre.

I’m not sure. Could you just run the ‘piped’ stuff one at a time?

Actually, try this…

cat_nuts <- summary(fit_nuts, pars = "output_test")$summary[,"mean"] %>%
  matrix(N_test, 3, byrow = TRUE) %>%
  apply(1, which.max)
table(cat_nuts, y_test)

If that works, could you also run summary(fit_nuts)$summary and tell me which of those is /not/ output_test[i,j] ? It may be lp__ or something; I’m not sure what that ‘other’ variable is.

This was right on the mark. It ran! Thank you!!

I ran the summary and the last row is named lp__, like you said. All the other rows are output_test[i,j].