Neural network fitting but result gives wrong number of columns

Enzo_Lima · July 27, 2021, 11:33am

Hey guys! I’m new here, someone from stackoverflow recommended me to ask for help on this forum. If i’m doing anything wrong with this post please let me know! I’ve read the category description and tried to find a similar problem on this forum but i haven’t had any luck so far.

I want to learn how to fit a (bayesian) neural network model in stan with R. I’m following the example on this website: Neural Networks in Stan: Or how I was utterly surprised that it worked at all. | Stephen R. Martin, PhD

My first step was to literally copy and paste the code that was needed to fit a categorical model, and run it. It immediately failed. And i can not figure out why it is failing.

When i run the model, it gives me an output. But then when i want to read the posterior sample to get the predictions for my validation set, i get this error message:

Warning message:
In matrix(., N_test, 3, byrow = TRUE) :
  data length [151] is not a sub-multiple or multiple of the number of rows [50]

The R code:

library(rstan)
library(magrittr)

sm <- stan_model("./stan_model.stan")

fit_nn_cat <- function(x_train, y_train, x_test, y_test, H, n_H, method = "optimizing", ...) {
  stan_data <- list(
    N = nrow(x_train),
    P = ncol(x_train),
    x = x_train,
    labels = y_train,
    H = H,
    n_H = n_H,
    N_test = length(y_test)
  )
  if(method == "optimizing") {
    optOut <- optimizing(sm, data = stan_data)
    test_char <- paste0("output_test[",1:length(y_test), ",",rep(1:max(y_train), each = length(y_test)),"]") 
    y_test_pred <- matrix(optOut$par[test_char], stan_data$N_test, max(y_train))
    y_test_cat <- apply(y_test_pred, 1, which.max)
    out <- list(y_test_pred = y_test_pred,
                y_test_cat = y_test_cat,
                conf = table(y_test_cat, y_test),
                fit = optOut)
    return(out)
  } else if(method == "sampling") {
    out <- sampling(sm, data = stan_data, pars = "output_test", ...)
    return(out)
  } 
}

data(iris)
x <- iris[,1:4]
y <- as.numeric(as.factor((iris[,"Species"])))

N_test <- 50
test_indices <- sample(1:nrow(x), N_test)
x_train <- x[-test_indices,]
y_train <- y[-test_indices]
x_test <- x[test_indices,]
y_test <- y[test_indices]

fit_nuts <- fit_nn_cat(x_train, y_train, x_test, y_test, 2, 50, method = "sampling", cores = 4, iter = 1000)

cat_nuts <- summary(fit_nuts)$summary[,"mean"] %>%
  matrix(N_test, 3, byrow = TRUE) %>%
  apply(1, which.max)
table(cat_nuts, y_test)

The second-to-last line is where the error happens.

The STAN code:

functions {
  vector[] nn_predict(matrix x, matrix d_t_h, matrix[] h_t_h, matrix h_t_d, row_vector[] hidden_bias, row_vector y_bias) {
    int N = rows(x);
    int n_H = cols(d_t_h);
    int H = size(hidden_bias);
    int num_labels = cols(y_bias) + 1;
    matrix[N, n_H] hidden_layers[H];
    vector[num_labels] output_layer_logit[N];
    vector[N] ones = rep_vector(1., N);

    hidden_layers[1] = inv_logit(x * d_t_h + ones * hidden_bias[1]);
    for(h in 2:H) {
      hidden_layers[h] = inv_logit(hidden_layers[h-1] * h_t_h[h - 1] + ones * hidden_bias[h]);
    }
    for(n in 1:N) {
      output_layer_logit[n, 1] = 0.0;
      output_layer_logit[n, 2:num_labels] = (hidden_layers[H, n] * h_t_d + y_bias)';
    }
    return(output_layer_logit);
  }
}

data {
  int N; // Number of training samples
  int P; // Number of predictors (features)
  matrix[N, P] x; // Feature data
  int labels[N]; // Outcome labels
  int H; // Number of hidden layers
  int n_H; // Number of nodes per layer (All get the same)

  int N_test; // Number of test samples
  matrix[N_test, P] x_test; // Test predictors
}

transformed data {
  int num_labels = max(labels); // How many labels are there
}

parameters {
  matrix[P, n_H] data_to_hidden_weights; // Data -> Hidden 1
  matrix[n_H, n_H] hidden_to_hidden_weights[H - 1]; // Hidden[t] -> Hidden[t+1]
  matrix[n_H, num_labels - 1] hidden_to_data_weights; // Hidden[T] -> Labels. Base class gets 0.
  // ordered[n_H] hidden_bias[H]; // Use ordered if using NUTS
  row_vector[n_H] hidden_bias[H]; // Hidden layer biases
  row_vector[num_labels - 1] labels_bias; // Labels biases. Base class gets 0.
}

transformed parameters {
  vector[num_labels] output_layer_logit[N]; // Predicted output layer logits

  output_layer_logit = nn_predict(x,
                                  data_to_hidden_weights,
                                  hidden_to_hidden_weights,
                                  hidden_to_data_weights,
                                  hidden_bias,
                                  labels_bias);

}

model {
  // Priors
  to_vector(data_to_hidden_weights) ~ std_normal();

  for(h in 1:(H-1)) {
    to_vector(hidden_to_hidden_weights[h]) ~ std_normal();
  }

  to_vector(hidden_to_data_weights) ~ std_normal();

  for(h in 1:H) {
    to_vector(hidden_bias[h]) ~ std_normal();
  }
  labels_bias ~ std_normal();

  for(n in 1:N) { // Likelihood
    labels[n] ~ categorical_logit(output_layer_logit[n]);
  }
}

generated quantities {
  vector[num_labels] output_layer_logit_test[N_test] = nn_predict(x_test,
							   data_to_hidden_weights,
							   hidden_to_hidden_weights,
							   hidden_to_data_weights,
							   hidden_bias,
							   labels_bias);
  matrix[N_test, num_labels] output_test;
  for(n in 1:N_test) {
    output_test[n] = softmax(output_layer_logit_test[n])';
  }
}

At first i thought something was wrong with the parameter initialization. But i’m new to STAN and i probably don’t understand fully what is happening. In the for-loop at the end, i’d expect the output_test variable to get N_test columns. And N_test = 50. Yet in the returned model, output_test has 151 columns.

What is going wrong here?

Stephen_Martin · July 27, 2021, 5:17pm

I can’t re-run the code right now; but could you run:

ncol(as.matrix(fit_nuts, pars = "output_test"))

Enzo_Lima · July 28, 2021, 11:12am

I did, the result is 150. What do i do wrong in the code that defines cat_nuts then?

Stephen_Martin · July 28, 2021, 4:46pm

Bizarre.

I’m not sure. Could you just run the ‘piped’ stuff one at a time?

Actually, try this…

cat_nuts <- summary(fit_nuts, pars = "output_test")$summary[,"mean"] %>%
  matrix(N_test, 3, byrow = TRUE) %>%
  apply(1, which.max)
table(cat_nuts, y_test)

If that works, could you also run summary(fit_nuts)$summary and tell me which of those is /not/ output_test[i,j] ? It may be lp__ or something; I’m not sure what that ‘other’ variable is.

Enzo_Lima · July 28, 2021, 5:25pm

This was right on the mark. It ran! Thank you!!

I ran the summary and the last row is named lp__, like you said. All the other rows are output_test[i,j].

Topic		Replies	Views
Fitting a custom function in rstan Modeling rstan , fitting-issues	2	280	June 18, 2023
Time series model fitting issue Modeling fitting-issues	8	402	April 2, 2020
Model Problems and Syntax Error Modeling rstan , fitting-issues	1	336	August 29, 2021
Mismatch in dimensionality - Stan is fixing the dimensionality to a number (why?) RStan	4	453	June 7, 2019
Rstan - Spatial Error Model with Multiple Predictors Modeling specification , performance	2	611	May 31, 2019

Neural network fitting but result gives wrong number of columns

Related Topics