K Fold Cross Validation with Logistic Regression Model

mdanb · April 11, 2022, 3:44am

I’m following along with this vignette to perform K-fold cross validation for a logistic regression model. I’ve included my model code at the end of this post. At one point, the author does this:

  fit <- sampling(stanmodel, data = data_train, seed = seed, refresh = 0)
  gen_test <- gqs(stanmodel, draws = as.matrix(fit), data= data_test)

I do something similar, except I use stan instead of sampling:

  fit <-  stan(model_code = logisticRegressionModel, data = trainList, 
             iter=10000)
  gen_test <- gqs(logisticRegressionModel, draws = as.matrix(fit), data= data_test)

When I do this, I get:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘gqs’ for signature ‘"character"’

I don’t quite understand what’s going wrong. But essentially what I’m trying to do is to compute the accuracy that I defined in generated quantities for the test set using the model I fit using the train set.

  logisticRegressionModel <- "
  data {
    int<lower=0> N;   // number of data items
    int<lower=0> K;   // number of predictors
    matrix[N, K] X;   // predictor matrix
    int y[N];      // outcome vector
  }
  parameters {
    real alpha;       // intercept
    vector[K] gamma;
    vector<lower=0>[K] tau;
    vector[K] beta;   // coefficients for predictors
  }
  model {
    // Priors:
    gamma ~ normal(0, 5);
    tau ~ cauchy(0, 2.5);
    alpha ~ normal(gamma, tau);
    beta ~ normal(gamma, tau);
    // Likelihood
    y ~ bernoulli_logit(alpha + X * beta);
  }
  generated quantities {
    vector[N] y_preds;
    real correct = 0;
    real accuracy;
    for (n in 1:N) {
      y_preds[n] = bernoulli_logit_rng(alpha + X[n] * beta);
      correct += logical_eq(y_preds[n], y[n]);
    }
    accuracy = correct / N;
  }
  "

jack_monroe · April 11, 2022, 4:15am

You might have better luck saving your model to a .stan file and calling it that way.

mdanb · April 11, 2022, 3:11pm

For some reason I can’t find documentation on how to do this. Could you point me to it?
EDIT: Actually, I was able to find it here. I’ll try it out and let you know what happens.

EDIT 2:
It seems like it works now when I do:

  fit <-  stan(file = "logisticRegressionModel.stan", data = trainList, 
              iter=10000)
  gen_test <- gqs(fit@stanmodel, draws = as.matrix(fit), 
                  data = valList)

Thank you! I’m curious I guess as to why this happens though

Bob_Carpenter · April 11, 2022, 8:22pm

There shouldn’t be any difference between passing the model code and passing a file containing the code. I’m guessing there may be some stray characters somewhere that made them different. If you are sure you have the exact same text both ways, that’s a bug in RStan and we’d really appreciate it if you could file a bug report. Thanks!

avehtari · April 12, 2022, 7:57am

Topic		Replies	Views
Model to calculate Misclassification error based on test data Modeling rstan , specification	3	679	November 5, 2020
K-fold validation for hierarchical model in rstan Modeling rstan , loo	4	1036	March 29, 2023
K-fold cross validation in cmdstanr - extracting loglikelihood from generated quantities CmdStan	2	1102	March 13, 2022
K-fold cross validation for large data models - stan's optimiser? Modeling	6	1540	August 29, 2017
LOO Model Comparison Alternative Modeling rstan , techniques , loo , cmdstanr	3	94	March 27, 2025

K Fold Cross Validation with Logistic Regression Model

Related topics