Mismatch in number of dimensions declared and found

I’m trying to pass a matrix as data to a stan model (via cmdstanr), but somehow the data gets read as a vector,
generating the following error message:

mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=k; dims declared=(3,9); dims found=(27)

The reproducible example is below. I’ve seen several posts with the same error message, where e.g. a vector was passed as a 1-column matrix, but none of the solutions there (specify as.vector, which in my case’d become as.matrix()) seems to work.

Some context: the model is based on the memory exponential decay example in Lee&Wagenmakers Bayesian Cognitive Modeling. There is a slightly different implementation of the same model in stan by Smira here, which gives the exact same issue: https://github.com/stan-dev/example-models/blob/master/Bayesian_Cognitive_Modeling/CaseStudies/MemoryRetention/Retention_1_Stan.R

library(cmdstanr)

t     <- c(1, 2, 4, 7, 12, 21, 35, 59, 99, 200)
nt    <- length(t)
slist <- 1:4
ns    <- length(slist)

k1 <- matrix(c(18, 18, 16, 13, 9, 6, 4, 4, 4, NA,
               17, 13,  9,  6, 4, 4, 4, 4, 4, NA,
               14, 10,  6,  4, 4, 4, 4, 4, 4, NA,
               NA, NA, NA, NA,NA,NA,NA,NA,NA, NA), nrow=ns, ncol=nt, byrow=T)

k <- k1[1:(ns - 1), 1:(nt - 1)]   # Excluding NAs (for Stan solution)

n <- 18

data <- list(k=k, n=n, t=t, ns=ns, nt=nt) # To be passed on to Stan

myinits <- list(
  list(alpha=.5, beta=.1))


stan_file <- write_stan_file("
// Retention With No Individual Differences

data {
  int<lower=0> ns; // n of participants
  int<lower=0> nt; // n of temporal points
  int<lower=0> n; // n of stimuli to remember
  int<lower=0> k[ns - 1,nt - 1]; // successes
  int<lower=0> t[nt]; // time from training
}

// The parameters accepted by the model.
parameters {
  real<lower=0,upper=1> alpha;
  real<lower=0,upper=1> beta;
}

transformed parameters {
  
  matrix<lower=0,upper=1>[ns,nt] theta;
  
  // Retention Rate At Each Lag For Each Subject Decays Exponentially
  for (i in 1:ns) {
    for (j in 1:nt) {
      theta[i,j] = fmin(1, exp(-alpha * t[j]) + beta);
    }
  }
}

// The model to be estimated. 
model {
  // Priors
  alpha ~ beta(1, 1);  // can be removed
  beta ~ beta(1, 1);  // can be removed
  
  // Observed Data
  for (i in 1:(ns - 1))
    for (j in 1:(nt - 1))
      k[i,j] ~ binomial(n, theta[i,j]);
}

generated quantities {
  int<lower=0,upper=n> predk[ns,nt];
  
  // Predicted Data
  for (i in 1:ns)
    for (j in 1:nt)
      predk[i,j] = binomial_rng(n , theta[i,j]);
}")

mod <- cmdstan_model(stan_file, cpp_options = list(stan_threads = TRUE), pedantic = TRUE)

samples <- mod$sample(
  data = data,
  seed = 123,
  chains = 2,
  parallel_chains = 2,
  threads_per_chain = 2,
  iter_warmup = 2000,
  iter_sampling = 2000,
  refresh = 500,
  #init = myinits,
  max_treedepth = 20,
  adapt_delta = 0.99,
)

****

Hey @fusaroli,

thanks for the report.

Here is a simplified version of where we could see this issue:

library(cmdstanr)

t     <- c(1, 2, 4, 7, 12, 21, 35, 59, 99, 200)
nt    <- length(t)
slist <- 1:4
ns    <- length(slist)

k1 <- matrix(c(18, 18, 16, 13, 9, 6, 4, 4, 4, NA,
               17, 13,  9,  6, 4, 4, 4, 4, 4, NA,
               14, 10,  6,  4, 4, 4, 4, 4, 4, NA,
               NA, NA, NA, NA,NA,NA,NA,NA,NA, NA), nrow=ns, ncol=nt, byrow=T)
k <- k1[1:(ns - 1), 1:(nt - 1)]   # Excluding NAs (for Stan solution)

n <- 18

data <- list(k=k, n=n, t=t, ns=ns, nt=nt) # To be passed on to Stan

f <- tempfile(fileext = ".json")
write_stan_json(data, f)

print(f)

Then go and check the printed file. I am seeing this:

{
  "k": [
    [18, 18, 16, 13, 9, 6, 4, 4, 4],
    [17, 13, 9, 6, 4, 4, 4, 4, 4],
    [14, 10, 6, 4, 4, 4, 4, 4, 4]
  ],
  "n": 18,
  "t": [1, 2, 4, 7, 12, 21, 35, 59, 99, 200],
  "ns": 4,
  "nt": 10
}

which to me seems like its fine. Can your run the above script and paste the output.

1 Like

thank you.
Here is what I get:

{
  "k": [
    [18, 18, 16, 13, 9, 6, 4, 4, 4],
    [17, 13, 9, 6, 4, 4, 4, 4, 4],
    [14, 10, 6, 4, 4, 4, 4, 4, 4]
  ],
  "n": 18,
  "t": [1, 2, 4, 7, 12, 21, 35, 59, 99, 200],
  "ns": 4,
  "nt": 10
}

so this seems to work

Ok, nevermind me, its a bug with cmdstanr. We try to pre-process the data based on the info in the Stan model (dimensions). This was added because R does not differentiate between an array and a scalar really. Seems that added a bug for this case. Not immediately sure how, but I guess thats how bugs work :)

For your immediate help run the model like this:

data_file <- tempfile(fileext = ".json")
write_stan_json(data, data_file)
samples <- mod$sample(
  data = data_file,
  seed = 123,
  chains = 2,
  parallel_chains = 2,
  threads_per_chain = 2,
  iter_warmup = 2000,
  iter_sampling = 2000,
  refresh = 500,
  #init = myinits,
  max_treedepth = 20,
  adapt_delta = 0.99,
)

I will get to fixing this ASAP.

1 Like

much obliged! thanks!

Just a quick follow up that this issue has been resolved in cmdstanr. If you encounter it, please upgrade to the latest version of cmdstanr:

remotes::install_github("stan-dev/cmdstanr")

Thanks again for reporting!