Strange error requiring generated quantitites

I’ll try.

Ok it compiles with cmdstanr so why won’t it with rstan. Any clues ?

My bad… does not compile even with cmdstan… forgot to turn on multithreading while compiling earlier…

any suggestions on whats going wrong ?

If I copy-paste that model to a Stan file and then try to compile with cmdstanr I error with

Ill-typed arguments supplied to function 'reduce_sum'. Expected arguments:
(array[] int, int, int, matrix, matrix, vector, matrix, int, array[] int) => real, array[] int, int, matrix, matrix, vector, matrix, int, array[] int

Instead supplied arguments of incompatible type: (array[] int, int, int, matrix, matrix, vector, matrix, int, array[] int) => real, array[] int, int, matrix, matrix, vector, array[] vector, int, array[] int
make: *** [/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/Rtmp5yrh3D/model-372f58bfb7ab.hpp] Error 1

That’s because partial_sum expects rbeta as a matrix but you pass rbeta as as an array of vectors. Maybe that’s the whole problem? The error message you’re seeing certainly isn’t a helpful one. Maybe a later version of rstan or cmdstan would fix that?

1 Like

Wait, if you were able to compile this without turning on multithreading it strongly suggests to me that the Stan program that you included above is not identical to the one you’re actually using. No way should this one compile with or without multithreading, due to the mis-typed argument in partial_sum

1 Like

@jscocolar you are right I got that error then I converted rbeta to matrix and it compiled in single threaded version.

This is the code that compiled:

functions {
  real partial_sum(int[] y_slice,
                   int start, 
                   int end,
                   matrix x1,
                   matrix x2,
                   vector beta,
                   matrix rbeta,
                   int J, 
                   int[] ll) {

    return bernoulli_logit_lpmf(y_slice[start:end] |x1[start:end,:] * beta + (x2[start:end,:] .* rbeta[ll[start:end]]) * rep_vector(1,J));
  }
}
data {
  int<lower=0> N;//Number of observations
  int<lower=1> J;//Number of predictors with random slope
  int<lower=1> K;//Number of predictors with non-random slope
  int<lower=1> L;//Number of customers/groups
  int<lower=0,upper=1> y[N];//Binary response variable
  int<lower=1,upper=L> ll[N];//Number of observations in groups
  matrix[N,K] x1;
  matrix[N,J] x2;
}
parameters {
  row_vector[J] rbeta_mu; //mean of distribution of beta parameters
  row_vector<lower=0>[J] rbeta_sigma; //variance of distribution of beta parameters
  row_vector[J] beta_raw[L]; //group-specific parameters beta
  vector[K] beta;
}
transformed parameters {
  matrix[L,J] rbeta;
  for (l in 1:L)
    rbeta[l] = rbeta_mu + rbeta_sigma .* beta_raw[l]; // coefficients on x
}
model {
  rbeta_mu ~ normal(0,10);
  rbeta_sigma ~ cauchy(0,5);
  beta~normal(0,5);
  
  for (l in 1:L){
    beta_raw[l] ~ normal(0,1);
  }

  target += reduce_sum(partial_sum,y,1,x1,x2,beta,rbeta,J,ll);
}


But this code does not compile in multithreaded in cmdstan or on rstan.

1 Like

Thanks for providing the updated model! This compiles just fine for me in single- or multithreaded mode with cmdstan 2.27.0

cmdstanr::cmdstan_model("/Users/JacobSocolar/Desktop/testmod.stan")
cmdstanr::cmdstan_model("/Users/JacobSocolar/Desktop/testmod.stan", 
                           cpp_options = list(stan_threads = TRUE))

Interestingly, I was able to reproduce your original error with rstan 2.26. Updating to 2.27 eliminated the issue. I cannot be certain whether the issue was a bug in 2.26 or whether it was a problem with my rstan installation that was fixed by re-installing.

So either there is a bug in earlier versions that is fixed in 2.27 or there’s a problem (apparently a common one) with your rstan installation. In either case, updating to the latest Stan seems to be a fix. To update to the latest rstan, do

To update to the latest Cmdstan, do

cmdstanr::install_cmdstan()

I am on the latest version of rstan and cmdstan. Yet it does not compile for me.

This is what happens when I try to compile it:

I have already unistalled, R, Rstudio, cmdstan and have done a fresh install all over.

I am using the experimental version of rstan and have installed the latest cmdstan as well.

So if it compiled for you can you tell me what changes you made if any ?

If i try to sample from this file I get these errors :

I have no issues compiling your model copy-pasted from your discourse post. I don’t have access to your data, so I cannot attempt to sample the compiled model. In a very cursory glance at the images of the output from the compiler that you posted above I didn’t see an error message, so maybe compiling is working fine on your end as well.

Perhaps there’s a problem in your data. One useful check might be: can you compile and sample the example reduce_sum model here:
https://mc-stan.org/users/documentation/case-studies/reduce_sum_tutorial.html

If you’re able to share your data, I’d be happy to check whether or not I can sample from your model+data.

I was able to run the example in the link you shared without issues. So am assuming it is to do with data.

Here is a generated data based on my actual data. I cannot share my actual data but this is the closest thing.

The variables X1 and X2 are fixed effects in case of the model X1 and the rest X3:X1 are random effects that is in the stan model would be X2.

customer_no would be ll in stan.
sample_data.csv (69.7 KB)
All your help is greatly appreciated.
test_reduce_sum.R (752 Bytes)

So after some experimenting, I have realised the problem is to do with random effects.
Since I was able to compile and sample from the model with all effects being fixed but when I introduced even a single random effect like only random intercept I got errors like before :

I can’t run this code. It contains

y=data$bought[1:500]

but there is no column bought in data.

Edit, if I replace $bought with $y I can run and reproduce your error

Edit 2: Nevermind, I still can’t reproduce. I error because y, L, and ll are all the wrong size in the data. If I replace the associated [1:500]s with [1:1000]s, I progress to a new set of errors that still aren’t your error. These new errors are probably related to the fact that in your partial_sum function, you are indexing into rbeta, a matrix, using a single index instead of two indices.

1 Like

My sense here, and this is consistent with all of the various error messages that you’ve reported, is that your problems all have to do with your indexing. I suggest you try to go over the indexing with a fine-toothed comb using synthetic, shareable data. If you can’t get it working, then if you could re-post exactly the data file, stan file, and and R script that reproduces the error, with no changes from the version that you are running in order to see the error, then we can take a look and try to troubleshoot the issue.

I agree it is an indexing issue, but then if that is the issue the random intercept only model should work but doesn’t.

Here is the stan code, exact data and r code am using to check.
reduce_sum_intercetp.R (3.2 KB)
reduce_sum_logit_int.stan (1.2 KB)

sample_data.csv (67.8 KB)

This is the error I get:

Exact stan code I used:

functions {
real partial_sum(int[] y_slice,
int start, int end,
matrix x,
vector beta,
vector rbeta,
int[] ll) {
return bernoulli_logit_lpmf(y_slice[start:end] |(x[start:end,] * beta) + rbeta[ll[start:end]]);
}
}
data {
int<lower=0> N;//Number of observations
int<lower=1> K;//Number of predictors with non-random slope
int<lower=1> L;//Number of customers/groups
int<lower=0,upper=1> y[N];//Binary response variable
int<lower=1,upper=L> ll[N];//Number of observations in groups
matrix[N,K] x;
}
parameters {
real rbeta_mu; //mean of distribution of beta parameters
real<lower=0> rbeta_sigma; //variance of distribution of beta parameters
vector[L] beta_raw; //group-specific parameters beta
vector[K] beta;
}
transformed parameters {
vector[L] rbeta;
for (l in 1:L)
rbeta[l] = rbeta_mu + rbeta_sigma * beta_raw[l]; // coefficients on x
}
model {
rbeta_mu ~ normal(0,10);
rbeta_sigma ~ cauchy(0,5);
beta~normal(0,5);

for (l in 1:L)
beta_raw[l] ~ normal(0,1);

target += reduce_sum(partial_sum,y,1,x,beta,rbeta,ll);
}

and snap shot of model data:

This code fails right out of the gate. You are trying to access column names that do not exist. Moreover, the file names for both the .csv file and the .stan file in the code are different from the file names that you’ve uploaded here, making clear that you haven’t run this exact code on either the stan file or the data that you sent over.

Happy to take a look at a good reproducible example when you get a chance :)

Oh sorry uploaded the wrong file. My bad.

Here are the right files.

reduce_sum_test.R (908 Bytes)
sample_data.csv (67.8 KB)
reduce_sum_logit_int.stan (1.2 KB)

1 Like

Your error was at line 8. You had y_slice[start:end] but instead you need y_slice. reduce_sum automatically slices its first argument.

Thanks for the reprex, it makes troubleshooting these minor bugs much easier!

1 Like

Thank you so much for your patience and help in this matter. I can’t believe something that small was causing so much of trouble.

1 Like