Reduce_sum error

jsocolar · May 12, 2020, 10:03pm

Hi all,
I am trying to fit a model using reduce_sum via cmdstanr, and I’m getting behavior that I don’t understand. If I compile the model via cmdstan_model("filepath", threads = F), everything runs as I’d expect. However, if I compile the model with threads = T things get a bit weird. Irrespective how many threads I set with set_num_threads, the model runs in parallel as long as I use grainsize of at least one fourth the size of my dataset. Otherwise, sampling fails immediately with:

Chain 1 Exception: Exception: bernoulli_logit_lpmf: Random variable has dimension = 0, expecting dimension = 4; a function was called with arguments of different scalar, array, vector, or matrix types, and they were not consistently sized; all arguments must be scalars or multidimensional values of the same shape. (in ‘/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/RtmpMutOFK/model-6fba47b8ffe9.stan’, line 27, column 16 to line 28, column 77) (in ‘/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/RtmpMutOFK/model-6fba47b8ffe9.stan’, line 37, column 8 to column 23)

If I use grainsize = 1, sampling fails without an informative error message. Again, this happens irrespective of how many threads I’ve set.

I’ve built a relatively lightweight reproducible example (boiled down from a more complex example, so the code might not be as elegant as it otherwise might be). Stan code is below, followed by R code for data simulation and analysis.

This is cmdstan 2.23.0 on Mojave. A colleague confirmed that the same issues also arose on a Linux cluster, but has not yet run the specific reproducible examples provided below.

Really appreciate any help here–presumably I’m missing something fundamental.
Jacob

Stan code

functions{
    real partial_sum(int[,] det_slice, 
                     int start, int end, 
                     vector b0, 
                     vector d0, 
                     vector d5,
                     row_vector[] vis_cov1,
                     int[] id_sp, 
                     int[] Q) {
        // indexing vars
        int len = end - start + 1;
        int r0 = start - 1;
        
        vector[len] lp;
        vector[len] logit_psi;
        row_vector[4] logit_theta[len];
        for (r in 1:len) {
            // calculate psi & theta
            logit_psi[r] = b0[id_sp[r0+r]];
            logit_theta[r] = d0[id_sp[r0+r]] + d5[id_sp[r0+r]]*vis_cov1[r0+r];
            // likelihood
            if (Q[r0 + r] == 1) 
                lp[r] = log_inv_logit(logit_psi[r]) +
                    bernoulli_logit_lpmf(det_slice[r0 + r] | logit_theta[r]);
            else lp[r] = log_sum_exp(
                log_inv_logit(logit_psi[r]) +
                    log1m_inv_logit(logit_theta[r, 1]) +
                    log1m_inv_logit(logit_theta[r, 2]) +
                    log1m_inv_logit(logit_theta[r, 3]) +
                    log1m_inv_logit(logit_theta[r, 4]),
                log1m_inv_logit(logit_psi[r]));
        } 
        return sum(lp);
    }
}
data {
    // dimensions
    int<lower=1> n_visit; //fixed number of visits
    int<lower=1> n_species; //number of species
    int<lower=1> n_tot; // nrows in df
    
    // indexing variables
    int<lower=1> id_sp[n_tot]; 
    int<lower=0, upper=1> Q[n_tot]; // species:cluster
    
    // data & covariates
    row_vector[n_visit] vis_cov1[n_tot]; // visit covariate 1
    int<lower=0, upper=1> det_data[n_tot, n_visit]; // detection history
    
    // grainsize for reduce_sum 
    int<lower=1> grainsize;
}
parameters {
    real mu_b0;
    real<lower=0> sigma_b0;
    vector[n_species] b0_raw;
    
    real mu_d0;
    real<lower=0> sigma_d0;
    vector[n_species] d0_raw;
    
    real mu_d5;
    real<lower=0> sigma_d5;
    vector[n_species] d5_raw;
}
transformed parameters{
    // occupancy
    vector[n_species] b0 = mu_b0 + b0_raw * sigma_b0;
    
    // detection
    vector[n_species] d0 = mu_d0 + d0_raw * sigma_d0;
    vector[n_species] d5 = mu_d5 + d5_raw * sigma_d5;
}
model {
    //Likelihood
    target += reduce_sum(partial_sum, det_data, grainsize, b0, d0, d5, vis_cov1, id_sp, Q);

    // Hyper-priors:
    mu_b0 ~ normal(0,10);
    
    mu_d0 ~ normal(0,10);
    mu_d5 ~ normal(0,10);
    
    sigma_b0 ~ normal(0,10);
    
    sigma_d0 ~ normal(0,10);
    sigma_d5 ~ normal(0,10);
    
    //Random Effects
    b0_raw ~ normal(0, 1);
    
    d0_raw ~ normal(0, 1);
    d5_raw ~ normal(0, 1);
}

R code

library("cmdstanr"); library("dplyr"); library("posterior")
num_chains <- 1

set.seed(101)
# Define data size ----
n_point <- 20
n_species <- 10
n_visit <- 4

# Define covariates
vis_cov1 <- matrix(data = NA, nrow = n_point, ncol = max(n_visit))
for(i in 1:n_point){
  vis_cov1[i, ] <- runif(n_visit) - .5
}

# Define parameters ----
# Hyperparameters
occ.hyper <- list(b0 = c(0, .5))
b0 <- rnorm(n_species, occ.hyper$b0[1], occ.hyper$b0[2])

det.hyper <- list(d0 = c(-2, .5), d1 = c(0, 1))
d0 <- rnorm(n_species, det.hyper$d0[1], det.hyper$d0[2])
d1 <- rnorm(n_species, det.hyper$d1[1], det.hyper$d1[2])

# Simulate parameters from hyperparameters
logit.occ <- psi <- matrix(NA, nrow = n_point, ncol = n_species)
for(i in 1:n_point){
  for(k in 1:n_species){
    logit.occ[i, k] <- b0[k]
    psi[i, k] <- boot::inv.logit(logit.occ[i, k])
  }
}

logit.det <- theta <- array(NA, dim = c(n_point, max(n_visit), n_species))
for(i in 1:n_point){
  for(j in 1:n_visit){
    for(k in 1:n_species){
      logit.det[i, j, k] <- d0[k] + d1[k]*vis_cov1[i,j]
      theta[i, j, k] <- boot::inv.logit(logit.det[i, j, k])
    }
  }
}

# Simulate data ----
Z <- matrix(NA, nrow = n_point, ncol = n_species)
for(i in 1:n_point){
  for(k in 1:n_species){
    Z[i, k] <- rbinom(1, 1, psi[i, k])
  }
}

det_data <- array(NA, dim = c(n_point, max(n_visit), n_species))
for(i in 1:n_point){
  for(j in 1:n_visit){
    for(k in 1:n_species){
      det_data[i,j,k] <- Z[i, k] * rbinom(1, 1, theta[i,j,k])
    }
  }
}

Q <- apply(det_data, c(1,3), function(x){return(as.numeric(sum(x) > 0))})

stan.data_J <- list(n_point = n_point, n_species = n_species, 
                    n_visit = n_visit,
                    det_data = det_data, 
                    Q = Q, 
                    vis_cov1 = vis_cov1)

# Format for Stan ----
det_df <- apply(det_data, 3, as_data_frame) %>%
  bind_rows(., .id="id_species") %>%
  rename(det_1 = V1, det_2 = V2, det_3 = V3, det_4 = V4) %>%
  mutate(id_species = as.numeric(id_species),
         id_point = rep(1:n_point, n_species), 
         vis_cov1_1 = vis_cov1[id_point, 1], 
         vis_cov1_2 = vis_cov1[id_point, 2], 
         vis_cov1_3 = vis_cov1[id_point, 3],
         vis_cov1_4 = vis_cov1[id_point, 4]) %>%
  mutate(Q = rowSums(select(.,det_1, det_2, det_3, det_4)) > 0) %>%
  arrange(desc(Q), id_species, id_point)

stan_data <- list(
  n_visit = n_visit, 
  n_species = length(unique(det_df$id_species)),
  n_pt = length(unique(det_df$id_point)),
  n_tot = nrow(det_df),
  id_sp = det_df$id_species,
  vis_cov1 = as.matrix(det_df[,paste0("vis_cov1_", 1:n_visit)]),
  det_data = as.matrix(det_df[,paste0("det_", 1:n_visit)]), 
  Q = det_df$Q, 
  grainsize = 200)

num_chains <- 1

setwd('/Users/jacobsocolar/Dropbox/Work/Code/biogeographicMSOM')

# Unthreaded version
mod_R <- cmdstan_model("stan_files/occupancy_problems_reproducible.stan", threads=F)
time_R <- system.time(samps_R <- mod_R$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

# Threaded version
mod_P <- cmdstan_model("stan_files/occupancy_problems_reproducible.stan", threads=T)

# Run 1 thread
# WORKS
set_num_threads(1)
time_1 <- system.time(samps_1 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

stan_data$grainsize <- 50
time_1.1 <- system.time(samps_1.1 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))
# DOES NOT WORK
stan_data$grainsize <- 49
time_1.2 <- system.time(samps_1.2 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

# Run 4 thread ----
# WORKS
set_num_threads(4)
stan_data$grainsize <- 50
time_4 <- system.time(samps_4 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))
# DOES NOT WORK
stan_data$grainsize <- 49
time_4.1 <- system.time(samps_4.1 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

# Run 5 thread ----
# WORKS
set_num_threads(5)
stan_data$grainsize <- 50
time_5 <- system.time(samps_5 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))
# DOES NOT WORK
stan_data$grainsize <- 49
time_5 <- system.time(samps_5 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

stan_data$grainsize <- 40
time_5 <- system.time(samps_5 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

wds15 · May 12, 2020, 10:38pm

Can u please use for now reduce sum static and run this with just one thread and different grainsizes? Pour in addition some prints in there to see what is going on. Thanks!

jsocolar · May 13, 2020, 12:55am

I modified the stan file by replacing reduce_sum with reduce_sum_static. Then, after running lines 1-95 of the R code in the original post (up through the setwd() line), I ran the following:

mod_P <- cmdstan_model("stan_files/occupancy_problems_reproducible.stan", threads=T)
set_num_threads(1)

Compilation went smoothly.

stan_data$grainsize <- 200 # This is the full dataset size
time_1 <- system.time(samps_1 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

The model fit as expected (Running MCMC with 1 chain(s) on 1 core(s)…).

stan_data$grainsize <- 50
time_1 <- system.time(samps_1 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

Again, behavior as expected.

But then:

stan_data$grainsize <- 49
time_1.2 <- system.time(samps_1.2 <- mod_P$sample(data = stan_data,
                                              num_chains = num_chains,
                                              num_cores = num_chains))

R prints the following to the console:

Running MCMC with 1 chain(s) on 1 core(s)…

Running ./occupancy_problems_reproducible ‘id=1’ random ‘seed=1020728084’ data
‘file=/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/RtmpjSLWIk/standata-76851a4c73c1.json’ output
‘file=/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/RtmpjSLWIk/occupancy_problems_reproducible-202005121954-1-439164.csv’ ‘method=sample’
‘save_warmup=0’ ‘algorithm=hmc’ ‘engine=nuts’ adapt ‘engaged=1’
Chain 1 Exception: Exception: bernoulli_logit_lpmf: Random variable has dimension = 0, expecting dimension = 4; a function was called with arguments of different scalar, array, vector, or matrix types, and they were not consistently sized; all arguments must be scalars or multidimensional values of the same shape. (in ‘/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/RtmpHEfrIr/model-71a0fd4f5d3.stan’, line 23, column 16 to line 24, column 77) (in ‘/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/RtmpHEfrIr/model-71a0fd4f5d3.stan’, line 23, column 16 to line 24, column 77)
Chain 1 Exception: Exception: bernoulli_logit_lpmf: Random variable has dimension = 0, expecting dimension = 4; a function was called with arguments of different scalar, array, vector, or matrix types, and they were not consistently sized; all arguments must be scalars or multidimensional values of the same shape. (in ‘/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/RtmpHEfrIr/model-71a0fd4f5d3.stan’, line 23, column 16 to line 24, column 77) (in ‘/var/folders/j6/dg5l3gl11xb9v8w61w99ngh80000gn/T/RtmpHEfrIr/model-71a0fd4f5d3.stan’, line 23, column 16 to line 24, column 77)
Warning: Chain 1 finished unexpectedly!

Warning message:
In .subset2(public_bind_env, “initialize”)(…) :
No chains finished successfully. Unable to retrieve the fit.

wds15 · May 13, 2020, 9:19am

Ok… found it. Here is the error: det_slice is only a sub-slice of the entire data. Therefore det_slice has the first dimension of size len and you have to replace the above with

bernoulli_logit_lpmf(det_slice[r] | logit_theta[r]);

Then it does work for me. This model is speed up by 2x from my quick runs. Nice.

Another comment: Why do you store in a vector of size Len the logit_psi and the logit_theta? You can just work with local loop variables, I think.

After all you got the wrong error message, which is bad. You should have gotten an error about “index out of range” instead of what you got. I think this got recently fixed. Tagging @rok_cesnovar and @nhuurre to have a quick look since we should otherwise file a stanc3 issue for this.

Thanks for providing that detailed code which made it quick to trace things down.

rok_cesnovar · May 13, 2020, 9:53am

Yes, I think this would get caught by the bounds check @nhuurre added.

Its simple for to try if it does help. Just run the following (@jsocolar you are on Catalina right?):

mac_stanc_path <- file.path(cmdstan_path(), "bin", "mac-stanc")
stanc_path <- file.path(cmdstan_path(), "bin", "stanc")
file.remove(mac_stanc_path)
file.remove(stanc_path)

and then compile the model again (set force_recompile = TRUE in cmdstan_model() or make a whitespace change in the model).

Unrelated side note:

I see that you are timing the $sample() calls. Timing of the sampling process can be retrieved with $time()$total, in your case samps_1$time()$total.

Topic		Replies	Views
Help with reduce_sum Modeling	32	1404	August 4, 2020
Reduce_sum error? Developers	10	689	September 20, 2020
Stan significantly slower after incorporating multithreading? CmdStan paralellization	6	842	May 3, 2023
Trouble with within-chain parallelization with cmdstan (via cmdstanr) Modeling	4	871	August 21, 2020
Possible confusion around 'grainsize' argument for 'reduce_sum' Modeling	2	521	September 4, 2020

Reduce_sum error

Related topics