Stuck at Warmup iteration with no error : CmdStanR

sam_learner · April 14, 2020, 1:49pm

Operating System: Windows 10 and Ubuntu 16.04
CmdStan Version: 2.22.1
Compiler/Toolkit: Rtools 3.5 (For Windows 10)
R version: 3.5.3(Windows 10) and 3.6.1 (Ubuntu 16.04)

Problem description :
My code runs perfectly for 1000 instances while using CmdStan but when I run for 10, 000 instances it is stuck both in Windows 10 and Ubuntu 16.04 without throwing any error. Please help I have been stuck at this problem for more than a week now and would be extremely really grateful if the community can please help

Model executable is up to date!

Running MCMC with 2 chain(s) on 2 core(s)...

Running ./tptrain 'id=1' random 'seed=123' data \
  'file=/tmp/Rtmpo8bYix/standata-a5af650b9b25.json' output \
  'file=/tmp/Rtmpo8bYix/tptrain-202004142313-1-46c5da.csv' 'method=sample' \
  'num_samples=100' 'num_warmup=100' 'save_warmup=0' 'algorithm=hmc' \
  'engine=nuts' adapt 'engaged=1'
Running ./tptrain 'id=2' random 'seed=124' data \
  'file=/tmp/Rtmpo8bYix/standata-a5af650b9b25.json' output \
  'file=/tmp/Rtmpo8bYix/tptrain-202004142313-2-46c5da.csv' 'method=sample' \
  'num_samples=100' 'num_warmup=100' 'save_warmup=0' 'algorithm=hmc' \
  'engine=nuts' adapt 'engaged=1'
Chain 1 Iteration:   1 / 200 [  0%]  (Warmup)
Chain 2 Iteration:   1 / 200 [  0%]  (Warmup)

Code Snippet

stan_program <- file.path(cmdstan_path(), "examples/new/tptrain.stan")
mod <- cmdstan_model(stan_program)
mod$print()

stan_data <- list(N = n, K = length(sense_cols), R = 5, L = 5, D=281, s = data.matrix(tp[,sense_cols]),
                  location = tp[,c("location")], domain = tp[,c("domain")],  rating = tp[,c("rating")])
# 
# run MCMC using the 'sample' method
fit_mcmc <- mod$sample(
  data = stan_data,
  num_samples = 100,
  num_warmup = 100,
  seed = 123,
  num_chains = 2
  num_cores = 2
)

Background:
I have been trying to run the Stan code using Rstan in Rstudio for a dataset consisting of 10,000 rows for more than a week now. I got the following error while running the code via terminal in Ubuntu

Error in FUN(X[[i]], ...) :
  trying to get slot "mode" from an object (class "try-error") that is not an S4 object
Calls: stan ... sampling -> sampling -> .local -> sapply -> lapply -> FUN
In addition: Warning message:
In parallel::mclapply(1:chains, FUN = callFun, mc.preschedule = FALSE,  :
  2 function calls resulted in an error
Execution halted

I was getting the following error running the code via Rstudio GUI in Windows 10

Error in unserialize(socklist[[n]]) : error reading from connection

I referred all the forum discussions recommending to delete Makevars file, run for 1 chain, move transformed parameters code to model block, reinstalling rstanarm, no NA in the dataset. But nothing worked I moved to using CmdStanR

please let me know if I need to provide more information

mitzimorris · April 14, 2020, 2:00pm

N = 10000, what is K?

CmdStanR is creating the temporary input file /tmp/Rtmpo8bYix/standata-a5af650b9b25.json - it’s possible that the input file is too large for the amount of tmp space you’re allowed.

sam_learner · April 14, 2020, 2:16pm

@mitzimorris K=2 . I have attached the file for reference. (Json wasn’t an allowed extension to upload and hence I uploaded the Json file saved as txt)

How can I get to know what is the maximum amount of tmp space allowed that is upper bound for tmp space?

standata-a5af650b9b25.txt (177.1 KB)

mitzimorris · April 14, 2020, 2:48pm

question, are you running CmdStan or CmdStanR ?
could you please share the model?

sam_learner · April 14, 2020, 2:58pm

I am using CmdStanR currently. I tried using CmdStan as well but I was getting a lot of errors hence went with CmdStanR

tptrain.stan (2.7 KB)

mitzimorris · April 14, 2020, 3:15pm

in this mode, the number of parameters N * (L + 2) and the number of transformed parameters is N * (R + L + D), and R = 5, L = 5, and D = 282, therefore approx 3M parameters.

folk theorem says there’s a problem with this model as written.

sam_learner · April 14, 2020, 3:25pm

Is having so many parameters causing memory issues ?

For a noob like me, can you please give some advice if I should look at MPI for parallelizing or try and send data in chunks of 1000 instances?

mitzimorris · April 14, 2020, 3:45pm

when you fit this model to 1000 items, do the answers make sense?

I didn’t look at your model very closely, but I suspect that there’s a fundamental misspecification - if so, this isn’t something you can parallelize away. what kind of model is this? (i.e., if you were to go looking for it by name in the Stan User’s Guide, what would it be called?)

sam_learner · April 15, 2020, 1:06am

@mitzimorris I did check the results of the sampling using posterior predictive checks and plotted the ppc graphs, graphs look good.

Model is categorical distribution and the priors follow normal distribution

sam_learner · April 15, 2020, 1:39pm

Update : The script has now been running for 21 hours and the output has updated slightly from previous post

Model executable is up to date!

Running MCMC with 2 chain(s) on 2 core(s)...

Running ./tptrain 'id=1' random 'seed=123' data \
  'file=/tmp/Rtmpo8bYix/standata-a5af650b9b25.json' output \
  'file=/tmp/Rtmpo8bYix/tptrain-202004142313-1-46c5da.csv' 'method=sample' \
  'num_samples=100' 'num_warmup=100' 'save_warmup=0' 'algorithm=hmc' \
  'engine=nuts' adapt 'engaged=1'
Running ./tptrain 'id=2' random 'seed=124' data \
  'file=/tmp/Rtmpo8bYix/standata-a5af650b9b25.json' output \
  'file=/tmp/Rtmpo8bYix/tptrain-202004142313-2-46c5da.csv' 'method=sample' \
  'num_samples=100' 'num_warmup=100' 'save_warmup=0' 'algorithm=hmc' \
  'engine=nuts' adapt 'engaged=1'
Chain 1 Iteration:   1 / 200 [  0%]  (Warmup)
Chain 2 Iteration:   1 / 200 [  0%]  (Warmup)
Chain 1 Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 2 Iteration: 100 / 200 [ 50%]  (Warmup)

question:

can reparameterization of the model help as mentioned here (https://stackoverflow.com/questions/29191538/how-to-speed-up-stan-when-fitting-a-random-effect-model-on-a-large-sparse-dataf/29202610).
If you could please explain to a beginner like me of how the reparameterization can be done I would be extremely grateful @bgoodri @mitzimorris
can using https://github.com/rmcelreath/cmdstan_map_rect_tutorial be helpful - will MPI map-reduce help in utilising all the 64 cores on my machine for a single chain?

mitzimorris · April 15, 2020, 2:15pm

I think that the answer is yes to both questions - do the non-centered parameterization. (for testing purposes, I suggest running on only 100 examples and comparing timings.)

I defer to the Bens - @bgoodri and @bbbales2 w/r/t reparameterizations.

As for map_rect, you’re welcome to test the release candidate of CmdStan 2.23 which has a cleaner version of map_rect called reduce_sum - New reduce_sum makes cross-validation simple; should we standardise?

mitzimorris · April 15, 2020, 2:17pm

release candidate announced here: Cmdstan 2.23 Release candidate is available!

(CmdStanR’s install_cmdstan function will install for you given the RC URL.)

sam_learner · April 15, 2020, 2:27pm

thanks so much @mitzimorris. I am not very familiar with non-centered parameterization. I am reading through the Stan guide and then will test on 100 examples.

@bgoodri @bbbales2 any insights would be extremely helpful

thanks so much, I will check out the new release candidate of CmdStan 2.23 and share the update here

Any beginner reading this thread in the future :
For reparameterization, I am reading https://mc-stan.org/docs/2_22/stan-users-guide/reparameterization-section.html

bbbales2 · April 15, 2020, 2:58pm

How many parameters do you have?

What leads you to want to run it longer than 1000 iterations?

The problem has been running for a week?

sam_learner · April 15, 2020, 3:14pm

sorry, for the confusion by instances here I mean only using 1000 rows of the dataset instead of all the 10,000 rows

the number of parameters N * (L + 2) and the number of transformed parameters is N * (R + L + D), and R = 5, L = 5, and D = 282, therefore approx 3M parameters as calculated by @mitzimorris

I am having trouble sampling and struggling with different errors for over a week
Error in unserialize(socklist[[n]]) : error reading from connection and Error in FUN(X[[i]], ...) when using Rstan. Then I switched to using CmdStanR after reading GoogleGroups mailing list old discussions.

Now using CmdStanR, code on Windows10 has been running for a day now and on Ubuntu server it has been close to 1.5 days and still showing warmup iterations

The size of the input file is not very large it is just 1662 KB

This is my stan code file for reference. I am using categorical distribution with normal priors
tptrain.stan (2.7 KB)

bbbales2 · April 15, 2020, 5:24pm

I think the largest Stan inference I’ve heard of before this was about 20k parameters. I’d consider something with 1000 parameters a very large model, if that gives some context.

For slow models I’d just use base cmdstan. Use refresh=1 and save_warmup=1 to save the warmup iterations and print something out for every draw. You can open these csvs in Excel or OpenOffice and have a look as stuff runs.

theta_loc[i] ~ normal(theta_loc_mu[i], 1);

I don’t think this needs to be here.

I would have expected something more like:

location[i] ~ categorical(softmax(to_vector(theta_loc_mu[i])));

Without the intermediary.

sam_learner · April 15, 2020, 6:01pm

I will reduce the number of parameters as my model parameters are enormously large

bbbales2:

For slow models I’d just use base cmdstan. Use refresh=1 and save_warmup=1 to save the warmup iterations and print something out for every draw. You can open these csvs in Excel or OpenOffice and have a look as stuff runs.
theta_loc[i] ~ normal(theta_loc_mu[i], 1);
I don’t think this needs to be here.

I would have expected something more like:
location[i] ~ categorical(softmax(to_vector(theta_loc_mu[i])));
Without the intermediary.

thanks so much. I really appreciate the time you spent in reviewing my code and providing detailed insights. I will work on the suggestions and provide an update by tomorrow

sam_learner · April 17, 2020, 6:49am

I tried reducing the number of rows in the dataset from 10,000 to 1000 i.e. N=1000 and making the changes advised by Ben. I was able to reduce the number of parameters from 3M to 84,000 (I’m aware still very big) but the sampling is still extremely slow.

Sorry, this might be a really stupid question, I am just afraid of messing up the existing fully running CmdStan with installation of Release version.

Question:
I wanted to ask is it possible to revert back to CmdStan 2.22 after installing install_cmdstan(release_url = "https://github.com/stan-dev/cmdstan/releases/download/2.23-candidate/cmdstan-2.23-rc1.tar.gz", cores = 4) or a way where CmdStan 2.23 doesn’t affect 2.22

I am working to use reduce_sum using the tutorial https://github.com/bbbales2/cmdstan_map_rect_tutorial/blob/reduce_sum/reduce_sum_tutorial.Rmd

mitzimorris · April 17, 2020, 2:07pm

you can keep both in appropriately named directories and then in CmdStanR you can set the path to the directory accordingly - Get or set the file path to the CmdStan installation — set_cmdstan_path • cmdstanr

the danger is losing track of the version of CmdStan used to compile a model; OTOH, if you’re playing with reduce_sum - only 2.23 RC will work.

sam_learner · April 17, 2020, 2:20pm

thanks so much I can now happily move over to working with 2.23. I am finding reduce_sum lot easier to adapt to my code than map_rect as there is no shard construction required. I will report back on the progress as soon as possible.

Topic		Replies	Views
Sampling fails after warmup CmdStan	20	1003	January 25, 2023
Problem "running the model cmdstanr with simulated data in R" CmdStan techniques , fitting-issues , specification	14	1752	February 16, 2022
Chains finish unexpectedly in new install of CmdStanR CmdStan cmdstanr	10	2246	August 6, 2024
Error: Supplied CSV file is corrupt! CmdStan	12	839	January 19, 2023
Getting started with CmdStanR CmdStan	3	764	October 7, 2021

Stuck at Warmup iteration with no error : CmdStanR

Related topics