Stuck at Warmup iteration with no error : CmdStanR

sam_learner · April 20, 2020, 11:06am

Kudos !! to all the developers for reduce_sum. I am so grateful for the new facility, working with map_rect was extremely difficult as a beginner I was stuck for days and couldn’t parallelize using it until I was pointed about the reduce_sum which makes it so much easier :)

I am working on a Linux server with 64 cores.

I am using categorical logit in my model as well. I hope to see massive speedups as well once I get the code up and running

sam_learner · April 20, 2020, 11:11am

@bbbales2 @mitzimorris Side note: not relevant to the original problem
I think in order to have a wider reach for the new capabilities of reduce_sum we can create a new post about the Speedup using reduce_sum or Massive Speedup (some title that pushes it at top of the search results) so that people can find it via Google search. Currently, all speedup posts point to posts about map_rect

bbbales2 · April 20, 2020, 12:09pm

I still kinda regret using my real name for the forums here. @syclik had the wherewithal to pick a cool one.

sam_learner · April 20, 2020, 12:52pm

lol I regret having such a random boring name =D ;) syclik definitely tops the chart for cool names !! ;)

sam_learner · April 20, 2020, 1:20pm

update and question :
@bbbales2 @mitzimorris Is it normal that sampling hasn’t even started in past 50 mins after parallelizing the most computationally expensive parts of the code? Can this be attributed to specifying number of threads to 16 instead of 8 ? I am working on 64 core linux server

set_num_threads(16)
time1 = system.time(fit1 <- tp1$sample(stan_data,num_samples = 100,num_warmup = 100,num_cores = 1, num_chains = 2,refresh = 100))
time1

@wds15 did you face any similar issue while running categorical logit model?

I am trying two different approaches for parallelizing parts of code and both haven’t started sampling yet. Attaching the output files for both approaches for reference
test_parallelV1.txt (3.8 KB)
test_parallelV2.txt (4.2 KB)

bbbales2 · April 20, 2020, 1:29pm

Well it’s possible – use a smaller test problem to start with.

This is summing the same term N times:

for (i in 1:N){
    target += reduce_sum(parallel_loc, location, grainsize, theta_loc_mu);
}

If you want the equivalent of:

target += categorical_lpmf(location | softmax(to_vector(theta_loc_mu)))

Just do:

target += reduce_sum(parallel_loc, location, grainsize, theta_loc_mu);

Also:

categorical_lpmf(a | softmax(b))

can be rewritten:

categorical_logit_lpmf(a | b)

for better numerical stability.

(Edit: that’s the the recommendation for the first – I didn’t look at the second)

mitzimorris · April 20, 2020, 2:16pm

agreed - when we post, the titles will be properly SEO’d

I’d like to thank you for working through your model and the problems you’ve encountered in public. This is a huge service both to the dev team and to other users.

sam_learner · April 21, 2020, 10:56am

I am glad if the problems I encountered can help the community in the future. Kudos!! to all the team of Stan Developers for the hardwork everyday as well as making this platform so supportive and welcoming for newbies like me :)

I am not sure how much time I will be able spend with getting the full dataset to parallelize (modified the script but it still takes forever to start sampling) as I have final exams in the coming month :( But I will definitely try to share my results of speedup on smaller dataset for future reference

sam_learner · April 21, 2020, 4:13pm

Coming back to this old message in the thread as I am not having speedup as outlined in threads above.I believe I might have been completely naive in taking a call myself while discussing with the experts that PPC graphs look alright.

I would really appreciate if you could please have a look at the PPC graphs I plotted and have uploaded in other thread How to summarise graphical Posterior predictive checks in one graph?

Topic		Replies	Views
Sampling fails after warmup CmdStan	20	1001	January 25, 2023
Problem "running the model cmdstanr with simulated data in R" CmdStan techniques , fitting-issues , specification	14	1744	February 16, 2022
Chains finish unexpectedly in new install of CmdStanR CmdStan cmdstanr	10	2197	August 6, 2024
Error: Supplied CSV file is corrupt! CmdStan	12	835	January 19, 2023
Getting started with CmdStanR CmdStan	3	762	October 7, 2021

Stuck at Warmup iteration with no error : CmdStanR

Related topics