Stuck at Warmup iteration with no error : CmdStanR

Kudos !! to all the developers for reduce_sum. I am so grateful for the new facility, working with map_rect was extremely difficult as a beginner I was stuck for days and couldn’t parallelize using it until I was pointed about the reduce_sum which makes it so much easier :)

I am working on a Linux server with 64 cores.

I am using categorical logit in my model as well. I hope to see massive speedups as well once I get the code up and running

@bbbales2 @mitzimorris Side note: not relevant to the original problem
I think in order to have a wider reach for the new capabilities of reduce_sum we can create a new post about the Speedup using reduce_sum or Massive Speedup (some title that pushes it at top of the search results) so that people can find it via Google search. Currently, all speedup posts point to posts about map_rect

1 Like

I still kinda regret using my real name for the forums here. @syclik had the wherewithal to pick a cool one.

2 Likes

lol I regret having such a random boring name =D ;) syclik definitely tops the chart for cool names !! ;)

update and question :
@bbbales2 @mitzimorris Is it normal that sampling hasn’t even started in past 50 mins after parallelizing the most computationally expensive parts of the code? Can this be attributed to specifying number of threads to 16 instead of 8 ? I am working on 64 core linux server

set_num_threads(16)
time1 = system.time(fit1 <- tp1$sample(stan_data,num_samples = 100,num_warmup = 100,num_cores = 1, num_chains = 2,refresh = 100))
time1

@wds15 did you face any similar issue while running categorical logit model?

I am trying two different approaches for parallelizing parts of code and both haven’t started sampling yet. Attaching the output files for both approaches for reference
test_parallelV1.txt (3.8 KB)
test_parallelV2.txt (4.2 KB)

Well it’s possible – use a smaller test problem to start with.

This is summing the same term N times:

for (i in 1:N){
    target += reduce_sum(parallel_loc, location, grainsize, theta_loc_mu);
}

If you want the equivalent of:

target += categorical_lpmf(location | softmax(to_vector(theta_loc_mu)))

Just do:

target += reduce_sum(parallel_loc, location, grainsize, theta_loc_mu);

Also:

categorical_lpmf(a | softmax(b))

can be rewritten:

categorical_logit_lpmf(a | b)

for better numerical stability.

(Edit: that’s the the recommendation for the first – I didn’t look at the second)

agreed - when we post, the titles will be properly SEO’d

I’d like to thank you for working through your model and the problems you’ve encountered in public. This is a huge service both to the dev team and to other users.

1 Like

I am glad if the problems I encountered can help the community in the future. Kudos!! to all the team of Stan Developers for the hardwork everyday as well as making this platform so supportive and welcoming for newbies like me :)

I am not sure how much time I will be able spend with getting the full dataset to parallelize (modified the script but it still takes forever to start sampling) as I have final exams in the coming month :( But I will definitely try to share my results of speedup on smaller dataset for future reference

1 Like

Coming back to this old message in the thread as I am not having speedup as outlined in threads above.I believe I might have been completely naive in taking a call myself while discussing with the experts that PPC graphs look alright.

I would really appreciate if you could please have a look at the PPC graphs I plotted and have uploaded in other thread How to summarise graphical Posterior predictive checks in one graph?