This is a misunderstanding because I was not specific enough.
I meant that if a model does not even start sampling one needs to set chains = 1 in order to see all error messages, which will tell you in which line of the model code there is a problem.
It is of course an entirely different matter when the model produces samples. Then one needs multiple chains to see potential scale reduction, check if no chain produces divergences, etc… (I never thought of this part of the workflow as “debugging”)
I also find that RStan swallows messages when running in parallel. I tend to debug in CmdStan for this reason. In R, I have the number of chains set to the number of cores whenever I start R and I can’t remember how to reset it off the top of my head.
Also, the detect.cores() thing is wrong in that it picks up Intel’s reporting of their “hyperthreading” feature as if it’s another core. So my four-core i7 reports 8 cores, even though there are truly only four physical cores. Running on about 5 cores is ideal for me when only running a Stan model (or build process, which is where I usually set cores). But it’ll depend on what else is going on with the machine.
As much as we’d like to put all the gotchas in a “prominent” position in the doc, we don’t have that many prominent positions. Do we say “dont’ install with a space on windows” or “debug with one core”?
The problem with “recommend prominently” is that every time somone runs into a problem they ask us to put the solution at the “top of the web page”. Alas, there’s only one top of the doc.
I am not sure there is a bug. You can call parallel::detectCores(logical = FALSE) to get 4 rather than 8 on my laptop. But if you are only doing 4 chains, then it doesn’t matter. In the past, I found that doing 8 chains with 4 cores and 2 threads was slightly faster than doing 8 chains with 4 cores used twice, but it is possible that something in between is more optimal depending on the model and non-Stan activity.