Question:
Does Stan really adjust the MCMC sampling from the posterior distribution internally on a parameter-by-parameter basis?
That is, given a parameter vector vector[2] theta, does Stan make separate MCMC sampling adjustments to the geometry of the posterior distribution for theta[1] and theta[2]? Or does Stan make a common adjustment for all theta (i.e., a parameter vector)?
Background:
I am currently trying to fit a policy-based reinforcement learning model to human (agent) behavior data during a gambling task, in which agents are presented different winning probabilities of gamble trial by trial and asked to place a bet as fraction of current chips they have at the trial.
The data are obtained as t-length vectors (arrays) of the presented winning probability of gamble p_t [0, 1], amount of current chips c_t [0, ], the betting amount y_t [0, 1] (as a percentage of the current chips c_t), and series of w_t and l_t for t = 60 or 100 trials per participant (here, the agent assumed by the model), respectively.
When I ran parameter recovery of the five model parameters individually on the artificial data generated from the model (i.e., estimating the model one agent at a time), I found relatively good recovery for the complexity of the model. Hence, I believe that there is no problem with the model and the gambling task set-up (and the data obtained from it). Therefore, I would like to refrain from disclosing the unpublished model here.
However, depending on the series of presented winning probabilities p_t and parameter combinations, the posterior distribution may be multimodal, and there seem to be cases in which the search for parameter space by Stan does not work well. I attribute this in itself to both the randomness of the gambling task and the presence of unidentifiable model parameter combinations. That is, the geometry of the posterior distribution may differ from agent to agent depending on the data set and model parameters.
The problem I’m really facing is that MCMC sampling fails with warnings that max_treedepth was exceeded when performing batch model fitting on multiple agents’ data (Note that if the number of agents to be batch-estimated is about 3, the recovery can be done without problems.).
Looking at the trace plot, I found no trace mixing for most of the agents’ parameters (even for agents for whom the separate fitting did a good job of estimating parameters). For 4000 iterations after warmup, the effective sample sizes are at most 10, which I believe suggests that the max_treedepth may have been exceeded without sufficient exploration due to the extremely small stepsize.
Based on the results at hand as described above, I am guessing that of the two possibilities mentioned in the preface question, Stan is taking the latter adjustment procedure. Is this understanding correct?
Also, is it difficult to implement batch estimation and hierarchical modeling with Stan when the geometry of the posterior distribution varies significantly depending on the dataset and parameters, as in the problem setting I am dealing with?
Perhaps it might be possible to declare explicitly separate model parameters for each agent in the Stan model code, but this is not realistic as the number of agents increases. Or is it possible if the parameters are declared as arrays instead of vectors?
I would be very grateful if someone could tell me what is going on.