I am currently running a very simple experiment to generate some data from my proposed model with ‘known’ parameter values and then fit these generated data to the model to examine whether the model can recover the ‘known’ parameter values.
However, I got different sets of generated data(the data are generated in generated quantities block with normal_rng function and I supply ‘known’ mean and standard deviation in the data block) when simulating from my proposed model with ‘known’ parameters.
I set seed = 123456 within stan() command, with default 2000 iterations on 4 chains and I think each of the resulting 4000 will be a random sample of data that I can then use to fit to the model to estimate the ‘known’ parameter values.
so I choose 4000th iteration, but when I ran it a second time and still chose the 4000th iteration, the generated data are different to the previous one. I am wondering where did it go wrong?
Thank you for your reply. what you meant this that each time i extract the stanfit object, it changes the ordering of the iterations and what I specify as the 4000th iteration of generated data is not the same one as the ordering has changed. I set permuted = TRUE but it will give me a matrix of 4 columns each represent the chain but I don’t know what are the rows correspond to the parameters in my model?
That defeats using convergence diagnostics like R-hat.
This should work in general, even with multiple chains. Look at the vignette for rstan about extracting posterior information on the order of all the arguments.
You don’t need to run 1000 sampling iterations per chain. Running to n_eff of 200 or so (50 per chain) should be sufficient (n_eff will only go to 200 if R-hat gets close to 1), which probably won’t take 4000 posterior draws.