Is there a way (via cmdstan, preferrably via cmdstanpy) to do a fixed param run, but with many chains in parallel and with many different values for the parameters, coming from some fit object/csv? Something like generate quantities, but running not only the generated quantities block, but everything? (Just without the NUTS/HMC part)
I’d just use the Python interface to loop over the different sets of parameters and pass them on to the Stan model one at a time as init values. Any parallelization would probably need to be done via Python as well.
I don’t see a way fo doing it within Stan without modifying the code considerably, and not sure it would be possible to use the cmdstan parallelization for different sets of parameters passed as data (maybe there is, but I can’t think of a way from the top of my head.)
Just had to parse the stan file, remove the model block and join the transformed parameters and generated quantities block as the new generated quantities block, recompile and use the original generate quantities method.
But to be honest, a simpler way would have been appreciated. I don’t think it exists though.
I have an ode problem, for which I want to sample from the prior without doing the costly integration. Then I want to take these draws and simulate my trajectories.
The rationale being that during nuts (and its warmup), more than one integration would have to be performed per draw, more than at least doubling the cost ((1k+1k)*something + adjoint overhead)
Edit:
This is what I ended up adding to CmdStanPy’s class:
@property
def sim_source(self):
rv = self.model_source
lines = rv.splitlines()
model_begin = lines.index('model {') - 1
gq_begin = lines.index('generated quantities {')+1
lines = lines[:model_begin] + lines[gq_begin:]
return '\n'.join(lines).replace('transformed parameters', 'generated quantities')
@property
def sim_file(self):
return self.stan_file.replace(self.name, 'sim_' + self.name)
@cached_property.cached_property
def sim_model(self):
do_write = False
if not os.path.exists(self.sim_file):
do_write = True
else:
with open(self.sim_file, 'r') as fd:
if fd.read() != self.sim_source:
do_write = True
if do_write:
with open(self.sim_file, 'w') as fd:
fd.write(self.sim_source)
return self.__class__(
stan_file=self.sim_file
)
And this should be a minimal non-working stan example:
functions {
vector gflow(real t, vector y, vector k){
// Something something
}
}
data {
int no_main_states;
int no_states;
int no_parameters;
int no_timesteps;
int no_observed;
real t0;
real ts[no_timesteps];
vector[no_observed] observed_states[no_timesteps];
real rel_tol;
real abs_tol;
int max_steps;
}
parameters {
vector[no_states] initial_states;
vector[no_parameters] log_k;
real log_sigma;
}
transformed parameters {
vector[no_parameters] k = exp(log_k);
real sigma = exp(log_sigma);
vector[no_states] true_states[no_timesteps];
if (no_timesteps > 0){
true_states = ode_rk45_tol(
gflow, initial_states, t0, ts,
rel_tol, abs_tol, max_steps,
k
);
}
}
model {
// Something something
}
generated quantities {
vector[no_observed] predicted_states[no_timesteps];
for (i in 1:no_timesteps){
predicted_states[i] = to_vector(normal_rng((true_states[i,:no_observed]), sigma));
}
}
This method requires sub-argument fitted_params which takes as its value an existing Stan CSV file that contains a sample from an equivalent model, i.e., a model with the same parameters, transformed parameters, and model blocks, conditioned on the same data.
Yes, it standalone generated quantities so only GQ.
So, essentially what you want is the fixed_param mode but where you can input an array of param values over which to run? Right now, the fixed_param mode will only run for 1 set of parameter values.
That’s what I figured. (Again, did not try, as the documented interface made it very unlikely to work).
I’m happy with the way it is right now. This makes full use of my cores and is probably much more efficient than looping over the draws via python or always doing the integration.
Do generate_quantities for warmup samples. CmdStanPy “solution”:
def hack_csv(self, **kwargs):
for csv in self.runset.csv_files:
with open(csv, 'r') as fd:
content = fd.read()
for key, value in kwargs.items():
content = re.sub(f'{key} = .+', f'{key} = {value}', content)
with open(csv, 'w') as fd:
fd.write(content)
Before running generate quantities, call the above method changing the reported number of warmup samples…
trying to understand the question: you’ve written a new model which has no parameters, i.e., the parameters are treated as data, and you want to run a bunch of chains where for each chain, you choose a different set of parameter values and those values are taken from the draws from some Stan csv file.
I have an ODE problem with no_timesteps x no_observations data points at no_timesteps timepoints. The integration happens in the transformed parameters block.
I want to sample (initial points and parameters) from the prior, for this I set no_timesteps=0, leading to no numerical integration being performed.
For these samples, I want to compute the states at the specified time points, i.e. changing no_timesteps and the timepoints to their true values.
For a single sample, I can just use fixed_param=True. For multiple samples, I could loop over all samples and do one fixed_param=True run for each. Without modifying the stan file, it does not appear to be possible to use generate quantities, as the integration happens in the transformed parameters block.
My solution is to “automatically” generate a new stan file where the transformed parameters and the generated quantities blocks are merged as the new generated quantities block, enabling me to only pass the parameters and obtain the evolved states.
Is this clear?
But really, it would be much more convenient and would simplify the ODE modeling workflow for everyone, if everything was possible from a single stan file.
Yes, I do not know the implementation, but I assume GQ works without autodiff for efficiency reasons.
The simplest/best solution (in my opinion) would be to be able to pass a csv file of parameters to the fixed_param run. Currently I do not have the time, but I might fiddle with the C++ at some later time…
correct, no auto-diff in GQ. that’s a feature.
transformed params and model blocks do autodiff.
GQ is run once per iteration, using current values of all model variables.