Fixed param, but for whole fit?

Funko_Unko · March 19, 2021, 4:37pm

Is there a way (via cmdstan, preferrably via cmdstanpy) to do a fixed param run, but with many chains in parallel and with many different values for the parameters, coming from some fit object/csv? Something like generate quantities, but running not only the generated quantities block, but everything? (Just without the NUTS/HMC part)

I hope I did not overlook anything.

caesoma · March 22, 2021, 6:31pm

I’d just use the Python interface to loop over the different sets of parameters and pass them on to the Stan model one at a time as init values. Any parallelization would probably need to be done via Python as well.

I don’t see a way fo doing it within Stan without modifying the code considerably, and not sure it would be possible to use the cmdstan parallelization for different sets of parameters passed as data (maybe there is, but I can’t think of a way from the top of my head.)

Funko_Unko · March 22, 2021, 6:43pm

Thanks, I found a solution.

Just had to parse the stan file, remove the model block and join the transformed parameters and generated quantities block as the new generated quantities block, recompile and use the original generate quantities method.

But to be honest, a simpler way would have been appreciated. I don’t think it exists though.

ahartikainen · March 23, 2021, 7:50am

Could you show minimal example what you want to do?

Funko_Unko · March 23, 2021, 8:01am

I’ll post Code in a bit.

I have an ode problem, for which I want to sample from the prior without doing the costly integration. Then I want to take these draws and simulate my trajectories.

The rationale being that during nuts (and its warmup), more than one integration would have to be performed per draw, more than at least doubling the cost ((1k+1k)*something + adjoint overhead)

Edit:

This is what I ended up adding to CmdStanPy’s class:


    @property
    def sim_source(self):
        rv = self.model_source
        lines = rv.splitlines()
        model_begin = lines.index('model {') - 1
        gq_begin = lines.index('generated quantities {')+1
        lines = lines[:model_begin] + lines[gq_begin:]
        return '\n'.join(lines).replace('transformed parameters', 'generated quantities')

    @property
    def sim_file(self):
        return self.stan_file.replace(self.name, 'sim_' + self.name)

    @cached_property.cached_property
    def sim_model(self):
        do_write = False
        if not os.path.exists(self.sim_file):
            do_write = True
        else:
            with open(self.sim_file, 'r') as fd:
                if fd.read() != self.sim_source:
                    do_write = True
        if do_write:
            with open(self.sim_file, 'w') as fd:
                fd.write(self.sim_source)
        return self.__class__(
            stan_file=self.sim_file
        )

And this should be a minimal non-working stan example:

functions {
  vector gflow(real t, vector y, vector k){
    // Something something
  }
}
data {
  int no_main_states;
  int no_states;
  int no_parameters;
  int no_timesteps;
  int no_observed;

  real t0;
  real ts[no_timesteps];
  vector[no_observed] observed_states[no_timesteps];

  real rel_tol;
  real abs_tol;
  int max_steps;
}
parameters {
  vector[no_states] initial_states;
  vector[no_parameters] log_k;
  real log_sigma;
}
transformed parameters {
  vector[no_parameters] k = exp(log_k);
  real sigma = exp(log_sigma);
  vector[no_states] true_states[no_timesteps];
  if (no_timesteps > 0){
    true_states = ode_rk45_tol(
      gflow, initial_states, t0, ts,
      rel_tol, abs_tol, max_steps,
      k
    );
  }
}
model {
  // Something something
}

generated quantities {
  vector[no_observed] predicted_states[no_timesteps];
  for (i in 1:no_timesteps){
    predicted_states[i] = to_vector(normal_rng((true_states[i,:no_observed]), sigma));
  }
}

ahartikainen · March 23, 2021, 8:18am

Ok, so basically CmdStanPy generated quantities does not run any other block than generated quantities?

(cc @mitzimorris is this true?)

Funko_Unko · March 23, 2021, 8:21am

I actually don’t know and never tried. The documentation makes it appear so: 12 Standalone Generate Quantities | CmdStan User’s Guide

This method requires sub-argument fitted_params which takes as its value an existing Stan CSV file that contains a sample from an equivalent model, i.e., a model with the same parameters, transformed parameters, and model blocks, conditioned on the same data.

rok_cesnovar · March 23, 2021, 8:29am

Yes, it standalone generated quantities so only GQ.

So, essentially what you want is the fixed_param mode but where you can input an array of param values over which to run? Right now, the fixed_param mode will only run for 1 set of parameter values.

Funko_Unko · March 23, 2021, 8:33am

Yes, easiest would be via an output csv.

That’s what I figured. (Again, did not try, as the documented interface made it very unlikely to work).

I’m happy with the way it is right now. This makes full use of my cores and is probably much more efficient than looping over the draws via python or always doing the integration.

Funko_Unko · March 23, 2021, 10:04am

Huh. New problem:

Do generate_quantities for warmup samples. CmdStanPy “solution”:



    def hack_csv(self, **kwargs):
        for csv in self.runset.csv_files:
            with open(csv, 'r') as fd:
                content = fd.read()
            for key, value in kwargs.items():
                content = re.sub(f'{key} = .+', f'{key} = {value}', content)
            with open(csv, 'w') as fd:
                fd.write(content)

Before running generate quantities, call the above method changing the reported number of warmup samples…

mitzimorris · March 23, 2021, 3:45pm

trying to understand the question: you’ve written a new model which has no parameters, i.e., the parameters are treated as data, and you want to run a bunch of chains where for each chain, you choose a different set of parameter values and those values are taken from the draws from some Stan csv file.

it that the goal?

Funko_Unko · March 23, 2021, 5:01pm

I have an ODE problem with no_timesteps x no_observations data points at no_timesteps timepoints. The integration happens in the transformed parameters block.

I want to sample (initial points and parameters) from the prior, for this I set no_timesteps=0, leading to no numerical integration being performed.

For these samples, I want to compute the states at the specified time points, i.e. changing no_timesteps and the timepoints to their true values.

For a single sample, I can just use fixed_param=True. For multiple samples, I could loop over all samples and do one fixed_param=True run for each. Without modifying the stan file, it does not appear to be possible to use generate quantities, as the integration happens in the transformed parameters block.

My solution is to “automatically” generate a new stan file where the transformed parameters and the generated quantities blocks are merged as the new generated quantities block, enabling me to only pass the parameters and obtain the evolved states.

Is this clear?

But really, it would be much more convenient and would simplify the ODE modeling workflow for everyone, if everything was possible from a single stan file.

Funko_Unko · April 13, 2021, 7:26am

@martinmodrak

Yes, I do not know the implementation, but I assume GQ works without autodiff for efficiency reasons.

The simplest/best solution (in my opinion) would be to be able to pass a csv file of parameters to the fixed_param run. Currently I do not have the time, but I might fiddle with the C++ at some later time…

Edit: Backlink to my reason https://github.com/stan-dev/design-docs/pull/37#issuecomment-818523886

mitzimorris · April 13, 2021, 1:01pm

correct, no auto-diff in GQ. that’s a feature.
transformed params and model blocks do autodiff.
GQ is run once per iteration, using current values of all model variables.

Topic		Replies	Views
Problem handing over data to fixed_param run with cmdstanpy Interfaces	4	452	February 18, 2022
Generate Quantities - Mismatch between model and fitted_parameters CmdStan	3	588	September 28, 2022
"Fixed_param" mode agonisingly slow General	7	1006	August 3, 2019
CmdStanPy and multithreading Modeling	10	1236	June 27, 2024
Running stan models in parallel in cmdstan py Interfaces cmdstanpy	3	492	August 31, 2021

Fixed param, but for whole fit?

Related topics