Fixing parameters in a model

My model contains an effect size parameter, mu, which is a real. In general, I want to marginalise over mu.

But in special cases of my analysis, I want to check particular values of mu explicitly by fixing it to e.g. mu = 0.

  1. I know I could keep two model files, one in which mu is a parameter declared in the parameters block and one in which it is a constant declared in the data block, and use the latter to investigate fixed values of mu.

I don’t like that duplication though, because I have to make sure the two model files are otherwise identical. And I have to compile two models (or be commenting/uncommenting lines and compiling each case).

Is there any preferred pattern for dealing with this situation? E.g.,

  1. I wonder if I could introduce a Boolean variable in my data block that switches on/off whether I want to fix the parameter mu. And some add some logic in the data and parameters block that use that Boolean to decide whether to declare mu as a parameter or as data.

I’m not sure about this, as the parameter and data block don’t allow much other than declarations. Can we do something fancy by using size 0 arrays?

  1. I wonder if just setting the lower and upper values of mu as a parameter would be advisable, e.g., to set mu = 0

    real<lower=0, upper=0> mu; 
    

    or maybe upper = 0 + epsilon. These bounds could be stored in the data file so that it could be chosen at runtime.

  2. Something else?

If anyone has any thoughts please let me know :)

I would be interested in the answer to your first question too. My workflow keeps getting overly complicated and any best practise advice would be welcome!

You could specify an additional integer input ‘fix_mu’ in your data block that can be either 0 or 1. Then use ‘fix_mu’ in your parameter block to define the length of your parameter vector mu, and in the model block in an if statement to use either the parameter mu or a fixed value of mu that you supply via the data block.

Edits: typos and formatting

You can join @andrewgelman in urging us to build in something to clamp parameters. There is not an elegant way to do it now.

Yes, using sizes is what rstanarm does. You can also use includes to keep most of the model and only change part of it. To do it with sizing, you can do this, which is what @LucC was suggesting:

data {
  int<lower=0, upper=1> mu_is_data;
  array[mu_is_data] mu_val;
  ...
parameters {
  array[1 - mu_is_data] mu_param;
  ...
transformed parameters {
  real mu = mu_is data ? mu_val[1] : mu_param[1];
  ...

Then you specify mu_is_data = 1 as data and give mu_val as a 1D array in the input data to fix it. If you have mu_is_data = 0 then mu_param becomes a 1D array. You can also put the definition of mu down in the model block if you don’t want to output it every iteration.

This won’t work because of the way Stan works on the unconstrained scale. For a constrained real value, it takes an unconstrained value mu_ucnonstrained and transforms to mu = 0 + (0 - 0) * inv_logit(mu_unconstrained). This leads to an improper uniform distribution on mu_unconstrained because its value has no effect on the value of mu and there are no gradients to keep it well behaved.

Just a linguistic comment, which I know Bob will appreciate:

We should say that we are “clamping” or “pinning” parameters, not that we are “fixing” them. I say this first because “fix” sounds like something was broken and needs to be fixed, also, and more importantly, because I want to avoid confusion with the already-overloaded term “fixed effects.” So, “clamp” or “pin,” please!

Thank you for your comments. The size-0 array solution isn’t as bad as I feared, that’s what I will go with for now.

I will adopt the terminology pin from here. I’m was thinking a little bit about how I’d like this to work. I don’t think I do want any new syntax. My preference would be a command-line argument in cmdstan (that I guess would be a keyword argument appearing somewhere in downstream packages like cmdstanpy) that works like this

./example_program sample data file=data.json init=init.json pin=pin.json

that is, I just supply a pin.json file by the data pin argument. My pin.json contains entries for any parameters I want to fix, e.g., in my case something like,

{
  "mu": 0.0
}

I would be happy to contribute towards fine-tuning how this could work and implementing it in C++.

We’ve thought about this recently and decided it isn’t very difficult given two restrictions:

  1. If you want to pin a container (like a vector) parameter, it would be all-or-nothing. So if you have vector[4] foo;, you can’t only pin foo[3]. This one is mostly because it is difficult to imagine what pin.json looks like, as well as making the implementation a bit more difficult
  2. The second, more fundamental, restriction would be you can only pin parameters that are unconstrained. This is because of a situation like
    real a;
    real <lower=a> b;
    
    If we allow you to pin b to say 4, what happens if HMC proposes a value of 5 for a?

One question is if the feature is still as useful when those requirements are imposed. I think in your case this would be fine, but there may be others where these make the feature unhelpful and you still need to do the size-0 dance.

Good points. I would find something like

{ 
   “foo”: [null, null, 0.0, null]
}

a reasonable solution to pinning only foo[3]. The idea being that null values indicate that there is no pin for that element.

I’m also not sure whether I find case 2 that problematic. When a pin is out of bounds, the model has target = 0 and the calculation need go no further, even if those bounds are themselves parameters. On the other hand, that introduces a discontinuity in the target function, which may be problematic.

I can’t remember how these cases are usually handled in Stan, but I think they are handled. E.g., when a Poisson rate parameters goes negative, you just get a warning. Can this case be similar?

In any case, even with the two restrictions, I think this will be very useful.

It would become rejection sampling at that point, but it’s possible. The other problem is that we need the unconstrained value to actually run HMC, which means we’d need to:

  • take the b the user defined and unconstrain it
  • run hmc, getting a new a
  • write back out to the user, constraining b

But if a changed, then b won’t be the pinned value any more, but something different.

This problem doesn’t happen if we force the user to write, instead,

parameters {
  real a;
  real b_raw;
}
transformed parameters {
  real b = lower_bound_jacobian(b_raw, a);
}

And they give us a pin on b_raw, because that actually won’t change (but b would) and is much more consistent overall

If we can get JSON to parse null values across R, Python, and Julia that may work as a way to write them.

The bigger problem is how we deal with partially specified parameters in Stan. It’s technically possible when there aren’t constraints, but pinning parts of a constrained vector are problematic if it’s a joint constraint (like say a simplex, or even worse, a correlation matrix). So if we did this, we would almost certainly make it all-or-nothing for parameters.

You get a warning in the console, but the current iteration will be rejected because the expression can’t be evaluated. This causes you to go on in MCMC, perhaps devolving to very inefficient rejection sampling, but it will absolutely break optimization.

The better thing for the user to write in this specific case would be

real b;
real<upper=b> a;

This is the simplest possible case of a constraint—this gets much harder with a simplex. Let’s say we want to pin the first and third values to 0.1 and 0.3 respectively in this program.

parameters {
  simplex[K] phi;
  ...

the transformed program to do this is:

parameters {
  simplex[K-2] phi_raw;
  ...
transformed parameters {
  simplex[K] phi;
  phi[1] = 0.1;  phi[3] = 0.3;
  real total = phi[1] + phi[3];
  phi[2] = (1 - total) * phi_raw[1];
  phi[4:] = (1 - total) * phi_raw[
  ...

Thank you Bob. I am persuaded the restrictions are sensible. And it’s better to start with restrictions and reconsider them, then to introduce functionality that causes headaches and regrets down the line.

Let me know the first steps and I am very willing to start helping.

The first step is to write a short design document on what the functionality would look like from a user’s perspective and roughly how it’d be coded. This doesn’t need to be fancy, but needs to be precise enough that the devs can sign off on changing Stan before the work begins. This kind of issue is tricky because it needs to be designed from the interfaces down to the low-level sampling.

The second step would be to talk to one of the devs more up on the interfaces and how they connect to C++, e.g., @stevebronder, @andrjohns, @mitzimorris, or @WardBrian, who can help with what needs to happen there.

I was discussing this with @stevebronder and I think some kind of mask would make sense as an implementation so that you could bind a reduced function to pass into our samplers the way we bind the whole thing currently (this is in the stan repo).

Thanks Bob. Shall I fork the design-docs repo (GitHub - stan-dev/design-docs) and make a PR adding a design doc for this feature?

I.e., make a design doc similar to the ones here:

Yes, that’s exactly how we do this. It doesn’t have to be huge to start, but it will have to address the problematic edge cases of constraints (either by saying “no constraints” or something else).