Composable transforms in Stan

Hi all -

This was discussed briefly in today’s Stan meeting but I wanted to make a post here with a summary and ask for comments.
I’ve been working with @Bob_Carpenter to allow users to define variables that have a multiplier/offset while also being constrained. This would allow variable declarations like

real<offset=mu, multiplier=sigma, lower=0> rate;

which are constrained or unconstrained through the simple composition of the the re-centering and the lower bound transform, with the offset and multiplier implicitly going first. More generally, one could do lower and upper bounds in this composition.

This will hopefully make some hierarchical models easier to define. @Bob_Carpenter can likely provide a more in-depth example of the usage.

Our main question is can anyone think of another valid composition besides lower/upper and multiplier/offset? I’ve been working on the implementation assuming these are the only two we would like to compose.

3 Likes

One application of offset/multiplier transforms is to try to get the scales of the margins of the unconstrained posterior to be close to standard-normal, which helps with warmup because the default initialization for the diagonal of the mass matrix is all 1’s. As such, if feasible, I think there would be uses for composing offset/multiplier with just about any transform (ordered vectors, simplexes, you name it). Not saying these uses would be worth whatever time/complexity they introduce, just that the use cases exist.

I’ll put in a vote for combining this with ordered vectors, would be very helpful!

1 Like

Would you like combining ordered with lower/upper, multiplier/offset, or both?

1 Like

multiplier/offset specifically, but I suppose that also raises an implementation question. Are you building out a general framework for creating these compositions, or is it more of a per-composition overload approach? Just because (depending on the implement-ability) a framework would be good for generalising to as many combinations as possible.

Sorry if this was already covered at the meeting!

1 Like

There’s already a positive_ordered type, which takes care of lower or upper, but not lower and upper simultaneously. For lower and upper simultaneously, however, we can instead declare a simplex and work with the cumulative sums along the simplex vector. So I don’t think there’s any particular need to explicitly expose a new option to compose lower/upper with ordered.

To get a little bit into the implementation, the initial decision Bob and I made was to write the lower/upper and mult/off composition as another specialized transform. In c++ it will have its own function, for example (though this function will more or less just call the existing).

This isn’t to say there’s no chance of generalizing, but the overhead gets significantly higher. Currently the transform on a variable is just stored as a variant type. We could make this a tuple, or even a list, of said variant types, but that introduces a lot of extra code later to handle said list. Especially when probably >95% of these lists will still only have one entry.

The general framework also leads to a question about which would compositions would or would not be valid. Defining and checking the semantics would be very complicated for a lot of the combinations, even if you’re limiting the user to two.

For these reasons I am hesitant to switch to a more generalized version unless there is a super clear need. It’s worth noting that even the change discussed here originally can be done using the transformed parameters block, this is mostly a shorthand for it. If there is an ability that is needed very rarely, it seems like the transformed parameters solution should suffice, but part of the reason for this post was to ask if there was some glaring recurrent need we weren’t properly imagining

2 Likes

I’m not sure if this was brought up independently from the math meeting but we also spoke about composable constraint transforms. However, the idea was to expand the language with functions that would allow one to mix and match whichever transforms one wants, in the order given. My proposal is related to yours but can be considered independent. There would just be another way to accomplish the same task.

I’m going to work with @stevebronder to get a clear design doc put together. The idea though is that these functions would be callable in the transformed parameters or model blocks. The user would be able to tell a functor the constraint functions to apply in the order given and if a jacobian adjustment should be made.

It might be accomplished with something like

parameters {
 real rate_raw;
}
transformed parameters {
  real rate = constraint_transform(rate_raw, offset(mu), multiplier(sigma), lower(0), target);
}

Though this is way more verbose it allows opening up the constraint transforms to be used outside of the parameters block. These can then be mixed and matched as a user wants. For example, if a user wants a cholesky factor of correlation matrices where the corresponding entries of the correlation matrix are all constrained to be positive (see here for how to accomplish this today in Stan) the proposal would look something like (nb this is just an example of one way we could accomplish this, if it looks bad to you, please comment on the design doc to make it better),

parameters {
 vector<lower=0>[N * (N - 1) / 2] corr_raw;
}
transformed parameters {
 matrix[N, N] L = constraint_transform(corr_raw, corr(), cholesky_corr_constrain(), target);
}

Though one could also do

parameters {
 vector[N * (N - 1) / 2] corr_raw;
}
transformed parameters {
 matrix[N, N] L = constraint_transform(corr_raw, lower(0), corr(), cholesky_corr_constrain(), target);
}

We can see how the current constraint transform in the parameters block is utilized with this proposal. Using the lower bound declaration of the corr_raw vector ensures that only positive elements are given to corr_constrain. That transform is tanh(x) with the appropriate jacobian adjustment. The final cholesky_corr_constrain does what is described in 10.12 Cholesky factors of correlation matrices | Stan Reference Manual.

1 Like

That to me looks reminiscent of the way TensorFlow Probability does it with their ‘Bijectors’ and the Chain bijector… While it’s certainly more verbose, IMHO it is a better way of allowing the most general form of this composition idea.

From a language perspective, trying to do too much implicitly would probably be a poor idea in the long term. I think something like real<lower=0, offset=10, mult=2> working is a reasonable expectation a user might have, but a more complicated transform should probably be deligated to a more complex function as you describe.

1 Like

Probably OT, but in responding to this Q it occurred to me that a nice feature would be to be able to declare values for the offset/multiplier that are themselves either parameters or transformed parameters. I know nothing about how Stan compiles things, but the parameters-as-values case seems like it would simply require the user define things in the right order in the parameter block. The TPs-as-values case would require more look-ahead that might be a headache?

Having parameters or functions of parameters is already possible and the whole point of the original feature, is it not? But I would also like transformed parameters to be arguments, I ended up repeating myself because it was not possible :(

Oh yeah! Sorry, lapse on my part. So it’s just the TP-as-value case that’s missing currently

I’d have to check, but this would probably not be feasible sadly. Having parameters depend on transformed parameters (which naturally depend on parameters) would be very tricky in anything but the easiest cases. I can take a look next week, or someone already more familiar with the transformed parameters computation could weigh in

2 Likes