I am relatively new to Stan, which I’m getting to like more and more. It’s the best code I’ve seen around and the manual is just wonderful.

I’m trying to implement the “Cruising the simplex” approach by Michael Betancourt:

but I’m having trouble coding it.

Could any body share a working example written in Stan? I have also looked into the alternatives offered in the manual, but I’m not yet experienced enough to implement those either. Any example using those too, will be most helpful.

I’m using a simplex to describe the weights of to be applied to a set of predictions to fit some data. The standard approach on simple, small datasets works fine, but I’m trying to speed up the process as eventually I will explore ~6000 weights.

I’ve looked into the QR parametrisation also, but since I need to impose a positivity constrain, this does not seem to be the optimal solution.

I’d appreciate any suggestion or example you can provide.

spec_obs is normalised to have mean of 1, ans ssp[:,i] have also mean of 1.

For my tests I’m using npix~500, nssp~300, but they will eventually be npix~4000 and nssp~2000 in real applications, which with the current implementation is terribly slow (more than 1h in certain cases for 500 iterations).

Is there a way to make it more efficient? Any tips/tricks/suggestions are more than welcome.

I recommend stepping back and more carefully analyzing the output of the fit. Most of the time computational problems hint at modeling problems that aren’t being addressed, especially as you consider larger data sets that make your inferences more sensitive to the limitations of your model.

For example here I presume that the spec_obs should be positive and if the data aren’t sufficiently above zero then ignoring this constraint could be a problem. Same with presuming a Gaussian observation model.

At the same time models will take the time they need to fit. An hour for a sophisticated model with hundreds of parameters is not unreasonable, although the model you present isn’t particularly sophisticated yet. There are various optimizations that can be done – avoiding the temporaries in the model block being the most immediate – but my guess is that you’re seeing the consequences of a pathological posterior.

Keep in mind that the simplex constraint is a weird one with many components. You’re trying to distribute the same unit probability to more and more components with means each receives correspondingly less. At the same time the increasing components can make your model more vulnerable to multimodality or in general poor identifiability as there will be multiple components that can yield the same observational outcomes. You’ll want to carefully consider all of the HMC diagnostics to learn more.

Without knowing the details of the real data and the measurement process being modeled, all we can do is speak in these generalities.

I think I know what you’re trying to do. I’ve been doing it myself as an odd hobby for some time now with some apparent success. I’ve been keeping a diary online for several months now and just in the past week or so managed to get code organized enough to start a github repository: https://github.com/mlpeck/spmutils.

You are correct that I had not appreciated all the complexities the simplex would bring into the problem and that I may need to rethink a little the model I’m proposing.

It seems that I am not the only one trying this, as Michael Peck’s response is exactly what I’m trying to do.

I’m very glad someone took the chance and do this in Stan. I had been looking for this kind of info for a while with no success.

I will have a very careful read to your blog to understand the path you went through to get the right answer. I bet I’ve been struggling with pretty much the same things…

Just a note – I don’t think that the simplex is a great way to encode sparsity in a model because it’s not easy to setup around relevance threshold (i.e. the threshold below which effects are ignorable and above which the effects are relevant). The horseshoe, and variants thereof, are designed with this threshold in mind and hence more natural for incorporating sparsity into a model (the additional hyperparameters also makes it clear that sparsity is a subtle concept and formalizing it is often more complex than taken for granted).

In particular, the simplex is all about encoding conservation of a given quantity across numerous components. Does it really make sense to be distributing the same amount of conserved quantity across a growing number of components (i.e. the same amount of stuff but ever increasing buckets to dump it into)? Or as you add more do you have more stuff to distribute?

Remember that in the limit of p -> 0 and N -> \infty the conservation constraint essentially evaporates. Which is why a multinomial model can be readily approximated as a product of Poisson models.

BTW, I think I figured out the reasons for the poor performance of the code. It was not in the model definition, but the normalization of the input data and models.

Thanks a lot for your time. I appreciate your insights on the potential advantages/drawbacks the simplex has.