Trying to understand _glm_lp*f functions in Stan

I couldn’t find too much information about the new _glm_lpdf/lpmf functions. I understand that they are faster than to code a regression by hand, but what exactly are they doing? I have two specific questions:

  1. If I’m doing a regression with a lognormal likelihood, would it be more efficient to do a change of variables to use normal_id_glm_lpdf(log(y)| ...) (and adapt the jacobian) or should I use lognormal_lpdf(y | X * beta , sigma)?

  2. For likelihood functions that don’t have the _glm version (such as gamma_lpdf and maybe lognormal_lpdf) is there a way to apply the same “tricks” thath the _glm functions apply but directly in Stan?

Thanks!
Bruno

2 Likes

There is some documentation for normal_id_glm and bernoulli_logit_glm. My guess is that using normal_id_glm_lpdf(log(y)| ...) would be more efficient, with gains growing with the problem size. I saw a speed-up of 4-5 times when I switched to the glm versions. I would suggest implementing both to verify correctness of your approach as well as performance.

Their main advantage lies in using analytically simplified gradients, so that many computations are avoided or shared within those functions. This is possible because these functions effectively code up some specific models (linear regression, logistic regression), so their use is extremely constrained.

Given that, I don’t think there are tricks you can do in Stan to get close to the performance of the glm functions ,as whatever you code directly in Stan will have to use the general version of the functions you call, which can’t make the simplifying assumptions proper of the glm versions.

3 Likes

ok thanks!
And is the code for each glm completely custom? Is it worth it to try to recreate it for other distributions (e.g., gamma) in c++? (Considering that my c++ skills are based on my amazing copy&paste skills)

Yes. We currently have 6 GLM functions. Attaching the relevant C++ files

While the code is readable it does use a lot of advanced C++/Stan Math code, but dont be too afraid to give it a try. I think you should be able to replicate by taking the most similar function and making the appropriate changes.

Whether it is worth it for your case, this probably depends on how large your problem is and the lp*f function you are trying to implement. The work on GLMs was initially done by @Matthijs and then extended by @tadej. Maybe the two of them have a better “rule of thumb” of when it makes sense to implement another GLM.

If you think the GLM function you are trying to add is general enough for a wider audience, please open an issue on the Math repo and we can discuss there, offer help implementing, etc.

2 Likes

If I’m doing a regression with a lognormal likelihood, would it be more efficient to do a change of variables to use normal_id_glm_lpdf(log(y)| ...) (and adapt the jacobian) or should I use lognormal_lpdf(y | X * beta , sigma) ?

I think normal_id_glm_lpdf(log(y)| ...) will be faster. However, benchmarking your model is the only way to know for sure.

Their main advantage lies in using analytically simplified gradients

That is the idea. However, right now glm functions are also better optimized. I am working on other distributions right now. Some may get in the next release (in ~2 weeks), but for most of them will be only ready for the following one.

Maybe the two of them have a better “rule of thumb” of when it makes sense to implement another GLM.

Any model will be faster when implemented in C++. The question is just if the difference is huge, negligible or something in between. So far glm functions were made for combinations of distribution and link function someone thought would be interesting for wider audience.

2 Likes

Both the the lognormal and the gamma likelihood are used for positive-only data with positively-skewed errors, for example, for reaction (or reading) times. In psycholinguistics, the lognormal likelihood is widely used. I’m now exploring the gamma with and without a log link, because it has some nice properties as well, but I’m not sure how widely use it is. I see that at least, R’s function glm allows for a gamma likelihood so I guess it can’t be uncommon. (It seems that it might be common in ecology: https://seananderson.ca/2014/04/08/gamma-glms/)

So if the plan is to incorporate more of these functions, I’ll open the issues in stan math.

2 Likes

It would be cool to have the Gamma GLM (preferably with log-link, although the inverse link is canonical).

Being a bit pedantic, but I would not think of the LogNormal as GLM: Only after the transformation of the dependent variable it can be thought of as a GLM (Normal GLM, but you’d have to additionally apply the change of variables formula in a Bayesian context). People sometimes confuse this and think that a Normal GLM with a log-link function is a LogNormal, which is (obviously) not the case.

1 Like

Wait, why is a regression with a lognormal likelihood not a glm? (But yeah, I agree that there is an unfortunate confusion with the link)

From wikipedia:

In a generalized linear model (GLM), each outcome Y of the dependent variables is assumed to be generated from a particular distribution in an exponential family

and the lognormal is in the exponential family, again according to wikipedia (https://en.wikipedia.org/wiki/Exponential_family)

Yes, after the change of variables. Then it’s identical to the Normal GLM. But the cahnge of variables makes it a bit awkward.

The problem is that the linear predictor (or \mu parameter) of the LogNormal is the expectation of, lets say, \log(y). It is not the expectation of y (and \exp(\mu) isn’t either). That can make things weird if your regression parameters have meaning and you want to model expectations. There has been a discussion of this in the trade economics literature which let the whole field to use the Poisson Pseudo Maximum Likelihood (PPML) as the workhorse estimator for gravity models in trade. (In my opinion they should have gone for Gamma GLM, but Gamma GLMs can be numerically unstable.) So although the Gamma and LogNormal distributions can look and behave quite similar, the regression coefficients will generally not be similar, which can lead to a lot of confusion.

Edit: To be clear: I think of GLM as a framework to structure a class or classes of models, which can be extremely helpful. And while you can fit in the LogNormal this framework, I just don’t think it is particularly helpful in this particular case, because the change of the dependent variable is not explicit in this framework (unlike the link function and variance function stuff).