Parallel Options for Independent Regressions

Hello everyone,

I’m unsure if this is a silly question, but suppose I want to run a series of completely independent linear regressions as part of my model block. Is this paralellizable using something like reduce sum? For example, I have this in my model block:

for(i in 1:n_securities) {
   target += normal_lpdf(returns[,i] | alpha[i] + 
                         risk_factors * beta[,i], sigma[i]);

This is in an asset pricing context, where alpha is a vector with an intercept term for each security, beta is a matrix of parameters with columns corresponding to each security, and sigma is an independent scale term for each security.

I realize that I could just fit each regression independently, but since I want to work with the parameters in a joint fashion after estimation this is much more convenient. Since each estimate is completely independent, I thought it might be an easy use-case to learn how to use the reduce_sum function. My attempts thus far have not worked, since I think I’m misunderstanding the syntax. I had something like

functions {
  real partial_sum_lpdf(matrix returns,
                   int start,
                   int end,
                   matrix risk_factors,
                   vector alpha,
                   matrix beta,
                   vector sigma) {
    return normal_lpdf(returns[,start:end] | alpha[start:end] + risk_factors * beta[,start:end], sigma[start:end];

But this failed all of the checks, since beta and returns were ill-formed. I have never worked with user-defined functions in Stan, and I’m pretty confused about how this all works, so any help would be appreciated!


To update this, for now I’m running each regression in parallel as a separate stan fit, and then taking the draws for alpha as “data” in the subsequent analysis (either using draws as data or summarizing into mean and variance in a measurement error model).

I would still love to learn how to write this jointly though, as I think it would make subsequent analysis of the joint posterior distribution much easier. I can write it as a joint regression, but it is very slow. So any help would be much appreciated!

1 Like

this is a bit out of my depth, but I will give it a try since nobody else answered. Some ways to speedup the model:

Hope that helps at least a bit

1 Like

Thanks Martin, I think it does help. For now I think I’m just going to run them separately, and treat their posterior distribution as the “data” for the next segment. In my case, each regression by itself is pretty fast, and it looks like most of the parallelization options are for working with a single likelihood. If I end up putting the first step in a hierarchical model, though, I think I might go back to the parallelization toolkit. I think my issue is mostly that, if I use the posterior distribution as data, the posterior distribution for the model predicting the parameter distribution of the original model is really huge (and very slow to fit).

If it helps at all, my goal is to look at the joint behavior of risk and return models. The betas represent a linear factor structure of contemporaneous returns, and the alpha, in this context, represents “return” that could in principle be systematically captured (obviously this is hard to do in real life, so you probably want a really conservative prior on the extent to which alpha is predictable). In practical terms, each beta in the “risk” portion of the model has a corresponding portfolio position that could be taken to hedge that risk. What a person would want to ideally do is develop a model that predicts the alpha for each security, which is why I wanted to work with the posterior distribution of alpha in a separate component of the model.

One of the things I’m interested in exploring is how to optimize over the whole joint problem. Different asset classes have different costs of doing business, short positions are more expensive than long, and so determining the correlation of potential predictors of the intercept term, alpha, with the associated portfolio positions (betas) could matter in terms of the implementation cost of a strategy. The idiosyncratic volatility term, sigma is also often modeled separately and could matter for implementation. For example, if the cases where I get a lot of alpha come with a lot of costs (or idiosyncratic vol), it might not be worth implementing because it is too costly or risky.

A thought I had was to treat each draw as data and use the optimization routine to get a penalized maximum likelihood estimate for each draw, and then combine them into a sort of posterior distribution. Not sure what the math would actually mean at that point (maybe some of the projection results would useful here), but it would prevent the sheer amount of data from exploding (since fitting MCMC draw by draw would be pretty slow). I could also do a kind of measurement error model, where I just pass the mean and sd of the posterior distribution for alpha into the prediction step, but this would lose the correlation structure, which could be important to the cost function that ultimately leads to the optimal portfolio for the forecaster.

Anyway, sorry for the rambling, and thanks for the help!

1 Like

A relevant term for your searching is “model cutting”