Stan Model running too slow (inv-logit transformation seems to slow it down)

Please share your Stan program and accompanying data if possible.


data{
        int<lower = 1> I;
        int<lower = 1> L;
        int<lower = 0, upper = 1> obs[I,L];
        real<lower=0> sigma_mu;                               // Standard Deviation of the prior for elements of mu
        matrix[L,L] Sigma_prior;
    }
    parameters{
        vector<lower=-2*sigma_mu, upper=2*sigma_mu>[L] mu;  // Elements of vector mu
        cov_matrix[L] Sigma;
        vector[L] raw_sig_vec[I]; 
    }
    transformed parameters{
        vector[L] sig_vec[I];
        for (i in 1:I){
            sig_vec[i] = inv_logit(raw_sig_vec[i]);         // Gaussian Sample to Probabilities
        }
    }
    model{
        real item_prob;
        
        mu ~ normal(0, sigma_mu);  
                
        Sigma ~ inv_wishart(L+1, Sigma_prior);
        for (i in 1:I){
            raw_sig_vec[i] ~ multi_normal(mu, Sigma);
        }
        for (i in 1:I){
            item_prob = 1.0;
            for (l in 1:L){
                item_prob *= obs[i,l]*sig_vec[i,l]+(1-obs[i,l])*(1-sig_vec[i,l]);
            }
            target += log(item_prob);
        }
    }

I am a Stan newbie. I am trying to run this model for approximate values of I = 10000, L = 10. Unfortunately it is too slow. Running it for I = 1000, L = 2 takes 200+ seconds. If I replace the inv_logit by softmax, it becomes even slower. That makes me think that this transformation is crucial. Is there a way I can optimize this to make it run faster?
Any help is highly appreciated! :)

1 Like

Have you seen the sufficient statistics trick in the manual for binomial outcomes?

1 Like

Wait. You are either doing something advanced that I’ve never seen before, or you’re doing something wrong. Can you explain your model more? I’ve never seen a model where binomial data are multiplied by parameters and the result simply log transformed to increment the log probability.

It is totally possible that I might be making a mistake.
What I am trying to capture here is that the different components for any item i are correlated with the correlation being captured by the multivariate normal distribution and then converted into probabilities.
The generative model is as follows:

  1. Sample the mean vector mu ~ N (0, sigma_mu) and the covariance matrix Sigma ~ inv_wishart(nu, Sigma_prior).

  2. For each item i in [1, I] do the following:
    (i) Sample a raw vector of length L raw_sig_vec[i] ~ MVN (mu, Sigma).
    (ii) Obtain a probability vector of length L sig_vec[i] = inv_logit(raw_sig_vec[i])
    (iii) For each l in [1, L] observe obs[i,l] ~ Bernoulli (sig_vec[i,l])

The idea of the line

is that it is the probability of observing obs[i,1], obs[i, 2], …, obs[i,L] for a particular i.
Then I add the log probabilities of all the items i.

Why not just use the built-in bernoulli() function? Generally you should avoid trying to code things yourself when the devs have done the work to create a verified working/performant function for things.

That makes a lot of sense! I changed my implementation to the following:

data{
        int<lower = 1> I;
        int<lower = 1> L;
        int<lower = 0, upper = 1> obs[I, L];
        real<lower=0> sigma_mu;                               // Standard Deviation of the prior for elements of mu
        matrix[L,L] Sigma_prior;
    }
    parameters{
        vector<lower=-2*sigma_mu, upper=2*sigma_mu>[L] mu;  // Elements of vector mu
        cov_matrix[L] Sigma;
        vector[L] raw_sig_vec[I]; 
    }
    model{
        real item_prob; 
        mu ~ normal(0, sigma_mu);    
        Sigma ~ inv_wishart(L+1, Sigma_prior);
        for (i in 1:I){
            raw_sig_vec[i] ~ multi_normal(mu, Sigma);
            obs[i] ~ bernoulli_logit(raw_sig_vec[i]);
        }
    }

Unfortunately, the runtime of this is also the same as the earlier code. Are there any other tricks I could use to speed this up?

We’re usually recommending LKJ priors rather than inverse-wisharts for covariance matrices. Perhaps this is not causing the above problem, but in general we don’t recommend inv-Wishart. Also the hard constraints on mu look weird.

2 Likes

Thanks for the insights @andrewgelman.
Those constraints can actually be done away with. Unfortunately, removing them doesn’t affect the runtime. :(

There are a couple of things you can try. The first is to use a cholesky factor and LKJ prior instead of the inverse-wishart, you can then also use the vectorised multi_normal_cholesky distribution rather than a loop:

data{
        int<lower = 1> I;
        int<lower = 1> L;
        int<lower = 0, upper = 1> obs[I, L];
        real<lower=0> sigma_mu;                               // Standard Deviation of the prior for elements of mu
        matrix[L,L] Sigma_prior;
    }
    parameters{
        vector[L] mu;  // Elements of vector mu
        cholesky_factor_corr[L] Sigma;
        vector[L] raw_sig_vec[I]; 
    }
    model{
        mu ~ normal(0, sigma_mu);    
        Sigma ~ lkj_corr_cholesky(1);  //You'll need to work out the appropriate prior
        raw_sig_vec ~ multi_normal_cholesky(mu, Sigma);
        for (i in 1:I){
            obs[i] ~ bernoulli_logit(raw_sig_vec[i]);
        }
    }
3 Likes

Remember to use the sufficient stats speed up I mentioned earlier.

@andrjohns Thanks a lot, this speeded it up tremendously!

I am trying to figure out what would be a good way of doing it. For a given ‘i’ and ‘l’, I currently have just one observation.
I wonder if I can probably apply that on the whole vector of length L. In that case I don’t know what the sufficient statistic would be. It surely wouldn’t be the mean as that would capture the correlation between the different components.
It’d be great if you could provide me some direction on how I should be approaching it. :)

Oh, sorry, I was misreading the model. I see now that as you say you only have one observation per combination of I and J, the sufficient statistics trick doesn’t apply.

2 Likes

I have a follow-up question on this. When I run this using pystan.sampling(), it works well. However, when I try doing pystan.optimizing(), I get an error:

Initial log joint probability = -5435.98
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      99       8196.55    0.00643506   2.28183e+06     0.03834      0.6638      157   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     199       10122.9     0.0126388   5.66937e+08       1.242      0.1242      296   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

     214       11170.1     0.0376207   4.53054e+09       1e-12       0.001      406  LS failed, Hessian reset 
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

Optimization terminated with error: 
  Line search failed to achieve a sufficient decrease, no more progress can be made

stderr:

Any idea how do I work around this?

1 Like