Bayesian Poisson Tensor Factorization optimization

andreyzaznaev · July 21, 2025, 3:23pm

Hello Stan community,

I am looking for ways to computationally improve my Bayesian Poisson Tensor Decomposition model fit with VI-meanfield. The model is heavily based on Schein et al 2015 paper. Here is my model specification:

\begin{aligned} X_{sbm} &\sim \text{Poisson}(A_{sb} \lambda_{sbm}) \\[4pt] \lambda_{sbm} &= \sum_{k=1}^{K} \sigma_{sk}\,\beta_{bk}\,\mu_{mk} \\[8pt] \sigma_{sk} &\sim \text{Gamma}\!\bigl(a,\; a b^{(\sigma)}\bigr) \\[4pt] \beta_{bk} &\sim \text{Gamma}\!\bigl(a,\; a b^{(\beta)}\bigr) \\[4pt] \mu_{mk} &\sim \text{Gamma}\!\bigl(a,\; a b^{(\mu)}\bigr) \end{aligned}

X is an observed tensor of size (S,M,B), A is an observed matrix of size (S,B). \sigma, \beta, and \mu are factor matrices that I’m interested in.

Here is my Stan code (I’ve tried adapting vectorization methods mentioned in this Stan forum post):

data {
    int<lower=0> S;  
	int<lower=0> M;  
    int<lower=0> B; 
	int<lower=0> K;  

    array[S, M, B] int<lower=0> X;
    array[S] row_vector<lower=0>[B] A; 

    real a;
    real b_sigma;
    real b_mu;
    real b_beta;
}

parameters {
    array[S] row_vector<lower=0>[K] sigma;
    array[M] row_vector<lower=0>[K] mu;
    matrix<lower=0>[K, B] beta;
}

model {

    for (s in 1:S) {
        sigma[s] ~ gamma(a, b_sigma);
    }

    for (m in 1:M) {
        mu[m] ~ gamma(a, b_mu);
    }

    to_vector(beta) ~ gamma(a, b_beta);

    for (s in 1:S) {
        for (m in 1:M) {
            
            row_vector[B] lambda = (sigma[s] .* mu[m]) * beta .* A[s];
            row_vector[B] lambda_eps = lambda + rep_row_vector(0.000001, B);
            
            X[s,m] ~ poisson(lambda_eps);
        }
    }
}

I’m compiling and running it using Cmdstanpy:

stan_data = dict(
    S=S,
    M=M,
    B=B,
    K=K,
    X=X.astype(int),
    A=A,
    a=0.1,
    b_sigma=3,
    b_mu=1,
    b_beta=1
)

model = cs.CmdStanModel(stan_file='models/3D_V1.stan')

vi = model.variational(
    data=stan_data,
    algorithm = "meanfield",    
    grad_samples = 1,         
    iter=100,
    draws = 1,     
    seed = 11,
    require_converged = False,   
    show_console=True
)

The problem I’m facing is that when running with my data tensor of dimensions (536, 96, 680), the model is fitting very slowly and displaying the following message:

Chain [1] 1000 transitions using 10 leapfrog steps per transition would take 1.05478e+06 seconds.
Chain [1] Adjust your expectations accordingly!

What I have already tested / observations:

Running with a small tensor of size (10, 96, 15) makes the model finish with no problem
Removing the line row_vector[B] lambda_eps = lambda + rep_row_vector(0.000001, B); results in the model rejecting the initial values and having log probability evaluate to log(0), i.e. negative infinity.
During adaptation, the target may jump to very low numbers (like -1.56092e+171). This happens right after the gamma sampling, not yet reaching poisson.

Any comments would be welcome! I appreciate the effort and patience of those replying to my post.

Andrey

kaskogsholm · July 21, 2025, 10:44pm

I can’t give more than guesses on the optimization of the code itself. I’m hoping that that @Bob_Carpenter shows up to educate us both.

But I do use CmdStanPy very often with a model with gamma and poisson distributions. I have found that the issues with initialization can be solved by either passing explicit inits or using pathfinder to find reasonable initial values. There’s an example of the later approach in the CmdStanPy docs: Variational Inference using Pathfinder — CmdStanPy 1.2.5 documentation

.

caesoma · July 22, 2025, 2:33pm

andreyzaznaev:

The problem I’m facing is that when running with my data tensor of dimensions (536, 96, 680), the model is fitting very slowly and displaying the following message:
Chain [1] 1000 transitions using 10 leapfrog steps per transition would take 1.05478e+06 seconds.
Chain [1] Adjust your expectations accordingly!

I believe Stan will always show that warning – you can set up any valid model and it will run if correctly implemented, but it may be the case that it’s a lot of computation and it will take a long time, hence “adjust your expectations accordingly”.

How does the number of operations scale with that increase in size? Does the run time scale accordingly or is it blowing up, there is such thing as too-heavy a calculation per iteration that makes the whole thing infeasible.

The Poisson distribution will only accept values greater or equal to zero (zero is tricky in practice, since the likelihood of anything else would be zero, and the logP would be minus infinity for anything but), so even if you are not expecting values smaller than zero, it is possible they are fluctuating to something vanishingly small, but still negative – if this is happening my suggestion would rather be to make sure any of those values are set to zero before passing them to the distribution (what you are doing will also work if no value is smaller than 0.000001 AND artificially setting this value doesn’t distort the model, i.e. it’s in practice equivalent to being zero).

I’m not sure what you mean by this, are you debugging by printing the probabilities at each step? It may well be that some proposal has very low probability and ends up rejected (or is accepted, but just ends up being low), but is this a large discrepancy from other proposals? If you have several thousand observations, it is entirely possible that adding all of them up as (log) small probabilities will give you a large negative number. The question is whether this is not expected for some reason.

Without going into your model in detail, maybe some those loops can be parallelized or eliminated altogether.

I haven’t done this in a while, but I think for a vector you either have the distribution parameters match its size, or be a scalar and then it applies to all of its elements. So this could probably be just sigma ~ gamma(a, b_sigma), for instance. I’m not sure this will save computation on the order you need, but maybe start looking at what loops are really necessary, and then what can be replaced with more efficient vector/matrix multiplication as well as parallelization functions.

Topic		Replies	Views
Optimizing stan code runtime for matrix factorization Modeling	6	359	May 18, 2024
Large Poisson model with Individual effects is too slow Modeling specification , performance	10	1849	November 13, 2017
Any way to make Stan competitive with Tensorflow for maximum likelihood? Algorithms	43	6574	July 31, 2019
Could you please help to find out why my GP model got slower Modeling gaussian-process	12	1328	April 7, 2021
Help needed specifying and optimizing tensor factorization model Modeling specification , performance	2	434	July 28, 2023

Bayesian Poisson Tensor Factorization optimization

Related topics