Hello Stan community,
I am looking for ways to computationally improve my Bayesian Poisson Tensor Decomposition model fit with VI-meanfield. The model is heavily based on Schein et al 2015 paper. Here is my model specification:
X is an observed tensor of size (S,M,B), A is an observed matrix of size (S,B). \sigma, \beta, and \mu are factor matrices that I’m interested in.
Here is my Stan code (I’ve tried adapting vectorization methods mentioned in this Stan forum post):
data {
int<lower=0> S;
int<lower=0> M;
int<lower=0> B;
int<lower=0> K;
array[S, M, B] int<lower=0> X;
array[S] row_vector<lower=0>[B] A;
real a;
real b_sigma;
real b_mu;
real b_beta;
}
parameters {
array[S] row_vector<lower=0>[K] sigma;
array[M] row_vector<lower=0>[K] mu;
matrix<lower=0>[K, B] beta;
}
model {
for (s in 1:S) {
sigma[s] ~ gamma(a, b_sigma);
}
for (m in 1:M) {
mu[m] ~ gamma(a, b_mu);
}
to_vector(beta) ~ gamma(a, b_beta);
for (s in 1:S) {
for (m in 1:M) {
row_vector[B] lambda = (sigma[s] .* mu[m]) * beta .* A[s];
row_vector[B] lambda_eps = lambda + rep_row_vector(0.000001, B);
X[s,m] ~ poisson(lambda_eps);
}
}
}
I’m compiling and running it using Cmdstanpy:
stan_data = dict(
S=S,
M=M,
B=B,
K=K,
X=X.astype(int),
A=A,
a=0.1,
b_sigma=3,
b_mu=1,
b_beta=1
)
model = cs.CmdStanModel(stan_file='models/3D_V1.stan')
vi = model.variational(
data=stan_data,
algorithm = "meanfield",
grad_samples = 1,
iter=100,
draws = 1,
seed = 11,
require_converged = False,
show_console=True
)
The problem I’m facing is that when running with my data tensor of dimensions (536, 96, 680), the model is fitting very slowly and displaying the following message:
Chain [1] 1000 transitions using 10 leapfrog steps per transition would take 1.05478e+06 seconds.
Chain [1] Adjust your expectations accordingly!
What I have already tested / observations:
- Running with a small tensor of size (10, 96, 15) makes the model finish with no problem
- Removing the line
row_vector[B] lambda_eps = lambda + rep_row_vector(0.000001, B);results in the model rejecting the initial values and having log probability evaluate to log(0), i.e. negative infinity. - During adaptation, the target may jump to very low numbers (like -1.56092e+171). This happens right after the gamma sampling, not yet reaching poisson.
Any comments would be welcome! I appreciate the effort and patience of those replying to my post.
Andrey