Latent Model, High Dimension, Rstan and ADVI


#1

Hello everyone!

I develope my model based on this link https://rfarouni.github.io/assets/projects/BayesianFactorAnalysis/BayesianFactorAnalysis.html and I would like to use it for the high dimension. Unfortunately, there is a problem with my program, it doesn’t work for the high dimension.

So this is my model :

data {
  int<lower=1> N; // number of data points in entire dataset
  int<lower=1> P; // number of genes
  int<lower=1> K; // number of mixture components
  matrix[P,N] y; //vector[N] y[P]; // observations
  int<lower=1> a0; // 
  int<lower=1> b0; // 
  int<lower=1> c0; // 
  int<lower=1> d0; // 
}

transformed data {
  vector[P] mu_k;
  mu_k = rep_vector(0.0,P);
}

parameters {
  matrix[K, N] W;  
  matrix[P, K] phi; 
  real<lower=0> tau_W;
  real<lower=0> tau_N;
	cholesky_factor_corr[P] Lcorr;
	vector<lower=0>[P] sigma_rdmInt;
}

transformed parameters {
  real<lower=0> s_W ;
  real<lower=0> s_N ;
  s_W = inv(sqrt(tau_W));
  s_N = inv(sqrt(tau_N));
}

model {
  real dens[N] ;

  // hyper priors
  tau_W ~ gamma(a0,b0);
  tau_N ~ gamma(c0,d0);
  sigma_rdmInt ~ cauchy(0, 5);
  Lcorr ~ lkj_corr_cholesky(2);

// priors  
  to_vector(W) ~ normal(0.0, s_W);

  for (k in 1:K) {
  phi[,k] ~ multi_normal_cholesky(mu_k, diag_pre_multiply(sigma_rdmInt, Lcorr));
  }

  // likelihood ;
  for (n in 1:N) { 
  dens[n]= normal_lpdf(y[,n]| phi*W[,n], s_N); 
  }
  target+= log_sum_exp(dens); 
}

generated quantities {
	matrix[P,P] Omega;
	matrix[P,P] Sigma;
	Omega = multiply_lower_tri_self_transpose(Lcorr);
	Sigma = quad_form_diag(Omega, sigma_rdmInt); 
}

This is my simulated data :


simu <- function(N=250, P=25, factors_sim=50,zeta_sim=4,eta_sim=0.1, rho_sim=rep(2,P),seed=012019){
  set.seed(seed)
  mean_Phi <- rep(0, P)
  Wth <- t(matrix(rnorm(N*factors_sim,0,zeta_sim),nrow=N,ncol=factors_sim))
  Eth <- t(matrix(rnorm(N*P,0,eta_sim),nrow=N,ncol=P))
  Phith <-t(rmvnorm(n=factors_sim, mean=mean_Phi, sigma = diag(rho_sim)))
  Xth <- Phith%*%Wth+Eth
  return(list(Xth,Wth,Phith))
}

P=30
K=30
sim <- simu(N=400, P=P,factors_sim=K) 
X <- sim[[1]]
W <- sim[[2]]
phi <- sim[[3]]

And this is my inference :

info <- list(N=dim(X)[2],
              P=dim(X)[1],
              K=K,
              y=X,
              a0=1.0,
              b0=1.0,
              c0=1.0,
              d0=1.0) 

# init, parameters
init <- list(W=W, phi=phi,
              tau_W=0.5,
              tau_N=10,
              Lcorr=t(chol(cor(t(phi)))),
              sigma_rdmInt=rep(1,info2$P))

latent_advi <- vb(latent_model, data=info, init=init, iter=10000)

When I compile it, I have this issue :

Error in sampler$call_sampler(c(args, dotlist)) : 
  stan::variational::advi::adapt_eta: Cannot compute ELBO using the initial variational distribution. Your model may be either severely ill-conditioned or misspecified.

Could you help me ?

Thank you for your time,


#2

It’s not a compilation error.

It’s an initialisation problem. You may solve this by chaning the init. It’s also possible that the model is ill conditioned in which case stronger priors may help (or not)


#3

The information in the init is the simulation data… How could this be the problem?
If that is not the problem, which priors can I use?


#4

I did not check what the values are, but if you think they are good then you are in trouble. init values set the mean of variational approximation, but there is currently now way to set the scale of initial variational distribution. The ELBO mentioned in the error message is computed by sampling from the variational distribution and if target fails to compute for any of those draws you could get this error. You can try to evaluate target (log density) just with init values without trying to evaluate ELBO e.g. running HMC or optimization for 1 iteration.


#5

You can set the init values for VB? I didn’t know that was implemented.

Re @Rincourt, my experience with latent variable models with Stan/VB is that init values speed convergence but don’t otherwise improve estimation/identification. It’d be best to run this without inits and then add inits later once you know how the model works in your application.

Also, as a general rule, when VB fails you should run it with the full sampler. The full sampler is much more robust and will usually show clearly where the model is miss-specified through divergent transitions/lack of chain convergence.


#6

Thank you for your reply.

@avehtari , how can I evaluate the target without computing the ELBO? Do I find it in the stan object?

I try to test with the HMC (with and without the init) but the R session crash.

@saudiwin , yes we can initialize with the VB algorithm.
What did you mean, when you say “you should run it with the full sampler”? Did you talk to the MCMC?