Convergence issues in simple latent variable model

Dirk_B · February 6, 2020, 6:14pm

Hi everyone,

I try to fit a model with latent variables (i.e. stock variables). We observe the outcome and the flow variables.

I can fit the following model with some toy data without any problems:

Outcome_t = \alpha + \beta Stock_t+\epsilon_t with
Stock_t = \lambda Stock_{t-1} + Flow_{t}+\nu_t ,
Stock_{t=0}=Flow_{t=0}+\nu_{t=0} and
0>\lambda>0.99

I generate the data as follows:

library(rstan)

set.seed(2409)
options(max.print = 999999)

T=200
flow<- rnorm(T) # observed flow varible (i.e. ad spendings at time t)
stock <- as.numeric() # latent stock variable (i.e. stock_t = lambda * stock_t-1 + flow_t + e_t with stock_t=0 = flow_t + e_t)
time <- as.numeric()
for (k in 1:T) {
  time[k] <- k
  if(k==1){ ## k=1 actually t=0
    stock[k] = flow[k] + rnorm(1)
  } else {
    stock[k]<- 0.7 * stock[k-1] + flow[k] + rnorm(1) # real lambda 0.7, real sigma_stock = 1
  }
}
outcome <- 1 + 0.5 * stock + rnorm(T, 0, 1) # real intercept 1, real beta 0.5, real sigma_outcome = 1

data <- data.frame(T = 1:T, outcome, flow, stock)

The Stan code is as follows:

code <- "data {
  int <lower=1> T; // number of obs
  vector[T] flow;//flow variable
  vector[T] outcome;//observed outcome
}

parameters {
  
  //coefficients
  real intercept_raw; //intercept
  real beta_raw; //beta
  real <lower=0, upper=0.98> lambda_raw; // carryover , upper=0.98 implies 0.5+0.5*0.98=0.99
  
  
  //latent variable
  vector[T] stock_raw; // stock
  
  //scales
  real<lower = 0> sigma_y_a; // scale for outcome
  real<lower = 0> sigma_y_b; // scale for outcome
  real<lower = 0> sigma_stock_a; // scale for stock
  real<lower = 0> sigma_stock_b; // scale for stock
}

transformed parameters {

  //parameters
  vector[T] stock; // stock
  vector[T] mu_stock; // mu stock
  real intercept; // intercept
  real beta; // intercept
  real <lower=0, upper=0.99> lambda; // carryover
  real sigma_y; // scale for outcome
  real sigma_stock; // scale for stock variable
  
  //non-centered param.
  
  //coefficients
  intercept = 2 * intercept_raw; //intercept~normal(0,2)
  beta = 2 * beta_raw; //beta~normal(0,2)
  lambda = 0.5 + 0.5 * lambda_raw; //lambda~normal(0.5,0.5)
  
  //scales
  
  sigma_y = sigma_y_a .* sqrt(sigma_y_b); //sigma_y~half-Cauchy(0,1)
  sigma_stock = sigma_stock_a .* sqrt(sigma_stock_b); //sigma_stock~half-Cauchy(0,1)
  
  // state equations
  mu_stock[1] =  flow[1];
  stock[1] = mu_stock[1] + sigma_stock * stock_raw[1];
  for(t in 2:T) {
    mu_stock[t] =  lambda * stock[t-1] + flow[t];
    stock[t] = mu_stock[t] + sigma_stock * stock_raw[t];
  }
  
}


model {
  //priors
  
  //non-centered param.
  //state variable
  stock_raw ~ normal(0, 1);
  
  //coefficients
  intercept_raw ~ normal(0, 1);
  beta_raw ~ normal(0, 1);
  lambda_raw ~ normal(0, 1);
  
  //scales
  sigma_y_a ~ normal(0, 1); 
  sigma_y_b ~ inv_gamma(0.5, 0.5);
  sigma_stock_a ~ normal(0, 1);
  sigma_stock_b ~ inv_gamma(0.5, 0.5);
  
  //likelihood
  for (t in 1:T) 
    outcome[t] ~ normal(intercept + beta * stock[t], sigma_y) ;
}
generated quantities {
  vector[T] outcome_rep ;
  for (t in 1:T) 
    outcome_rep[t] = normal_rng(intercept + beta * stock[t], sigma_y) ;
}"

Fit and results:

model <- stan(model_code=code, 
              data = list(T = nrow(data),
                          flow = data$flow,
                          outcome = data$outcome),
              chains = 4,
              cores = 4,
              iter = 2000,
              warmup = 1000,
              seed = 2409,
              control=list(adapt_delta=0.99,
                           stepsize=0.01,
                           max_treedepth=15))

print(model, c("intercept", "beta", "lambda", "sigma_y", "sigma_stock"))

Inference for Stan model: 43f89ba07911b37e33b31fed5178be72.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
intercept   1.15    0.00 0.13 0.88 1.07 1.15 1.24  1.41  2922    1
beta        0.46    0.00 0.08 0.31 0.40 0.46 0.52  0.62  3640    1
lambda      0.72    0.00 0.07 0.57 0.68 0.72 0.77  0.84  1457    1
sigma_y     1.03    0.00 0.08 0.87 0.98 1.03 1.08  1.19   936    1
sigma_stock 0.96    0.01 0.31 0.47 0.75 0.92 1.12  1.67  1003    1

Samples were drawn using NUTS(diag_e) at Thu Feb 06 19:03:37 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The problem is that the paper we are referring to is modeling the stock variable not as Stock_t = \lambda Stock_{t-1} + Flow_{t}+\nu_t but as Stock_t = \lambda Stock_{t-1} + (1-\lambda)Flow_{t}+\nu_t.

So I generate the data as:

...
stock[k]<- 0.7 * stock[k-1] + 0.3 * flow[k] + rnorm(1) # real lambda 0.7, real sigma_stock = 1
...

and adjust my Stan program accordingly:

transformed parameters {
 ...
  
  for(t in 2:T) {
    mu_stock[t] =  lambda * stock[t-1] + (1 - lambda) * flow[t];
    stock[t] = mu_stock[t] + sigma_stock * stock_raw[t];
  }
}

If I do this adjustment, I get divergent transitions and low effective sample size.

1: There were 5 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: Examine the pairs() plot to diagnose sampling problems
 
3: The largest R-hat is 1.05, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

Ignoring the warnings, the fits still look ok, but it seems like the adjustment is somehow “killing” the sampling especially for beta:

4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean   sd  2.5%  25%  50%  75% 97.5% n_eff Rhat
intercept   1.14    0.00 0.15  0.83 1.05 1.15 1.24  1.43  2409 1.00
beta        0.52    0.04 0.41 -0.23 0.30 0.49 0.72  1.46    96 1.05
lambda      0.76    0.00 0.10  0.55 0.69 0.76 0.83  0.94  1090 1.00
sigma_y     1.05    0.00 0.08  0.88 1.00 1.05 1.11  1.22   991 1.00
sigma_stock 1.11    0.08 1.22  0.17 0.46 0.76 1.31  4.19   246 1.02

Samples were drawn using NUTS(diag_e) at Thu Feb  6 19:05:41 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

I already increased adapt_delta up to 0.99999 but I still get divergent transitions so there has to be something with my code. I already tried to use non-centered parameterization. Is there anything I missed?

Thank you in advance!

Dirk_B · February 7, 2020, 12:25pm

I could solve the problem. The problem was sampling the stock variable as a parameter in the model block. Now I defined the latent stock variable as a transformed parameter and sampled the lambda directly from an uniform distribution. Further I didn’t include an error term for the latent variable which is also in line with some literature. Now everything runs fine.

library(rstan)

set.seed(2409)
options(max.print = 999999)

T=500
flow<- rnorm(T) # observed flow varible (i.e. ad spendings at time t)
stock <- as.numeric() # latent stock variable (i.e. stock_t = lambda * stock_t-1 + (1 - lambda) * flow_t with stock_t=0 = flow_t and 0<lambda<0.99)
time <- as.numeric()
for (k in 1:T) {
  time[k] <- k
  if(k==1){ ## k=1 actually t=0
    stock[k] = flow[k]
  } else {
    stock[k]<- 0.7 * stock[k-1] + 0.3 * flow[k] # real lambda 0.7
  }
}
outcome <- 1 + 1 * stock + rnorm(T, 0, 1) # real intercept 1, real beta 1, real sigma_outcome = 1

data <- data.frame(T = 1:T, outcome, flow, stock)

code <- "data {
  int <lower=1> T; // number of obs
  vector[T] flow;//flow variable
  vector[T] outcome;//observed outcome
}

parameters {
  
  //coefficients
  real intercept_raw; //intercept
  real beta_raw; //slope
  real <lower=0, upper=0.99> lambda; // carryover coefficient
  
  
  //scales
  real<lower = 0> sigma_y_a; // scale for outcome
  real<lower = 0> sigma_y_b; // scale for outcome
}

transformed parameters {

  //parameters
  vector[T] stock; // stock
  vector[T] mu_stock; // mu stock
  real intercept; // intercept
  real beta; // slope
  real sigma_y; // scale for outcome
  
  //non-centered param.
  
  //coefficients
  intercept = 2 * intercept_raw; //intercept~normal(0,2)
  beta = 1 * beta_raw; //beta~normal(0,1)
  
  //scale
  
  sigma_y = sigma_y_a .* sqrt(sigma_y_b); //sigma_y~half-Cauchy(0,1)
  
  // state equation
  mu_stock[1] =  flow[1];
  stock[1] = mu_stock[1];
  for(t in 2:T) {
    mu_stock[t] =  lambda * stock[t-1] + (1 - lambda) * flow[t];
    stock[t] = mu_stock[t];
  }
  
}


model {
  //priors
  
  //non-centered param.
  
  //coefficients
  intercept_raw ~ normal(0, 1);
  beta_raw ~ normal(0, 1);
  
  //scale
  sigma_y_a ~ normal(0, 1); 
  sigma_y_b ~ inv_gamma(0.5, 0.5);
  
  //likelihood
  for (t in 1:T) 
    outcome[t] ~ normal(intercept + beta * stock[t], sigma_y) ;
}
generated quantities {
  vector[T] outcome_rep ;
  for (t in 1:T) 
    outcome_rep[t] = normal_rng(intercept + beta * stock[t], sigma_y) ;
}"

model <- stan(model_code=code, 
              data = list(T = nrow(data),
                          flow = data$flow,
                          outcome = data$outcome),
              chains = 4,
              cores = 4,
              iter = 2000,
              warmup = 1000,
              seed = 2409,
              control=list(adapt_delta=0.99,
                           stepsize=0.1,
                           max_treedepth=15))

print(model, c("intercept", "beta", "lambda", "sigma_y"))

Inference for Stan model: 18ea097e50e38146f33ce4fbd34e2ac0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

          mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
intercept 0.99       0 0.04 0.90 0.96 0.99 1.02  1.08  2186    1
beta      1.10       0 0.16 0.81 1.00 1.10 1.21  1.42  1821    1
lambda    0.69       0 0.05 0.57 0.65 0.69 0.72  0.78  1902    1
sigma_y   1.00       0 0.03 0.94 0.97 1.00 1.02  1.06  3757    1

Samples were drawn using NUTS(diag_e) at Fri Feb 07 13:22:57 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Topic		Replies	Views
Latent variables, divergent transitions and prior selection Modeling rstan , techniques , fitting-issues	22	1422	September 19, 2021
No convergence - mediation with latent variables in rstan Modeling rstan , fitting-issues	2	433	April 11, 2022
Help with QR decomposition with latent variable Modeling	10	754	May 6, 2020
Multiple Chains do not converge Modeling	1	2087	July 4, 2019
Rstan convergence issue - mediation model with latent variables Modeling rstan , fitting-issues , specification	0	438	April 10, 2022

Convergence issues in simple latent variable model

Related topics