# Convergence issues in simple latent variable model

Hi everyone,

I try to fit a model with latent variables (i.e. stock variables). We observe the outcome and the flow variables.

I can fit the following model with some toy data without any problems:

Outcome_t = \alpha + \beta Stock_t+\epsilon_t with
Stock_t = \lambda Stock_{t-1} + Flow_{t}+\nu_t ,
Stock_{t=0}=Flow_{t=0}+\nu_{t=0} and
0>\lambda>0.99

I generate the data as follows:

library(rstan)

set.seed(2409)
options(max.print = 999999)

T=200
flow<- rnorm(T) # observed flow varible (i.e. ad spendings at time t)
stock <- as.numeric() # latent stock variable (i.e. stock_t = lambda * stock_t-1 + flow_t + e_t with stock_t=0 = flow_t + e_t)
time <- as.numeric()
for (k in 1:T) {
time[k] <- k
if(k==1){ ## k=1 actually t=0
stock[k] = flow[k] + rnorm(1)
} else {
stock[k]<- 0.7 * stock[k-1] + flow[k] + rnorm(1) # real lambda 0.7, real sigma_stock = 1
}
}
outcome <- 1 + 0.5 * stock + rnorm(T, 0, 1) # real intercept 1, real beta 0.5, real sigma_outcome = 1

data <- data.frame(T = 1:T, outcome, flow, stock)


The Stan code is as follows:

code <- "data {
int <lower=1> T; // number of obs
vector[T] flow;//flow variable
vector[T] outcome;//observed outcome
}

parameters {

//coefficients
real intercept_raw; //intercept
real beta_raw; //beta
real <lower=0, upper=0.98> lambda_raw; // carryover , upper=0.98 implies 0.5+0.5*0.98=0.99

//latent variable
vector[T] stock_raw; // stock

//scales
real<lower = 0> sigma_y_a; // scale for outcome
real<lower = 0> sigma_y_b; // scale for outcome
real<lower = 0> sigma_stock_a; // scale for stock
real<lower = 0> sigma_stock_b; // scale for stock
}

transformed parameters {

//parameters
vector[T] stock; // stock
vector[T] mu_stock; // mu stock
real intercept; // intercept
real beta; // intercept
real <lower=0, upper=0.99> lambda; // carryover
real sigma_y; // scale for outcome
real sigma_stock; // scale for stock variable

//non-centered param.

//coefficients
intercept = 2 * intercept_raw; //intercept~normal(0,2)
beta = 2 * beta_raw; //beta~normal(0,2)
lambda = 0.5 + 0.5 * lambda_raw; //lambda~normal(0.5,0.5)

//scales

sigma_y = sigma_y_a .* sqrt(sigma_y_b); //sigma_y~half-Cauchy(0,1)
sigma_stock = sigma_stock_a .* sqrt(sigma_stock_b); //sigma_stock~half-Cauchy(0,1)

// state equations
mu_stock[1] =  flow[1];
stock[1] = mu_stock[1] + sigma_stock * stock_raw[1];
for(t in 2:T) {
mu_stock[t] =  lambda * stock[t-1] + flow[t];
stock[t] = mu_stock[t] + sigma_stock * stock_raw[t];
}

}

model {
//priors

//non-centered param.
//state variable
stock_raw ~ normal(0, 1);

//coefficients
intercept_raw ~ normal(0, 1);
beta_raw ~ normal(0, 1);
lambda_raw ~ normal(0, 1);

//scales
sigma_y_a ~ normal(0, 1);
sigma_y_b ~ inv_gamma(0.5, 0.5);
sigma_stock_a ~ normal(0, 1);
sigma_stock_b ~ inv_gamma(0.5, 0.5);

//likelihood
for (t in 1:T)
outcome[t] ~ normal(intercept + beta * stock[t], sigma_y) ;
}
generated quantities {
vector[T] outcome_rep ;
for (t in 1:T)
outcome_rep[t] = normal_rng(intercept + beta * stock[t], sigma_y) ;
}"


Fit and results:

model <- stan(model_code=code,
data = list(T = nrow(data),
flow = data$flow, outcome = data$outcome),
chains = 4,
cores = 4,
iter = 2000,
warmup = 1000,
seed = 2409,
stepsize=0.01,
max_treedepth=15))

print(model, c("intercept", "beta", "lambda", "sigma_y", "sigma_stock"))

Inference for Stan model: 43f89ba07911b37e33b31fed5178be72.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
intercept   1.15    0.00 0.13 0.88 1.07 1.15 1.24  1.41  2922    1
beta        0.46    0.00 0.08 0.31 0.40 0.46 0.52  0.62  3640    1
lambda      0.72    0.00 0.07 0.57 0.68 0.72 0.77  0.84  1457    1
sigma_y     1.03    0.00 0.08 0.87 0.98 1.03 1.08  1.19   936    1
sigma_stock 0.96    0.01 0.31 0.47 0.75 0.92 1.12  1.67  1003    1

Samples were drawn using NUTS(diag_e) at Thu Feb 06 19:03:37 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


The problem is that the paper we are referring to is modeling the stock variable not as Stock_t = \lambda Stock_{t-1} + Flow_{t}+\nu_t but as Stock_t = \lambda Stock_{t-1} + (1-\lambda)Flow_{t}+\nu_t.

So I generate the data as:

...
stock[k]<- 0.7 * stock[k-1] + 0.3 * flow[k] + rnorm(1) # real lambda 0.7, real sigma_stock = 1
...


and adjust my Stan program accordingly:

transformed parameters {
...

for(t in 2:T) {
mu_stock[t] =  lambda * stock[t-1] + (1 - lambda) * flow[t];
stock[t] = mu_stock[t] + sigma_stock * stock_raw[t];
}
}



If I do this adjustment, I get divergent transitions and low effective sample size.

1: There were 5 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems

3: The largest R-hat is 1.05, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess


Ignoring the warnings, the fits still look ok, but it seems like the adjustment is somehow â€śkillingâ€ť the sampling especially for beta:

4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd  2.5%  25%  50%  75% 97.5% n_eff Rhat
intercept   1.14    0.00 0.15  0.83 1.05 1.15 1.24  1.43  2409 1.00
beta        0.52    0.04 0.41 -0.23 0.30 0.49 0.72  1.46    96 1.05
lambda      0.76    0.00 0.10  0.55 0.69 0.76 0.83  0.94  1090 1.00
sigma_y     1.05    0.00 0.08  0.88 1.00 1.05 1.11  1.22   991 1.00
sigma_stock 1.11    0.08 1.22  0.17 0.46 0.76 1.31  4.19   246 1.02

Samples were drawn using NUTS(diag_e) at Thu Feb  6 19:05:41 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


I already increased adapt_delta up to 0.99999 but I still get divergent transitions so there has to be something with my code. I already tried to use non-centered parameterization. Is there anything I missed?

1 Like

I could solve the problem. The problem was sampling the stock variable as a parameter in the model block. Now I defined the latent stock variable as a transformed parameter and sampled the lambda directly from an uniform distribution. Further I didnâ€™t include an error term for the latent variable which is also in line with some literature. Now everything runs fine.

library(rstan)

set.seed(2409)
options(max.print = 999999)

T=500
flow<- rnorm(T) # observed flow varible (i.e. ad spendings at time t)
stock <- as.numeric() # latent stock variable (i.e. stock_t = lambda * stock_t-1 + (1 - lambda) * flow_t with stock_t=0 = flow_t and 0<lambda<0.99)
time <- as.numeric()
for (k in 1:T) {
time[k] <- k
if(k==1){ ## k=1 actually t=0
stock[k] = flow[k]
} else {
stock[k]<- 0.7 * stock[k-1] + 0.3 * flow[k] # real lambda 0.7
}
}
outcome <- 1 + 1 * stock + rnorm(T, 0, 1) # real intercept 1, real beta 1, real sigma_outcome = 1

data <- data.frame(T = 1:T, outcome, flow, stock)

code <- "data {
int <lower=1> T; // number of obs
vector[T] flow;//flow variable
vector[T] outcome;//observed outcome
}

parameters {

//coefficients
real intercept_raw; //intercept
real beta_raw; //slope
real <lower=0, upper=0.99> lambda; // carryover coefficient

//scales
real<lower = 0> sigma_y_a; // scale for outcome
real<lower = 0> sigma_y_b; // scale for outcome
}

transformed parameters {

//parameters
vector[T] stock; // stock
vector[T] mu_stock; // mu stock
real intercept; // intercept
real beta; // slope
real sigma_y; // scale for outcome

//non-centered param.

//coefficients
intercept = 2 * intercept_raw; //intercept~normal(0,2)
beta = 1 * beta_raw; //beta~normal(0,1)

//scale

sigma_y = sigma_y_a .* sqrt(sigma_y_b); //sigma_y~half-Cauchy(0,1)

// state equation
mu_stock[1] =  flow[1];
stock[1] = mu_stock[1];
for(t in 2:T) {
mu_stock[t] =  lambda * stock[t-1] + (1 - lambda) * flow[t];
stock[t] = mu_stock[t];
}

}

model {
//priors

//non-centered param.

//coefficients
intercept_raw ~ normal(0, 1);
beta_raw ~ normal(0, 1);

//scale
sigma_y_a ~ normal(0, 1);
sigma_y_b ~ inv_gamma(0.5, 0.5);

//likelihood
for (t in 1:T)
outcome[t] ~ normal(intercept + beta * stock[t], sigma_y) ;
}
generated quantities {
vector[T] outcome_rep ;
for (t in 1:T)
outcome_rep[t] = normal_rng(intercept + beta * stock[t], sigma_y) ;
}"

model <- stan(model_code=code,
data = list(T = nrow(data),
flow = data$flow, outcome = data$outcome),
chains = 4,
cores = 4,
iter = 2000,
warmup = 1000,
seed = 2409,
stepsize=0.1,
max_treedepth=15))

print(model, c("intercept", "beta", "lambda", "sigma_y"))

Inference for Stan model: 18ea097e50e38146f33ce4fbd34e2ac0.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
intercept 0.99       0 0.04 0.90 0.96 0.99 1.02  1.08  2186    1
beta      1.10       0 0.16 0.81 1.00 1.10 1.21  1.42  1821    1
lambda    0.69       0 0.05 0.57 0.65 0.69 0.72  0.78  1902    1
sigma_y   1.00       0 0.03 0.94 0.97 1.00 1.02  1.06  3757    1

Samples were drawn using NUTS(diag_e) at Fri Feb 07 13:22:57 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

1 Like