Hi everyone,
I try to fit a model with latent variables (i.e. stock variables). We observe the outcome and the flow variables.
I can fit the following model with some toy data without any problems:
Outcome_t = \alpha + \beta Stock_t+\epsilon_t with
Stock_t = \lambda Stock_{t-1} + Flow_{t}+\nu_t ,
Stock_{t=0}=Flow_{t=0}+\nu_{t=0} and
0>\lambda>0.99
I generate the data as follows:
library(rstan)
set.seed(2409)
options(max.print = 999999)
T=200
flow<- rnorm(T) # observed flow varible (i.e. ad spendings at time t)
stock <- as.numeric() # latent stock variable (i.e. stock_t = lambda * stock_t-1 + flow_t + e_t with stock_t=0 = flow_t + e_t)
time <- as.numeric()
for (k in 1:T) {
time[k] <- k
if(k==1){ ## k=1 actually t=0
stock[k] = flow[k] + rnorm(1)
} else {
stock[k]<- 0.7 * stock[k-1] + flow[k] + rnorm(1) # real lambda 0.7, real sigma_stock = 1
}
}
outcome <- 1 + 0.5 * stock + rnorm(T, 0, 1) # real intercept 1, real beta 0.5, real sigma_outcome = 1
data <- data.frame(T = 1:T, outcome, flow, stock)
The Stan code is as follows:
code <- "data {
int <lower=1> T; // number of obs
vector[T] flow;//flow variable
vector[T] outcome;//observed outcome
}
parameters {
//coefficients
real intercept_raw; //intercept
real beta_raw; //beta
real <lower=0, upper=0.98> lambda_raw; // carryover , upper=0.98 implies 0.5+0.5*0.98=0.99
//latent variable
vector[T] stock_raw; // stock
//scales
real<lower = 0> sigma_y_a; // scale for outcome
real<lower = 0> sigma_y_b; // scale for outcome
real<lower = 0> sigma_stock_a; // scale for stock
real<lower = 0> sigma_stock_b; // scale for stock
}
transformed parameters {
//parameters
vector[T] stock; // stock
vector[T] mu_stock; // mu stock
real intercept; // intercept
real beta; // intercept
real <lower=0, upper=0.99> lambda; // carryover
real sigma_y; // scale for outcome
real sigma_stock; // scale for stock variable
//non-centered param.
//coefficients
intercept = 2 * intercept_raw; //intercept~normal(0,2)
beta = 2 * beta_raw; //beta~normal(0,2)
lambda = 0.5 + 0.5 * lambda_raw; //lambda~normal(0.5,0.5)
//scales
sigma_y = sigma_y_a .* sqrt(sigma_y_b); //sigma_y~half-Cauchy(0,1)
sigma_stock = sigma_stock_a .* sqrt(sigma_stock_b); //sigma_stock~half-Cauchy(0,1)
// state equations
mu_stock[1] = flow[1];
stock[1] = mu_stock[1] + sigma_stock * stock_raw[1];
for(t in 2:T) {
mu_stock[t] = lambda * stock[t-1] + flow[t];
stock[t] = mu_stock[t] + sigma_stock * stock_raw[t];
}
}
model {
//priors
//non-centered param.
//state variable
stock_raw ~ normal(0, 1);
//coefficients
intercept_raw ~ normal(0, 1);
beta_raw ~ normal(0, 1);
lambda_raw ~ normal(0, 1);
//scales
sigma_y_a ~ normal(0, 1);
sigma_y_b ~ inv_gamma(0.5, 0.5);
sigma_stock_a ~ normal(0, 1);
sigma_stock_b ~ inv_gamma(0.5, 0.5);
//likelihood
for (t in 1:T)
outcome[t] ~ normal(intercept + beta * stock[t], sigma_y) ;
}
generated quantities {
vector[T] outcome_rep ;
for (t in 1:T)
outcome_rep[t] = normal_rng(intercept + beta * stock[t], sigma_y) ;
}"
Fit and results:
model <- stan(model_code=code,
data = list(T = nrow(data),
flow = data$flow,
outcome = data$outcome),
chains = 4,
cores = 4,
iter = 2000,
warmup = 1000,
seed = 2409,
control=list(adapt_delta=0.99,
stepsize=0.01,
max_treedepth=15))
print(model, c("intercept", "beta", "lambda", "sigma_y", "sigma_stock"))
Inference for Stan model: 43f89ba07911b37e33b31fed5178be72.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
intercept 1.15 0.00 0.13 0.88 1.07 1.15 1.24 1.41 2922 1
beta 0.46 0.00 0.08 0.31 0.40 0.46 0.52 0.62 3640 1
lambda 0.72 0.00 0.07 0.57 0.68 0.72 0.77 0.84 1457 1
sigma_y 1.03 0.00 0.08 0.87 0.98 1.03 1.08 1.19 936 1
sigma_stock 0.96 0.01 0.31 0.47 0.75 0.92 1.12 1.67 1003 1
Samples were drawn using NUTS(diag_e) at Thu Feb 06 19:03:37 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The problem is that the paper we are referring to is modeling the stock variable not as Stock_t = \lambda Stock_{t-1} + Flow_{t}+\nu_t but as Stock_t = \lambda Stock_{t-1} + (1-\lambda)Flow_{t}+\nu_t.
So I generate the data as:
...
stock[k]<- 0.7 * stock[k-1] + 0.3 * flow[k] + rnorm(1) # real lambda 0.7, real sigma_stock = 1
...
and adjust my Stan program accordingly:
transformed parameters {
...
for(t in 2:T) {
mu_stock[t] = lambda * stock[t-1] + (1 - lambda) * flow[t];
stock[t] = mu_stock[t] + sigma_stock * stock_raw[t];
}
}
If I do this adjustment, I get divergent transitions and low effective sample size.
1: There were 5 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems
3: The largest R-hat is 1.05, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
Ignoring the warnings, the fits still look ok, but it seems like the adjustment is somehow “killing” the sampling especially for beta:
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
intercept 1.14 0.00 0.15 0.83 1.05 1.15 1.24 1.43 2409 1.00
beta 0.52 0.04 0.41 -0.23 0.30 0.49 0.72 1.46 96 1.05
lambda 0.76 0.00 0.10 0.55 0.69 0.76 0.83 0.94 1090 1.00
sigma_y 1.05 0.00 0.08 0.88 1.00 1.05 1.11 1.22 991 1.00
sigma_stock 1.11 0.08 1.22 0.17 0.46 0.76 1.31 4.19 246 1.02
Samples were drawn using NUTS(diag_e) at Thu Feb 6 19:05:41 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
I already increased adapt_delta up to 0.99999 but I still get divergent transitions so there has to be something with my code. I already tried to use non-centered parameterization. Is there anything I missed?
Thank you in advance!