Help with QR decomposition with latent variable

Hi everyone,

we run the following linear model:

y_t=\alpha+Stock_{t}\beta+CV_t\gamma+\epsilon_t with Stock_{s,t}=\lambda_s Stock_{s,t-1}+Flow_{s,t} where 0<\lambda<1 and \epsilon_t = \theta\epsilon_{t-1}+\eta_t

We observe S flow variables Flow and D control variables CV.

The Stan code I run is the following:

"data {
  int <lower=1> N; // number of obs
  int <lower=1> S; // number of flow/stock variables
  int <lower=1> D; // number of control variables
  matrix[N,S] flow; //flow variables
  matrix[N,D] cv; // control variables
  vector[N] y;// outcome
}

transformed data {
  int<lower=0> ar_lag[N];
  for (n in 1:(N-1)) {
   ar_lag[n] = 1;
   ar_lag[N] = 0;
  }
}

parameters {
  
  //coefficients
  real alpha; //intercept
  vector[S] beta;//coefficients  stock variable
  vector[D] gamma;//coefficients control variables

  real <lower=0, upper=1> ar; //autocorrelation coefficient
  vector <lower=0, upper=1> [S] lambda; // carryover coefficient


  //scales
  real<lower=0> sigma_y; // scale for outcome
}

model {
  matrix[N,S] stock;
  vector[N] mu;
  vector[N] Err;
  vector[N] err;
  
  // state equation
  
  for(s in 1:S) {
  stock[1,s] = mean(flow[1:52,s])/(1-lambda[s]);
  for(t in 2:N) {
    stock[t,s] = lambda[s] * stock[t-1, s] + flow[t,s];
  }
  }
  
  //mu for model
  mu = alpha + stock*beta + cv * gamma;

  // include AR terms
  Err = rep_vector(0, N);
  for (n in 1:N) {
    err[n] = y[n] - mu[n];
    for (i in 1:ar_lag[n]) {
      Err[n + 1] = err[n + 1 - i];
    }
    mu[n] += ar * Err[n];
  }

  //priors
  
  //coefficients
  alpha ~ student_t(4, 0, 10);
  beta ~ student_t(4, 0, 10);
  gamma ~ student_t(4, 0, 10);
  ar ~ beta(4, 4);
  lambda ~ beta(4,4);
  //scale
  sigma_y ~ student_t(4, 0, 10);

  //likelihood
    y ~ normal(mu, sigma_y) ;
}"

Unfortunately it is not possible to get this model running even with higher settings for max_treedepth (e.g. max_treedepth=17). For max_treedepth=17 the model already needed several hours to run and resulted in max_treedepth warnings.

I already made this model as simple as it could be by ignoring the latent stock variables (i.e. I created the stock variables before fitting the model by setting lambda to a fixed value) and the autocorrelated error. The model always results in max_treedepth warnings with a lot of transitions reaching the max_treedepth (something like 3800 out of 4000) and extremely low n_eff (i.e. n_eff = 2).

After that, I tried the QR decomposition and this immediately solves the problem. For the model with manifest stock variables the code was the following:

"data {
  int <lower=1> N; // number of obs
  int <lower=1> S; // number of flow/stock variables
  int <lower=1> D; // number of control variables
  matrix[N,S] flow; //flow variables
  matrix[N,D] cv; // control variables
  vector[N] y;// outcome
}

transformed data {
  matrix[N,S] stock;
  matrix[N, S+D] X; // final design matrix
  matrix[N, S+D] Q;
  matrix[S+D, S+D] R;
  matrix[S+D, S+D] R_inv;
  int<lower=0> ar_lag[N];
  
  for(s in 1:S) {
  stock[1,s] = mean(flow[1:52,s])/(1-0.52);
  for(t in 2:N) {
    stock[t,s] = 0.52 * stock[t-1, s] + flow[t,s];
  }
  }
  
  X = append_col(stock, cv);
  
  Q = qr_Q(X)[, 1:S+D] * sqrt(N-1);
  R = qr_R(X)[1:S+D, ] / sqrt(N-1);
  R_inv = inverse(R);
  
  for (n in 1:(N-1)) {
   ar_lag[n] = 1;
   ar_lag[N] = 0;
  }
}

parameters {
  
  //coefficients
  real alpha; //intercept
  vector[S+D] beta_tilde;//coefficients

  real <lower=0, upper=1> ar;

  //scales
  real<lower=0> sigma_y; // scale for outcome
}

transformed parameters {
  vector[S+D] beta;//coefficients

  beta = R_inv * beta_tilde;
}
model {
  vector[N] mu;
  vector[N] Err;
  vector[N] err;
  
  //mu for model
  mu = alpha + Q*beta_tilde;

  // include AR terms
  Err = rep_vector(0, N);
  for (n in 1:N) {
    err[n] = y[n] - mu[n];
    for (i in 1:ar_lag[n]) {
      Err[n + 1] = err[n + 1 - i];
    }
    mu[n] += ar * Err[n];
  }

  //priors
  
  //coefficients
  alpha ~ student_t(4, 0 , 10);
  beta ~ student_t(4, 0 , 10);
  ar ~ beta(4, 4);

  //scale
  sigma_y ~ student_t(4, 0 , 10);

  //likelihood
    y ~ normal(mu, sigma_y) ;
}"

The problem now, of course, is that we still want to estimate the stock variable as a latent variable. But I have no idea how to implement the QR decomposition correctly, since we can only apply it to the flow variables. One attempt was as follows:

"data {
  int <lower=1> N; // number of obs
  int <lower=1> S; // number of flow/stock variables
  int <lower=1> D; // number of control variables
  matrix[N,S] flow; //flow variables
  matrix[N,D] cv; // control variables
  vector[N] y;// outcome
}

transformed data {

  matrix[N, S+D] Q;
  matrix[S+D, S+D] R;
  matrix[S+D, S+D] R_inv;
  int<lower=0> ar_lag[N];
  
  Q = qr_Q(append_col(flow,cv))[, 1:S+D] * sqrt(N-1);
  R = qr_R(append_col(flow,cv))[1:S+D, ] / sqrt(N-1);
  R_inv = inverse(R);
  
  for (n in 1:(N-1)) {
   ar_lag[n] = 1;
   ar_lag[N] = 0;
  }
}

parameters {
  
  //coefficients
  real alpha; //intercept
  vector[S+D] beta_tilde;//coefficients

  real <lower=0, upper=1> ar;
  vector <lower=0, upper=1>[S] lambda;

  //scales
  real<lower=0> sigma_y; // scale for outcome
}

transformed parameters {
  vector[S+D] beta;//coefficients

  beta = R_inv * beta_tilde;
}
model {
  matrix[N,S] stock;
  matrix[N, S+D] Q_final; // final Q matrix
  vector[N] mu;
  vector[N] Err;
  vector[N] err;
  
  for(s in 1:S) {
  stock[1,s] = mean(Q[1:52,s])/(1-lambda[s]);
  for(t in 2:N) {
    stock[t,s] = lambda[s] * stock[t-1, s] + Q[t,s];
  }
  }
  
  Q_final = append_col(stock, Q[, S+1:S+D]);
  
  //mu for model
  mu = alpha + Q_final*beta_tilde;

  // include AR terms
  Err = rep_vector(0, N);
  for (n in 1:N) {
    err[n] = y[n] - mu[n];
    for (i in 1:ar_lag[n]) {
      Err[n + 1] = err[n + 1 - i];
    }
    mu[n] += ar * Err[n];
  }

  //priors
  
  //coefficients
  alpha ~ student_t(4, 0 , 10);
  beta ~ student_t(4, 0 , 10);
  ar ~ beta(4, 4);
  lambda ~ beta(4, 4);

  //scale
  sigma_y ~ student_t(4, 0 , 10);

  //likelihood
    y ~ normal(mu, sigma_y) ;
}"

Even though the model runs without any warnings and the results are face valid at first sight, this approach completely ignores the latent structure of the stock variable and due to that doesn’t look right.

Does anyone have an idea and could help how to implement the QR decomposition and reconvert the estimates for the coefficients correctly?

Thanks!

1 Like

How many parameters do you have? If it’s less than a few hundred you can probably do a dense metric (add the argument control = list(metric = "dense_e") to a stan call in rstan).

That can solve the same types of problems.

Thanks for your reply! We only have 14 parameters. I just tried the model with control = list(metric = "dense_e"). Unfortunately this does not solve the problem :(

1: There were 3975 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
2: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
3: Examine the pairs() plot to diagnose sampling problems
 
4: The largest R-hat is 4.22, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
5: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
6: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

Is this for the first model? The one where the QR trick worked?

Or is it for the one where there is no known reparameterization yet?

This was the result from the model where we don’t know the reparameterization, i.e. the first model which also includes the estimation of the latent variable.

I now also tried the same for the model where I fix lambda and create the stock variable in the transformed data block as in model 2. This is the model where the QR decomposition works. Again the same warning. :( Additionally, the same model without QR decomposition gives me several divergent transitions. Both disappear after using the QR trick even with the standard settings (adapt_delta = 0.8 and max_treedepth = 10).

Warning messages:
1: There were 759 divergent transitions after warmup. Increasing adapt_delta above 0.99999 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: There were 2491 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 12. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
3: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
4: Examine the pairs() plot to diagnose sampling problems
 
5: The largest R-hat is 3.55, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 
1 Like

Can you do a pairs plot of the fit with only the 14 parameters included?

Like:

pairs(fit, pars = c(alpha, beta, gamma, ar, lambda, sigma_y)
1 Like

Thanks again for your reply :) The plot was pretty messy so I dropped all the control variables from the model. It looks like they weren’t related to the problem anyway.

Here is the plot for the model without the latent variable (thus, without lambda)…

…and here for the first model with the latent variable

Looks pretty multimodal. Given there are four modes, I assume the four different chains found four different solutions?

Is it possible to rule any of these out? Can you add lp__ to the pair plots and drop beta (since those seem to be fine)?

Yeah, you’re right! The trace plots show exactly the behaviour you just mentioned! Since the problem also persists after dropping both, the latent variable and the autocorrelated error, I wonder how multimodality can occur in a simple linear regression model regressing y on x.

Here is the plot with the latent variable:

and here without:

One of the modes is at like -50 lp__ vs -550 for the other. Can you look at the differences in parameters and piece together a story for the different fits? Like why are the separate modes?

1 Like

It looks like the problem is related to the scale of our focal predictors. After rescaling the predictors the warnings disappear. Unfortunately the sampling is still extremely inefficient and I have to run the model with 10 chains and 10000 iterations, but I can work with that. Thanks again for your help! :)

1 Like