Divergencies with truncated normal

Aleksandr_Popkov · April 3, 2020, 9:47am

Hello, I am working on a project devoted to promotions estimation. As a starting model, I use the stan model written for the prophet python library. The first experiments showed that some of the promotions of the starting model have a negative effect on sales. This cannot be a priori. Trend charts showed that the trend factor captured part of the effect of promotions.

So, I modified stan model in order to truncate normal distribution for the part of the model beta coefficients.

data {
  int<lower=1> K;           // Number of regressors
  ...
  int n_constr;             // Number of regressors with constrained priors
  int constr_vec[n_constr]; // Indexes to find a priori constrained features in X
  int norm_vec[K-n_constr]; // Indexes to find a priori unconstrained features in X
  real L;                   // Lower bound constraint
  real U;                   // Upper bound constraint
}
...
  // Unconstrained parameters initialization
  for (i in norm_vec) {
     beta[i] ~ normal(0, sigmas[i]);
      }
  // Constrained parameters initialization
  for (j in constr_vec) {
      beta[j] ~ normal(0, sigmas[j]) T[L, U];
      }

(I attached the full model code in the message.)
I set 5000 samples and got the expected results consistent with reality.

But stan writes me a warning that I have divergencies even with adapt_delta=0.999. Is it critical for such truncated normal models? If it is, could you give me some model modification advice, please?

The second warning for me was r_hat statistics that sometimes is slightly higher than empirical 1.1 (1.12~1.15) for a part of beta coefficients. Is it critical too or not? (I suppose that it is the result of normal distribution truncation).

I’d appreciate any advice. Thanks.

Aleksandr

andre.pfeuffer · April 3, 2020, 11:12am

Welcome Aleksandr to Stan!

If you use:
beta[j] ~ normal(0, sigmas[j]) T[L, U];
you also have to use constraints:
vector<lower=L, upper=U>[K] beta; // Regressor coefficients

Aleksandr_Popkov · April 3, 2020, 11:34am

Thank you,
Could you please give an advice how can I initialize a vector where not all elements need to be constrained? I’m a little confused.

andre.pfeuffer · April 3, 2020, 11:48am

parameters {
  vector<lower=L, upper=U>[Nconstrained] beta_constrained; // Regressor coefficients
  vector[Nunconstrained] beta_unconstrained; // Regressor coefficients
}
transformed parameters {
  vector[Nconstrained+Nunconstrained] beta;
  beta[norm_vec] = beta_unconstrained;
  beta[constr_vec] = beta_constrained;
}
model {
  beta_unconstrained ~ normal(0, sigma[norm_vec]);
  beta_constrained ~ normal(0, sigma[constr_vec]);
}

You don’t need to truncate, but need to constrain.

Please also note that 0 might not be part of the interval [L, U]. Thus you might consider:

beta_constrained ~ normal(L, sigma[constr_vec]);

Aleksandr_Popkov · April 3, 2020, 11:58am

I see, thank you! I will modify the code and write about results.

Aleksandr_Popkov · April 8, 2020, 4:41pm

Thank you for the advice!

I tested the modified model. It turns out that without prior distribution truncation trace plots show low oscillation near zero and several large offsets. As a result, Gelman-Rubin statistics tells that Markov chains don’t converge. Perhaps, when Markov chain warms up and approximates joint distribution, sampling operation from the chain has lots of values that are rejected because of initial constraints (>0 in my case).

So I also added truncated normal initialization to priors.

transformed parameters {
  vector[K] beta;
  vector[n_constr] sigmas_pos;
  beta[norm_vec] = beta_unconstrained;
  beta[constr_vec] = beta_constrained;
  sigmas_pos = sigmas[constr_vec];
}

model {
  //priors
  k ~ normal(0, 5);
  m ~ normal(0, 5);
  delta ~ double_exponential(0, tau);
  sigma_obs ~ normal(0, 0.5);

  beta_unconstrained ~ normal(0, sigmas[norm_vec]);
  for (i in 1:n_constr) {
      beta_constrained[i] ~ normal(0, sigmas_pos[i]) T[L, U];
  };

This may not be the best option, but it works. Markov chains converge, Gelman-Rubin statistics is between 1 and 1.01. All constraints are working correctly, there are no divergencies.
In the documentation I see the same idea.

If my reasoning is not quite correct, please tell me.

Topic		Replies	Views
Fitting issue in forcing non-negative additional regressor coefficients Modeling fitting-issues	5	1244	May 6, 2020
Divergent transitions in Beta model General fitting-issues , specification , performance , divergences	7	1267	February 7, 2022
Truncated Multivariate normal distrbution Modeling techniques	1	393	June 26, 2023
Non-centered parameterization of the multivariate normal distribution Modeling techniques , fitting-issues , specification	1	1287	May 13, 2019
Non-centered parameterization for likelihood Modeling fitting-issues	2	373	April 25, 2023

Divergencies with truncated normal

Related topics