Non-centered parametrisation of truncated normal distribution

Hello all,

I am aware of non-centered parameterization for truncated normal distribution as shown below:

data {
  int<lower=0> N;
  vector[N] Y;
}

parameters {
  real<lower=0> mu_u;
  real alpha;
  real<lower=0> sigma;
  real<lower=0> sigma_u;
  vector<lower=-mu_u/sigma_u>[N] u_raw;
}

transformed parameters {
  // Implies u ~ normal(mu_u, sigma_u) with lower bound of 0
  vector<lower=0>[N] u = mu_u + u_raw * sigma_u;
  // Calculate density adjustment for lower truncation at 0 for
  // every observation
  real trunc_adjustment = -N * normal_lccdf(0 | mu_u, sigma_u);
}

model {
  alpha ~ normal(0, 0.5);
  mu_u ~ normal(0, 0.5);
  sigma_u ~ normal(0, 0.5);
  sigma ~ normal(0, 0.5);

  u_raw ~ std_normal();
  target += trunc_adjustment;

  Y ~ normal(alpha - u, sigma);
}

However what if mu_u is a vector how do you specify u_raw then?

that is :
parameters {
vector[N] mu_u;
real alpha;
real<lower=0> sigma;
real<lower=0> sigma_u;
vector<lower=-mu_u/sigma_u>[N] u_raw; —>>>>>>???
}
Stan seem to want mu_u to be real.

Can some one please help?

Thanks
Ants.

1 Like

Vectors of lower bounds can’t be declared in the constraints for a container data type. The documentation here might provide a solution in your case. If I am understanding that documentation properly, then a solution could look something like this:

data {
  int<lower=0> N;
  vector[N] Y;
}

parameters {
  vector<lower=0>[N] mu_u;
  real alpha;
  real<lower=0> sigma;
  real<lower=0> sigma_u;
  vector<lower=0>[N] u_raw_zero;
}

transformed parameters {
  // Implies u ~ normal(mu_u, sigma_u) with lower bound of 0
  vector[N] u_raw = u_raw_zero - mu_u/sigma_u;
  vector<lower=0>[N] u = mu_u + u_raw*sigma_u;
  // Calculate density adjustment for lower truncation at 0 for
  // every observation
  vector[N] trunc_adjustment_vect;
  real trunc_adjustment;

  for(i in 1:N){
    trunc_adjustment_vect[i] = -N * normal_lccdf(0 | mu_u[i], sigma_u);
  }

  trunc_adjustment = sum(trunc_adjustment_vect);


}

model {
  alpha ~ normal(0, 0.5);
  mu_u ~ normal(0, 0.5);
  sigma_u ~ normal(0, 0.5);
  sigma ~ normal(0, 0.5);

  u_raw_zero ~ std_normal();

  target += trunc_adjustment;

  Y ~ normal(alpha - u, sigma);
}
1 Like

Hello Isaac,

Thanks for your suggestion.

Here is the code I tried to run based on your advice, but got lots of divergences and no sample. Can you please let me know what I am doing wrong here.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
IDF = pd.read_csv("Z:/SF_RE_BC.csv") # Importing the csv file with explanatory variables
X = IDF[["x1",
"x2" , 
"x3" ,   
"x4" ,  
"x5" ,  
"x6" ,    
"q1" ,  
"q2" ,  
"q3",
"TREND",
        "QUAD_TREND"]]


Z = IDF[["z1",
"z2" , 
"z3" ,   
"z4"]]
                        
Y = IDF.Y[0:640]
ID = IDF.ID
N = len(Y)
IDF_data = {}
IDF_data['Y'] = Y
IDF_data['N'] = N
IDF_data['X'] = X
IDF_data['Z'] = Z
IDF_data['Q'] = IDF_data['Z'].shape[1] 
IDF_data['P'] = IDF_data['X'].shape[1] 
IDF_data['J'] = 20
IDF_data['ID'] = ID

stan_code ="""


data {
  int<lower=0> N;
  vector[N] Y;
  int<lower=0> P; // Number of expla
    int<lower=0> Q; // Number of expla
matrix[N,P] X; // matrix for  regression
matrix[N,Q] Z; // matrix for  regression
int<lower=0> J; // Number of groups
int<lower=1,upper=J> ID[N];
}


parameters {
vector[P] beta;
vector[Q] zeta;

real<lower=0> sigma;
real alpha;
real alpha_mu;
vector[J] RE;
  real<lower = 0> sigma_RE;
  real<lower = 0> sigma_u;
    real<lower = 0> sigma_mu;
 vector<lower=0>[N] u_raw_zero;
  vector[N] mu;

}

transformed parameters {
  vector[N] u_raw = u_raw_zero - mu/sigma_u;
  vector<lower=0>[N] u = mu +u_raw*sigma_u;
  vector[N] trunc_adjustment_vect;
  real trunc_adjustment;
  for(i in 1:N){
    trunc_adjustment_vect[i] = -N * normal_lccdf(0 | mu[i], sigma_u);
  }

  trunc_adjustment = sum(trunc_adjustment_vect);
}

model {
sigma ~ inv_gamma(2,1);
alpha ~ normal(0,2);
beta~ normal(0,2);

 
sigma_RE ~ inv_gamma(2,1);

RE ~ normal(0, sigma_RE); 

alpha_mu ~ normal(0,2);
zeta~ normal(0,2);
sigma_u ~ inv_gamma(2,1);

sigma_mu ~ inv_gamma(2,1);

mu ~ normal(alpha_mu + Z*zeta, sigma_mu);

u_raw_zero ~ std_normal();

  target += trunc_adjustment;


Y ~ normal(alpha + X*beta-u + RE[ID], sigma);

}

generated quantities {
vector[N] y_predict;
vector[N] log_likelihood;
vector[N] RE_ID;
vector[N] RES;
vector[N] U;
  for (i in 1:N) {
y_predict[i]  = normal_rng( alpha + X[i]*beta -u[i]+ RE[ID[i]] , sigma);
log_likelihood[i] = normal_lpdf(Y[i] | alpha + X[i]*beta-u[i] + RE[ID[i]] , sigma);
RE_ID[i] = RE[ID[i]];
RES[i] = y_predict[i] - Y[i];
U[i] = u[i];
  }
 }


  
"""
  
import pystan
SF_BC_RE_Model_Recover_1_Y_t_RE_t_mu_normal = pystan.StanModel(model_code=stan_code)
SF_BC_RE_Model_Recover_1_Y_t_RE_t_mu_normal_SAMPLE_1 = SF_BC_RE_Model_Recover_1_Y_t_RE_t_mu_normal.sampling(data=IDF_data,chains=4,iter=1000)
print(SF_BC_RE_Model_Recover_1_Y_t_RE_t_mu_normal_SAMPLE_1.stansummary(['alpha','beta','sigma','sigma_u','zeta','alpha_mu','sigma_RE'],probs = (0.025, 0.975)))


And this is the output:

Your suggestion will be highly appreciate. I have attached data below as well if you require it.

antony
SF_RE_BC.csv (147.3 KB)

The linked documentation is for Stan version 2.18 (note the “view current version” link at the top of the page). As of v. 2.29, vectors of bounds are allowed; see 24.4 Vectors with varying bounds | Stan User’s Guide

However, Rstan on Cran is stuck at 2.21 for difficult-to-resolve reasons. If possible, it would be a good idea to use cmdstan here. If you’re working in R, package cmdstanr makes it easy to use cmdstan. Alternatively you can get Rstan 2.29 from GitHub on the experimental branch GitHub - stan-dev/rstan at experimental. To do this, run the following from R:

# uncomment the next line if you already have a version of rstan installed
# remove.packages(c("rstan", "StanHeaders"))
remotes::install_git("https://github.com/stan-dev/rstan", subdir = "StanHeaders", ref = "experimental") 
remotes::install_git("https://github.com/stan-dev/rstan", subdir = "rstan/rstan", ref = "experimental")

Hi Antony,

You could try @jsocolar 's suggestion of using cmdstanr (or I guess cmdstanpy because you seem to be using python), but I realized that there are several things about your model that I am struggling to wrap my head around and are probably contributing to the model’s severe convergence/misspecification issues. More importantly though, are you sure the added complexity of this non-centered truncated normal is necessary? It seems like a straightforward multiple linear model fit to your data works well. rhats and neffs look good, though I do get a some max treedepth warnings (tried a non-centered parameterization for RE but it didn’t help alot; I didn’t do any rigorous prior predictive sims to motivate the priors, so that could help). Here is the model I fit to your data:


data {
  int<lower=0> N;
  vector[N] Y;
  int<lower=0> P; // Number of expla
  int<lower=0> Q; // Number of expla
  matrix[N,P] X; // matrix for  regression
  matrix[N,Q] Z; // matrix for  regression
  int<lower=0> J; // Number of groups
  int<lower=1,upper=J> ID[N];
}


parameters {
  vector[P] beta;
  vector[Q] zeta;


  real<lower=0> sigma;
  
  vector[J] RE;
  real<lower = 0> sigma_RE;
  

}

transformed parameters {
  vector[N] mu_u = Z*zeta;
 


}

model {
  // variance
  sigma ~ gamma(3,2);
  
  // X regression slope
  beta ~ normal(0,2);
  
  // Z regression slope
  zeta ~ normal(0,2);

  // RE centered 
  sigma_RE ~ gamma(3,2);
  RE ~ normal(0, sigma_RE);


  // Standard multi-linear regression
  Y ~ normal(X*beta + mu_u + RE[ID], sigma);

}

generated quantities {
  // Array of draws from posterior predictive distribution
  real y_predict[N];
  real log_likelihood[N];
  real RE_ID[N];
  real RES[N];
  
  for (i in 1:N) {
    y_predict[i]  = normal_rng( X[i]*beta + Z[i]*zeta + RE[ID[i]] , sigma);
    log_likelihood[i] = normal_lpdf(Y[i] | X[i]*beta + Z[i]*zeta + RE[ID[i]] , sigma);
    RE_ID[i] = RE[ID[i]];
    RES[i] = y_predict[i] - Y[i];
  }
 }

And here are the posterior predictive checks suggesting that the simple linear model is a decent fit

And a subset of the pairs plots showing no crazy correlations between parameters; as I mentioned though, some max treedepth warnings

Some changes I made from your original model were that I absorbed the intercept term alpha into the ID-intercepts; running the model on simulated data suggested that there were some identifiability issues when keeping it in. Also, for the posterior predictive samples, I think you need to declare the containers that will hold the samples as real-valued arrays rather than vectors.

If you have domain expertise that suggests this simpler model is not capturing an important feature of the data, then I would be happy to try and help get a more complicated model that incorporates the truncated normal or something similar working, but maybe consider switching to the simpler model.

Hello Issac,
I really appreciate you taking time to write back to me. I will see if I can work with cmd.

My model is quite specific , the parameter u must be greater than 0 (there is an economic theory behind it) and with its mean mu a linear function of other variables. So the equation has to be of the form:

 for (i in 1:N) {
u[i] ~ normal(alpha_mu + Z[i]*zeta, sigma_u) T[0,];
}
Y ~ normal(X*beta + u + RE[ID], sigma);

The above code obivously no good at sampling so I looked into non-centered parametrisation. The good thing is that I have created this data set (fake Data as Professor G calls it)
R file.txt (820 Bytes)
I will attach the R code for it.

So I should be able recover the parameters as there is a DGP.

I do not want to take too much of your time , byt if you could assist I will be very grateful.

Cheers
Antony

Hi Anton.

Happy to help as much as I can! I’ve tried a couple things and haven’t had much luck with the non-centered parameterization in rstan; the naive strategy that I put forth in my first reply removes all dependence of u on the “zeta” regression, which is no good.

I spent a little time trying to optimize the centered parameterization and had one realization. The model as you currently have specified it has an identifiability issue. Stan is not able to uniquely estimate sigma_u (the variance of the unobserved u’s about the “zeta” regression) and sigma (the variance of the observed data about the estimated multi-regression). Because the u’s are unobserved and get used to inform the expectation value of each data point y[i], there is no way for the model to separate out variability in the y[i]'s due to uncertainty in u (i.e., sigma_u) and variability about the multi-regression trend (sigma). Therefore, you might want to only have the model estimate a single variance parameter sigma, for the multi-regression, and set the variance parameter for the u regression to a constant. I think that whether that constant is accurate or not is irrelevant, the model’s estimate for sigma will adjust to whether or not the constant under/over-estimates the variance in u.

To show what I mean, I stripped away a lot of the model complexity and just focused on the truncated normal parameter u and its use in a regression model. Here is R code I used for a simplified simulation of just this part of the generative model
Simulation.R (1020 Bytes)

And here is the stripped down Stan model I used:

data {
  int<lower=0> N;
  vector[N] Y;
  int<lower=0> Q; // Number of expla
  matrix[N,Q] Z; // matrix for  regression
}


parameters {
  vector[Q] zeta;
  
  real alpha;
  real<lower=0> sigma_u;
  vector<lower=0>[N] u;

  real<lower=0> sigma;

}


model {
  // u regression intercept and sd
  alpha ~ normal(0, 0.5);
  sigma_u ~ normal(0.25,1);
  
    
  // data regression sd
  sigma ~ gamma(1,1);
  
  // Z regression slope
  zeta ~ normal(1,0.25);
  
  u ~ normal(alpha + Z*zeta, sigma_u);

  Y ~ normal(- u , sigma);

And here’s a pairs plot from one run of the simulation + model (there were lots of warnings about rhats and low Bayesian fraction of missing information estimates)

Notice the striking correlation between sigma and sigma_u, which is evidence of the identifiability issue I mentioned.

Using this model instead, where all I changed was replaced sigma_u with a constant significantly improved model fit and performance:

data {
  int<lower=0> N;
  vector[N] Y;
  int<lower=0> Q; // Number of expla
  matrix[N,Q] Z; // matrix for  regression
}


parameters {
  vector[Q] zeta;
  
  real alpha;
  vector<lower=0>[N] u;

  real<lower=0> sigma;

}


model {
  // u regression intercept and sd
  alpha ~ normal(0, 0.5);
  
    
  // data regression sd
  sigma ~ gamma(1,1);
  
  // Z regression slope
  zeta ~ normal(1,0.25);
  
  u ~ normal(alpha + Z*zeta,  1);

  Y ~ normal(- u , sigma);

And the model’s estimate of the us agrees well with the simulated values, as shown in this plot (red line is y = x):

You might also notice that I changed the priors for several of the parameters as well. That is because I did some prior predictive simulations to make sure that there wasn’t significant prior density < 0 for u, though these changes did not seem to significantly alter model performance. The script to reproduce the prior predictive simulations is here
Priors.R (1.3 KB)

I fit the over-simplified model to your data (so left out all of the other covariate data), and the model fit really well, though there were still some tail effective sample size warnings. You could try some other tricks like standardizing your regression covariates (that has helped some of my complicated linear models). I haven’t tried fitting a full model with all of the regression parameters to your data, but I hope this helps with the truncated normal parameter that seemed to be causing many of the model convergence issues.

1 Like

This is a common confusion – the so-called “non-centered parameterization” is defined in the context of unconstrained individual parameters drawn from a normal population model. It is not applicable to any generic hierarchal model. In particular when the individual parameters are constrained one cannot shift them arbitrarily without eventually violating the constraint, which prevents the non-centering of the location.

When dealing with positive individual parameters the most effective approach is to first identify an appropriate population model and then consider any reparameterizations that might be useful. For example one might consider a gamma or inverse gamma population model, or a log normal population model. The log normal population model is particularly nice because with a quick transformation it can be implemented as a standard normal hierarchal model amenable to centered and non-centered parameterizations. For example

parameters {
  real log_u[K];
}

model {
  log_u ~ normal(mu, tau);
  y ~ normal( exp(log_u[group_idx]) + X * beta, sigma);
}

or

parameters {
  real log_u_tilde[K];
}

model {
  log_u_tillde ~ normal(0, 1);
  y ~ normal( exp(mu + tau * log_u_tilde[group_idx]) + X * beta, sigma);
}

The gamma and inverse gamma models admit some reparameterizations, but they’re not as effective as those for the normal model.

5 Likes

Thanks for the info, this is really helpful and clarifies confusion.

1 Like