Simple normal regression model with inequality constraints on the parameters

Dear Stan Community,

I am very new to stan and trying to fit a simple normal linear regression with linear inequality constraints on the parameters.

y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \epsilon, ~~ \epsilon \sim N(0,\sigma^2)

The inequality constraints on the parameter are to be 0>\beta_0, 0<\beta_1<1, 0<\beta_2<1. I first fit the model with diffuse reference prior for the parameters: (\beta, \sigma) \propto \sigma^{-1}.

data {
  int<lower=0> N; // Number of observations
  vector[N] X1;
  vector[N] X2;
  vector[N] y; // response
}
parameters {
  real<lower = 0> sigma;
  real<lower = 0> beta0;
  real<lower = 0, upper = 1> beta1;
  real<lower = 0, upper = 1> beta2;
}
model {
  real mu[N];
  
  // Likelihood 
  for (i in 1:N) {
    mu[i] = beta0 + beta1*X1[i] + beta2*X2[i];
  }
  y ~ normal(mu, sigma);
}
  1. In terms of constraints, Is this the right way to specify this model in Stan?

  2. In general, define a set for inequality constraints, e.g. Q = \{a \le D\beta \le w \}, where D is a nonsingular matrix with real numbers and use this set as an indicator function of the constraints for \beta. In vector form,

data {
  int<lower=0> N; // Number of observations
  int<lower=0> K; // number of explanatory variables.
  matrix[N,K] X; // Matrix of explanatory variables 
  matrix[K,K] D; // Matrix that represents inequality constraints
  vector[N] y; // Vector of dependent variables
  vector[K] a; // Vector of lower inequality constraints
  vector[K] w; // Vector of upper inequality constraints
}
parameters {
  vector[K] b; 
  real<lower=0> sigma;
}
model {
  y ~ normal(X*b, sigma);
}

How can I put the indicator function in the above code so that it is possible to employ prior distribution in the model?

Thank you very much.

If your constraint is:

a \le D\beta \le w

Then if you multiply all by the inverse of D:

D^{-1}a \le D^{-1}D\beta \le D^{-1}w

This simplifies to:

D^{-1}a \le \beta \le D^{-1}w

Stan has overloaded the division operator / for matrices/vectors to perform this multiplication by an inverse in more numerically stable and efficient way. So you would express your model in Stan as:

data {
  int<lower=0> N; // Number of observations
  int<lower=0> K; // number of explanatory variables.
  matrix[N,K] X; // Matrix of explanatory variables 
  matrix[K,K] D; // Matrix that represents inequality constraints
  vector[N] y; // Vector of dependent variables
  vector[K] a; // Vector of lower inequality constraints
  vector[K] w; // Vector of upper inequality constraints
}
transformed data {
  vector[K] D_div_a = D / a;
  vector[K] D_div_w = D / w;
}
parameters {
  vector<lower=D_div_a, upper=D_div_w>[K] b; 
  real<lower=0> sigma;
}
model {
  // More efficient likelihood for linear regression
  y ~ normal_id_glm(X, 0, b, sigma);
}
4 Likes

Thank you for the reply.

In the transformed data block, isn’t the matrix operation should be D_div_a = D \ a , not D/a for D^{-1}a?

Thanks.

2 Likes

Oh yes, good catch! It should be \ for left division

1 Like

Can you clarify what you mean by “transforming the quantities to the original scale”? Which quantities are you wanting to transform and what scale do you want them to be on?

1 Like

Nvm. My question was not clear. Thanks!

1 Like

Hi,

I am trying to add time-varying coefficients to the model.

Let \gamma = D\beta \in \{\gamma: a \le \gamma \le w\} and the model becomes

z_t = x_t D^{-1}\gamma + \epsilon_t, ~~ t = 1, \ldots, T.

We assign appropriate priors to time-varying \gamma_t. Could you check if I coded \gamma and model z correctly?

Note that I changed the lower and upper bound notation to l and u.

data {
  int<lower=0> T; // Time
  int<lower=0> K; // number of explanatory variables.
  matrix[T,K] X; // Matrix of explanatory variables 
  matrix[K,K] D; // Matrix that represents inequality constraints
  vector[N] z; // Vector of dependent variables
  vector[K] l; // Vector of lower inequality constraints
  vector[K] u; // Vector of upper inequality constraints
}
transformed data { 
  vector[K] D_div_l = D \ l; // D^{-1}l
  vector[K] D_div_u = D \ u; // D^{-1}u
}
parameters {
  real<lower=0> sigma;
  real<lower=l, upper=u> phi_zero;
  vector[K] psi;
  real<lower=0> tau;
  matrix[K, T] gamma;
}
transformed parameters {
  vector<lower=-1, upper=1>[K] phi = 
  vector<lower=D_div_l, upper=D_div_u>[K] D_div_gamma = D \ gamma
}
model {
  z ~ normal(X*D_div_gamma, sigma);
  
  // priors
  psi ~
  sigma ~
  tau ~
  
  gamma[, 1] ~ normal(phi_zero, tau);
  gamma[, 2:T] ~ normal(phi * gamma[1:(T-1)], tau);
}

Thank you.

This seems to be missing the definition for phi, but the constraints are right to make the columns of gamma stationary.

If K is large, I’d suggest making the transformed parameters local variables in the model block so they don’t get saved.

Also, the upper and lower bounds for transformed parameters are just for error checking.

1 Like