Regression with parameter constraints

Hi all,

I am new for STAN and have a question about a code. The model I am working is a sort of simple regression. But I’d like to provide constraint to the coefficients. Simply, sum of the coefficients is equal to 1. Also, all coefficients should be positive.

y_t = alpha_1x1_t + alpha_2x2_t + alpha_3*x3_t + epsilon_t,

where epislon is normal and the coefficient vector (alpha_1, alpha_2, alpha_3) ~ Dirichlet((para_1, para_2, para_3)).

I built this model with a synthetic data example (see below).

The model works well but the posteriors of the parameter vector (para_1, para_2, para_3) are extremly different from the values which I desinged. I’m wondering if there is any suggestion for me.

Thanks in advance!

library(nimble) ## For dirichlet distribution
library(rstan)

a <- c(1,2,3) ## Desinged parameter values
Np <- length(a)
Nn <- 300

zz1 <- NULL
for (i in 1:Nn) {
zz1 <- rbind(zz1, rdirch(n=1,a))
}

Firxx <- runif(Nn, 1, 4)
Secxx <- runif(Nn, 1, 4)
Thrxx <- runif(Nn, 1, 4)

yy1 <- Firxxzz1[,1] + Secxxzz1[,2] + Thrxx*zz1[,3] + rnorm(Nn, 0, 0.2)

######### Model #########

data {
int <lower=0> Nn; // number of time steps
int <lower=0> Np; // number of Xs

// Variables
real yy[Nn];
real<lower=0> Firxx[Nn];
real<lower=0> Secxx[Nn];
real<lower=0> Thrxx[Nn];
}

parameters {
vector<lower=0,upper=100>[Np] alpha; // Define parameters of dirichlet
real<lower=0> sigma; // Define standard deviation
simplex[Np] fracs;
}

transformed parameters {
real Flowhat[Nn];
for (i in 1:Nn) {
Flowhat[i] <- fracs[1]*Firxx[i] + fracs[2]*Secxx[i] + fracs[3]*Thrxx[i];
}
}

model {
yy ~ normal(Flowhat, sigma);
fracs[1:Np] ~ dirichlet(alpha[1:Np]);
}

######### Model #########

Posterior mean of alphas: c(250, 500, 750)

Lucky for you we have this as a built-in Stan type! We call it a simplex. All you have to do is replace vector with simplex and remove those bounds.

(I can’t make any statement on the model or the interpretability of the coefficients. Just trying to answer your question directly.)

Thank you for quick response.
I am using a simplex for “fracs” so the propotion between three covariates is OK.
The problem is my model provides wrong parameter estimations for the dirichlet distribution.

You want to debug with simulated data.

Also, you don’t need to write fracs[1:Np] — you can just write fracs.

We use = now for assignment nad have deprecated <-

Many of your assignments could be vectorized.

If you only have one outcome simplex, you aren’t going to be able to estimate Dirichlet parameters as sum(alpha) won’t be identified with only a single simplex for data. You probably just want to take a fixed Dirichlet prior.