# Constrained parameters are transformed, which affects the calculation of the log posterior probability lp__

Dear STAN community,
I am a new STAN fan. I am testing the following model:

``````data {
int<lower=0> N;         // number of data points
vector[N] x;              // independent variable
vector[N] y;              // dependent variable
}
parameters {
real slope;
real intercept;
real<lower=0> sigma;      // standard deviation
}
model {
vector[N] y_est;
y_est = slope * x + intercept;
y ~ normal(y_est, sigma); // posterior
}
``````

This is a simple linear regression model and works fine. I use auto-generated data to test the model in R,

``````samplesize <- 100
x <- runif(samplesize,min=0,max=samplesize)
slope <- 0.4
intercept <- 2
stdev <- 0.5
y <- slope*x + intercept + rnorm(samplesize,sd=stdev)
fitted_stan_model <- stan(file="linearregressionmodel.stan", data=list(N=samplesize, x=x, y=y))
``````

and after running the model, I use the following R function to calculate the log posterior from the generated samples:

``````f <- function(x,y,slope,intercept,sigma) { ## loglikelihood estimator
y_est <- slope*x + intercept
residuals <- y - y_est
loglikelihood <- sum(-.5*(residuals/sigma)^2)-length(residuals)*(log(sigma)+log(2*pi)/2)
return(loglikelihood)
}
``````

Since I have not defined any prior, the posterior probability should be equal to the likelihood function. I understand that STAN does not scale the log posterior probability (lp__), however I still expect a linear relationship between lp__ and f(…) for a given parameter set. However, when I try to verify this, it does not work:

``````d <- as.data.frame(fitted_stan_model)
lp2 <- vector()
for (i in 1:nrow(d)){
lp2[i] <- with(d[i,],f(x,y,slope,intercept,sigma))
}
m1 <- lm(d\$lp__~lp2)
plot(residuals(m1)~lp2)
``````

This small difference confuses me as it seems to cause small differences in the estimated values when I compare with other methods that I used in the past. If anybody has a helpful explanation for this, it would be greatly appreciated.

Cheers

The difference is due to a transform that constrains `sigma` to be positive. If `sigma` is data the relationship is linear. If `sigma` is declared without `<lower=0>` constraint then the relationship is also linear but the sampler will sometimes fail with a divergent transition because it tries to evaluate the probability of negative `sigma`.

2 Likes

Thank you, that immediately solved it!

Cheers