Constrained parameters are transformed, which affects the calculation of the log posterior probability lp__

Dear STAN community,
I am a new STAN fan. I am testing the following model:

data {
  int<lower=0> N;         // number of data points
  vector[N] x;              // independent variable
  vector[N] y;              // dependent variable
  }
parameters {
  real slope;
  real intercept;
  real<lower=0> sigma;      // standard deviation
}
model {
  vector[N] y_est;
  y_est = slope * x + intercept;
  y ~ normal(y_est, sigma); // posterior
}

This is a simple linear regression model and works fine. I use auto-generated data to test the model in R,

samplesize <- 100
x <- runif(samplesize,min=0,max=samplesize)
slope <- 0.4
intercept <- 2
stdev <- 0.5
y <- slope*x + intercept + rnorm(samplesize,sd=stdev)
fitted_stan_model <- stan(file="linearregressionmodel.stan", data=list(N=samplesize, x=x, y=y))

and after running the model, I use the following R function to calculate the log posterior from the generated samples:

f <- function(x,y,slope,intercept,sigma) { ## loglikelihood estimator
  y_est <- slope*x + intercept
  residuals <- y - y_est
  loglikelihood <- sum(-.5*(residuals/sigma)^2)-length(residuals)*(log(sigma)+log(2*pi)/2)
  return(loglikelihood)
}

Since I have not defined any prior, the posterior probability should be equal to the likelihood function. I understand that STAN does not scale the log posterior probability (lp__), however I still expect a linear relationship between lp__ and f(…) for a given parameter set. However, when I try to verify this, it does not work:

d <- as.data.frame(fitted_stan_model)
lp2 <- vector()
for (i in 1:nrow(d)){
  lp2[i] <- with(d[i,],f(x,y,slope,intercept,sigma))
}
m1 <- lm(d$lp__~lp2)
plot(residuals(m1)~lp2)

This small difference confuses me as it seems to cause small differences in the estimated values when I compare with other methods that I used in the past. If anybody has a helpful explanation for this, it would be greatly appreciated.

Cheers

The difference is due to a transform that constrains sigma to be positive. If sigma is data the relationship is linear. If sigma is declared without <lower=0> constraint then the relationship is also linear but the sampler will sometimes fail with a divergent transition because it tries to evaluate the probability of negative sigma.

2 Likes

Thank you, that immediately solved it!

Cheers