Correct log likelihood formulation for non-centered model with response uncertainty

Hi,
I’m trying to figure out the appropriate log likelihood formulation for a model in which the response is uncertain. Further, I’m using a non-centered parameterization to estimate the “true” value of the response as my parameter estimates using the centered formulation are expressing a form of Neal’s funnel.

The reason I ask is because my impression is that if I have a normal log likelihood, I should calculate that likelihood using the observed response (y_meas), the expected value (mu) and a scaling parameter (sigma), whatever the expected “true” value (y_true) is determined from the model, accounting for measurement uncertainty in the response (y_noise). However, when I look at the distribution of log likelihoods for the results (or try to run them in the {loo} package), I do not get values that make any sense. The only way I get likelihoods that seem reasonable is to compare y_meas to y_true and y_noise.

Below is a toy linear model highlighting what I’m talking about. I have three scenarios. A) how I thought I was supposed to calculate the log likelihood in Stan, B) an alternative that includes the scaling factor on sigma (helps prevent the “funnel”) – may be identical to A, but more explicit, and C) the parameterization that seems to produce the most reasonable results, but I don’t know why.

This is just a toy example (e.g., these data don’t necessitate a non-centered parameterization), but here is some code to produce some data, focusing a linear model for the data generation process:

library(rstan)
library(tidybayes)
library(tidyverse)

set.seed(20821)
#Number of samples
n <- 30
#Some explanatory
x <- rnorm(n, mean = 0, sd = 5)
#Intercept
alpha <- 0.5
#Slope
beta <- 2.2
#Relationship uncertainty
epsilon <- rnorm(n = n, mean = 0, sd = 1)
#Perfect fit between response and explanatory
y_perfect <- alpha + beta * x
#Fit after including relationship uncertainty
y_true <- alpha + beta * x + epsilon
#Measurement noise (equivalent to standard error)
y_noise <- rgamma(n, shape = 10)
#Measurement error (and fit uncertainty)
y_meas <- rnorm(n, mean = y_true, sd = y_noise)

#Model data
stan_data_uncertain <- list(N = n,
  y_meas = y_meas,
  y_noise = y_noise,
  x = x)

And the Stan code (with the generated quantities block highlighting the three scenarios):

data {
  int<lower=0> N; //Number of observations
  vector[N] y_meas; //Observed response
  vector<lower=0>[N] y_noise; //Standard error of response
  vector[N] x; //Explanatory matrix
}

parameters {
  real alpha; //Intercept
  real beta; //Slope
  real<lower=0> z_sigma; //Scales non-centered standard deviation
  real<lower=0> sigma; //Standard deviation
}

transformed parameters {
  vector[N] mu; //Expected response from model
  vector[N] y_true; //The "true" response value
  mu = alpha + beta * x; //Model
  y_true = mu + z_sigma * sigma; //Non-centered estimate of y_true
}

model {
  target += std_normal_lpdf(alpha); //Standard normal intercept prior
  target += std_normal_lpdf(beta); //Standard normal slope prior
  target += std_normal_lpdf(z_sigma); //Non-centered scaling factor for sigma
  target += exponential_lpdf(sigma | 1); //Exponential prior for sigma
  target += normal_lpdf(y_meas | y_true, y_noise); //Likelihood using latent
//variable and measurement uncertainty
}

generated quantities {
  vector[N] pred_A; //Predictions from scenario A
  vector[N] pred_B; //Predictions from scenario B
  vector[N] pred_C; //Predictions from scenario C
  vector[N] log_lik_A; //Log likelihood for scenario A
  vector[N] log_lik_B; //Log likelihood for scenario B
  vector[N] log_lik_C; //Log likelihood for scenario C
  
  for(n in 1:N) {
    pred_A[n] = normal_rng(mu[n], sigma);
    pred_B[n] = normal_rng(mu[n], z_sigma * sigma);
    pred_C[n] = normal_rng(y_true[n], y_noise[n]);
    log_lik_A[n] = normal_lpdf(y_meas[n] | mu[n], sigma);
    log_lik_B[n] = normal_lpdf(y_meas[n] | mu[n], z_sigma * sigma);
    log_lik_C[n] = normal_lpdf(y_meas[n] | y_true[n], y_noise[n]);
  }
}

Here is a visualization (and code; Stan model results are stored in model_samples_uncertain) of the log likelihoods (note the different axis scales):

##Log likelihoods
gather_draws(model_samples_uncertain, log_lik_A[n], log_lik_B[n], log_lik_C[n]) %>%
  ggplot(aes(.value)) +
  geom_density() +
  labs(x = NULL) +
  facet_wrap(~ .variable, scales = "free", strip.position = "bottom") +
  theme_bw() +
  theme(strip.background = element_blank(),
    strip.placement = "outside",
    panel.grid.minor = element_blank()) +
  scale_fill_brewer(palette = "Set1")

The first two scenarios seem pretty extreme, while the scenario that estimates the “true” response value and predetermined uncertainty look reasonable. Just to help confirm these whacky values, below are the predictions across scenarios, compared to the measured response values.

##Response predictions
gather_draws(model_samples_uncertain, pred_A[n], pred_B[n], pred_C[n]) %>%
  group_by(n, .variable) %>%
  summarise(y_pred = quantile(.value, probs = 0.5),
    y_lo = quantile(.value, probs = 0.05),
    y_hi = quantile(.value, probs = 0.95)) %>%
  full_join(tibble(y_meas) %>%
      mutate(n = 1:n())) %>%
  ggplot(aes(y_pred, y_meas)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_errorbar(aes(ymin = y_lo, ymax = y_hi)) +
  geom_point(size = 3, shape = 21, fill = "gray", color = "black") +
  labs(y = "Data generating response",
    x = "Predicted response") +
  facet_wrap(~ .variable) +
  theme_bw() +
  theme(strip.background = element_blank())

To be clear, the x-axis represents the predicted values from the three different scenarios, the y-axis is the data used to fit the model, the vertical lines are the 90th percent credible intervals, and the lines bisecting the plots are the 1:1 line. I’m not actually sure why the model is so confident about the results in the first two scenarios, but if we compare them to the “prefect” response values (no relationship or measurement uncertainties), the results actually make a lot of sense:

gather_draws(model_samples_uncertain, pred_A[n], pred_B[n], pred_C[n]) %>%
  group_by(n, .variable) %>%
  summarise(y_pred = quantile(.value, probs = 0.5),
    y_lo = quantile(.value, probs = 0.05),
    y_hi = quantile(.value, probs = 0.95)) %>%
  full_join(tibble(y_perfect) %>%
      mutate(n = 1:n())) %>%
  ggplot(aes(y_pred, y_perfect)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_errorbar(aes(ymin = y_lo, ymax = y_hi)) +
  geom_point(size = 3, shape = 21, fill = "gray", color = "black") +
  labs(y = "Data generating response",
    x = "Predicted response") +
  facet_wrap(~ .variable) +
  theme_bw() +
  theme(strip.background = element_blank())

So, I’m basically confused as to what I’m supposed to be comparing to what now.

Does anyone know how one is actually supposed to calculate the log likelihoods for a model like this?

I think a linear regression will work here:

data {
  int<lower=0> N; //Number of observations
  vector[N] y_meas; //Observed response
  vector<lower=0>[N] y_noise; //Standard error of response
  vector[N] x; //Explanatory matrix
}

parameters {
  real alpha; //Intercept
  real beta; //Slope
  real<lower=0> sigma; //Standard deviation
}

transformed parameters {
  vector[N] mu; //Expected response from model
  mu = alpha + beta * x; //Model
}

model {
  target += std_normal_lpdf(alpha); //Standard normal intercept prior
  target += std_normal_lpdf(beta); //Standard normal slope prior
  target += exponential_lpdf(sigma | 1); //Exponential prior for sigma
  target += normal_lpdf(y_meas | mu, y_noise + sigma); //Likelihood using latent
}

The way I read what you wrote was:

y_{meas, i} \sim N(\mu + \phi, y_{noise, i}) \\ \phi \sim N(0, \sigma) \\ y_{noise, i} \sim Gamma(\alpha, \beta)

You’re passing in y_{noise, i} as data, so that’s known. Because \phi is a normal, this is the same as saying:

y_{meas, i} \sim N(\mu, \sigma + y_{noise, i})

This should be what you get from integrating \phi out of the likelihood.

1 Like

Oh wow. That’s a really interesting way of approaching the problem I’ve not seen before, but I really like it. And both the approach and results make sense, which is great! But I also like that inclusion of a gamma distribution when the measurement uncertainty isn’t known.

Building on the code I’ve already posted, here’s my take on the code:

data {
  int<lower=0> N; //Number of observations
  vector[N] y_meas; //Observed response
  vector<lower=0>[N] y_noise; //Standard error of response
  vector[N] x; //Explanatory matrix
}

parameters {
  real alpha; //Intercept
  real beta; //Slope
  real<lower=0> sigma; //Standard deviation
}

transformed parameters {
  vector[N] mu;
  mu = alpha + beta * x;
}

model {
  //Priors
  target += std_normal_lpdf(alpha);
  target += std_normal_lpdf(beta);
  target += exponential_lpdf(sigma | 1);
  //Likelihood
  target += normal_lpdf(y_meas | mu, y_noise + sigma);
}

generated quantities {
  vector[N] pred; //Predictions
  vector[N] log_lik; //Log-likelihood

  for(n in 1:N) {
    pred[n] = normal_rng(mu[n], y_noise[n] + sigma);
    log_lik[n] = normal_lpdf(y_meas[n] | mu[n], y_noise[n] + sigma);
  }
}

The resultant samples (stored in model_samples – code not shown) do good job describing both the measured data, as well as the “true” data that was used to draw the measured samples, which is real neat.

gather_draws(model_samples, pred[n]) %>%
  group_by(n) %>%
  summarise(y_pred = quantile(.value, probs = 0.5),
    y_lo = quantile(.value, probs = 0.05),
    y_hi = quantile(.value, probs = 0.95)) %>%
  full_join(tibble(Measured = y_meas) %>%
      mutate(n = 1:n())) %>%
  full_join(tibble(True = y_true) %>%
      mutate(n = 1:n())) %>%
  pivot_longer(Measured:True, names_to = "source") %>%
  ggplot(aes(y_pred, value)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_errorbar(aes(ymin = y_lo, ymax = y_hi)) +
  geom_point(size = 3, shape = 21, fill = "gray", color = "black") +
  facet_wrap(~ source, strip.position = "left") +
  labs(y = NULL,
    x = "Predicted response",
    title = "Measured (relationship + measurement uncertainty) vs True (relationship uncertainty only)") +
  theme_bw() +
  theme(strip.background = element_blank(),
    strip.placement = "outside")

Also notable (code not provided) is that this version of the code does marginally better than a version of the model that does not include uncertainty (using LOOIC from the {loo} package, which is what I would expect.

Thank you @bbbales2!

1 Like

You can also optimise this a little with the normal_id_glm distribution (linear regression), specified like so:

 target += normal_id_glm_lpdf(y_meas | x, alpha, beta, y_noise + sigma);

You’ll just need to change the declaration of x to matrix[N,1] and beta to vector[1]

3 Likes

Just came across that distribution while reading through some other code. Neat solution! Thank you.