Why exactly is alpha + beta * x ~ normal(y, sigma) wrong? Is it even wrong?

Tyler_Mangin · January 12, 2023, 11:10pm

Long time lurker, first time poster. All the normal disclaimers - I looked around to see if I could find something similar and didn’t, so apologies if I missed another post, or famous blog entry, or even something in the Stan manual.

But folks, this is vexing me.

I was working with someone new to Stan on a model and I saw them write something like this in the model block:

alpha + beta * x ~ normal(y, sigma);

And of course, that stood out to me as obviously “wrong.” But here’s the thing -
stan_mdl1.csv (3.7 KB)
I ran it both ways on some simulated data, and it … works?

Model 1: The Right Way
Here’s the model from the guide.

  data {
    int N;
    vector[N] y;
    vector[N] x;
  }
  parameters {
    real alpha;
    real beta;
    real<lower = 0> sigma;
  }
  model {
    y ~ normal(alpha + beta * x, sigma);
  }

Model 2: The Wrong Way?

  data {
  int N;
  vector[N] y;
  vector[N] x;
  }
  parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
  }
  model {
  alpha + beta * x ~ normal(y, sigma);
  }

Now Model 2 kicks the warning about non-linear transforms. But I know this isn’t (or I could provide a Jacobian if it was!) .

And the punch line is, the estimates from both models are nearly identical.

So, here are my questions:
1a.) Is there a good reason to prefer Model 1 over Model 2?

1b.) Would we say that Model 2 is ‘using data (y) in the prior (for alpha and beta)’ and that’s what’s ultimately wrong about it? The usual complaint about doing that is we are ‘using the data twice’ and I don’t think that’s happening here?

2.) Is this a trick of a simplified setting (linear, normal, ect.)?

some related reading for the interested:
Yes, you can include prior information on quantities of interest, not just on parameters in your model

I’m not 100% sure this is exactly the same situation, but I mean you could say alpha + beta * x is, in fact, a quantity of interest!

Thanks in advance for your thoughts

Stephen_Martin · January 13, 2023, 4:28am

This is an accident of symmetric distributions, they wind up being the same thing.
Normal distribution model:

y_i \sim \mathcal{N}(\alpha + \beta x_i, \sigma) \\ p(y_i | x_i, \alpha, \beta, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} \exp(-.5\frac{(y_i - (\alpha + \beta x_i))^2}{\sigma^2}) \\

What if we switched the roles of y and \hat \mu ?

\frac{1}{\sigma\sqrt{2\pi}} \exp(-.5\frac{((\alpha + \beta x_i)-y_i)^2}{\sigma^2})

Or to make it clearer:

\frac{1}{\sigma\sqrt{2\pi}} \exp(-.5\frac{(\delta_i)^2}{\sigma^2}) \\ \frac{1}{\sigma\sqrt{2\pi}} \exp(-.5\frac{(-\delta_i)^2}{\sigma^2}) \\ \delta_i = y_i - \hat\mu_i = y_i - (\alpha + \beta x_i)

As a result, they wind up being the same, in this particular case for this particular distribution, because swapping the order of y_i and \hat\mu_i in the normal density function doesn’t matter; it evaluates to the same value.

jsocolar · January 13, 2023, 1:44pm

Perhaps worth noting that we use this identity in Gaussian measurement error models. Suppose our measurement yields a Gaussian posterior distribution for y; i.e. y \sim \mathcal{N}(\mu, \sigma). Note that y is unobserved. Thus, in the measurement error model we treat \mu as data and flip the Gaussian:
\mu \sim \mathcal{N}(y, \sigma)
y \sim ...),
where ... is the rest of the model.

Tyler_Mangin · January 13, 2023, 10:48pm

Thanks to @Stephen_Martin for an excellent answer! And to @jsocolar for the interesting extension.

So, this is a ‘trick’ of symmetric distributions.

I checked it out with a Gamma GLM to confirm.

Model 1: The Right Way - works as expected

  data {
    int N;
    vector[N] y;
    vector[N] x;
  }
  parameters {
    real alpha;
    real beta;
    real<lower = 0> shape;
  }
  model {
    y ~ gamma(shape, shape ./ (alpha + beta * x));
  }

Model 2: The Wrong Way - Divergent transitions, fails to mix, incorrect estimates

  data {
    int N;
    vector[N] y;
    vector[N] x;
  }
  parameters {
    real alpha;
    real beta;
    real<lower = 0> shape;
  }
  model {
    alpha + beta * x ~ gamma(shape, shape ./ y);
  }

stan_mdl2.csv (3.6 KB)

andrewgelman · January 16, 2023, 3:56pm

Also, when in doubt, recall that
a ~ dist(b, c);
is the same as
target += dist_lpdf(a | b, c);
for any continuous distribution “dist”. At least, I think I got this right! If I’m missing something, someone please correct me!

The point is that you should always be able to figure out what the code is doing by thinking of these ~ calls as augmentations to the target function.

Topic		Replies	Views
Possible error in user guide? "statistically equivalent" Modeling techniques	2	298	July 17, 2023
Difference in parametrization Modeling	2	350	May 3, 2020
Help with implementation of Normal Normal-Inverse Gamma model Modeling specification , cmdstanr	1	488	February 16, 2023
Beginner request for reparameterization help Modeling	12	1134	January 3, 2018
Convention for user-calibrated priors Modeling	2	523	April 7, 2021

Why exactly is alpha + beta * x ~ normal(y, sigma) wrong? Is it even wrong?

Related topics