Misleading Warning when providing prior parameters as data

Pystan3 (or stanc) gives misleading warning: “Warning: The parameter {parameter_name} has no priors.”, when parameters of the prior is given as data. The warning is misleading, because the code seems to respect the prior given in the data. Motivation for doing such thing is to avoid recompilation when playing with different priors.

Example code:

import numpy as np
import stan
#testing prior
stan_model = """
data {
  real m;
}
parameters {
  real mu;
}
model {
  mu ~ normal(m, 1.0);
}
"""
asd = stan.build(stan_model, data={"m":42.0})
kek = asd.sample(num_chains=1, num_samples=1000);
lol = kek.to_frame()["mu"]
print("mean: ", lol.mean())

Output:

Building: found in cache, done.
Messages from stanc:
Warning: The parameter mu has no priors.
Sampling: 100% (2000/2000), done.
Messages received during sampling:
  Gradient evaluation took 8e-06 seconds
  1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
  Adjust your expectations accordingly!
mean:  41.89910543191657

When m is replaced by a real number, the warning goes away.

System details:

  • Mac Monterey (M1)
  • Python 3.8.12
  • Pystan 3.4.0
  • Compiler Clang 10.0.0
1 Like

This seems like a bug in “pedantic mode”, which is a parser feature that PyStan3 turns on by default (I think it’s the only interface that currently turns it on by default).

If you don’t mind, could you open a bug report at Issues · stan-dev/stanc3 · GitHub and share this example there? Thank you!

I looked at the issues. I think mine would be duplicate to this: Pedantic-mode missing prior false alarm · Issue #932 · stan-dev/stanc3 · GitHub. I think no need to make another one? I’ll cross reference there to this post.

Ah, I remember this issue. I’m actually not sure there’s a great way to solve it.

The issue is, how do we define a prior? I’ve defined it as a factor that’s not a likelihood, and I’ve defined a likelihood as a factor that includes or has connection to the data. So in this case, since the factor normal_lpdf(mu, m, 1.0) includes a data variable, it’s recognized as a likelihood, just like if it were normal_lpdf(m, mu, 1.0).

Ideally we’d be able to distinguish between “modeled” data variables and “umodeled” data variables/hyperparameters. Then we’d only mark factors as likelihoods if they touched modeled data.

The obvious fix for this example would be to only mark factors as likelihoods if they touch the data on the left-hand side of a twiddle, AKA the first argument of an lpdf, but this is only a partial solution. As soon as you stray away from build-in distributions, like if you write out the normal distribution or any other custom density, you’re back to marking it as a likelihood.

I suppose we could do this left-hand-side detection for built-in distributions and then make the warning’s wording less assertive for all variables with any factors that aren’t builtin distributions.

2 Likes

This makes sense to me!

1 Like