Stan says "no priors" in model

Stan doesn’t see any priors here, but I clearly do have a beta prior on theta…

data {
  int<lower=0> N; // number of observations
  int<lower=0, upper=100000> k[N]; // observed number of successes
  int<lower=0, upper=100000> n[N]; // observed number of trials  
  real<lower=0, upper=10> alpha; // hyperparameter for beta prior
  real<lower=0, upper=10> beta; // hyperparameter for beta prior
}

parameters {
  real<lower=0, upper=1> theta; // probability of success
}

model {
  // priors
  theta ~ beta(alpha, beta);
  
  // likelihood
  for (i in 1:N){
      k[i] ~ binomial(n[i], theta);
  }
  
}
"""

And yet I get this warning: "Warning: The parameter theta has no priors. This means either no prior is provided, or the prior(s) depend on data variables. In the later case, this may be a false positive."

1 Like

This warning is produced by “pedantic mode” which is known to have false positives. In particular, it doesn’t consider a ~ statement which includes variables from data as priors, which clearly they can be. You can ignore this warning

Are you using PyStan? I believe that is the only interface which enables pedantic mode by default.

Indeed I am using PyStan. I ignored this warning and plowed ahead, getting this error:

“RuntimeError: Exception during call to services function: ValueError("Initialization failed. Rejecting initial value: Error evaluating the log probability at the initial value. Exception: binomial_lpmf: Successes variable is 1199, but must be in the interval [0, 1051] (in '/tmp/httpstan_jd0xv6_k/model_i7bmbwl7.stan', line 20, column 6 to column 35) Rejecting initial value: ..."), traceback: [' File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/asyncio/tasks.py", line 232, in __step\n result = coro.send(None)\n', ' File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/httpstan/services_stub.py", line 185, in call\n future.result()\n', ' File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/asyncio/futures.py", line 201, in result\n raise self._exception.with_traceback(self._exception_tb)\n']

Seems to believe that the success variable maxes out at 1051, which I defined otherwise, so a little confused.

The reason that I started with the build warning is I thought that it was upstream of this sampling error.

I don’t see where you’ve defined this in the model. You state that n and k are both less than 100,000, but this doesn’t imply any relationship between k and n

Is the error due to a relationship with k and n?
I interpret the error as simply a k problem. And the 100k limit should resolve.

And I’ve validated that k[i] < n[i] for all i in observations.

The error you’re getting is thrown when the first argument to binomial_lpdf is greater than the second, by this line: math/binomial_lpmf.hpp at 6cd15b88d16dbace9ee4ce9d27997901159c44e7 · stan-dev/math · GitHub, so it seems like an issue with the input data rather than the model code

You can encode that k[i]<n[i]directly in the model by changing your data block:

  int<lower=0, upper=100000> n[N]; // observed number of trials
  int<lower=0, upper=n> k[N]; // observed number of successes

Yeah no rows were returned when I check df[df['k'] > df['n']]. Data seems fine! Maybe using Jupyter notebook on AWS is the issue. Haven’t had a problem before but who knows?

Can we just remove this rule? We very often have users who define their prior parameters in the data block, so this is just going to be producing a massive number of false positives for all those models.

I believe @rybern had a suggestion for how to improve it but I can’t find it now. If I recall correctly having some way of differentiating between prior and likelihood was important for many of the pedantic mode analyses, and using “does it touch data” was chosen as the criteria

That’s right - Pedantic Mode guesses what’s a prior based on what touches variables in data, so it’ll get confused when data variables are actually hyperparameters like this. Ideally we’d have something like an annotation or separate block to distinguish ‘true’ data variables, but alas.

One option would be to mention this issue in the warning message.

1 Like

The current text (… or the prior(s) depend on data variables. In the later case, this may be a false positive.) was an attempt at exactly that, but it might still come off too strongly as a warning

1 Like

Oh nice, I didn’t notice! It makes sense to me to keep the warning with this wording or similar. We could consider making it even softer by saying “The parameter theta may have no priors.”