Seeing "Warning: the parameter ... has no priors" when I shouldn't

The following model gives the (incorrect) warning “The parameter x_m has no priors” in pystan3. It seems to be due to the data d_m having dimension 2*ell+1, whereas the parameter x_m has dimension Nm, even though that is actually defined to be the same quantity.

data {
  int ell;    // number of data = Nm = 2*ell+1;
  vector[2] d_m[2*ell+1];  
}

transformed data {
  int Nm = 2*ell+1; // useful constant. Too bad I can't use it in d_m above.
}

parameters { 
  vector[2] x_m[Nm];
}

model {
  for (i in 1:Nm) {
      d_m[i] ~ normal(x_m[i], 1);
      x_m[i] ~ normal(0, 1);
  }
}

The problem goes away if I just make Nm into data and lose all references to the ell parameter. (Nb. the actual model is somewhat more complicated; this is an MWE.)

Not a big deal, but this feels like a bug, or at least an inaccurate warning.

Relatedly, is there any way to declare a constant which depends on the data, in such a way that it can actually be used in the data block so that we could replace vector[2] d_m[2*ell+1] with vector[2] d_m[Nm]?

What interface to Stan are you using?

Sorry – pystan3, as now mentioned in the question.

That warning comes from stanc3.

Can you just declare int Nm=3;

So in my case the “natural” variable for the problem is ell, even though it results in arrays of size 2*ell+1, which I also declare to be the value of the transformed data, Nm. Of course I can indeed just use it either by never actually referring to Nm or by never referring to ell itself (and instead giving Nm as data).

But it seems like there is no real reason why the compiler shouldn’t be able to create transformed data-like constants directly in the data block, at least if they are only used for purposes like array sizes, and in this case it could then keep track of the possibly-different array sizes which are actually the same.

It’s because the two blocks are used for different purposes, the data block is for interfacing with externally provided data while the transformed data block is for manipulations of data. Because of this, the two blocks transpile to different C++ with different purposes.

Thanks, @andrjohns for that explanation. But I think my feature request still stands!

The data block can already have declarations that depend on the data — I am just requesting the ability to capture those calculations in a name. I don’t think allowing this expressiveness would break the internal code model of Stan (but of course I could be wrong!).

My code above is already a case where Stan is forcing me to violate “DRY” – I need to repeat the expression 2*ell+1 at least once, and even then it gives me a warning unless I am willing to repeat it everywhere the current code uses Nm (which is more times in the more complicated production version). Alternately I can of course just provide Nm as data instead of ell, but this is moving the model a further step away from the way it is actually expressed “scientifically” (and I can certainly think of cases where various different functions of ell appear in other data dimensions). All of this can of course lead to bugs in my code (and it’s part of Stan’s job to make it harder to code programs with bugs!)

3 Likes

I wholeheartedly agree with this feature request.