Preferably one layer lower in stan-dev/stan (that is, not in stan-dev/cmdstan). If it goes in the cmdstan repo, none of the other interfaces will see it.
This is currently a mess in our code and what winds up happening with these things is that people implement them multiple times, one for each interface.
Pulling them up to the C++ level in stan-dev/stan would mean providing functions that all the interfaces could use.
Isn’t there just one control variate per parameter?
What does this mean? Generated quatities is a separate block in Stan that’s executed with double
types using pseudo-RNGs, not sampling. Their values are output like parameters, but they’re not deterministic functions of the parameters like the transformed parameters are.
Where what wouldn’t be possible? That was the first sentence in the paragraph.
Maybe it’ll help to be specific. If we want to evaluate \textrm{Pr}[\beta > 0 \mid y] where \beta is a parameter and y the observed data, then we would have something like this:
data {
int N;
vector[N] x;
vector[N] y;
int N_tilde;
vector[N_tilde] x_tilde;
}
parameters {
real alpha; real beta;
real<lower = 0> sigma;
}
model {
y ~ normal(alpha + beta * x, sigma);
{ alpha, beta, sigma } ~ normal(0, 2);
}
generated quantities {
int<lower = 0, upper = 1> beta_gt_0 = (beta > 0); // event probabilities
vector[N_tilde] y_tilde = normal_rng(alpha + beta * x_tilde, sigma); // predictions
}
The first generated quantity variable alpha_gt_0
is for the event probabiltiy—it’s posterior mean will be the event probability \textrm{Pr}[\beta > 0 \mid x, y]. The second is for posterior prediction. We are very interested in posterior means for beta_gt_0
and interested in estimates for the various y_tilde[n]
.
I don’t see how to use control variates to help with either estimate. But they’re very often the focus of estimation, especially in predictive settings.
Our users care about estimates of their parameters in the constrained space. The unconstrained space we use for sampling is just a convenience and nothing users ever see. All the transforms are implicit results of defining constrained variables. For example, the sigma
parameter above is constrained so that sigma > 0
. The user does not want an estimate of log sigma
, they want an estimate of sigma
.
Derivatives w.r.t. transformsed parameters are available internally, but as @bgoodri notes, not yet exposed anywhere.
Probably. People often want good parameter estimates on the constrained scale. And that includes transformed parameters. Some generated quantities, like beta_gt_0
above are really just transformed parameters (though they may be discrete) and others involve random number generation.
The existential question is the one @andrewgelman raised, which was when control variates would be useful given that we can’t even diagnose convergence until we have at least an ESS of 400 (100 in each of 4 chains, where 100 seems to be about the lower limit of ESS that we can reliably estimate—this last bit on reliable estimation is hearsay from the reliable source of @avehtari [but don’t blame Aki if I misremembered]).
So the real question is when do we need ESS >> 400? For most of our applied stats problem, an ESS of 400 is fine, as that implies a standard error of 1/20th of a posterior standard deviation.