Consider that one wants to declare a 2D array in the data block as:
data {
int<lower=1> T;
real x[T,8];
If all the columns have as bound 0, I can add the argument <lower=0> in it. However, what if a certain column does not have as bound 0. How can I specify different bounds to certain columns in 2D arrays?. Is that possible?
No. In the data block you would have to make 8 different declarations and then glue them together as real x[T,8]; in the transformed data block, which is probably not worthwhile just to check that the data are input correctly. Alternatively, you could pass x to data unbounded and check the bounds yourself in the transformed data block using a reject statement.
Thank you for your answer. Might you give me a hint of how the code for gluing the 8 variables would look like?. I got confused by the type of one dimensional container that I should use (real, vector or row_vector) . How do I index them to use a âforâ loop in the transformed data block?
data {
int<lower=1> T;
// add bounds on these as necessary
real x1[T];
real x2[T];
real x3[T];
real x4[T];
real x5[T];
real x6[T];
real x7[T];
real x8[T];
}
transformed data {
real x[8,T];
x[1] = x1;
x[2] = x2;
x[3] = x3;
x[4] = x4;
x[5] = x5;
x[6] = x6;
x[7] = x7;
x[8] = x8;
}
Thanks, now I got what you meant by not worthwhile just to check that data are loaded correctly.
I thought that by doing so, I would constraint the generated quantities to donât be taken under that threshold.
Re phrasing my entire question, what could I do to accomplish such a thing?
To be more precise, I am estimating parameters in an ODE system, and my generated quantities (x_hat[1,1], [2,1] and so on) should not be under a certain bound. Is that kind of constraining possible?
Is it legitimate to build in soft restraints ? Like
// Set your lower bound, min_val to something
// Set bad_scale to some number which probably scales with the
// number of data points.
for (i in 1:maxi) {
for ( j in 1:maxj) {
real bad = min_val - x_hat[i,j];
if (bad > 0) {
target += - bad * bad * bad_scale;
}
}
}
Wonât this have the desired effect of dissuading the sampler from visiting values that are smaller than you want ?
Obviously it makes a mess of the cost function, but only when your parameters are in the forbidden region.
The way to dissuade the sampler from visiting values that are smaller than you want is to formulate a hard constraint if itâs a logical constraint (like probabilities falling between 0 and 1 or correlations between -1 and 1 or covariance matrices being positive definite) and as a soft constraint with a prior if itâs not.
Trying to hack terms into the density like this is almost always going to be a bad idea as it destroys the generative nature of whatever the underlying model is.
Based on the answer from Andrew, I did a modification that has helped a little bit:
In my transformed parameters block, I defined a sum of squares differences as:
x_hat = integrate_ode_rk45(cbf, x0_1, t0, ts, theta, y_r, y_i,1.0E-6, 1.0E-6, 2.0E6); //generated quantities
for (t in 1:T)
for (n in 1:8)
ssd[t,n] = (xn[t,n] - x_hat[t,n])^2
And in my model block:
for (t in 1:T)
xn[t]~normal(x_hat[t],sigma);
for (t in 1:T)
for (n in 1:8)
target += - lambda*ssd[t,n];
So, from your comments, if I am not mistaken, what I am defining is some sort of prior for the SSD, that will be a soft constraint. And btw, sorry if my questions are naive, but by that in my model block, is not some sort of double definition of SSDâs?
Not sure what you mean by double definition. You usually only want to give each parameter a single prior (if you give it more than one, they multiply (or add on the log scale)).
I am already defining that my data xn is going to be normally distributed with mean x_hat, and error sigma. So, by defining this SSD, am I giving sigma another prior?. In other words, a question of very basic concepts (sorry again), is the expression above equivalent to an OLS regression?
Yes, youâre providing two priors for xn that way. The log posterior (up to an additive constant) is defined by adding together the result of all the sampling, target increment, and Jacobians for constraint transforms.
OLS is just an ad hoc way of generalizing maximum likelihood. For linear regression, theyâre equivalent. We donât do least squares algorithms, just direct max likelihood estimation and posterior sampling (and variational inference).
The statement
for (t in 1:T)
xn[t]~normal(x_hat[t],sigma);
or more efficiently,
xn ~ normal(x_hat, sigma);
adds the following to the likelihood:
SUM_{n in 1:N} log normal(xn[n] | x_hat[n], sigma)
If you work out the math, thereâs a quadratic term in (xn[n] - x_hat[n]) / sigma)^2.
P.S. We also donât recommend tabs in code.
P.P.S. Also wouldnât recommend using x_hat[n] for a parameterâwe usually use âhatâ for estimators.
thanks, now is clearer. However, adding this extra prior to xn does not mean it is wrong, does not it? I ask this, because by adding it has improved the fit of the experimental data.
P.S. My code is a little Frankenstein made up of parts of examples, so notation and indentation needs to be polished, my bad
So are you trying to hand code a normal prior here? BecauseâŚ
Good point, but let me ask another question. I donât think one can say
bad ~ normal(0, some_constant);
The aim is to say that each x_hat should not go below a min_x_hat.
Should one write ?
if (x_hat < min_x_hat)
bad = min_x_hat - x_hat;
else
bad = 0.;
bad ~ normal (0, some_constant);
It looks like it works numerically. Even the derivative goes to zero as x_hat approaches the minimum. It does not look like a valid probability, since it is like a half-gaussian, which goes flat at the top.
It turns it into a tighter prior. You can work out the specifics in some cases, or you can do the multiplcation and plot it to see what kind of prior youâre actually imposing.
or by the other hand, could I play around with the prior xn~normal(x_hat,sigma) in terms of tighten sigma up (I am using sigma~cauchy(0,2.5)), would that give a similar effect or am just digressing too much?
The product of two normal densities is another normal density, so itâd be equivalent if you found the right scaling. This post from John Cook is very clear on the subject and goes through the math of calculating what the result ing variance is (though keep in mind Stan works with scale, not variance):