Bounds on a specific column of a 2D array

data {
  int<lower=1> T;
  // add bounds on these as necessary
  real x1[T];
  real x2[T];
  real x3[T];
  real x4[T];
  real x5[T];
  real x6[T];
  real x7[T];
  real x8[T];
}
transformed data {
  real x[8,T];
  x[1] = x1;
  x[2] = x2;
  x[3] = x3;
  x[4] = x4;
  x[5] = x5;
  x[6] = x6;
  x[7] = x7;
  x[8] = x8;
}
1 Like

Thanks, now I got what you meant by not worthwhile just to check that data are loaded correctly.

I thought that by doing so, I would constraint the generated quantities to don’t be taken under that threshold.

Re phrasing my entire question, what could I do to accomplish such a thing?

To be more precise, I am estimating parameters in an ODE system, and my generated quantities (x_hat[1,1], [2,1] and so on) should not be under a certain bound. Is that kind of constraining possible?

No

Is it legitimate to build in soft restraints ? Like

// Set your lower bound, min_val to something
// Set bad_scale to some number which probably scales with the
// number of data points.

for (i in 1:maxi) {
    for ( j in 1:maxj) {
        real bad = min_val - x_hat[i,j];
        if (bad > 0) {
            target += - bad * bad * bad_scale;
        }
    }
}

Won’t this have the desired effect of dissuading the sampler from visiting values that are smaller than you want ?
Obviously it makes a mess of the cost function, but only when your parameters are in the forbidden region.

1 Like

thanks, that worked perfectly for my problem

The way to dissuade the sampler from visiting values that are smaller than you want is to formulate a hard constraint if it’s a logical constraint (like probabilities falling between 0 and 1 or correlations between -1 and 1 or covariance matrices being positive definite) and as a soft constraint with a prior if it’s not.

Trying to hack terms into the density like this is almost always going to be a bad idea as it destroys the generative nature of whatever the underlying model is.

There’s a chapter in the manual explaining the differences between real arrays, vectors and row vectors and factors motivating the choice of each.

So are you trying to hand code a normal prior here? Because

target += -bad^2 * bad_scale;

gives the same result as

bad ~ normal(0, 1 / sqrt(2) / bad_scale);

That is, your bad_scale isn’t a scale in the sense of a location/scale distribution like the normal.

Either way, it’s a data-dependent prior, which can disrupt some theoretical results, such as the data dominating the prior asymptotically.

Based on the answer from Andrew, I did a modification that has helped a little bit:

In my transformed parameters block, I defined a sum of squares differences as:

		x_hat = integrate_ode_rk45(cbf, x0_1, t0, ts, theta, y_r, y_i,1.0E-6, 1.0E-6, 2.0E6); //generated quantities
		for (t in 1:T)
		  for (n in 1:8)
			ssd[t,n] = (xn[t,n] - x_hat[t,n])^2

And in my model block:

		for (t in 1:T)
			xn[t]~normal(x_hat[t],sigma);
		for (t in 1:T)
			for (n in 1:8)
				target += - lambda*ssd[t,n];

So, from your comments, if I am not mistaken, what I am defining is some sort of prior for the SSD, that will be a soft constraint. And btw, sorry if my questions are naive, but by that in my model block, is not some sort of double definition of SSD’s?

Not sure what you mean by double definition. You usually only want to give each parameter a single prior (if you give it more than one, they multiply (or add on the log scale)).

I mean, in the part of

		for (t in 1:T)
			xn[t]~normal(x_hat[t],sigma);

I am already defining that my data xn is going to be normally distributed with mean x_hat, and error sigma. So, by defining this SSD, am I giving sigma another prior?. In other words, a question of very basic concepts (sorry again), is the expression above equivalent to an OLS regression?

Yes, you’re providing two priors for xn that way. The log posterior (up to an additive constant) is defined by adding together the result of all the sampling, target increment, and Jacobians for constraint transforms.

OLS is just an ad hoc way of generalizing maximum likelihood. For linear regression, they’re equivalent. We don’t do least squares algorithms, just direct max likelihood estimation and posterior sampling (and variational inference).

The statement

for (t in 1:T)
  xn[t]~normal(x_hat[t],sigma);

or more efficiently,

xn ~ normal(x_hat, sigma);

adds the following to the likelihood:

SUM_{n in 1:N}  log normal(xn[n] | x_hat[n], sigma)

If you work out the math, there’s a quadratic term in (xn[n] - x_hat[n]) / sigma)^2.

P.S. We also don’t recommend tabs in code.

P.P.S. Also wouldn’t recommend using x_hat[n] for a parameter—we usually use “hat” for estimators.

thanks, now is clearer. However, adding this extra prior to xn does not mean it is wrong, does not it? I ask this, because by adding it has improved the fit of the experimental data.

P.S. My code is a little Frankenstein made up of parts of examples, so notation and indentation needs to be polished, my bad

So are you trying to hand code a normal prior here? Because…

Good point, but let me ask another question. I don’t think one can say

bad ~ normal(0, some_constant);

The aim is to say that each x_hat should not go below a min_x_hat.
Should one write ?

if (x_hat < min_x_hat)
    bad = min_x_hat - x_hat;
else
    bad = 0.;
bad ~ normal (0, some_constant);

It looks like it works numerically. Even the derivative goes to zero as x_hat approaches the minimum. It does not look like a valid probability, since it is like a half-gaussian, which goes flat at the top.

It turns it into a tighter prior. You can work out the specifics in some cases, or you can do the multiplcation and plot it to see what kind of prior you’re actually imposing.

1 Like

or by the other hand, could I play around with the prior xn~normal(x_hat,sigma) in terms of tighten sigma up (I am using sigma~cauchy(0,2.5)), would that give a similar effect or am just digressing too much?

The product of two normal densities is another normal density, so it’d be equivalent if you found the right scaling. This post from John Cook is very clear on the subject and goes through the math of calculating what the result ing variance is (though keep in mind Stan works with scale, not variance):

https://www.johndcook.com/blog/2012/10/29/product-of-normal-pdfs/

1 Like

Hi again, first, I would like to thank you for your advice. I do really appreciate it. One remaining question, in one of your answers to this post you point out that a target += -bad^2 * bad_scale; gives the same result as:

This is something that I don’t get, let’s say that bad_scale is 70 (just saying), so bad is going to be bad~normal(0,0.0101)? is that the case, because if it is, I tried it in my approach for giving a prior to the sum squared difference of my problem (which is the only one that seems to help to improve the fit to experimental data), and it simply doesn’t work.

thanks in advance.

Is bad defined as

bad = x_min - x

or

if (x < x_min)
    bad = x_min - x;
else
    bad = 0;

? If you are imposing a lower bound, you might want the second version.

Stan only cares about the log density, not how it’s defined. If bad_scale is a constant, then

log normal(bad | 0, 1 / (sqrt(2) * bad_scale))

  = - 1/2 * [(bad - 0) / (1 / (sqrt(2) * bad_scale))]^2 + const

  = -1/2 * bad^2 * 2 * bad_scale + const

  = bad^2 * bad_scale + const

Stan doesn’t care about the constants.

1 Like