Bounds on a specific column of a 2D array

Hi everyone, I have the following question:

Consider that one wants to declare a 2D array in the data block as:

data {
	int<lower=1> T;
	real x[T,8];

If all the columns have as bound 0, I can add the argument <lower=0> in it. However, what if a certain column does not have as bound 0. How can I specify different bounds to certain columns in 2D arrays?. Is that possible?

Thanks in advance.

No. In the data block you would have to make 8 different declarations and then glue them together as real x[T,8]; in the transformed data block, which is probably not worthwhile just to check that the data are input correctly. Alternatively, you could pass x to data unbounded and check the bounds yourself in the transformed data block using a reject statement.

Thank you for your answer. Might you give me a hint of how the code for gluing the 8 variables would look like?. I got confused by the type of one dimensional container that I should use (real, vector or row_vector) . How do I index them to use a ‘for’ loop in the transformed data block?

data {
  int<lower=1> T;
  // add bounds on these as necessary
  real x1[T];
  real x2[T];
  real x3[T];
  real x4[T];
  real x5[T];
  real x6[T];
  real x7[T];
  real x8[T];
}
transformed data {
  real x[8,T];
  x[1] = x1;
  x[2] = x2;
  x[3] = x3;
  x[4] = x4;
  x[5] = x5;
  x[6] = x6;
  x[7] = x7;
  x[8] = x8;
}
1 Like

Thanks, now I got what you meant by not worthwhile just to check that data are loaded correctly.

I thought that by doing so, I would constraint the generated quantities to don’t be taken under that threshold.

Re phrasing my entire question, what could I do to accomplish such a thing?

To be more precise, I am estimating parameters in an ODE system, and my generated quantities (x_hat[1,1], [2,1] and so on) should not be under a certain bound. Is that kind of constraining possible?

No

Is it legitimate to build in soft restraints ? Like

// Set your lower bound, min_val to something
// Set bad_scale to some number which probably scales with the
// number of data points.

for (i in 1:maxi) {
    for ( j in 1:maxj) {
        real bad = min_val - x_hat[i,j];
        if (bad > 0) {
            target += - bad * bad * bad_scale;
        }
    }
}

Won’t this have the desired effect of dissuading the sampler from visiting values that are smaller than you want ?
Obviously it makes a mess of the cost function, but only when your parameters are in the forbidden region.

1 Like

thanks, that worked perfectly for my problem

The way to dissuade the sampler from visiting values that are smaller than you want is to formulate a hard constraint if it’s a logical constraint (like probabilities falling between 0 and 1 or correlations between -1 and 1 or covariance matrices being positive definite) and as a soft constraint with a prior if it’s not.

Trying to hack terms into the density like this is almost always going to be a bad idea as it destroys the generative nature of whatever the underlying model is.

There’s a chapter in the manual explaining the differences between real arrays, vectors and row vectors and factors motivating the choice of each.

So are you trying to hand code a normal prior here? Because

target += -bad^2 * bad_scale;

gives the same result as

bad ~ normal(0, 1 / sqrt(2) / bad_scale);

That is, your bad_scale isn’t a scale in the sense of a location/scale distribution like the normal.

Either way, it’s a data-dependent prior, which can disrupt some theoretical results, such as the data dominating the prior asymptotically.

Based on the answer from Andrew, I did a modification that has helped a little bit:

In my transformed parameters block, I defined a sum of squares differences as:

		x_hat = integrate_ode_rk45(cbf, x0_1, t0, ts, theta, y_r, y_i,1.0E-6, 1.0E-6, 2.0E6); //generated quantities
		for (t in 1:T)
		  for (n in 1:8)
			ssd[t,n] = (xn[t,n] - x_hat[t,n])^2

And in my model block:

		for (t in 1:T)
			xn[t]~normal(x_hat[t],sigma);
		for (t in 1:T)
			for (n in 1:8)
				target += - lambda*ssd[t,n];

So, from your comments, if I am not mistaken, what I am defining is some sort of prior for the SSD, that will be a soft constraint. And btw, sorry if my questions are naive, but by that in my model block, is not some sort of double definition of SSD’s?

Not sure what you mean by double definition. You usually only want to give each parameter a single prior (if you give it more than one, they multiply (or add on the log scale)).

I mean, in the part of

		for (t in 1:T)
			xn[t]~normal(x_hat[t],sigma);

I am already defining that my data xn is going to be normally distributed with mean x_hat, and error sigma. So, by defining this SSD, am I giving sigma another prior?. In other words, a question of very basic concepts (sorry again), is the expression above equivalent to an OLS regression?

Yes, you’re providing two priors for xn that way. The log posterior (up to an additive constant) is defined by adding together the result of all the sampling, target increment, and Jacobians for constraint transforms.

OLS is just an ad hoc way of generalizing maximum likelihood. For linear regression, they’re equivalent. We don’t do least squares algorithms, just direct max likelihood estimation and posterior sampling (and variational inference).

The statement

for (t in 1:T)
  xn[t]~normal(x_hat[t],sigma);

or more efficiently,

xn ~ normal(x_hat, sigma);

adds the following to the likelihood:

SUM_{n in 1:N}  log normal(xn[n] | x_hat[n], sigma)

If you work out the math, there’s a quadratic term in (xn[n] - x_hat[n]) / sigma)^2.

P.S. We also don’t recommend tabs in code.

P.P.S. Also wouldn’t recommend using x_hat[n] for a parameter—we usually use “hat” for estimators.

thanks, now is clearer. However, adding this extra prior to xn does not mean it is wrong, does not it? I ask this, because by adding it has improved the fit of the experimental data.

P.S. My code is a little Frankenstein made up of parts of examples, so notation and indentation needs to be polished, my bad

So are you trying to hand code a normal prior here? Because…

Good point, but let me ask another question. I don’t think one can say

bad ~ normal(0, some_constant);

The aim is to say that each x_hat should not go below a min_x_hat.
Should one write ?

if (x_hat < min_x_hat)
    bad = min_x_hat - x_hat;
else
    bad = 0.;
bad ~ normal (0, some_constant);

It looks like it works numerically. Even the derivative goes to zero as x_hat approaches the minimum. It does not look like a valid probability, since it is like a half-gaussian, which goes flat at the top.

It turns it into a tighter prior. You can work out the specifics in some cases, or you can do the multiplcation and plot it to see what kind of prior you’re actually imposing.

1 Like

or by the other hand, could I play around with the prior xn~normal(x_hat,sigma) in terms of tighten sigma up (I am using sigma~cauchy(0,2.5)), would that give a similar effect or am just digressing too much?

The product of two normal densities is another normal density, so it’d be equivalent if you found the right scaling. This post from John Cook is very clear on the subject and goes through the math of calculating what the result ing variance is (though keep in mind Stan works with scale, not variance):

https://www.johndcook.com/blog/2012/10/29/product-of-normal-pdfs/

1 Like