Dear Stan users,
I’ve been considering the so called matrix balancing problem - which consist in estimating a matrix, based on known row and column totals (x and y, respectively) only. Rather than estimating matrix elements directly, I estimated row shares (Lambda), for which there is also some prior information available (formulated in terms of Dirichlet priors). The following code illustrates the 2x2 case, for simplicity.

data {
vector[2] y; // column totals
vector[2] x; // row totals
vector[2] alpha1;
vector[2] alpha2;
real s;
}
parameters {
simplex[2] Lambda1; // row shares of the unknown matrix (row 1)
simplex[2] Lambda2; // row shares of the unknown matrix (row 2)
}
model {
Lambda1 ~ dirichlet(alpha1);
Lambda2 ~ dirichlet(alpha2);
y[1] ~ normal(Lambda1[1]*x[1] + Lambda2[1]*x[2], s);
y[2] ~ normal(Lambda1[2]*x[1] + Lambda2[2]*x[2], s);
}

That formulation (seemingly) worked fine in practice. However, it includes what I consider a workaround, i.e. normal errors, with arbitrarily chosen standard deviations. These standard deviations were set to very small values, implying that estimated column totals equal their target values (y) closely, though still only approximately. But in fact these errors are not a part of the original estimation problem.

My question is - how to program the above problem in Stan avoiding the addition of errors?

That is, instead of assuming “y[1] ~ normal(Lambda1[1]*x[1] + Lambda2[1]*x[2], s)”, I would rather like to impose a strict identity, such as “y[1] = Lambda1[1]*x[1] + Lambda2[1]*x[2]” etc.
Note that reverting to identity does not rule out the need for estimation - the unknown matrix is still underdetermined. Rather it implies using a peculiar likelihood function, which takes the value of 1 for any combination of Lambdas satisfying (in matrix notation) y=Lambda’*x, and 0 otherwise.

I’ll be grateful for your suggestions
Thanks
Jakub

Sorry for the late reply—I’m way behind on Discourse.

What’s going on generatively in the model?

To express constrained parameterizations (like a simplex where the values need to be positive and sum to one, or a covariance matrix which must be positive definite), you need to work backward from the fact that Stan requires unconstrained parameters. So what you need to do with constraints is reduce the number of degrees of freedom in your parameterization and then reconstruct what you need. For instance, to get a sum-to-zero vector, you declare a parameter vector of one smaller size, then define a transformed parameter where the last value is the negative sum of the first values to make sure it sums to zero. There’s a reduced number of actual parameters. Then you need to apply the log Jacobian if the transform’s non-linear in the parameters.

This looks like a tricky thing to solve as it involves all the data x — are you sure there are solutions?

Following your suggestions I have rewritten (and reparametrized) the illustrative problem. I also included example numbers for perhaps more clarity.
(Answering your question - yes, I think there are solutions here, meaning there is an infinite number of matrices satisfying the constraints)

/*
Estimate matrix elements a,b,c,d
given row and column totals
Total
a b 70
c d 30
Total 60 40
*/
parameters {
real<lower=0, upper=60> a;
}
transformed parameters {
real b;
real c;
real d;
b = 70 - a;
c = 60 - a;
d = 30 - c;
}
model {
a ~ normal(40, 15);
}

Now we have only one free parameter, a, while b, c, & d are determined based on constraints, i.e. such that they add-up to the known row and column totals.
Imposing prior (here - normal) distribution on a, we can draw distrubutions of b, c & d, as the figure below shows.

OK, in this setting it no longer seems to be Bayesian inference (unless perhaps we stick with the interpretation mentioned in the original post, that the constraints themselves are a peculiar likelihood, taking value of 1 when the constraints are satisfied, 0 otherwise).

Still, I have a few concerns:

I would actually like to assume that a,b,c,d are non-negative. Assuming bounds for a <0,60> provides non-negativity of b & c, but not d. Of course for this 2x2 example oun can easily find proper bounds for a <30,60> but it is not the point, since for bigger or more general problems such derivations will likely be infeasible. Adding explicit zero lower bounds to the transformed parameters leads to sampling problems (initial value rejections and divergent transitions) and much, much slower execution - I presume that the sampler bumps at bounds of parameter space. Is it therefore legitimate to use the program without zero lower bounds, and then simply remove all ‘posterior’ samples containing negative values and use such a ‘filtered’ distribution for inference?

This simplistic problem is in fact an illustration of a more general situation, in which constraints on parameters are forumlated as a system of (possibly non-linear) equations. Is it possible to solve in Stan - e.g. in the transformed parameters block - a system of non-linear equations involving both free and transformed parameters, without deriving explicit analytical solution? In the context of the above example, would it be possible to write constraints in their ‘structural’ form, i.e. a+b=70 etc. and solve numerically?

My previous approach implied too wide parameter space at the beginning (with four free parameters, instead of just one), being then constrained significantly by the ‘sampling distribution’. I believe this hampered execution times etc. But, on the other hand, it allowed to avoid analytical derivations of transformed parameters. Would you consider that approach valid as such?

It’s still Bayesian, but there’s only a single free parameter. You can get Bayesian inference over the others.

But if you have known row and column totals in the 2 x 2 case, you can just solve for the matrix. In the 3 x 3 case, the row and column totals only give you 6 degrees of freedom, so you won’t get unique solutions.

Yes, you can solve algebraic equations now in Stan, but I’m not sure how effective that’d be at producing constraints. In general, using constraints where they’re valid is much more effective than adding soft constraints.