Consider I have a matrix or other array of parameters that I’m trying to fit. I however know some (arbitrary) entries in the matrix must be 0.
What is a good approach to do this?
I have a two ideas but neither seems great:
Have the full matrix in parameters, but multiply them by some masking matrix before using anywhere in the model. This should be mathematically valid, but computationally I’m fitting a bunch of parameters that are completely irrelevant to the model and have 0 gradient.
Use some sort of sparse representation. Like in the CSR representation I only fit the w vector as parameters. The u, v can in principle be known in data. This should solve the problem at the cost of additional code, but it still feels a bit clunky. I am not sure how good is it computationally. The matrix isn’t actually very sparse in practice.
Furthermore, I will probably want to have the parameters constrained in some way. If I want to use not a vanilla matrix but something like row_stochastic_matrix (because I want to interpret each row as a parameter of categoric/multinomial distribution) together with the above. Then I suppose I need to additionally write a custom transformation that would do some similar logic STAN would do with these constraints on some hidden? Or is there a way to combine this with the existing capabilities?
How many of the entries will be zero? If it’s more than half than I would use CSR representation. If it’s less than half you can pass in an array with the indices that are nonzero.
data {
int N;
int M;
int num_nonzero;
// Holds cell idx that are nonzero like [(1, 2), (8, 9)]
array[num_nonzero, 2] nonzero_idxs;
}
parameters {
vector[num_nonzero] raw_param;
}
transformed parameters {
matrix[N, M] param = rep_matrix(0, N, M);
for (i in 1:num_nonzero) {
param[nonzero_idxs[i, 1], nonzero_idxs[i, 2]] = raw_param[i];
}
}
But this is a thing I’ve been trying to think about for row and column stochastic matrices. We often know that some states cannot change. So it would be nice to tell Stan that those cells are zero when we build the stochastic matrix. Something like
This won’t work with simplexes because the other values won’t add up to one. You can fudge this, but there are better ways.
This is usually best if you can do it.
We discussed this the other day around pinning parameters. To keep it simple, let’s say you have a simplex of size N and know that some (but not all) of the indexes have values of zero. You can code this in Stan as follows.
It’s a bit trickier if it’s a simplex with known values, because those have to be constrained to be non-negative and not exceed 1 in sum, and then you have to multiply the simplex by one minus the sum to make everything add up to 1.
P.S. Stan is named after Stanislaw Ulam—it’s not an acronym, so we usually don’t write “STAN”.
Yeah, this would indeed be lovely, but for now, I have to work with what we have :)
This seems good for a single simplex, but for a row_stochastic_matrix (or equivalently array[m] simplex[n]), it would be a bit more complicated, because there may be a different number of zero_indices for each row and we don’t have native support for ragged arrays. I mean it could be likely worked around but at the cost of more code and importantly of obscuring the semantics.
What I would also really like to see would be a native support for some boolean masking indexing instead. (As in numpy etc.) But I suppose a custom function can be written for that without big issues.
Maybe Steve’s idea storing just the coordinates
would feel better-readable if nothing else. And then do some row-wise softmaxing to ensure the constraints. Not sure whether there is a significant difference in performance, I suspect Bob’s native use of simplex may be better.
Do you mean so that you can do something like a[b == 1] to pull out the values of a where the parallel array b has value 1. We really should add something like that—it’s been on our to-do list forever. We can’t literally overload it the way R does.
That is, unfortunately, the cost for now. We’d also like to add ragged arrays. But it’s hard finding volunteers for a project as complicated as Stan.
That can work, but you have to be careful with identifiability because sofmtax(x) = softmax(x + c) for any scalar. So you need either a sum-to-zero constraint or to take the traditional approach of pinning one of the values to zero. We usually prefer the latter because it allows symmetric priors.
Yeah, this is what I mean. Probably some native support for booleans would be a pre-requisite as now indexing with integer array is already defined. (To allow overloads of the logical operators such as array[] bool operator==(array[] int, int) etc.)
Going a bit off topic here, but I am convinved Stan is amazing enough to be eligible for some serious corporate funding. Look for example on Facebook Prophet, that’s built on top of Stan (although it’s essentially just some regularized GLM). Facebook is a big company with loads of money and perhaps its in their interests. I believe data scientists working for other corporations could get a good value for their employers from Stan, which goes hand in hand with Stan being more feature-complete. But my personal impression is that Stan seems to go under the radar for people who could put it in good use. With appropriate funding, you wouldn’t need to find volunteers (which is always hard) but simply pay people.
We don’t see see how to overload and maintain backward compatibility. Instead, we’d go with a custom indexing operation, along the lines of:
vector index_bool(vector, array[] int idxs)
And we might have to do the same for your array[] bool operator==(array[] int, int). So we could have the functionality, just not with the clean syntax of R.
I don’t think it’s because it’s under the radar, it’s more because it’s mature. We have tried and are still trying to get money from everywhere (others more than me now that I’m no longer in a soft money position). @andrewgelman is trying to convince one of the big tech companies to give us a few more nickels and dimes right now. The problem is that the people who use it don’t control big budgets within those very rich companies. And these big companies, despite sitting on dragon hoards of cash, do not like to spend money. Even Sean Taylor, who was behind the Prophet project at Facebook, couldn’t drum up money for us. Ditto every other major company. They can sometimes shake loose amounts in the $10–25K range.
I mean, wouldn’t it be possible to also do an overload like array[] T operator[](array[] T, array[] bool)? Then you can get the clean syntax like x[a >= b] etc. I don’t see into the internals, but I would expect the indexation operator to be just another an overloadable function.
That would be nice, but we don’t have a bool type or a clean, backward-compatible path to adding one that I can see. So we can’t overload bool and int the way that R does.
If someone has some suggestion on how to add a bool type, I’d love to hear it.