I’m trying to model several teams competing in a group tournament. I’m fairly new to Stan, and I’m struggling to understand how to build a model beyond simple examples I could find.

Given: number of points won by each team by the end of the tournament (one point for win).
Estimate: relative strength of teams.

This is what I’ve got so far (using rstan package):

stan(
model_code = "
data {
int num_teams;
int points[num_teams];
}
transformed data {
int outcome[num_teams, num_teams];
}
parameters {
vector[num_teams] strength;
}
model {
strength ~ normal(0, 1);
for (i in 1:num_teams) {
for (j in 1:num_teams) {
outcome[i, j] ~ bernoulli_logit(strength[i] - strength[j]);
}
}
for (i in 1:num_teams) {
points[i] = 0;
for (j in 1:num_teams) {
if (i != j) points[i] = points[i] + outcome[i, j]
}
}
}
",
data = list(
num_teams = 4,
points = c(6, 4, 2, 0)
)
)

I’m getting this error message:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Cannot assign to variable outside of declaration block; left-hand-side variable origin=data
error in 'model2294209945cd_ded447960a469e2e3ca50ae670299991' at line 25, column 27
-------------------------------------------------
23:
24: for (i in 1:num_teams) {
25: points[i] = 0;
^
26:
-------------------------------------------------
PARSER EXPECTED: <expression assignable to left-hand side>
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model 'ded447960a469e2e3ca50ae670299991' due to the above error.

Looks like I cannot assign values to whatever comes from data block. But how should I go about building this model then?

Could someone nudge me in the right direction, please?

I cannot understand why you are changing the value of your input data (the variable point ) in the model block. what are you trying to do?
Moreover, in the transformed data block you are just declaring a matrix of int(4x4) without assigning any values to it.
So, on my PC, every value of the matrix elements is set by default (-2147483648)

Stan requires a log density. It’s not clear from that sketch what log density your’e after, because it’s not clear where match_outcome is coming from. Is that observed or unknown? Also, ~ sum(...) isn’t a distribution. Is that an exact requirement?

Stan doesn’t allow discrete parameters, so if the match outcome is unknown, then it has to be marginalized out.

The “more sophisticated” model I’m trying to build should only have total_points in data block.

The model should output posterior distribution for strength parameter for each team.

The tricky part is that there’s no direct relationship (or, at least, none that I can easily model) between strength and total_points. Instead, I’ve got a relationship between strength and individual match outcomes, with total_points simply being sum of the latter (positive outcome = 1 point). This is what I tried to express as total_points ~ sum(match_outcome), which is, of course, not appropriate syntax.

Basically, I need a model that’s fitted to aggregate counts, and I can’t figure out how to insert that extra step.

Then you need to marginalize out the discrete parameters. That is, sum over all the ways the individual teams can make points that add up to the total if you only observe the total and there’s a latent discrete number of points for each team.

You could also model everything continuously. But you have to deal with variable bounds for different game contributions and there will be a lot of latent parameters.

I think it might help to write down the generative model for what you’re doing. How would you simulate from the model? I don’t understand why you’re counting points, and also modeling a logistic regression for match outcome (that’s basically a Bradley-Terry model in the middle).

I can help you do the marginalization, but you have to meet me in the middle by writing down the density for the model with the latent discrete parameters.

No, you can’t calculate means of posteriors within Stan. Stan calculates a log density and then the algorithms on top of Stan do things like sample from the posterior, then with a sample from the posterior, you can compute expectations like means.

I think what you’re trying to do is fit a Bradley-Terry model, which is a piece of cake in Stan. First,