Need help building a two-stage model (beginner)

Cervus1983 · March 26, 2018, 5:29pm

I’m trying to model several teams competing in a group tournament. I’m fairly new to Stan, and I’m struggling to understand how to build a model beyond simple examples I could find.

Given: number of points won by each team by the end of the tournament (one point for win).
Estimate: relative strength of teams.

This is what I’ve got so far (using rstan package):

stan(
	model_code = "
		data {
			int num_teams;
			int points[num_teams];
		}

		transformed data {
			int outcome[num_teams, num_teams];
		}
		
		parameters {
			vector[num_teams] strength;
		}

		model {
			strength ~ normal(0, 1);

			for (i in 1:num_teams) {
				for (j in 1:num_teams) {
					outcome[i, j] ~ bernoulli_logit(strength[i] - strength[j]);
				}
			}

			for (i in 1:num_teams) {
				points[i] = 0;

				for (j in 1:num_teams) {
					if (i != j) points[i] = points[i] + outcome[i, j]
				}
			}
		}
	",
	data = list(
		num_teams = 4,
		points = c(6, 4, 2, 0)
	)
)

I’m getting this error message:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Cannot assign to variable outside of declaration block; left-hand-side variable origin=data
  error in 'model2294209945cd_ded447960a469e2e3ca50ae670299991' at line 25, column 27
  -------------------------------------------------
    23: 
    24: 			for (i in 1:num_teams) {
    25: 				points[i] = 0;
                                  ^
    26: 
  -------------------------------------------------

PARSER EXPECTED: <expression assignable to left-hand side>
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'ded447960a469e2e3ca50ae670299991' due to the above error.

Looks like I cannot assign values to whatever comes from data block. But how should I go about building this model then?

Could someone nudge me in the right direction, please?

fabio · March 27, 2018, 12:35pm

maybe try:

for (i in 1:num_teams) {
  points[i] = 0;
    for (j in 1:num_teams) { 
       if (i != j) {
          points[i] = points[i] + outcome[i, j];
       }
    }
}

I just try to fix the syntax error and improving readability (my readability)… I don’t know if your model is correct or not…

Cervus1983 · March 27, 2018, 2:06pm

Thanks, but this doesn’t seem to change the error message:

Cannot assign to variable outside of declaration block; left-hand-side variable origin=data

fabio · March 27, 2018, 2:30pm

I cannot understand why you are changing the value of your input data (the variable point ) in the model block. what are you trying to do?
Moreover, in the transformed data block you are just declaring a matrix of int(4x4) without assigning any values to it.
So, on my PC, every value of the matrix elements is set by default (-2147483648)

Cervus1983 · March 27, 2018, 2:53pm

I’m trying to move away from a simple model:

data { match_outcome }
parameters { team_strength }
model {
  team_strength ~ normal()
  match_outcome ~ bernoulli_logit(team_strength[i] - team_strength[j])
}

to something more sophisticated:

data { total_points }
parameters { team_strength }
model {
  strength ~ normal()
  match_outcome ~ bernoulli_logit(team_strength[i] - team_strength[j])
  total_points ~ sum(match_outcome)
}

I can’t understand how I’m supposed to add that extra step.

Bob_Carpenter · March 27, 2018, 6:55pm

Stan requires a log density. It’s not clear from that sketch what log density your’e after, because it’s not clear where match_outcome is coming from. Is that observed or unknown? Also, ~ sum(...) isn’t a distribution. Is that an exact requirement?

Stan doesn’t allow discrete parameters, so if the match outcome is unknown, then it has to be marginalized out.

Cervus1983 · March 27, 2018, 8:39pm

The “more sophisticated” model I’m trying to build should only have total_points in data block.

The model should output posterior distribution for strength parameter for each team.

The tricky part is that there’s no direct relationship (or, at least, none that I can easily model) between strength and total_points. Instead, I’ve got a relationship between strength and individual match outcomes, with total_points simply being sum of the latter (positive outcome = 1 point). This is what I tried to express as total_points ~ sum(match_outcome), which is, of course, not appropriate syntax.

Basically, I need a model that’s fitted to aggregate counts, and I can’t figure out how to insert that extra step.

Bob_Carpenter · March 28, 2018, 4:45am

Then you need to marginalize out the discrete parameters. That is, sum over all the ways the individual teams can make points that add up to the total if you only observe the total and there’s a latent discrete number of points for each team.

You could also model everything continuously. But you have to deal with variable bounds for different game contributions and there will be a lot of latent parameters.

I think it might help to write down the generative model for what you’re doing. How would you simulate from the model? I don’t understand why you’re counting points, and also modeling a logistic regression for match outcome (that’s basically a Bradley-Terry model in the middle).

Cervus1983 · March 28, 2018, 7:44am

Is there an example out there I could use? I’m not sure how to start implementing this in code.

Cervus1983 · March 28, 2018, 8:17am

I’ve read Chapter15. Latent Discrete Parameters of the reference manual, but it’s still unclear to me how to approach implementing my model.

Bob_Carpenter · March 30, 2018, 8:27pm

I can help you do the marginalization, but you have to meet me in the middle by writing down the density for the model with the latent discrete parameters.

Cervus1983 · April 2, 2018, 9:04am

Sorry, I just don’t know how to do it. I don’t even know if it’s possible.

To belatedly answer your previous question:

Having estimated strength of each team (as mean of posterior distribution):

outcome <- matrix(
	rep(0, num_teams^2),
	nrow = num_teams,
	ncol = num_teams
)

for (i in 1:num_teams) {
	for (j in 1:i) {
		if (i != j) {
			# probability of team [i] beating team [j]
			win_probability <- exp(strength[i] - strength[j]) / (exp(strength[i] - strength[j]) + 1)
			
			# simulate outcome (1 point for win, 0 for loss)
			outcome[i, j] <- runif(1) < win_probability
			outcome[j, i] <- 1 - outcome[i, j]
		}
	}
}

points <- rep(0, num_teams)

for (i in 1:num_teams) {
	points[i] <- sum(outcome[i, ])
}

Does this make it a bit clearer what I’m trying to do? Is this something that can be easily modeled in Stan?

Bob_Carpenter · April 6, 2018, 7:52pm

No, you can’t calculate means of posteriors within Stan. Stan calculates a log density and then the algorithms on top of Stan do things like sample from the posterior, then with a sample from the posterior, you can compute expectations like means.

I think what you’re trying to do is fit a Bradley-Terry model, which is a piece of cake in Stan. First,

inv_logit(u) = exp(u) / (1 + exp(u))

so you don’t need to do all that yourself.

Here are Stan implementations of Bradley-Terry for a case study.

In particular, you can start with this Stan program for Bradley-Terry that is completely self-contained.

Topic		Replies	Views
Doubts when modelling the probability of a team beating another team Modeling specification	3	600	November 23, 2020
Bradley-Terry model for teams Modeling	1	102	July 11, 2024
One-off STAN tutoring for a small research project Jobs	9	676	April 15, 2020
Stan coding difficulty in lognormal race model Modeling techniques , specification , cognitive-science	6	508	January 22, 2021
Stan model to fit ranks to match data like Bradley & Terry (1952)? General example-models	3	1335	May 21, 2018

Need help building a two-stage model (beginner)

Related topics