Weighting observations and arviz.loo



I’m trying to model a problem where I have reason to believe that certain observations matter more than others (or, put another way, I consider some observations to be more important in the overall final fit than others). I found another post that recommended replacing:

for(i in 1:N_obs ){
    pred[i] = a[i] + b[i] * x[i];
    mean_y[i] ~ normal (pred[i], sigma[i]);


for(i in 1:N_obs){
    pred[i] = a[i] + b[i] * x[i];
    target += normal_lpdf(mean_y[i] | pred[i] , sigma[i]) * weight[i];

In my case, I’m using:

win ~ bernoulli_logit(win_chance_logit);

which suggests switching to

target += bernoulli_logit_lpdf(win | win_chance_logit) .* weight;

However, I’m also capturing log likelyhoods and generated data in order to use arviz.loo:

generated quantities{
    vector[NG] log_lik;
    vector[NG] win_hat;

    for (n in 1:NG) {
        log_lik[n] = bernoulli_logit_lpmf(win[n] | win_chance_logit[n]);
        win_hat[n] = bernoulli_logit_rng(win_chance_logit[n]);

What would the right way to weight those be in order for arviz.loo to respect the importance of each observation? Should I just multiply the bernoulli_logit_lpmf by the weight?

From a modelling perspective, I understand that maybe I shouldn’t be adding into my model manually. However, in this particular case, the parameter I’m focused on understanding is supposed to capture something about the dynamics of a game when it’s played optimally, so I want it to most-closely match the performance of the highest level players. If there’s a way to represent that in my model definition, then I’d love to hear about it, but I couldn’t figure out any way to do it except to weight the observations (the results of games played) based on knowledge of who the top players are.


Yes, you can weight then in the same way as in the model block.


I wouldn’t also weight the bernoulli_logit_rng, right?


No and you don’t weight win_chance_logit[n]. If you want to get draws from the weighted marginal, then you can after the sampling to subsample from each win_hat[n] according to the weights, but I don’t see what would be the need for that in this case.


As I am working with weights in my own model, I would like to know what would be the rationale for that as opposed to arviz.loo.

In my model I am using the generated quantity block for my predictions so I too was in doubt whether to use or not weights like in the model block.

Thanks in advance and for this great explanation.


I don’t understand what do you mean by “as opposed to arviz.loo .”


Sorry for the unclear passage. What I would like to know is the rationale for this:

Thanks in advance.


I ended up with this model:

data {
    int<lower=0> NG; // Number of games
    int<lower=0> NM; // Number of matchups
    int<lower=0> NP; // Number of players
    int<lower=0> NC; // Number of characters

    int<lower=0, upper=1> win[NG]; // Did player 1 win game
    int<lower=1, upper=NM> mup[NG]; // Matchup in game
    vector<lower=0, upper=1>[NG] non_mirror; // Is this a mirror matchup: 0 = mirror
    int<lower=1, upper=NC> char1[NG]; // Character 1 in game
    int<lower=1, upper=NC> char2[NG]; // Character 2 in game
    int<lower=1, upper=NP> player1[NG]; // Player 1 in game
    int<lower=1, upper=NP> player2[NG]; // Player 1 in game
    vector[NG] elo_logit; // Player 1 ELO-based logit win chance
    vector[NG] obs_weights;
parameters {
    vector[NM] mu; // Matchup value
    vector<upper=0>[NP] char_skill_raw[NC]; // Player skill at character
    vector<upper=0>[NC] char_skill_mean; // Average player skill at character
    vector<lower=0>[NC] char_skill_variance; // Player skill variance per character
    real elo_logit_scale; // elo_logit scale
transformed parameters {
    vector[NG] player_char_skill1;
    vector[NG] player_char_skill2;
    vector[NG] win_chance_logit;
    vector<upper=0>[NP] char_skill[NC]; // Player skill at character

    for (c in 1:NC) {
        char_skill[c] = char_skill_mean[c] + char_skill_raw[c] * char_skill_variance[c];
    for (n in 1:NG) {
        player_char_skill1[n] = char_skill[char1[n], player1[n]];
        player_char_skill2[n] = char_skill[char2[n], player2[n]];
    win_chance_logit = (player_char_skill1 - player_char_skill2) + non_mirror .* mu[mup] + elo_logit_scale * elo_logit;
model {
    for (n in 1:NC) {
        char_skill_raw[n] ~ std_normal();
    mu ~ normal(0, 0.5);
    elo_logit_scale ~ std_normal();
    char_skill_mean ~ uniform(-4, 0);
    char_skill_variance ~ normal(0, 3);

    for (n in 1:NG) {
        target += bernoulli_logit_lpmf(win[n] | win_chance_logit[n]) * obs_weights[n];
generated quantities{
    vector[NG] log_lik;
    vector[NG] win_hat;

    for (n in 1:NG) {
        log_lik[n] = bernoulli_logit_lpmf(win[n] | win_chance_logit[n]) * obs_weights[n];
        win_hat[n] = bernoulli_logit_rng(win_chance_logit[n]);

So, I’m weighting the update to target, and I’m weighting the log likelyhoods. Does that seem right? Like @realkrantz, I’m a bit confused by "you don’t weight win_chance_login[n]"


You don’t weight win_chance_logit[n] in the model block, and you don’t weight win_chance_logit[n] when computing log_lik in generated quantities. Why would you then weight when you use it to generate 0’s and 1’s from the predictive distribution? Think again what win_chance_logit[n] and win_hat[n] are and then answer why would you like to weight either of them?


Is it possible to compare the accuracy of a weighted model with an unweighted one? The values I get from arviz.loo seem to scale differently based on what my weight values are. Do I need to normalize the weights so that they sum to the number of observations?


Hi, @avehtari. Thanks for your explanation. My question is for those cases that the weights are part of the generative story.

For these cases, that the use of weights in the model block is reasonable, I would appreciate a rationale for:

Because, like @Calen_Pennington, I am not certain which approach is more appropriate here:


I’m pretty sure you don’t want to weight the predicted values inside generated quantities. In particular, you don’t want to scale y_pred down, you want to scale the likelyhood of specific y values in your observations. So, you want to scale the probability mass/density functions (to make those observations less likely), but not the output generated by apply the model to make predictions from your existing data.


I think so too. Thanks, @Calen_Pennington, for the input. Very helpful.