Rhat > 3 for all parameters in simple latent variable model

tm87 · December 24, 2019, 12:55am

I’m modeling a setting where critics assign scores to films. There are three critics and N films, and each critic assigns a score in [0, 1] to each film. Film n has true score U^{(n)}, and critic i has idiosyncrasy \nu_i. My model is R_i^{(n)} = U^{(n)} + \nu_i + \epsilon, where \epsilon is noise. I’m currently simulating data based on this model and trying to recover the ground truth distributions for my parameters.

My Stan model is:

    data {
        int<lower=0> N;
        vector[N] R_0;
        vector[N] R_1;
        vector[N] R_2;
    }
    parameters {
        vector<lower=0, upper=1>[N] U;
        real nu_0;
        real nu_1;
        real nu_2;
    }
    model {
        // Specify priors
        for (n in 1:N) {
            U[n] ~ uniform(0, 1);
        }
        nu_0 ~ normal(0, 1);
        nu_1 ~ normal(0, 1);
        nu_2 ~ normal(0, 1);

        // Specify model
        for (n in 1:N) {
            R_0[n] ~ normal(U[n] + nu_0, 1e-6);
            R_1[n] ~ normal(U[n] + nu_1, 1e-6);
            R_2[n] ~ normal(U[n] + nu_2, 1e-6);
        }
    }

and the Python code for generating the simulated data is:

    n = 1000
    nu_means = [-0.2, 0, 0.2]

    data = {'N': n}
    u = rng.uniform(0, 1, n)
    for critic_idx, nu_mean in enumerate(nu_means):
        nu = rng.normal(nu_mean, 0.1, n)
        epsilon = rng.normal(0, 1e-6, n)
        data[f'R_{critic_idx}'] = u + nu + epsilon

The posterior means of the \nu variables appear to match the ground truth, but Rhat is greater than 3 for every parameter. I’ve tried adjusting iter, max_treedepth, and adapt_delta based on recommendations given in the output, but I’m not sure what to do next.

Jean_Billie · December 24, 2019, 1:46am

I am not sure your model, but I guess the parameter U should be independent of the samples, namely,

use real<lower=0, upper=1> U;
instead of vector<lower=0, upper=1>[N] U;

I also changed the variance. The critics assign scores to films which changes by films and I think such variation should be represented by the variance of distribution rather than model parameter.
But … I am not sure :)

  data {
        int<lower=0> N;
        vector[N] R_0;
        vector[N] R_1;
        vector[N] R_2;
    }
    parameters {
        real<lower=0, upper=1>U;
        real nu_0;
        real nu_1;
        real nu_2;
    }
    model {
        // Specify priors
  
            U ~ uniform(0, 1);
      
        nu_0 ~ normal(0, 1);
        nu_1 ~ normal(0, 1);
        nu_2 ~ normal(0, 1);

        // Specify model
        for (n in 1:N) {
            R_0[n] ~ normal(U + nu_0, 1);
            R_1[n] ~ normal(U + nu_1, 1);
            R_2[n] ~ normal(U + nu_2, 1);
        }
    }

tm87 · December 24, 2019, 1:51am

Thank you for your comment. Without a separate U^{(n)} for each film, I would be unable to draw any conclusions about a specific film, despite having three reviews for each.

maedoc · December 24, 2019, 4:31am

1e-6 is pretty tight, but in any case you’re simulating n different values of nu while fitting 1, that could multimodal your posterior and raise Rhat.

FJCC · December 24, 2019, 2:54pm

I am not very experienced with Stan or Python, so there are many opportunities for me to misunderstand the situation. Be cautious with my advice!
I read the Python code that generates the data to be using a standard deviation of 0.1 to generate the nu values. This would place a standard deviation of 0.1 on R_0, R_1, and R_2. The standard deviation of 1e-6 on epsilon would be entirely washed out. Yet the sampling statements for the R_i variables only allow for a standard deviation of 1e-6. Since the U[n] value is shared among the three R_i variables and each nu_i is used for all data points, the difference between R_i and U[n] + nu_i will almost always be orders of magnitude larger than 1e-6. The mismatch between the variation in the data and the variation expected by the model results in there being no good fit, I think.

Separately, the model structure allows for R values outside of [0, 1]. Ignoring epsilon for the moment, if R = U + nu and U is near 1 and nu is positive, R will be greater than one. The same problem will happen near zero when nu is negative. Your priors even allow nu to have an absolute value greater than one, which clearly cannot be if both U and R are in [0, 1]. Perhaps this is not a problem with your application.

I invented some data in R and modified your model a bit. Rather than fixing the standard deviation in the sampling statements, I made it a parameter. I also tightened the prior on the nu_i variables. The fit runs with good Rhat and acceptable n_eff, though the number of samples and the accuracy on nu_i are not great. Of course, you could just set the standard deviation in the sampling statements to a fixed but larger value, say 0.1.

I hope that helps a bit. Again, keep in mind that I am no expert.

set.seed(10)
DF <- data.frame(U = runif(50, min = 0, max = 1))
DF$R0 <- DF$U - 0.2 + rnorm(50, mean = 0, sd = 0.1)
DF$R1 <- DF$U + rnorm(50, mean = 0, sd = 0.1)
DF$R2 <- DF$U + 0.1 + rnorm(50, mean = 0, sd = 0.1)
dataList <- list(N = nrow(DF), R_0 = DF$R0, R_1 = DF$R1, R_2 = DF$R2)

data {
  int<lower=0> N;
  vector[N] R_0;
  vector[N] R_1;
  vector[N] R_2;
}
parameters {
  vector<lower=0, upper=1>[N] U;
  real nu_0;
  real nu_1;
  real nu_2;
  real<lower=0, upper=0.3> sig;
}
model {
// Specify priors
  for (n in 1:N) {
    U[n] ~ uniform(0, 1);
  }
  nu_0 ~ normal(0, 0.3);
  nu_1 ~ normal(0, 0.3);
  nu_2 ~ normal(0, 0.3);

// Specify model
  for (n in 1:N) {
    R_0[n] ~ normal(U[n] + nu_0, sig);
    R_1[n] ~ normal(U[n] + nu_1, sig);
    R_2[n] ~ normal(U[n] + nu_2, sig);
  }
}

FIT1 <- stan(model_code = MODEL, data = dataList, chains = 4, iter = 4000)

Jean_Billie · December 24, 2019, 8:21pm

I learn a lot from your comment.
The current model R_i^{(n) }= U^{(n)}+\nu_i + \epsilon
cannot generate data in the interval [0,1], because the left hand side is belong to [0,1] but the right hand side is in \mathbb{R}. In order to validate models, we need to draw samples from a model, but this model cannot replicate data. Some special technique is required to obtain a generative model. It’s hard.

Topic		Replies	Views
Question about Rhat statistics General rstan	5	636	June 25, 2020
Fairly simple hierarchical model with odd Rhat and neff values Modeling fitting-issues	17	1268	June 7, 2019
Rhat calculation fails in for sentence? Modeling fitting-issues	2	541	November 30, 2021
Bad RHat values and divergent transitions when sampling from the prior Modeling fitting-issues , specification	11	746	September 22, 2020
Huge RHATs, Extreme Model Performance Variability Depending on Priors Modeling	10	879	September 23, 2020

Rhat > 3 for all parameters in simple latent variable model

Related topics