How to deal with uncentered parameters not estimating well, potentially due to bad identifiability constraints?

I am having a problem with identifiability when using non centered parameters. My beta and type parameters (when i use real data) will likely be positive, lets say normal(1,.5) for beta and normal(2,.5) for type. And so, since neither are centered at 0 it is making it difficult to model this. My data is structured as follow: theta = people, beta = test item difficulty, type = item type effect. So i could have 100 people taking 40 items, where each item is 1 of 4 types. Important to also note that the type an item is does not change between persons, so item 1 for instance, if type 3, is always type 3.

I have tried to use a sum to 0 constraint on type or beta, however i run into problems when doing that. Because my beta parameters are mostly all positive, the pinned last beta value to be the - sum ends up being a large negative number which makes it difficult to estimate the type parameter that contains that constrained beta. Similarly with type. I then tried to pin a value of beta or type to 0, but it does not seem to consistently work well. What else can i do in this case?

here is my data generation

    J <- 150
    I <- 40
    K <- 4
    
    theta <- rnorm(J, 0, .4)
    beta <- rnorm(I, .7, .5)
    type <- rnorm(K, 2, .5)
    type[1] <- 0
    
    N <- I*J
    ii <- rep(1:I, times = J)
    jj <- rep(1:J, each = I)
    kk <- rep(1:K, times = N/K)

    y <- numeric(N)
    for(n in 1:N){
      y[n] <- rpois(1,exp(theta[jj[n]] + beta[ii[n]] + type[kk[n]]))
    }

    fit <- sampling(mod, data = list(I = I,
                                                          J = J,
                                                          N = N,
                                                          ii = ii,
                                                          jj = jj,
                                                          kk = kk,
                                                          K = K,
                                                          y = y),
                              iter = 4000)
data {
      int<lower=0> I;
      int<lower=0> J;
      int<lower=0> K;
      int<lower=1> N;
      int<lower=1,upper=I> ii[N];
      int<lower=1,upper=J> jj[N];
      int<lower=1,upper=K> kk[N];
      int<lower=0> y[N];
    }
    parameters {
      vector[J] theta;
      vector[K] type;
      vector[I-1] b_free;
      real<lower=0> sigma_beta;
      real<lower=0> sigma_type;
      real<lower=0> sigma_theta;
      real mu_type;
      real mu_beta;

 
    }
    transformed parameters{
      vector[I] beta = append_row(0, b_free);
    }


    model {

      sigma_theta ~ cauchy(0,2);
      sigma_beta ~ cauchy(0,2);
      sigma_type ~ cauchy(0,2);

      mu_type ~ normal(0,3);
      mu_beta ~ normal(0,3);
    
      theta ~ normal(0,sigma_theta);
      beta ~ normal(mu_beta,sigma_beta);
      type ~ normal(mu_type,sigma_type);
      
      
      y ~ poisson_log(theta[jj] + beta[ii] + type[kk]);
  }
1 Like

The model

y ~ poisson_log( theta[jj] + beta[ii] + type[kk] );

can’t separate the mean beta from the mean type as the likelihood remains the same if we subtract a constant from the betas and add it to the types.

Since the items are nested within types (an item i is always of the same type k[i]), one option is to model the betas with type k as normal with mean type[k]. Then the type means are identified. And since there only four types, with the same number of observations per type, it doesn’t seem necessary to specify a multi-level model for the types (ie. treat those as fixed/population-level effects instead).

I tried the suggestion

data {
  int<lower=0> I;            // number of items
  int<lower=0> J;            // number of people
  int<lower=0> K;            // number of types
  int<lower=1> N;            // total number of observations
  int<lower=1,upper=I> ii[N]; // item index for each observation
  int<lower=1,upper=J> jj[N]; // context index for each observation
  int<lower=1,upper=K> kk[N]; // type index for each observation
  int<lower=0> y[N];         // observed counts
}

parameters {
  vector[J] theta;             
  vector[K] type;                
  vector[I] beta;               
  real<lower=0> sigma_type;
  real<lower=0> sigma_beta;
  real<lower=0> sigma_theta;
  real mu_type;

}
model {
  // Priors
  theta ~ normal(0, sigma_theta); 
  type ~ normal(mu_type, sigma_type);

  mu_type ~ normal(0,4);
  sigma_type ~ cauchy(0,2);
  sigma_beta ~ cauchy(0,2);
  sigma_theta ~ cauchy(0,2);

  // Loop over items and assign betas according to their type k
  for (k in 1:K) {
    for (i in 1:I) {
      if (kk[i] == k) { // Check if item i is of type k
        beta[i] ~ normal(type[k], sigma_beta);
      }
    }
  }

  // Likelihood
  y ~ poisson_log(theta[jj] + beta[ii] + type[kk]);
}

where kk[1:I] = 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

This however still suffers from additive unidentifiability between beta and type. The unique estimates are off, but their sum is identified. Also, what other options exist here?

This is a property of the likelihood:

y ~ poisson_log(theta[jj] + beta[ii] + type[kk]);

is the same as

y ~ poisson_log(theta[jj] + (beta[ii] - a) + (type[kk] + a));

So in this sense you cannot estimate beta effects and type effects. For example, you can estimate differences between types, or you can assume that the mean beta is 0 and then estimate type effects. It’s a question of what specification is most meaningful to your analysis.

Thanks. If this is the case, one of the problems i may run into is that it is extremely likely when using real data that beta and type both will not be centered around 0. Knowing this, how would i go about centering beta for example around 0? Or can this be done by just setting the prior for beta to normal(0,sigma_beta)? Also, would pinning the first beta to 0 be an effective alternative to this problem?

I don’t understand the “pinning to 0” idea: you set type[1] = 0 when you generate the data, then set beta[1] = 0 in the Stan code but still put a normal prior on it. It’s more natural to set beta have normal(0, sigma_beta) prior. (I’m still not sure you gain much from the normal(mu_type, sigma_type) prior on the type parameters as there are only 4 of those; are there any other possible types?) And how do you assess whether the model fitting works or not?

I meant pinning the first beta value to 0 to set it as a reference that all other betas can be compared to. When i generate the data i would set the first beta to 0 as well, then allowing type to float. I assess model fit by comparing the estimates to my true generating parameters. I will now try what you suggested, having beta be centered and then type can be freely estimated.

So i ran this code:

data {
      int<lower=0> I;
      int<lower=0> J;
      int<lower=0> K;
      int<lower=1> N;
      int<lower=1,upper=I> ii[N];
      int<lower=1,upper=J> jj[N];
      int<lower=1,upper=K> kk[N];
      int<lower=0> y[N];
    }
    parameters {
      vector[J] theta;
      vector[K] type;
      vector[I] beta;
      real<lower=0> sigma_theta;
      real<lower=0> sigma_beta;
      real<lower=0> sigma_type;
      real mu_type;
    }
    model {
    
      sigma_theta ~ cauchy(0,2);
      sigma_beta ~ cauchy(0,2);
      sigma_type ~ cauchy(0,2);
      
      theta ~ normal(0, sigma_theta);
      beta ~ normal(0, sigma_beta);
      
      mu_type ~ normal(0,5);
      type ~ normal(mu_type,sigma_type);
      
      
      y ~ poisson_log(theta[jj] + beta[ii] + type[kk]);
  }

This centers beta at 0, even though it was generated with mean(beta) = .98


#true parameters
    J <- 150
    I <- 40
    K <- 4
    
    theta <- rnorm(J, 0, .3)
    beta <- rnorm(I, 1, .4)
    type <- rnorm(K, 2, .7)


mean(summary(fit, pars = c("beta"))$summary[,1]) = .0004

So what happens here is that because beta does indeed have an avg effect of ~1 due to my data generation, when it gets centered to 0, that effect is offloaded onto type. So for instance here, if i subtract the true effect of beta from type:

#true type
1.984104 1.147125 2.998301 1.496971


#stan estimates
summary(fit, pars = c("type"))$summary[,1] - mean(beta)
type[1]  type[2]  type[3]  type[4] 
2.118548 1.162591 2.856867 1.431059 

And so here I would probably say that the estimate was successful (theta looked good also).

1 Like

Good. With your real data, you don’t know the true mean(beta) so won’t be able to compare directly the true type effects with the estimated effects. The pairwise difference though are estimable: type[1] - type[2], type[1] - type[3] and so on.

This makes sense i believe as beta would now be seen more as a random effect such that I would likely not be reporting the actual beta estimates themselves, but the variance of beta in order to provide sort of an “error” effect of the type estimates. For instance, the type of an item can explain something, but the difficulty of the items (beta) also has a small or large effect depending on what sigma_beta is estimated to be.

So in the end, is there no way to estimate my model such that i can have beta and type not centered, and return accurate estimates? If possible i would like to be able to determine for any given item, how much beta and type uniquely contribute. Or does one of them have to be centered in some way? I believe i read somewhere in relation to this model that it would be possible if items were not constantly the same type. For instance, if item 1 could be type 2 for person 1, and then for person 2, item 1 could be type 3, etc. This mixing would allow to get unique contributions i suppose?

Not under the model you are considering. This doesn’t imply the estimates are not accurate but that the mean beta and the mean type are non-identifiable in the model. (If you had an idea what the mean beta is, you could use that instead of 0 but I assume the question is whether you can learn it from the data.)