# How to deal with uncentered parameters not estimating well, potentially due to bad identifiability constraints?

I am having a problem with identifiability when using non centered parameters. My beta and type parameters (when i use real data) will likely be positive, lets say normal(1,.5) for beta and normal(2,.5) for type. And so, since neither are centered at 0 it is making it difficult to model this. My data is structured as follow: theta = people, beta = test item difficulty, type = item type effect. So i could have 100 people taking 40 items, where each item is 1 of 4 types. Important to also note that the type an item is does not change between persons, so item 1 for instance, if type 3, is always type 3.

I have tried to use a sum to 0 constraint on type or beta, however i run into problems when doing that. Because my beta parameters are mostly all positive, the pinned last beta value to be the - sum ends up being a large negative number which makes it difficult to estimate the type parameter that contains that constrained beta. Similarly with type. I then tried to pin a value of beta or type to 0, but it does not seem to consistently work well. What else can i do in this case?

here is my data generation

``````    J <- 150
I <- 40
K <- 4

theta <- rnorm(J, 0, .4)
beta <- rnorm(I, .7, .5)
type <- rnorm(K, 2, .5)
type[1] <- 0

N <- I*J
ii <- rep(1:I, times = J)
jj <- rep(1:J, each = I)
kk <- rep(1:K, times = N/K)

y <- numeric(N)
for(n in 1:N){
y[n] <- rpois(1,exp(theta[jj[n]] + beta[ii[n]] + type[kk[n]]))
}

fit <- sampling(mod, data = list(I = I,
J = J,
N = N,
ii = ii,
jj = jj,
kk = kk,
K = K,
y = y),
iter = 4000)
``````
``````data {
int<lower=0> I;
int<lower=0> J;
int<lower=0> K;
int<lower=1> N;
int<lower=1,upper=I> ii[N];
int<lower=1,upper=J> jj[N];
int<lower=1,upper=K> kk[N];
int<lower=0> y[N];
}
parameters {
vector[J] theta;
vector[K] type;
vector[I-1] b_free;
real<lower=0> sigma_beta;
real<lower=0> sigma_type;
real<lower=0> sigma_theta;
real mu_type;
real mu_beta;

}
transformed parameters{
vector[I] beta = append_row(0, b_free);
}

model {

sigma_theta ~ cauchy(0,2);
sigma_beta ~ cauchy(0,2);
sigma_type ~ cauchy(0,2);

mu_type ~ normal(0,3);
mu_beta ~ normal(0,3);

theta ~ normal(0,sigma_theta);
beta ~ normal(mu_beta,sigma_beta);
type ~ normal(mu_type,sigma_type);

y ~ poisson_log(theta[jj] + beta[ii] + type[kk]);
}
``````
1 Like

The model

``````y ~ poisson_log( theta[jj] + beta[ii] + type[kk] );
``````

canâ€™t separate the mean `beta` from the mean `type` as the likelihood remains the same if we subtract a constant from the `beta`s and add it to the `type`s.

Since the items are nested within types (an item `i` is always of the same type `k[i]`), one option is to model the `beta`s with type `k` as normal with mean `type[k]`. Then the type means are identified. And since there only four types, with the same number of observations per type, it doesnâ€™t seem necessary to specify a multi-level model for the types (ie. treat those as fixed/population-level effects instead).

I tried the suggestion

``````data {
int<lower=0> I;            // number of items
int<lower=0> J;            // number of people
int<lower=0> K;            // number of types
int<lower=1> N;            // total number of observations
int<lower=1,upper=I> ii[N]; // item index for each observation
int<lower=1,upper=J> jj[N]; // context index for each observation
int<lower=1,upper=K> kk[N]; // type index for each observation
int<lower=0> y[N];         // observed counts
}

parameters {
vector[J] theta;
vector[K] type;
vector[I] beta;
real<lower=0> sigma_type;
real<lower=0> sigma_beta;
real<lower=0> sigma_theta;
real mu_type;

}
model {
// Priors
theta ~ normal(0, sigma_theta);
type ~ normal(mu_type, sigma_type);

mu_type ~ normal(0,4);
sigma_type ~ cauchy(0,2);
sigma_beta ~ cauchy(0,2);
sigma_theta ~ cauchy(0,2);

// Loop over items and assign betas according to their type k
for (k in 1:K) {
for (i in 1:I) {
if (kk[i] == k) { // Check if item i is of type k
beta[i] ~ normal(type[k], sigma_beta);
}
}
}

// Likelihood
y ~ poisson_log(theta[jj] + beta[ii] + type[kk]);
}
``````

where kk[1:I] = 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

This however still suffers from additive unidentifiability between beta and type. The unique estimates are off, but their sum is identified. Also, what other options exist here?

This is a property of the likelihood:

``````y ~ poisson_log(theta[jj] + beta[ii] + type[kk]);
``````

is the same as

``````y ~ poisson_log(theta[jj] + (beta[ii] - a) + (type[kk] + a));
``````

So in this sense you cannot estimate beta effects and type effects. For example, you can estimate differences between types, or you can assume that the mean beta is 0 and then estimate type effects. Itâ€™s a question of what specification is most meaningful to your analysis.

Thanks. If this is the case, one of the problems i may run into is that it is extremely likely when using real data that beta and type both will not be centered around 0. Knowing this, how would i go about centering beta for example around 0? Or can this be done by just setting the prior for beta to normal(0,sigma_beta)? Also, would pinning the first beta to 0 be an effective alternative to this problem?

I donâ€™t understand the â€śpinning to 0â€ť idea: you set type[1] = 0 when you generate the data, then set beta[1] = 0 in the Stan code but still put a normal prior on it. Itâ€™s more natural to set beta have normal(0, sigma_beta) prior. (Iâ€™m still not sure you gain much from the normal(mu_type, sigma_type) prior on the type parameters as there are only 4 of those; are there any other possible types?) And how do you assess whether the model fitting works or not?

I meant pinning the first beta value to 0 to set it as a reference that all other betas can be compared to. When i generate the data i would set the first beta to 0 as well, then allowing type to float. I assess model fit by comparing the estimates to my true generating parameters. I will now try what you suggested, having beta be centered and then type can be freely estimated.

So i ran this code:

``````data {
int<lower=0> I;
int<lower=0> J;
int<lower=0> K;
int<lower=1> N;
int<lower=1,upper=I> ii[N];
int<lower=1,upper=J> jj[N];
int<lower=1,upper=K> kk[N];
int<lower=0> y[N];
}
parameters {
vector[J] theta;
vector[K] type;
vector[I] beta;
real<lower=0> sigma_theta;
real<lower=0> sigma_beta;
real<lower=0> sigma_type;
real mu_type;
}
model {

sigma_theta ~ cauchy(0,2);
sigma_beta ~ cauchy(0,2);
sigma_type ~ cauchy(0,2);

theta ~ normal(0, sigma_theta);
beta ~ normal(0, sigma_beta);

mu_type ~ normal(0,5);
type ~ normal(mu_type,sigma_type);

y ~ poisson_log(theta[jj] + beta[ii] + type[kk]);
}

``````

This centers beta at 0, even though it was generated with mean(beta) = .98

``````
#true parameters
J <- 150
I <- 40
K <- 4

theta <- rnorm(J, 0, .3)
beta <- rnorm(I, 1, .4)
type <- rnorm(K, 2, .7)

mean(summary(fit, pars = c("beta"))\$summary[,1]) = .0004
``````

So what happens here is that because beta does indeed have an avg effect of ~1 due to my data generation, when it gets centered to 0, that effect is offloaded onto type. So for instance here, if i subtract the true effect of beta from type:

``````#true type
1.984104 1.147125 2.998301 1.496971

#stan estimates
summary(fit, pars = c("type"))\$summary[,1] - mean(beta)
type[1]  type[2]  type[3]  type[4]
2.118548 1.162591 2.856867 1.431059
``````

And so here I would probably say that the estimate was successful (theta looked good also).

1 Like

Good. With your real data, you donâ€™t know the true mean(beta) so wonâ€™t be able to compare directly the true type effects with the estimated effects. The pairwise difference though are estimable: type[1] - type[2], type[1] - type[3] and so on.

This makes sense i believe as beta would now be seen more as a random effect such that I would likely not be reporting the actual beta estimates themselves, but the variance of beta in order to provide sort of an â€śerrorâ€ť effect of the type estimates. For instance, the type of an item can explain something, but the difficulty of the items (beta) also has a small or large effect depending on what sigma_beta is estimated to be.

So in the end, is there no way to estimate my model such that i can have beta and type not centered, and return accurate estimates? If possible i would like to be able to determine for any given item, how much beta and type uniquely contribute. Or does one of them have to be centered in some way? I believe i read somewhere in relation to this model that it would be possible if items were not constantly the same type. For instance, if item 1 could be type 2 for person 1, and then for person 2, item 1 could be type 3, etc. This mixing would allow to get unique contributions i suppose?

Not under the model you are considering. This doesnâ€™t imply the estimates are not accurate but that the mean beta and the mean type are non-identifiable in the model. (If you had an idea what the mean beta is, you could use that instead of 0 but I assume the question is whether you can learn it from the data.)