Pedagogical example: search for a "good" non-trivial unidentifiable model

For teaching, I’m looking for some interesting (practically) unidentifiable models which, ideally, are quite simple. I have a couple of examples that I use (the Lotka-Volterra ODE model with Gaussian noise sampled sparsely, for example), but I’m sure there are better and simpler ones out there.

Specifically, I’m looking for a model + data where:

  • The model is relatively simple to explain
  • Ideally, the model has relatively few parameters, so can be compactly written down
  • Generating simulated data from the model is straightforward so that the class could do this as part of the exercise to show the system is unidentified
  • The reason for the unidentification is non-trivial. So, for example, it’s not (say) a regression model where two parameters appear in the likelihood as a product

I’m looking for an example that’d be good for an audience of early career researchers who are just beginning in Bayesian inference.

Does anyone have a favourite example here?

9 Likes

Models with two random intercepts with similar but not-identical groups (e.g. nested random effects with few levels of the nested effect per level of the coarser effect), where the total variance is identified but the individual variances not so much. This can be usefully reparameterized by the total variance and a (weakly identified) parameter from zero to one that allocates that variance to one or the other grouping.

A more sophisticated application of this trick is used for the spatial versus nonspatial variances in some parameterizations of ICAR models.

3 Likes

SEM-style latent variable models are notorious for needing indentifiability constraints. Simplest:

#R code to generate

#means
z_mu = rnorm(1)
x_mu = rnorm(1)
y_mu = rnorm(1)

#SDs:
z_sigma = exp(rnorm(1))
x_sigma = exp(rnorm(1))
y_sigma = exp(rnorm(1))

# loading weights
z_x_beta = rnorm(1)
z_y_beta = rnorm(1)

#sample size
n = 100

#latent variable z
z = rnorm(n,z_mu,z_sigma)

#observed_variables
x = z*z_x_beta + rnorm(n,x_mu,x_sigma)
y = z*z_y_beta + rnorm(n,y_mu,y_sigma)

Then have them try to do inference on the *_mu, *_sigma & *_beta values as well as the latent z values.

Eventually it’ll become apparent that even with a high sample size, without identifiability constraints Stan’s sampler will struggle thanks to the multimodality. Then you can show that there are different kinds of constraints necessary.
The sign of the betas and the relative ordering of the z-values are interdependent (so a common approach is to fix the sign of one beta to positive).
The magnitude of the *_betas and z_scale are interdependent (so a common approach is to fix z_scale=1.
The values of the *_mus and *_betas are interdependent (so a common approach is to fix z_mu=0).
There’s probably more interdependencies I’m missing (I’m still relatively new to SEM), and certainly other interesting identifiability constraint options than the ones I note parenthetically above.

5 Likes

Oh, and multi-item ordinal-outcome models have some neat identifiably aspects.

A categorical/multinomial regression? There’s even an example in the manual.

1 Like

Thanks @mike-lawrence @jsocolar @bnicenboim – those are all good options. My plan is to make a (pedagogical) repository with these, so will share it once it’s online.

Anyone else – still interested to hear of your interestingly unidentified models!

6 Likes

I have an IMHO neat sigmoid example at Identifying non-identifiability

4 Likes

Mixture models pose all sorts of identifiability problems.

This post exhibits a subtle one with just gaussians. If the means are close and the variances high enough, there’s nearly no way to figure out how many gaussians there should be. You can put “repulsive” priors to help a bit but then you need some “attractive” priors too! It just gets really unwieldy, even in a simple Gaussian mixture model.

See also @betanalpha Identifying Bayesian Mixture Models (betanalpha.github.io).

4 Likes

I think measurement error models that involve x_{obs} \sim N(x_{true}, \tau_x) produce some interestingly degenerate likelihoods

@bnicenboim do I recall correctly that diffusion models for response time data require identifiably constraints? Maybe LBA as well?

Here’s another one, as described here, a Von Mises / uniform mixture for circular data has identifiability issues when p(uniform) is high and the Von Mises precision is low.

1 Like

| mike-lawrence
June 17 |

  • | - |

@bnicenboim do I recall correctly that diffusion models for response time data require identifiably constraints?

Yes, the scale. Also LBA.

@Ben_Lambert heres a paper whose intro has some refs on identifiably in drift-diffusion models.

1 Like

If you fit a multivariate normal distribution with a single factor structure, then you have problems with identification (positive or negative). You have to constrain one of the factor variables to be greater than zero to identify the rest properly.

3 Likes

One has to be careful to separate out technical identifiabilities (which obstruct almost all realized likelihood functions from contracting to a point with infinite data) from the more common complex uncertainties that arise in many models. To avoid confusion I refer to the latter as degeneracies (unidentifiable models lead to degeneracies, but not all degeneracies are due to unidentifiable models).

I discuss this terminology and review a variety of common sources of degeneracies (and how to investigate them with Stan) in Identity Crisis.

My case studies reviewing particular modeling techniques also review the degeneracies inherent to those models, for example:

Hierarchical Modeling (Section 3 and Section 4)

Ordinal Regression (Section 2.1)

Robust Gaussian Process Modeling (Section 3.2)

8 Likes

If you want to dive into the topic you may be interested by a recent book on parameter identifiability. It has both trivial and non-trivial examples (the latter can be fairly complicated).

Regarding non-trivial yet not too complicated examples: any logistic-style population growth model fitted on time series that do not cover both very high and very low densities will have difficulties separating the maximum growth rate from other parameters (carrying capacity, speed of return to equilibrium,…) An example here

1 Like

Unfortunately, the terminology (as noted by Mike) can be quite confusing - identifiability has precise definition used in most of statistics, but in Stan community (and possibly elsewhere) it is used more loosely. In the looser sense, “non-identifiability” is sometimes used to refer also to models that are identifiable in the technical sense, but where the amount/nature of data we currently have result in the likelihood being sufficiently similar for vastly different parameter values that it poses the same obstacles for computation as a strictly non-identifiable model would. Some people use “weakly identifiable” for this case, but I don’t think it is established usage.

I kind of like using the word “degenerate” for the broader class of problems we can encounter, but it has a similar problem since “degenerate posterior” already has an established meaning in statistics which is substantially different.

Terminology can definitely be changed, but I think that for the time being, if precision is desired, it is IMHO best to be a bit more verbose and say exactly what kind of problem one wants to talk about, e.g. “multimodality”, “uneven curvature”, “available data inform only some aspects of the model” etc. but that’s just my personal opinion :-)

Good luck with your teaching!

4 Likes

Here’s another simple suite of examples with e.g. fixed vs random intercepts in linear mixed models

1 Like

Thanks all – great suggestions. I’m still leaving this open in case anyone else wants to contribute their examples.

1 Like

The intercept in case-control logistic regression.