Hierarchical mixture

mike-lawrence · May 30, 2023, 4:09pm

Take the simple hierarchical model:

data{
  int n_id ;
  int n_obs ;
  matrix[n_obs,n_id] obs ;
}
parameters{
  vector[n_id] mu ;
  real mu_mean ;
  real<lower=0> mu_sd ;
  real<lower=0> obs_noise ;
}
model{
  mu ~ normal(mu_mean,mu_sd) ;
  mu_mean ~ std_normal() ;
  mu_sd ~ weibull(2,1) ;
  obs_noise ~ weibull(2,1) ;
  for( i_id in 1:n_id){
    obs[,i_id] ~ normal( mu[i_id], obs_noise) ;
  }
}

I’d like to modify this to express a model whereby there’s two latent groups, one group where mu_mean is positive and one that is negative. I’m uncertain however if I should have a single mixture probability parameter, as in:

...
parameters{
  vector[n_id] mu_neg ;  
  vector[n_id] mu_pos ;
  real<upper=0> mu_mean_neg ;  
  real<lower=0> mu_mean_pos ;
  real<lower=0> mu_sd ;
  real<lower=0> obs_noise ;
  real<lower=0,upper=1> group_prob ;
}
model{
  mu_neg ~ normal(mu_mean_neg , mu_sd) ;
  mu_pos ~ normal(mu_mean_pos , mu_sd) ;
  mu_mean_neg ~ std_normal() ;
  mu_mean_pos ~ std_normal() ;
  mu_sd ~ weibull(2,1) ;
  obs_noise ~ weibull(2,1) ;
  for( i_id in 1:n_id){
    target += log_mix(
      group_prob
      , normal_lupdf( obs[,i_id] | mu_neg[i_id], obs_noise) 
      , normal_lupdf( obs[,i_id] | mu_pos[i_id], obs_noise) 
    )
  }
}

or a mixture probability parameter for each individual:

...
parameters{
  ...
  vector<lower=0,upper=1>[n_id] group_prob ;
}
model{
  ...
  for( i_id in 1:n_id){
    target += log_mix(
      group_prob[i_id]
      ...
    )
  }
}

Thoughts?

MauritsM · May 31, 2023, 9:45am

I think it’s more common to have a single mixture probability parameter. I should add that I’m not an expert, but as far as I understand mixtures, if you want to sample from a mixture distribution you first sample group membership from the common group membership distribution (say, Bernouilli in your case), and then you sample each observation according to its group distribution.

Thinking about this in another way, if each observation had its own mixture probability, then if you knew the prior probability for those individual probabilities, wouldn’t that also ultimately control group membership probabilities in a single parameter?

What I’m trying to say is that, if your individual group membership probabilities were drawn from (e.g.) a Beta(1,1) distribution, then you could marginalize out the intermediate step and conclude that the observations had an equal probability of belonging to each group, right?

I hope this helps :-)

ernestbrearules · May 31, 2023, 11:50am

Hey, MauritsM, I’m grateful for your comment! Thank you!
Also, I think that OP author created a good idea of hierarchy

mike-lawrence · May 31, 2023, 12:08pm

I had a somewhat different thought last night: ultimately I think there is something about each individual that determines their latent group membership, and as you say at present the only constraint on each is a common prior. But if we have other kinds of information manifest by other variables measured about each individual that I think might help predict the latent group membership, only the probability-parameter-associated-with-each-individual permits me to start incorporating this information in the model, which in turn leaves me thinking that that even absent such information, that parameterization at least makes sense even if it can be marginalized to an equivalent one-probability model with an appropriate prior.

MauritsM · May 31, 2023, 1:15pm

Ah, that makes sense. I believe that is a slightly different model than the “plain vanilla” mixture model, though. In that one, you only observe the values without knowing anything about group memberships, and the groups are purely a latent variable that helps explain the data better than a single distribution would. If you have additional information “prior to observing the outcome” then it makes a lot of sense to have individual-level group membership probabilities.

Just as a thought experiment, the most useful information you could have is the actual group memberships - knowing that it would obviously not make sense to disregard that information :-)

jsocolar · May 31, 2023, 2:01pm

Sometimes it is easiest to think about these models in their unmarginalized form, where the latent state is a parameter. The mixture probability is a prior on the group membership, and in practice we generally use a hierarchical prior since the mixture probability is typically a fitted parameter with a prior of its own.

Let’s consider the class of priors that have the form of a logistic regression; i.e. m_i = L(\alpha + X_i\theta) where m is the mixture probability, L is the inverse logit, X is covariates, and \alpha and \theta are parameters with priors of their own. The intercept-only regression is the case of just one mixture probability, but it’s also clear that we can add covariates.

This is all a restatement of what @mike-lawrence and @MauritsM have already said, but there’s an important twist that gets revealed by viewing the problem this way. Namely, observation-level random effects are not well identified in logistic regression except via the prior. Thus, I think it is highly unlikely that fitting observation-specific latents will work well unless you have a highly restrictive hierarchical prior (like a logistic regression), and not a prior that induces observation-level flexibility to adjust the mixture probabilities for observations one-at-a-time.

sonicking · June 2, 2023, 5:32pm

This is a paper that may be similar to what you are trying to do.
Nasserinejad.pdf (1.2 MB)

Topic		Replies	Views
Hierarchical model with gaussian mixture parameters Modeling specification	1	596	July 11, 2019
Trouble turning Bayesian finite mixture model into hierarchical BFMM Modeling specification	0	447	December 6, 2018
Specification of multivariate hierarchical priors Modeling specification	1	505	June 9, 2019
Specify latent vector parameters in a hierarchical model Modeling	2	339	October 8, 2020
Help with specifying a hierarchical poisson - binomial - gaussian mixture Modeling specification , ecology , discrete-parameters	7	139	September 12, 2024

Hierarchical mixture

Related topics