Multilevel linear model showing signs of multimodality


functions {
#include cpl.stan
#include pgstat.stan
#include foldinterval.stan
}

data {

  int<lower=1> N_intervals;
  int max_n_echan;
  int max_n_chan;

  int<lower=0> N_dets[N_intervals]; // number of detectors poer data type
  int<lower=0> N_chan[N_intervals, max(N_dets)]; // number of channels in each detector
  int<lower=0> N_echan[N_intervals,  max(N_dets)]; // number of energy side channels in each detector

  int grb_id[N_intervals];
  int N_grbs;

  vector[max_n_echan] ebounds_hi[N_intervals, max(N_dets)];
  vector[max_n_echan] ebounds_lo[N_intervals, max(N_dets)];



  vector[max_n_chan] observed_counts[N_intervals, max(N_dets)];
  vector[max_n_chan] background_counts[N_intervals, max(N_dets)];
  vector[max_n_chan] background_errors[N_intervals, max(N_dets)];

  int idx_background_zero[N_intervals, max(N_dets), max_n_chan];
  int idx_background_nonzero[N_intervals, max(N_dets), max_n_chan];
  int N_bkg_zero[N_intervals,max(N_dets)];
  int N_bkg_nonzero[N_intervals,max(N_dets)];

  real exposure[N_intervals, max(N_dets)];

  matrix[max_n_echan, max_n_chan] response[N_intervals, max(N_dets)];



  int mask[N_intervals, max(N_dets), max_n_chan];
  int N_channels_used[N_intervals,max(N_dets)];

  vector[N_intervals] dl;
  vector[N_intervals] z;


  int N_gen_spectra;
  vector[N_gen_spectra] model_energy;

  /* int N_correlation; */
  /* vector[N_correlation] model_correlation; */


}

transformed data {
  vector[N_intervals] dl2 = square(dl);
  int N_total_channels = 0;



  vector[N_intervals] log_zp1 = log10(z+1);
  vector[N_intervals] log_dl2 =2*log10(dl) + log10(4*pi());

  real emin = 10.;
  real emax = 1E4;
  vector[max_n_echan] ebounds_add[N_intervals, max(N_dets)];
  vector[max_n_echan] ebounds_half[N_intervals, max(N_dets)];

  int all_N[N_intervals];


  // precalculation of energy bounds

  for (n in 1:N_intervals) {

    all_N[n] = n;
    
    for (m in 1:N_dets[n]) {
      ebounds_half[n, m, :N_echan[n, m]] = 0.5*(ebounds_hi[n, m, :N_echan[n, m]]+ebounds_lo[n, m, :N_echan[n, m]]);
      ebounds_add[n, m, :N_echan[n, m]] = (ebounds_hi[n, m, :N_echan[n, m]] - ebounds_lo[n, m, :N_echan[n, m]])/6.0;
      N_total_channels += N_channels_used[n,m];
    }


  }




}



parameters {

  vector<lower=0>[N_grbs] gamma;
  vector[N_grbs] delta_offset;
  vector<lower=-1.8, upper=1.>[N_intervals] alpha;

  vector [N_intervals] log_epeak;






}

transformed parameters {

  
  vector[N_grbs] delta;
  vector[N_intervals] log_energy_flux;
  vector[N_intervals] norm_log_epeak = (sort_desc(log_epeak) +log_zp1 ) - 2.;
  
  delta = delta_offset + 52;
 

  log_energy_flux = (delta[grb_id] + gamma[grb_id] .* (norm_log_epeak)) -  log_dl2[grb_id];   

  
}


model {

  int grainsize = 1;

  alpha ~ normal(-1,.5);

  log_epeak ~ normal(2.,2);

  //  log_energy_flux ~ normal(-7, 1);
  gamma ~ normal(1.5,.2);
  delta_offset ~ std_normal();


  target += reduce_sum(partial_log_like, all_N, grainsize,  alpha,  log_epeak,  log_energy_flux,  observed_counts,  background_counts, background_errors,  mask, N_channels_used, exposure,  ebounds_lo,  ebounds_hi,  ebounds_add,  ebounds_half,  response,   idx_background_zero,   idx_background_nonzero,  N_bkg_zero, N_bkg_nonzero, N_dets,  N_chan,  N_echan,  max_n_chan,  emin,  emax) ;

}


I am trying to model some data where there are multiple observations of a special data type in high-energy astrophysics with a special likelihood. The details of this likelihood are irrelevant for this issue. Each of the i^{th} observations can be thought of as having a likelihood:

p(D_i | a_i, b_i ) \simeq f(a_i,b_i).

Here i corresponds to N_intervals in the above code and is split up with partial sum function above.

I have a simulation to generate this data. If I fit each of the observations as independent intervals, I recover my simulated parameters. In the simulation, I link some of the parameters by a linear relation:


log_energy_flux = (delta[grb_id] + gamma[grb_id] .* (norm_log_epeak)) -  log_dl2[grb_id];  

again, if I do not try to estimate gamma and delta here and just fit for log_energy_flux as a parameter, this code works. However, if I try to estimate gamma and delta, gamma has multiple modes. Moreover, I simulate gamma as a positive value and the relation tries to go very negative. I’m curious if this is a centering problem.

To give some back story. The lower level of this likelihood is typically fit separately, yielding estimates for log_peak and log_energy_flux. These estimates are then fit to the linear relation via a measurement error likelihood. I am trying to do it all in one step.

This may appear cumbersome, but if anyone has an idea why the relation wants to flip over… I would be very happy.

Sorry, short on time, maybe @imadmali can answer?

Don’t sort_desc() parameters. That’s going to create multimodality because the applied permutation acts as a discrete latent parameter. Instead you can declare the parameter as ordered vector.

parameters {
  ordered[N_intervals] log_epeak;
}

Note that ordered is in ascending order so you may need to adapt you code a bit.
Also I don’t know what partial_log_like() does with log_epeak, maybe there’s something order-dependent there too?

1 Like

Apologies, the ordered part was from a test where I was trying to kill the odd behavior.

I’ve played a bit with this and think I have an idea what is occurring but not how to solve it.

If you think old the linear part of the model having log_energy_flux as the y data and log_epeak as the x data, then the idea would be to standardize these data for a regression. However, in this case they are latent variables that are inputs to a spectrum/function that predicts a shape for multivariate data. So:

f(x_{i,j}; log energy flux_j, log epeak_j)

predicts the data at the j^{th} datum which is taken care of in the partial_log_like sum. There could be an issue introduced in the form of f… but I’m currently thinking the issue is related to standardizing. If I try to force the linear model to be standardized by subtracting of what I know (because I use simulations) the means of log_energy_flux and log_epeak, then the model begins to behave better, but still with the oddness. I simulate with gamma = 1.5 and one of the chains tries to go to gamma= -10 or (if I set <lower=0>gamma) the chain will go to gamma = 0.

I’ve been searching for linear models that have latent variables rather than data… but I’ve not run across these in the literature.

I don’t see anything in particular, but if estimating log_energy_flux works I would try to extract a submodel that takes log_energy_flux as data and fits the other parameters to it… This might help you isolate the issue.

Best of luck!

1 Like

Thanks. So I’m curious what you mean by “extract a submodel.” The way this analysis is classically done is that the lower level observables are fit individually, and then the linear model is fit to the expectations from those fits. This could introduce all kinds of bias. This is not what you mean, correct?

Can you try and do it with Stan the way it’s done classically? It’s not the analysis you ultimately want but at this stage the pertinent question is not whether it introduces bias but whether it introduces multimodality. If it does, you have a simplified model where you can examine the problems plaguing the complicated model. On the other hand, if the simplified model seemingly works then the problems arise from the interaction between model components.