Help with a non-converging Gaussian dynamic system

Hello,
I have been trying for some time to fit a custom Gaussian dynamic linear system over a set of latent variables and the model just doesn’t converge even in the presented minimal example. I tried the QR decomposition, Cauchy reparametrization, changing the sampler settings (adapt delta, number of draws, thee max), what not…
My predictive checks look great, but the chain diagnostic is a complete disaster!
I am really out of options here so I hope someone will an idea how to improve the model or point me to a mistake that I am probably missing.

data {
    int W;                                                  // Number of time points
    int N;                                                  // Number of subjects
    int Xdim;                                                       
    int exo_q_num;                                                 
    real U[N,W,exo_q_num];                                       
    int<lower=0> idx_itc_obs[N,W];                          // Indices of time points with data
    int<lower=1> P_itc;                                     // number of parameters
    int<lower=0> T_max_itc;                                 // Max number of trials across subjects across time points
    int<lower=0> Tr_itc[N, W];                              // Number of trials for each subj for each time point
    real<lower=0> delay_later[N, W, T_max_itc];             // Delay later - subj x time x trials
    real<lower=0> amount_later[N, W, T_max_itc];            // Amount later - subj x time x trials      
    real<lower=0> amount_sooner[N, W, T_max_itc];           // Amount sooner - subj x time x trials
    int<lower=-1, upper=1> choice_itc[N, W, T_max_itc];     // choice itc - 0 for instant reward, 1 for delayed reward - subj x time x trials
}

transformed data {
    real Q = 1; 
    real cauchy_alpha = 10;                 // cauchy scale parameter hyperprior  
    int num_par = P_itc;
}

parameters {
    real<lower=0> sigma_x;            // for time point 1, sigma for mu_prior_x
    real<lower=0> sigma_a;            // sigma for A
    real<lower=0> sigma_b;            // sigma for B
    real<lower=0> sigma_c;            // sigma for C
    real<lower=0> sigma_mu;           // sigma for mu_d
    real<lower=0> sigma_v;            // for week 1, sigma for X
    vector<lower=0>[num_par] sigma_r;             
    vector[num_par] mu_d;             // constant offset
    real mu_prior_x;                  // for week 1, X mean     
    real X[N,W,Xdim];
    vector[Xdim] A;                   // dynamics matrix (here I create just the diagonal vector)
    matrix[Xdim, exo_q_num] B;              
    matrix[num_par, Xdim] C;               
    real itc_k_pr[N,W];                 
    real itc_beta_pr[N,W];               
}

transformed parameters {
    real<lower=0> itc_k[N,W];                
    real<lower=0> itc_beta[N,W];
    for (n in 1:N) {
        itc_k[n,] = exp(itc_k_pr[n,]);                                 
        itc_beta[n,] = exp(itc_beta_pr[n,]);                 
    } 
}

model { 
    // put sigma priors
    sigma_v ~ cauchy(0,cauchy_alpha);
    sigma_r ~ cauchy(0,cauchy_alpha); 
    sigma_x ~ cauchy(0,cauchy_alpha); 
    sigma_mu ~ cauchy(0,cauchy_alpha); 
    sigma_a ~ cauchy(0,cauchy_alpha); 
    sigma_b ~ cauchy(0,cauchy_alpha); 
    sigma_c ~ cauchy(0,cauchy_alpha); 
    
    mu_prior_x ~ normal(0,sigma_x); // prior on X mean   
    mu_d ~ normal(0,sigma_mu);

    // put priors an A, B, C
    A ~ normal(0,sigma_a);
    to_vector(B) ~ normal(0,sigma_b);
    to_vector(C) ~ normal(0,sigma_c);          
     
	 {
     for (s in 1:N) {
         for (w in 1:W) {
             
             vector[num_par] params_pr;
             params_pr[1] = itc_k_pr[s,w];
             params_pr[2] = itc_beta_pr[s,w];             
             
 	            if (w == 1) {                                              							// first time point                                      
 	                to_vector(X[s,w,]) ~  normal(mu_prior_x,sigma_v);                           
 	            }              
 	            else { 
 	                to_vector(X[s,w,]) ~ normal(diag_matrix(A) * to_vector(X[s,w-1,]) + B * to_vector(U[s, w-1,]), Q);                                                          
 	            }  
                                    
            params_pr ~ normal(mu_d + C * to_vector(X[s,w,]),sigma_r);                   
                 
 	        if (idx_itc_obs[s,w] != 0) {                                    // if time point exists
 	            vector[Tr_itc[s,w]] ev_later   = to_vector(amount_later[s,w,:Tr_itc[s,w]])  ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7);
 	            vector[Tr_itc[s,w]] ev_sooner  = to_vector(amount_sooner[s,w,:Tr_itc[s,w]]);
 	            choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner)); 
 	        }    
         }
    }                                      
}
} 

I use cmdstanpy with cmdstan 2.24.1

Any suggestion will be welcomed!

to_vector(X[s,w,]) ~  normal(mu_prior_x,sigma_v);

You’ll probably want to use a non-centered parameterization for X here. For what those are and why you’d use them, check out: Diagnosing Biased Inference with Divergences

Kindof a long read but it’s worth it.

Also unless you reaaally want Cauchy priors, switch those to normals. Heavy tailed distributions are hard and so unless you really need them, avoid them. If you really need to sample them, check out the reparameterization in the manual here. There’s a case study on this here.

But otherwise this looks like a very complicated model. Probably the way to get the big model working is start with a simpler model. If this is structurally the simplest thing you have, start by:

  1. Generating data using known parameters
  2. Fitting the model with most parameters set to their known value (only fit a couple)
  3. Once you get this working, add some more parameters and repeat until you’re estimating all the things you want to estimate. The hope is that by adding model complexity just a bit at a time it’ll be easier to debug and fix problems.
1 Like

First of all, thanks replying! I tried changing Cauchy priors to normal, but this made things even worse!
I am trying to implement a non-centered parameterization right now and the model is already sampling, but I implemented it for the unconstrained parameters itc_k_pr and itc_beta_pr, based on this:

params_pr[1] = itc_k_pr[s,w];
params_pr[2] = itc_beta_pr[s,w];
.
.
params_pr ~ normal(mu_d + C * to_vector(X[s,w,]),sigma_r)

so in the transformed parameters section I have:

    for (n in 1:N) {
        for (w in 1:W) {
            if (w == 1) {
                itc_k[n,w] = exp(mu_prior_x + sigma_v * itc_k_pr[n,w]);                            
                itc_beta[n,w] = exp(mu_prior_x + sigma_v * itc_beta_pr[n,w]);                
            }                        
            else {
                itc_k[n,w] = exp((mu_d[1] + C[1,] * to_vector(X[n,w,])) + sigma_r[1] * itc_k_pr[n,w]);                                  
                itc_beta[n,w] = exp((mu_d[2] + C[2,] * to_vector(X[n,w,])) + sigma_r[2] * itc_beta_pr[n,w]);                  

            } 
        } 

Does it mean that in the reparametrized version i should have

params_pr ~ normal(0,1)

Also, to follow your suggestion, since X is a dynamic variable that changes over time (it is basically a latent time-series), how would I implement a non-centered parameterization on it?
My model is quite complex, I know; and the presented example is about 10% of the full model. But I cannot get reliable convergence estimates even from this reduced example…
Thanks again!

params_pr ~ normal(0, 1);

Yeah so whenever you do the non-centering you start with something with a prior of normal(0, 1) and then transform it and make the transformed variable have the distribution you want.

So if you want:

x ~ normal(a, b);

Do:

parameters {
  real x_z;
}
transformed parameters {
  real x = a + b * x_z;
}
model {
  x_z ~ normal(0, 1);
}

You can equivalently code this like:

parameters {
  real<offset = a, multiplier = b> x;
}
model {
  x ~ normal(a, b);
}

But in explaining the non-centered I like the longer version. Here’s some more discussion on it: Updated non-centered parametrization with offset and multiplier

x_t \sim N(x_{t - 1}, \sigma) works similarly to above. Here is a verbose example (you’d do this with vectors and loops normally):

To non-center:

x1 ~ normal(0, b);
x2 ~ normal(x1, b);

Do:

parameters {
  real x1_z;
  real x2_z;
}
transformed parameters {
  real x1 = b * x1_z;
  real x2 = x1 + b * z2_z; // note, can use the transformed x1 here
}
model {
  x1_z ~ normal(0, 1);
  x2_z ~ normal(0, 1);
}

Edit: Fixed typo in eq.

1 Like

Thanks so much! I tried to do it myself yesterday and am happy to see I did it correctly! Unfortunately, it didn’t solve the problem and X still won’t converge… I will open a new topic about it where I give more info about math and show chain traces and info.
Btw, an unrelated question – how do I write math equations in the forum?

1 Like

Surround stuff with single dollar signs if you want inline or double dollar signs if you want it to to be

standalone
1 Like

Nice! So just to make sure (and to try out math!), you meant:
x_{t+1} \sim N(x_t, \sigma)
right?
The double dollar sign doesn’t work for me though. All I get is:
$$x_{t+1} \sim N(x_t, \sigma)$$

I have a related question - what if I have another parameter, A, that I want to do a non-centered parametrization to and that is a function of X , e.g., A \sim N(X, \sigma) So should I use the raw X or the transformed X when I reparametrize A?

Ooof, yeah, that

Put the double dollars on their own lines and the math in between. It’s for multiline stuff is how that works.

1 Like

Transformed, if you used the untransformed X that would be A \sim N(X_{untransformed}, \sigma).

Ok! So this was a mistake. Will fit the new model now and maybe you solved my problem!

Hmm something doesn’t work. I get even worse results. Just to make sure I am doing it right:

parameters {
        real x1_z;
        real x2_z;
        real a1_z;
        real a2_z;
}

transformed parameters {
        real x1 = sigma_x * x1_z;
        real x2 = x1 + sigma_x * x2_z;
        
        real a1 = x1 + sigma_a * a1_z;    // and NOT a1 = x1_z + sigma_a * a1_z, right?
        real a2 = x2 + sigma_a * a2_z;
}
model {
       x1_z ~ normal(0, 1);
       x2_z ~ normal(0, 1);
       a1_z ~ normal(0, 1);
       a2_z ~ normal(0, 1);
}

is this correct?

Sorry for the long delay. Your model says to me this:

a_1 \sim N(x_1, \sigma_a)

Which looks like a reasonable thing.