Help with a non-converging Gaussian dynamic system

nerpa · January 7, 2021, 2:10pm

Hello,
I have been trying for some time to fit a custom Gaussian dynamic linear system over a set of latent variables and the model just doesn’t converge even in the presented minimal example. I tried the QR decomposition, Cauchy reparametrization, changing the sampler settings (adapt delta, number of draws, thee max), what not…
My predictive checks look great, but the chain diagnostic is a complete disaster!
I am really out of options here so I hope someone will an idea how to improve the model or point me to a mistake that I am probably missing.

data {
    int W;                                                  // Number of time points
    int N;                                                  // Number of subjects
    int Xdim;                                                       
    int exo_q_num;                                                 
    real U[N,W,exo_q_num];                                       
    int<lower=0> idx_itc_obs[N,W];                          // Indices of time points with data
    int<lower=1> P_itc;                                     // number of parameters
    int<lower=0> T_max_itc;                                 // Max number of trials across subjects across time points
    int<lower=0> Tr_itc[N, W];                              // Number of trials for each subj for each time point
    real<lower=0> delay_later[N, W, T_max_itc];             // Delay later - subj x time x trials
    real<lower=0> amount_later[N, W, T_max_itc];            // Amount later - subj x time x trials      
    real<lower=0> amount_sooner[N, W, T_max_itc];           // Amount sooner - subj x time x trials
    int<lower=-1, upper=1> choice_itc[N, W, T_max_itc];     // choice itc - 0 for instant reward, 1 for delayed reward - subj x time x trials
}

transformed data {
    real Q = 1; 
    real cauchy_alpha = 10;                 // cauchy scale parameter hyperprior  
    int num_par = P_itc;
}

parameters {
    real<lower=0> sigma_x;            // for time point 1, sigma for mu_prior_x
    real<lower=0> sigma_a;            // sigma for A
    real<lower=0> sigma_b;            // sigma for B
    real<lower=0> sigma_c;            // sigma for C
    real<lower=0> sigma_mu;           // sigma for mu_d
    real<lower=0> sigma_v;            // for week 1, sigma for X
    vector<lower=0>[num_par] sigma_r;             
    vector[num_par] mu_d;             // constant offset
    real mu_prior_x;                  // for week 1, X mean     
    real X[N,W,Xdim];
    vector[Xdim] A;                   // dynamics matrix (here I create just the diagonal vector)
    matrix[Xdim, exo_q_num] B;              
    matrix[num_par, Xdim] C;               
    real itc_k_pr[N,W];                 
    real itc_beta_pr[N,W];               
}

transformed parameters {
    real<lower=0> itc_k[N,W];                
    real<lower=0> itc_beta[N,W];
    for (n in 1:N) {
        itc_k[n,] = exp(itc_k_pr[n,]);                                 
        itc_beta[n,] = exp(itc_beta_pr[n,]);                 
    } 
}

model { 
    // put sigma priors
    sigma_v ~ cauchy(0,cauchy_alpha);
    sigma_r ~ cauchy(0,cauchy_alpha); 
    sigma_x ~ cauchy(0,cauchy_alpha); 
    sigma_mu ~ cauchy(0,cauchy_alpha); 
    sigma_a ~ cauchy(0,cauchy_alpha); 
    sigma_b ~ cauchy(0,cauchy_alpha); 
    sigma_c ~ cauchy(0,cauchy_alpha); 
    
    mu_prior_x ~ normal(0,sigma_x); // prior on X mean   
    mu_d ~ normal(0,sigma_mu);

    // put priors an A, B, C
    A ~ normal(0,sigma_a);
    to_vector(B) ~ normal(0,sigma_b);
    to_vector(C) ~ normal(0,sigma_c);          
     
	 {
     for (s in 1:N) {
         for (w in 1:W) {
             
             vector[num_par] params_pr;
             params_pr[1] = itc_k_pr[s,w];
             params_pr[2] = itc_beta_pr[s,w];             
             
 	            if (w == 1) {                                              							// first time point                                      
 	                to_vector(X[s,w,]) ~  normal(mu_prior_x,sigma_v);                           
 	            }              
 	            else { 
 	                to_vector(X[s,w,]) ~ normal(diag_matrix(A) * to_vector(X[s,w-1,]) + B * to_vector(U[s, w-1,]), Q);                                                          
 	            }  
                                    
            params_pr ~ normal(mu_d + C * to_vector(X[s,w,]),sigma_r);                   
                 
 	        if (idx_itc_obs[s,w] != 0) {                                    // if time point exists
 	            vector[Tr_itc[s,w]] ev_later   = to_vector(amount_later[s,w,:Tr_itc[s,w]])  ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7);
 	            vector[Tr_itc[s,w]] ev_sooner  = to_vector(amount_sooner[s,w,:Tr_itc[s,w]]);
 	            choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner)); 
 	        }    
         }
    }                                      
}
}

I use cmdstanpy with cmdstan 2.24.1

Any suggestion will be welcomed!

bbbales2 · January 9, 2021, 11:12pm

to_vector(X[s,w,]) ~  normal(mu_prior_x,sigma_v);

You’ll probably want to use a non-centered parameterization for X here. For what those are and why you’d use them, check out: Diagnosing Biased Inference with Divergences

Kindof a long read but it’s worth it.

Also unless you reaaally want Cauchy priors, switch those to normals. Heavy tailed distributions are hard and so unless you really need them, avoid them. If you really need to sample them, check out the reparameterization in the manual here. There’s a case study on this here.

But otherwise this looks like a very complicated model. Probably the way to get the big model working is start with a simpler model. If this is structurally the simplest thing you have, start by:

Generating data using known parameters
Fitting the model with most parameters set to their known value (only fit a couple)
Once you get this working, add some more parameters and repeat until you’re estimating all the things you want to estimate. The hope is that by adding model complexity just a bit at a time it’ll be easier to debug and fix problems.

nerpa · January 9, 2021, 11:27pm

First of all, thanks replying! I tried changing Cauchy priors to normal, but this made things even worse!
I am trying to implement a non-centered parameterization right now and the model is already sampling, but I implemented it for the unconstrained parameters itc_k_pr and itc_beta_pr, based on this:

params_pr[1] = itc_k_pr[s,w];
params_pr[2] = itc_beta_pr[s,w];
.
.
params_pr ~ normal(mu_d + C * to_vector(X[s,w,]),sigma_r)

so in the transformed parameters section I have:

    for (n in 1:N) {
        for (w in 1:W) {
            if (w == 1) {
                itc_k[n,w] = exp(mu_prior_x + sigma_v * itc_k_pr[n,w]);                            
                itc_beta[n,w] = exp(mu_prior_x + sigma_v * itc_beta_pr[n,w]);                
            }                        
            else {
                itc_k[n,w] = exp((mu_d[1] + C[1,] * to_vector(X[n,w,])) + sigma_r[1] * itc_k_pr[n,w]);                                  
                itc_beta[n,w] = exp((mu_d[2] + C[2,] * to_vector(X[n,w,])) + sigma_r[2] * itc_beta_pr[n,w]);                  

            } 
        }

Does it mean that in the reparametrized version i should have

params_pr ~ normal(0,1)

Also, to follow your suggestion, since X is a dynamic variable that changes over time (it is basically a latent time-series), how would I implement a non-centered parameterization on it?
My model is quite complex, I know; and the presented example is about 10% of the full model. But I cannot get reliable convergence estimates even from this reduced example…
Thanks again!

bbbales2 · January 11, 2021, 2:32pm

params_pr ~ normal(0, 1);

Yeah so whenever you do the non-centering you start with something with a prior of normal(0, 1) and then transform it and make the transformed variable have the distribution you want.

So if you want:

x ~ normal(a, b);

Do:

parameters {
  real x_z;
}
transformed parameters {
  real x = a + b * x_z;
}
model {
  x_z ~ normal(0, 1);
}

You can equivalently code this like:

parameters {
  real<offset = a, multiplier = b> x;
}
model {
  x ~ normal(a, b);
}

But in explaining the non-centered I like the longer version. Here’s some more discussion on it: Updated non-centered parametrization with offset and multiplier

x_t \sim N(x_{t - 1}, \sigma) works similarly to above. Here is a verbose example (you’d do this with vectors and loops normally):

To non-center:

x1 ~ normal(0, b);
x2 ~ normal(x1, b);

Do:

parameters {
  real x1_z;
  real x2_z;
}
transformed parameters {
  real x1 = b * x1_z;
  real x2 = x1 + b * z2_z; // note, can use the transformed x1 here
}
model {
  x1_z ~ normal(0, 1);
  x2_z ~ normal(0, 1);
}

Edit: Fixed typo in eq.

nerpa · January 11, 2021, 3:04pm

Thanks so much! I tried to do it myself yesterday and am happy to see I did it correctly! Unfortunately, it didn’t solve the problem and X still won’t converge… I will open a new topic about it where I give more info about math and show chain traces and info.
Btw, an unrelated question – how do I write math equations in the forum?

bbbales2 · January 11, 2021, 3:14pm

Surround stuff with single dollar signs if you want inline or double dollar signs if you want it to to be

standalone

nerpa · January 11, 2021, 3:17pm

Nice! So just to make sure (and to try out math!), you meant:
x_{t+1} \sim N(x_t, \sigma)
right?
The double dollar sign doesn’t work for me though. All I get is:
$$x_{t+1} \sim N(x_t, \sigma)$$

nerpa · January 11, 2021, 4:34pm

I have a related question - what if I have another parameter, A, that I want to do a non-centered parametrization to and that is a function of X , e.g., A \sim N(X, \sigma) So should I use the raw X or the transformed X when I reparametrize A?

bbbales2 · January 11, 2021, 4:38pm

Ooof, yeah, that

Put the double dollars on their own lines and the math in between. It’s for multiline stuff is how that works.

bbbales2 · January 11, 2021, 4:40pm

Transformed, if you used the untransformed X that would be A \sim N(X_{untransformed}, \sigma).

nerpa · January 11, 2021, 4:42pm

Ok! So this was a mistake. Will fit the new model now and maybe you solved my problem!

nerpa · January 11, 2021, 6:39pm

Hmm something doesn’t work. I get even worse results. Just to make sure I am doing it right:

parameters {
        real x1_z;
        real x2_z;
        real a1_z;
        real a2_z;
}

transformed parameters {
        real x1 = sigma_x * x1_z;
        real x2 = x1 + sigma_x * x2_z;
        
        real a1 = x1 + sigma_a * a1_z;    // and NOT a1 = x1_z + sigma_a * a1_z, right?
        real a2 = x2 + sigma_a * a2_z;
}
model {
       x1_z ~ normal(0, 1);
       x2_z ~ normal(0, 1);
       a1_z ~ normal(0, 1);
       a2_z ~ normal(0, 1);
}

is this correct?

bbbales2 · January 19, 2021, 3:45pm

Sorry for the long delay. Your model says to me this:

a_1 \sim N(x_1, \sigma_a)

Which looks like a reasonable thing.

Topic		Replies	Views
Non-converging chains for a set of parameters Modeling	20	871	January 21, 2021
Multi-Ouput Gaussian Process Modeling specification , gaussian-process	3	580	April 15, 2021
Stan got stuck when sampling in chains Modeling	7	1054	February 15, 2023
Convergence problems in Hilbert space Gaussian process with ID effect Modeling techniques , fitting-issues , gaussian-process	4	103	January 7, 2025
Modeling time dynamics with GP Modeling specification , gaussian-process	12	649	April 29, 2020

Help with a non-converging Gaussian dynamic system

Related topics