The samples of parameter in the model sometimes stick to a constant when doing dynamic linear modeling

data {
  int<lower=1> N;  // total number of observations
  int Y[N];  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // row_vector[K-1] Xdummy[N];  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> groupmax;  // number of grouping levels
  // int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> group[N];  // grouping indicator per observation
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[K-1] beta1;  // population-level effects
  real beta0;  // temporary intercept for centered predictors
  real alpha[groupmax]; 
  real<lower=0> sigma;
}
model {
  sigma ~ gamma(0.1,0.01);
  target += student_t_lpdf(beta1 | 7, 0, 2.5);
  target += student_t_lpdf(beta0 | 7, 0, 2.5);

  target += normal_lpdf(alpha[1] | 0, 0.1);
  target += normal_lpdf(alpha[2] | 0, 1/sigma);
    for (k in 3:groupmax){
    alpha[k] ~ normal(2*alpha[k-1]-alpha[k-2],1/sigma);
    }
    
    for(n in 1:N) {
      Y[n] ~ bernoulli(inv_logit(beta0 + alpha[groupmax+1-group[n]] +  Xc[n]*beta1));
    }
}


shot
tempdata.RData (961 Bytes)

Figure 1 is the model I try to fit. Figure 2 is the example traceplot. File 1 is the example dataset used to generate figure 2.
Here is the r command with the seed to generate figure 2
fit=rstan::sampling(logisticmodel_rand,
data = dataran, chains = 1, refresh=0, warmup = 2500, iter = 5000,seed=93)
traceplot(fit)

Question: I am a beginner. As you can see in figure 2, the value of parameters sticks to a constant after the warm-up phase. This will lead to a problem when making the inference. This happens for some seeds. May I ask how to deal with the dynamic linear model using stan?

Thank you very much for anyone making comments!

@andrjohns

The R version is 4.1.2
The rstan version is 2.21.3

That can happen for many reasons, but it’s likely a symptom of some underlying issue with the model, or at least the model given the data (e.g. too little, or more likely too much data for the mode to handle). If you can have tried it with different data sets, or with smaller values of K for instance, you may get an idea of whether it’s a structural problem or not.

Another possibility, given that for at least some part of the chain (and I infer from your comment for other entire chains) that doesn’t happen, it can be that the optimization of the algorithm parameters is not being adequate during the warmup phase and it’s getting stuck in some parameter regions – that could happen for complicated models because the chain started in a weird place and couldn’t get out of it during warmup, so the metric and step size are not good values for the rest of the chain. if these parameter values are far from what you get on other chains, and the likelihood hasn’t converged to higher probability spaces, it’s an indication of the problem.

You may change the algorithm tuning parameters, like adapt_delta, so it takes longer to find the optimal values, but may explore better the probability space, or run a longer warmup and chain, or both.

There are probably other causes, but it depends on the “symptoms” you are getting.

Hi Caesoma,

Thank you very much for your reply.

The K in this example equals to 2. I have tried to use jag. Jag works well on this model. Thank you very much.

Ziyan

Do you mean the software JAGS? Because that uses Gibbs sampling, which is very different, and “working well” may only be true in the sense of you not getting the error you were with Stan,. HMC is a much more powerful sampler, so it’s unlikely that it will underperform if working as intended. I’d say there are two possibilities here:

  1. The problem is simple enough that Gibbs sampling actually works well (HMC should produce a similar output);
  2. The Gibbs sampler generates a trace that doesn’t have obvious problems, but it’s actually a poor sample from the posterior.

You may not be able to distinguish between the two cases just from simple trace diagnostics, so it may be worth trying to make HMC work – depending on the applications a reviewer may be inclined to reject the use of an outdated sampler and require a state-of-the-art method.

Yes JAGS. I am willing to use Stan but when sampling from this hierarchical model something strange happened like what I asked. I tried to debug this issue but did not succeed. All other models in my thesis are sampled using Stan and works well. I would like to keep my thesis consistently using Stan. Would you mind help download the data file I have attached (temp.RData) and sample from the model by using the seed to see whether such problem can be avoided? I can not find a way of solving this problem at this moment. Maybe I need to reparameterize this model?

Thank you very much
Caesoma

Have you tried to make the prior on sigma more restrictive? Something like gamma(0.1, 0.1) would contain sigma a bit more. It looks like what’s happening is the sampler get’s stuck in a region where sigma is large and the dynamics on alpha get stuck. If sigma is large, alpha2= 0 and alpha[k] = 2 * alpha[k-1] - alpha[k-2] thus alpha[k] = alpha[1].

1 Like

Thank you very much.

I have rewritten the Stan code which seems avoids this problem. I am testing the result and will update tomorrow.

Best wishes
Ziyan

There’s nothing “wrong” with the Gibbs sampler or JAGS/BUGS, but it would be a fair question from a reviewer/examiner if there’s something odd about the model that it won’t work with HMC when other models do (and whether Gibbs is actually working).
I cannot run and check your model myself, but if it’s an important piece of your thesis and you are worried about the robustness of the results I’d strongly suggest fiddling a bit with it and trying to fix it.

There are a few things you could relatively easily do:

1.Try to restrict the space of parameter by constraining parameters or using stronger priors (making sure you are able to justify the constraints);
2. Run longer warmup and change adapt_delta to try and make sure the algorithm parameters are well-optimized;
3. Try to start the chains from regions of higher probability (but making sure they are different initial points to better assess convergence);
4. Simulate data and see how the inference behaves under known parameters;
5. Simplify the model and scale up to see when problems start arising.

Here is the Stan code that works better. There is no sickness now.

data {
  int<lower=1> N;  // number of patients
  int<lower=1> K;  // number of treatment arms
  int<lower=1> groupmax;  // number of time points
  int<lower=1, upper=K> X[N];  // predictor variable
  int<lower=0, upper=1> Y[N];  // response variable
  int<lower=1> group[N];
}

parameters {
  real beta_0;  // intercept
  vector[K] beta;  // coefficients for predictor variable
  vector[groupmax] alpha;  // random effects for each time point
  real<lower=0> gamma;  // precision parameter for alpha
}

model {
  // priors
  beta_0 ~ normal(0, 1.8);
  beta ~ normal(0, 1.8);
  alpha[1] ~ normal(0,0.01);
  alpha[2] ~ normal(0, sqrt(1/gamma));
  alpha[3:groupmax] ~ normal(2*alpha[2:(groupmax-1)] - alpha[1:(groupmax-2)], sqrt(1/gamma));
  gamma ~ gamma(0.1, 0.01);

  // likelihood
    // for(n in 1:N) {
    //   y[n] ~ bernoulli(inv_logit(beta_0 + alpha[groupmax+1-group[n]] +  x[n]*beta));
    // }

  for (n in 1:N) {
    vector[K] pi;
    for (k in 1:K) {
      pi[k] = inv_logit(beta_0 + beta[k] * (X[n] == k) + alpha[groupmax-group[n]+1]);
    }
    Y[n] ~ bernoulli(pi[X[n]]);
  }
}