Casual question about convergence

Hello Community,

I have always been curious of why sometimes the chains tend to converge to a different value close to the end of the “warmup” phase. Sometimes do a substantial jump.

In this case, for example I have this

The chains seem to mix decently and suddently change behavious when the sampling time is approaching.

P.S. Although this is a general question, FYI the (work in progress) model is the following.

	int G;                                                           // Number of marker genes
	int P;                                                           // Number of cell types
	int S;                                                           // Number of mix samples 
	int<lower=1> R;                                                           // Number of covariates (e.g., treatments)
  matrix[S,R] X;                                                    // Array of covariates
	matrix<lower=0>[S,G] y;                                                  // Mix samples matrix
	matrix<lower=0>[G,P] x;    
	// Background cell types
	vector<lower=0, upper=1>[S] p_ancestor;
	matrix[S,G] y_bg_hat;    
transformed data{
	matrix<lower=0>[S,G] y_log;           // Mix samples matrix     

	y_log = log(y+1);

parameters {
	simplex[P] beta[S]; // coefficients for predictors
	real<lower=0> sigma; // error scale
	matrix[R,P] alpha;    // Prior to a
	vector<lower=0.1>[P] phi;
transformed parameters{
	matrix[S,P] beta_adj;
	matrix[S,G] y_hat; 

	for(s in 1:S) beta_adj[s] = to_row_vector(beta[s]) * p_ancestor[s];
	y_hat = beta_adj * x';
	y_hat = y_hat + y_bg_hat;
model {

	matrix[S,P] beta_hat;

	// Regression
	sigma ~ normal(0, 0.1);
	// Dirichlet prior on proportions
	alpha[1] ~ normal(0, 1);
	if(R>1) to_vector(alpha[2:R]) ~ normal(0,1);
	phi ~ normal(0.1, 5); #phi ~ normal(phi_prior[1], phi_prior[2]);
	// Regression
  to_vector(y_log) ~ student_t(8, log(to_vector(y_hat)), sigma);   
  // Hypothesis testing
	beta_hat = X * alpha;
	for(s in 1:S) for(p in 1:P) beta_hat[s,p] = inv_logit(beta_hat[s,p]);
	for(s in 1:S) beta[s] * mean(p_ancestor) ~ beta(beta_hat[s] .* to_row_vector(phi), (1 - beta_hat[s]) .* to_row_vector(phi));  


You might want to look at this in the context of what happens during the various adaptation windows. I think what you’re seeing is the final stepsize adaptation phase where the sampler might end up further away from where it previously settled as larger stepsizes are tried out. The phases are described in the manual and the default values are described in the CmdStan doc and probably the rstan doc.