State space model too flexible

Dear Stan community,

I am developing a State-Space Model (SSM) for annual tree radial growth. I am using this framework for two reasons:

  1. The measured diameters have uncertainties (two error terms a ‘routine’ observation error sigma_obs, and an extreme observation error eta_obs), and I want to account for this hierarchy
  2. I have missing informations (tree diameters are measured every five years, but I want to estimate their annual growth rate—i.e., annual diameters)

I now detail my model (I give the complete structure of the model at the end of this message).

Context

The observation equation is a mixture of normal distributions, where lambda is the probability of occurrence of extreme errors (such as a typo in the diameter):

obs_growth = (1 - lambda) normal(latent_growth, sigma_obs) + lambda normal(latent_growth, eta_obs)

and the process equation is:

latent_growth ~ lognormal(meanlog, sdlog)

where sigma_proc is the process error, and meanlog and sdlog are the parameters of the lognormal distribution. I use the method of moments to determine them:

meanlog = log(prediction^2/sqrt(sigma_proc^2 + prediction^2))
sdlog = sqrt(log(sigma_proc^2/prediction^2 + 1))

where prediction is my regression (i.e., predicted growth for a given diameter and a set of explanatory variables).

Problem
With this formulation of the model, I unfortunately allow for insane growth rates, despite I use a prior quite informative for sigma_proc (see end of message for the model with priors). Here is an image of posterior generated diameters (done with the bloc generated quantities) in function of time. The red dots are measured data. As can be seen, the third datum is wrong. Therefore, I would like this observation to be from the extreme error mixture rather than the routine error mixture. But instead, sigma_proc is increased (8 times larger than the average of its prior) so that the data are fitted. This is what I mean by a SSM too flexible!

timeSeries_3measures_1wrong.pdf (235.2 KB)

Except that I allow for insane growth (curves in green), all the chains converged and mixed well.

My first attempt has been to fix sigma_proc to a constant value (1.3 mm/yr, obtained from another paper and it is quite sensible in my case). In this case, I am able to produce what I want (see picture below, same data), however the chains do not mix well for all the parameters (some environmental variables are having problems, rhat is around 1.2 to 1.4).

timeSeries_3measures_1wrong_fixed_sigmaProc.pdf (120.5 KB)

Do you have any idea of what is happening? I feel the problems are around sigma_proc, the two other errors, and the probability lambda… As a second attempt, I also fixed the observation errors (so no error to estimate), but the mixing was terrible.

Sorry for this long message! Here is my model with hopefully enough comments to be understood. Just one note on the variables’ name: I use the word parent for the beginning of each individual time series, and the word children for all the subsequent measurements. This implied a lot of indexing!

model {
	// Declare variables
	int growth_counter = 1; // Counter in the latent space
	int children_counter = 1; // Counter in the 'observation space' for chidlren
	int record_children_counter = 1; // Counter to know when we should register the current dbh value into latent_dbh_children
	real expected_growth;
	real temporary;
	vector [n_children] latent_dbh_children;
	real growth_mean_logNormal;
	real growth_sd_logNormal;

	// Priors
	// --- Growth parameters
	target += normal_lpdf(averageGrowth | -4, 10); // sd_dbh * exp(-4) is around 2 to 3 mm (reasonable average growth) for sd_dbh = 130
	target += normal_lpdf(dbh_slope | 0, 5);
	
	target += normal_lpdf(pr_slope | 0, 5);
	target += normal_lpdf(pr_slope2 | 0, 5);

	target += normal_lpdf(tas_slope | 0, 5);
	target += normal_lpdf(tas_slope2 | 0, 5);

	target += normal_lpdf(ph_slope | 0, 5);
	target += normal_lpdf(ph_slope2 | 0, 5);

	target += normal_lpdf(competition_slope | 0, 5);

	// --- Errors
	/*
		A note on the folowing priors:
		--> Let m be the mean of the gamma distribution, and v its variance (see comment at the beginning of this file to get shape and
			rate from mean and variance).

		For instance, for the routine measure error, which is the standard deviation of the observation around the latent state, we have:
		m = sqrt(3)
		v = 0.025

		Then, because I work on standardised dbh, I need to divide m by sd(dbh), and v by sd(dbh)^2. Therefore, we get the following:
		shape = (m/sd(dbh))^2 / (v/sd(dbh)^2) = m^2/v
		rate =  (m/sd(dbh))   / (v/sd(dbh)^2) = sd(dbh) * m/v

		--> Let m be the mean of the lognormal distribution, and s its standard deviation (see comment at the beginning of this file to get
		meanlog and sdlog from mean and sd).
		For the process error, which is the sd of growth, we have:
		meanlog = log( (m/sd(dbh))^2/sqrt((s/sd(dbh))^2 + (m/sd(dbh))^2) ) = log(m^2/sqrt(s^2 + m^2)) - log(sd(dbh)) = meanlog - log(sd(dbh))
		sdlog = sqrt(log( (s/sd(dbh))^2 / (m/sd(dbh))^2 + 1)) = sqrt(log(s^2/m^2 + 1))

		However, for the lognormal, I used meanlog = log(mu_h) = log(1.28), and sd_log = sigma_h = 0.16 (Nadja Rüger, personnal
		communication)

		The values are taken from Rüger et al (2011), Growth Strategies of Tropical Tree Species: Disentangling Light and Size Effects
		for sigmaProc (personnal communication), etaObs and proba priors.

		The values are taken from Luoma et al (2017), Assessing Precision in Conventional Field Measurements of Individual Tree Attributes
		for sigmaObs prior.

		The average values of the priors on the standardised scale are approximately (obtained with one sample of 1e8):
		sigmaProc: 0.008977944
		sigmaObs: 0.01209929
		etaObs: 0.1788352
	*/
	target += lognormal_lpdf(sigmaProc | 0.2468601 - log(sd_dbh), 0.09); // <=> procError = 1.29 mm/yr ± 0.12 mm/yr
	target += gamma_lpdf(sigmaObs | 3.0/0.025, sd_dbh*sqrt(3)/0.025); // <=> routine measurement error (sd) = sqrt(3) mm
	target += gamma_lpdf(etaObs | 25.6^2/6.2, sd_dbh*25.6/6.2); // <=> extreme measurement error (sd) = 25.6 mm

	// target += beta_lpdf(proba | 3.95, 391.05); // This corresponds to a 1 % chance extrem error, ± 0.5 %
	target += beta_lpdf(proba | 48.67, 1714.84); // This corresponds to a 2.76 % chance extrem error, ± 0.39 % (Rüger et. al 2011)

	// Model
	for (i in 1:n_indiv) // Loop over all the individuals
	{
		temporary = latent_dbh_parents[i];
		for (j in 1:nbYearsGrowth[i]) // Loop for all years but the first (which is the parent of indiv i)
		{
			// Process model
			expected_growth = growth(temporary, normalised_precip[climate_index[i] + j - 1],
				normalised_tas[climate_index[i] + j - 1], normalised_ph[plot_index[i]], normalised_standBasalArea[climate_index[i] + j - 1],
				averageGrowth, dbh_slope, pr_slope, pr_slope2, tas_slope, tas_slope2, ph_slope, ph_slope2, competition_slope);

			growth_mean_logNormal = log(expected_growth^2/sqrt(sigmaProc^2 + expected_growth^2));
			growth_sd_logNormal = sqrt(log(sigmaProc^2/expected_growth^2 + 1));

			target += lognormal_lpdf(latent_growth[growth_counter] | growth_mean_logNormal, growth_sd_logNormal);

			// Dbh at time t + 1
			temporary += latent_growth[growth_counter];

			growth_counter += 1;
			record_children_counter += 1;

			// Only the relevant (i.e., children) dbh are recorded
			if (record_children_counter == latent_children_index[children_counter])
			{
				latent_dbh_children[children_counter] = temporary;
				children_counter += 1;
			}
		}
		record_children_counter += 1; // The growth counter stops one year earlier! Indeed 3 measurements implies only 2 growing years!
	}
	
	// Prior on initial hidden state: This is a diffuse initialisation
	target += uniform_lpdf(latent_dbh_parents | 0.1/sd_dbh, 3000/sd_dbh); // Remember that the dbh is in mm and standardised
	
	// --- Observation model
	for (k in 1:n_inventories)
	{
		// Compare true (i.e., hidden or latent) parents with observed parents
		// Do not try to vectorise here! https://mc-stan.org/docs/2_29/stan-users-guide/vectorizing-mixtures.html
		for (i in start_nfi_parents[k]:end_nfi_parents[k])
			target += log_mix(proba[k], normal_lpdf(normalised_Yobs[parents_index[i]] | latent_dbh_parents[i], etaObs[k]),
				normal_lpdf(normalised_Yobs[parents_index[i]] | latent_dbh_parents[i], sigmaObs[k]));

		// Compare true (i.e., hidden or latent) children with observed children
		for (i in start_nfi_children[k]:end_nfi_children[k])
			target += log_mix(proba[k], normal_lpdf(normalised_Yobs[children_index[i]] | latent_dbh_children[i], etaObs[k]),
				normal_lpdf(normalised_Yobs[children_index[i]] | latent_dbh_children[i], sigmaObs[k]));
	}
}