Issues with joint model estimating variation

Deirdre_L · July 1, 2023, 2:54am

I am working on a joint model that includes a hierarchical linear model that partitions variation across multiple factors (species, and study, and residual variation). The species-level estimate of the traits are then used as predictors in the second part of the model. My aim is to develop a model that allows the traits to effect the cue, while also allowing us to estimate species-level variation in cues not explained by the trait.

But my model is struggling to estimate the study level variation (sigma_study) for my test data and I am not sure why. Specifically the model appears to struggle to differentiate the study sigma from the general variance.

It is unclear to me whether there is an issue with the specification of the model or how to improve it. Any insights or help would be much appreciated.

Below is the code for the Stan model:

data {
  int<lower = 1> n_spec; // number of random effect level
  // Trait model
  int<lower = 1> N; // Sample size 
  int<lower = 1, upper = n_spec> trait_species[N]; // id of random effect (species)
  int<lower = 1> n_study; // number of random effect levels (study) 
  int<lower = 1, upper = n_study> study[N]; // id of random effect (study)
  vector[N] yTraiti; // Observed trait
  real prior_mu_grand_mu; 
  real prior_mu_grand_sigma;
  real prior_sigma_sp_mu;
  real prior_sigma_sp_sigma;
  real prior_sigma_study_mu;
  real prior_sigma_study_sigma;
  real prior_sigma_traity_mu;
  real prior_sigma_traity_sigma;
  
  // Cue model
  int<lower = 1> Ncue; // Sample size for second part
  int<lower = 1, upper = n_spec> cue_species[Ncue]; // id of random effect (species)
  vector[Ncue] yRespi; // Outcome phenology
  vector[Ncue] cuei; // predictor forcing 
  real prior_muCueSp_mu;
  real prior_muCueSp_sigma;
  real prior_muRespSp_mu;
  real prior_muRespSp_sigma;
  real prior_sigmaCueSp_mu;
  real prior_sigmaCueSp_sigma;
  real prior_sigmaRespSp_mu;
  real prior_sigmaRespSp_sigma;
  real prior_betaTraitxCue_mu;
  real prior_betaTraitxCue_sigma;
  real prior_sigmaRespy_mu;
  real prior_sigmaRespy_sigma;
}

parameters{
  // Traits
  real mu_grand; // grand mean for trait value 
  vector[n_spec] muSp; // species offsets
  vector[n_study] muStudy; // study offsets
  real<lower = 0> sigma_traity; // sd general
  real<lower = 0> sigma_sp; // sd species
  real<lower = 0> sigma_study; // sd study
 
  real alphaCueSp[n_spec]; 
  real muCueSp; 
  real<lower = 0> sigmaCueSp;
  real alphaRespSp[n_spec];
  real muRespSp;
  real<lower = 0> sigmaRespSp; 
  real betaTraitxCue; 
  real<lower = 0> sigmaResp_y;
}

transformed parameters{
  
  vector[N] y_hat; 
  vector[n_spec] mu_grand_sp;
  
  real betaCueSp[n_spec];     //species level beta forcing 
 
  for(i in 1:n_spec){
    mu_grand_sp[i] = mu_grand + muSp[i];
  }
  for (i in 1:N){
    y_hat[i] = mu_grand + muSp[trait_species[i]] + muStudy[study[i]];
  }
  
  for (isp in 1:n_spec){
    betaCueSp[isp] = alphaCueSp[isp] + betaTraitxCue * (mu_grand_sp[isp]);
  }
}

model{
  // Traits
  //// likelihood
  yTraiti ~ normal(y_hat, sigma_traity);
  muSp ~ normal(0, sigma_sp);
  muStudy ~ normal(0, sigma_study);
  //// priors
  mu_grand ~ normal(prior_mu_grand_mu, prior_mu_grand_sigma);
  sigma_sp ~ normal(prior_sigma_sp_mu, prior_sigma_sp_sigma);
  sigma_study ~ normal(prior_sigma_study_mu, prior_sigma_study_sigma);
  sigma_traity ~ normal(prior_sigma_traity_mu, prior_sigma_traity_sigma);

  //// likelihood
  for (i in 1:Ncue){
    yRespi[i] ~ normal(alphaRespSp[cue_species[i]] + betaCueSp[cue_species[i]] * cuei[i], sigmaResp_y);
  }
  alphaRespSp ~ normal(muRespSp, sigmaRespSp);
  alphaCueSp ~ normal(muCueSp, sigmaCueSp); 

  //// priors
  muRespSp ~ normal(prior_muRespSp_mu, prior_muRespSp_sigma);
  sigmaRespSp ~ normal(prior_sigmaRespSp_mu, prior_sigmaRespSp_sigma);
  
  sigmaResp_y ~ normal(prior_sigmaRespy_mu, prior_sigmaRespy_sigma);

  muCueSp ~ normal(prior_muCueSp_mu,  prior_muCueSp_sigma);
  sigmaCueSp ~ normal(prior_sigmaCueSp_mu, prior_sigmaCueSp_sigma);

  betaTraitxCue ~ normal(prior_betaTraitxCue_mu, prior_betaTraitxCue_sigma);
 
}

We are using test data in which every study contains every species, using the below code:

# Part 1
Nrep <- 10 # rep per trait
Nstudy <- 20 # number of studies w/ traits 
Nspp <- 20 # number of species 

Ntrt <- Nspp * Nstudy * Nrep 

mu.grand <- 15 
sigma.species <- 5 
sigma.study <- 2
sigmaTrait_y <- 2

trt.dat <- data.frame(matrix(NA, Ntrt, 1))
names(trt.dat) <- c("rep")
trt.dat$rep <- c(1:Nrep)
trt.dat$study <- rep(rep(c(1:Nstudy), each = Nspp), each = Nrep)
trt.dat$species <- rep(rep(c(1:Nspp), times = Nstudy), each = Nrep)

alphaTraitSp <- rnorm(Nspp, 0, sigma.species)
trt.dat$alphaTraitSp <- rep(rep(alphaTraitSp, times = Nstudy), each = Nrep) 

muStudy <- rnorm(Nstudy, 0, sigma.study) 
trt.dat$muStudy <- rep(muStudy, each = Nspp) 

sigmaTraity <- 5 
trt.dat$sigmaTraity <- rnorm(Ntrt, 0, sigmaTraity)

for (i in 1:Ntrt){
  trt.dat$mu_grand_sp[i] <-  trt.dat$alphaTraitSp[i] +  mu.grand
}

for (i in 1:Ntrt){
  trt.dat$yTraiti[i] <-  trt.dat$alphaTraitSp[i] + trt.dat$muStudy[i] +  mu.grand + trt.dat$sigmaTraity[i]
}

#################################################################
# Part 2
n_spec <-Nspp
nRep <- 20
Ncue <- n_spec * nRep

cue.dat <- data.frame(matrix(NA, Ncue, 2))
names(cue.dat) <- c("rep","species")
cue.dat$rep <- c(1:Ncue)
cue.dat$species <- rep(c(1:n_spec), each = nRep)

cue.dat$cuei <- rnorm(Ncue, 1, 1)

sigmaRespSp <- 20
muRespSp <- 80
alphaRespSp <- rnorm(n_spec, muRespSp, sigmaRespSp)
cue.dat$alphaRespSp <- rep(alphaRespSp, each = nRep)

betaTraitxCue <- 0.3 #Cue effects

muCueSp <- -15
sigmaCueSp <- 4
alphaCueSp <- rnorm(n_spec, muCueSp, sigmaCueSp)
cue.dat$alphaCueSp <- rep(alphaCueSp, each = nRep)

sigmaResp_y <- 5
cue.dat$eResp <- rnorm(Ncue, 0, sigmaResp_y)

cueTrtDat <- merge(cue.dat, unique(trt.dat[,c("species","mu_grand_sp")]), by = "species")

for (i in 1:Ncue){
  cueTrtDat$betaCueSp[i] <-  cueTrtDat$alphaCueSp[i] + (betaTraitxCue * cueTrtDat$mu_grand_sp[i])
}
for (i in 1:Ncue){
  cueTrtDat$yMu[i] <-  cueTrtDat$alphaRespSp[i] +  cueTrtDat$betaCueSp[i] * cueTrtDat$cuei[i] 
}

cueTrtDat$yRespi <- cueTrtDat$yMu + cueTrtDat$eResp

sim.data <- list(yTraiti = trt.dat$yTraiti,
                 N = Ntrt,
                 n_spec = Nspp,
                 trait_species = as.numeric(as.factor(trt.dat$species)),
                 n_study = Nstudy,
                 study = as.numeric(as.factor(trt.dat$study )),
                 prior_mu_grand_mu = 20,
                 prior_mu_grand_sigma = 10,
                 prior_sigma_sp_mu = 4,
                 prior_sigma_sp_sigma = 5,
                 prior_sigma_study_mu = 2,
                 prior_sigma_study_sigma = 5,
                 prior_sigma_traity_mu = 3,
                 prior_sigma_traity_sigma = 5,
                 Ncue = nrow(cue.dat),
                 cue_species = as.numeric(as.factor(cueTrtDat$species )),
                 yRespi = cueTrtDat$yRespi,
                 cuei = cueTrtDat$cuei,
                 prior_muCueSp_mu = -15,
                 prior_muCueSp_sigma = 10, 
                 prior_muRespSp_mu = 40,
                 prior_muRespSp_sigma = 10,
                 prior_sigmaCueSp_mu = 5,
                 prior_sigmaCueSp_sigma = 5,
                 prior_sigmaRespSp_mu = 5,
                 prior_sigmaRespSp_sigma = 5, 
                 prior_betaTraitxCue_mu = 0,
                 prior_betaTraitxCue_sigma = 1,
                 prior_sigmaRespy_mu = 10,
                 prior_sigmaRespy_sigma = 5 
)

The pairs plot using this test data and model is as follows:

simulatedPairs3

Thanks!

lizzieinvancouver · July 4, 2023, 9:26am

Do you think there could be something funky in these lines? I see you want mu_grand_sp to use in later parts of the model but it looks like you could be giving the model more information about mu_grand and muSp than your test data includes …

I am not sure how to define parameters (e.g., mu_grand_sp) you want created solely to be used later and not included in the estimation of the parameters they’re made up of (e.g., mu_grand and muSp) but I feel like I have heard this option in intro Stan classes and hope someone else can help.

L

Deirdre_L · July 14, 2023, 10:58pm

@lizzieinvancouver thanks for the suggestion.

I am not sure if that is the issue. I tried running the model without the mu_grand_sp, and got the same issue using the below code instead:

transformed parameters{
    vector[N] y_hat;
    
    for (i in 1:N)
      y_hat[i] = mu_grand + muSp[species[i]] + muStudy[study[i]];
    }

The models produce estimates like the following, with especially poor estimates for sigma_study:

      Parameter param        esti
1      mu_grand  15.0  16.1610177
2      sigma_sp   5.0   5.0885989
3   sigma_study   2.0   0.4694376
4   sigmaResp_y   5.0   5.0000000
5       muCueSp -15.0 -14.7467721
6       muResSp  80.0  73.5317974
7    sigmaCuesp   4.0   3.7861210
8   sigmaRespSp  20.0  15.5854671
9   sigmaResp_y   5.0   5.2531993
10 betaTraitCue   0.3   0.3023376

The model does not seem able to return the test data values for the muStudy, but can for other parameters, like muRespSp:

lizzieinvancouver · July 17, 2023, 8:30am

@Deirdre_L Thanks! On a glance the Stan code looks good, but I think something might be wrong with your test data code. Just inspecting some of it you see that the study value is not constant across studies:

> trt.dat[1:100,]
    rep study species alphaTraitSp    muStudy sigmaTraity mu_grand_sp   yTraiti
1     1     1       1   -0.3662840 -0.4475684   6.2716654   14.633716 20.457813
2     2     1       1   -0.3662840 -0.4475684  -2.1875409   14.633716 11.998607
3     3     1       1   -0.3662840 -0.4475684  -8.4131014   14.633716  5.773046
4     4     1       1   -0.3662840 -0.4475684  -2.0532432   14.633716 12.132904
5     5     1       1   -0.3662840 -0.4475684   3.6280606   14.633716 17.814208
6     6     1       1   -0.3662840 -0.4475684   5.0288262   14.633716 19.214974
7     7     1       1   -0.3662840 -0.4475684  -4.7502807   14.633716  9.435867
8     8     1       1   -0.3662840 -0.4475684   0.7819488   14.633716 14.968096
9     9     1       1   -0.3662840 -0.4475684  -1.1076715   14.633716 13.078476
10   10     1       1   -0.3662840 -0.4475684  -0.3748877   14.633716 13.811260
11    1     1       2    0.5588979 -0.4475684  -3.4416627   15.558898 11.669667
12    2     1       2    0.5588979 -0.4475684   7.5966889   15.558898 22.708018
13    3     1       2    0.5588979 -0.4475684  -0.2218172   15.558898 14.889512
14    4     1       2    0.5588979 -0.4475684  -1.6187178   15.558898 13.492612
15    5     1       2    0.5588979 -0.4475684  -1.5606937   15.558898 13.550636
16    6     1       2    0.5588979 -0.4475684   3.9817567   15.558898 19.093086
17    7     1       2    0.5588979 -0.4475684  -4.7895378   15.558898 10.321792
18    8     1       2    0.5588979 -0.4475684 -11.1263806   15.558898  3.984949
19    9     1       2    0.5588979 -0.4475684   5.6209827   15.558898 20.732312
20   10     1       2    0.5588979 -0.4475684  -5.5183353   15.558898  9.592994
21    1     1       3    1.5095331  0.5555627   0.6129953   16.509533 17.678091
22    2     1       3    1.5095331  0.5555627  -6.3046517   16.509533 10.760444
23    3     1       3    1.5095331  0.5555627   4.9164946   16.509533 21.981590
24    4     1       3    1.5095331  0.5555627  -1.4197892   16.509533 15.645307
25    5     1       3    1.5095331  0.5555627  -1.7363043   16.509533 15.328792
26    6     1       3    1.5095331  0.5555627   3.6585174   16.509533 20.723613
27    7     1       3    1.5095331  0.5555627   9.0563550   16.509533 26.121451
28    8     1       3    1.5095331  0.5555627  -5.4503614   16.509533 11.614734
29    9     1       3    1.5095331  0.5555627   7.7744355   16.509533 24.839531
30   10     1       3    1.5095331  0.5555627  -7.7482102   16.509533  9.316886
31    1     1       4   -7.2267318  0.5555627  -2.4502182    7.773268  5.878613
32    2     1       4   -7.2267318  0.5555627  -5.1584692    7.773268  3.170362
33    3     1       4   -7.2267318  0.5555627   0.5121159    7.773268  8.840947
34    4     1       4   -7.2267318  0.5555627   3.0612902    7.773268 11.390121
35    5     1       4   -7.2267318  0.5555627   0.4636415    7.773268  8.792472
36    6     1       4   -7.2267318  0.5555627  11.8531050    7.773268 20.181936
37    7     1       4   -7.2267318  0.5555627  -3.9456552    7.773268  4.383176
38    8     1       4   -7.2267318  0.5555627   2.6287028    7.773268 10.957534
39    9     1       4   -7.2267318  0.5555627   0.8468259    7.773268  9.175657
40   10     1       4   -7.2267318  0.5555627  -1.2486952    7.773268  7.080136

Species looks good – each species appears to get their own value (for example, subset(trt.dat, species==1) gave all one alphaTraitSp value) but if I do the same for study (subset(trt.dat, study==1)) I get 10 different values. I think this line may be your problem, perhaps?

trt.dat$muStudy <- rep(muStudy, each = Nspp)

which contrasts with …
trt.dat$alphaTraitSp <- rep(rep(alphaTraitSp, times = Nstudy), each = Nrep)

L

Deirdre_L · July 18, 2023, 9:36pm

Thanks @lizzieinvancouver , I think that is the issue!

The model estimates look much better:

      Variable Parameter    Estimate         2.5       97.5
1     mu_grand      15.0  13.1826762  10.8605118 15.6679884
2     sigma_sp       5.0   4.9062756   3.5509274  6.8316180
3  sigma_study       2.0   1.9663274   1.4023942  2.8284024
4  sigmaResp_y       5.0   4.9497511   4.6061570  5.3291774
5      muCueSp     -15.0 -15.4105784 -22.4536695 -8.3219147
6      muResSp      80.0  73.4124964  64.5280508 81.9648668
7   sigmaCuesp       4.0   5.4873654   3.8633931  7.8094396
8  sigmaRespSp      20.0  21.2382161  17.2866958 25.9841407
9 betaTraitCue       0.3   0.3237353  -0.2031581  0.8470558

Topic		Replies	Views
Model struggles to systematically estimate simple three estimate model Modeling	2	291	August 18, 2023
Varying Variances Modeling	4	1817	November 27, 2017
Issues with hierarchical linear regression stan model Modeling rstan , fitting-issues	2	469	January 26, 2021
Joint Species Distribution Model Performance Modeling ecology	3	1112	December 15, 2020
Generalized linear latent variable model in Stan Modeling	3	3792	October 9, 2017

Issues with joint model estimating variation

Related topics