Unable to retrieve parameters from a dynamic model

Hello,

TL;DR

I apologize if these are too many questions for a single post – I’ll be grateful if you can help me to solve question 1, and I could keep the rest of the questions for future posts if you think that would be more appropriate. My questions, in order of decreasing importance, are:

  1. Why am I unable to retrieve the parameters of this model? Model (soy_dev.stan) plus fake data and sampling statement (soy_dev_fake_data.R) are attached
  2. Is my approach to modifying mu in the model block of soy_dev.stan correct? Or would it be better instead to create a custom function in the functions block of soy_dev.stan to later determine mu?
  3. In this example, we are simulating fake data for a single soybean genotype, but do you think that this model can be scaled into a hierarchical model, where parameters dOptFiFl, dOptSdPm, dOptSdPm, Pcrit and Psen can vary between several soybean genotypes?
  4. How would you call this model? From my ignorance this is a “dynamic model” but my definition may be too broad!

Longer story

I am interested in modeling how a soybean crop develops. Development in this species depends on temperature and day length (i.e. photoperiod) – it is faster at an optimum temperature (around 30 °C) and when day length shortens. Inspired by Ben Bolker’s Ecological Models and Data in R, Chapter 11, Dynamic models, I tried to write a dynamic model with a daily time step. For this, I coded developmental stages, which are discrete events, into a continuous scale (called R) as follows:

  • R = 0: plant emergence
  • R = 1: end of juvenile phase
  • R = 2: flowering initiation
  • R = 3: flowering
  • R = 4: start of seed growth
  • R = 5: end of seed growth (physiological maturity)

Temperature and photoperiod are observed daily (weather.csv).

Under ideal temperature and photoperiod conditions, development progreses at a rate (with units 1/day) equal to:

  • 1/dOptEmJuv: between emergence (R = 0) and the end of the juvenile stage (R = 1)
  • 1/dOptJuvFi: between the end of the juvenile phase (R = 1) and flowering induction (R = 2)
  • 1/dOptFiFl: between flowering induction (R = 2) and flowering (R = 3)
  • 1/dOptFlSd: between flowering (R = 3) and the start of seed growth (R = 4)
  • 1/dOptSdPm: between the start (R = 4) and the end of seed growth (R = 5)

Under non-optimal temperature and photoperiod conditions, these developmental rates are penalized by temperature (a, in the following figure) and photoperiod multipliers (b) that look like this:

The parameters of the equation describing the temperature multiplier (i.e. Tmin, Topt and Tmax) don’t change much among soybean genotypes, so I fixed them as constants (see lines 24:32 in soy_dev_fake_data.R, and lines 3:19 in soy_dev.stan).

In turn, the parameters describing the photoperiod multiplier can vary between soybeans, and I am therefore interested in their estimation. There’s a threshold photoperiod (Pcrit), below which developmental rate is maximum (multiplier = 1), and a photoperiod sensitivity (Psen). For photoperiods above Pcrit, the developmental rate is reduced at a rate equal to Psen.

Parameters dOptEmJuv and dOptJuvFi are assumed constant and are thus fixed as 3.7 days (transformed data block in soy_dev.stan).

To test the model I generated some fake data, starting from the following parameters

pars <- c(Topt      = 30,
          Pcrit     = 13,
          Psen      = 0.29,
          dOptEmJuv = 3.7,
          dOptJuvFi = 3.7,
          dOptFiFl  = 16,
          dOptFlSd  = 17,
          dOptSdPm  = 39,
          sigma     = 0.1)

and simulating for three emergence dates. Simulations look like the following figure:

In summary, I am interested in estimating parameters dOptFiFl, dOptSdPm, dOptSdPm, Pcrit and Psen.

I am not proficient coding Stan scripts so I used brms::make_stancode() to make an initial Stan script (soy_dev_to_be_edited.stan) which I later modified (soy_dev.stan). Comparing both files, you’ll notice that

  • I modified how mu is estimated in the model block (lines 87:104)
  • I added a one-dimensional array called time, so that I can define when mu = 0 at time = 1 (line 50)

There are no warnings after fitting this model, but I am not able to retrieve the original parameter values (e.g. b_dOptSdPm), whereas the posterior distributions of other parameters resemble that of their priors (e.g. b_dOptFiFl, b_dOptFlSd):

Inference for Stan model: soy_dev.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                 mean se_mean    sd    2.5%     25%     50%     75%   97.5%
b_dOptFiFl[1]   14.95    0.06  5.90    3.51   11.07   14.85   18.87   26.53
b_dOptFlSd[1]   15.97    0.07  7.00    2.52   11.09   15.96   20.86   29.47
b_dOptSdPm[1]   13.13    0.01  0.45   12.19   12.88   13.18   13.42   13.91
b_Pcrit[1]      11.09    0.01  0.38   10.28   10.89   11.12   11.34   11.71
b_Psen[1]        0.16    0.00  0.02    0.13    0.15    0.16    0.17    0.20
sigma            0.94    0.00  0.03    0.89    0.92    0.94    0.97    1.01
lp__          -921.31    0.46 15.61 -952.03 -931.78 -921.46 -910.43 -891.56
              n_eff Rhat
b_dOptFiFl[1]  9135    1
b_dOptFlSd[1]  8718    1
b_dOptSdPm[1]  2254    1
b_Pcrit[1]     2017    1
b_Psen[1]      2872    1
sigma          3317    1
lp__           1134    1

Samples were drawn using NUTS(diag_e) at Tue Dec  3 15:56:10 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

My questions are:

  1. Why am I unable to retrieve the parameters of this model? Model (soy_dev.stan) plus fake data and sampling statement (soy_dev_fake_data.R) are attached
  2. Is my approach to modifying mu in the model block of soy_dev.stan correct? Or would it be better instead to create a custom function in the functions block of soy_dev.stan to later determine mu?
  3. In this example, we are simulating fake data for a single soybean genotype, but do you think that this model can be scaled into a hierarchical model, where parameters dOptFiFl, dOptSdPm, dOptSdPm, Pcrit and Psen can vary between several soybean genotypes?
  4. How would you call this model? From my ignorance this is a “dynamic model” but my definition may be too broad!

Thanks in advance.

soy_dev.stan (4.0 KB) soy_dev_fake_data.R (5.4 KB) soy_dev_to_be_edited.stan (3.3 KB) weather.csv (100.3 KB)

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  grid      methods  
[8] base     

other attached packages:
 [1] brms_2.10.0        Rcpp_1.0.3         rstan_2.19.2       StanHeaders_2.19.0
 [5] forcats_0.4.0      stringr_1.4.0      dplyr_0.8.3        purrr_0.3.3       
 [9] readr_1.3.1        tidyr_1.0.0        tibble_2.1.3       tidyverse_1.3.0   
[13] ggplot2_3.2.1     

loaded via a namespace (and not attached):
 [1] nlme_3.1-140         matrixStats_0.55.0   fs_1.3.1            
 [4] xts_0.11-2           lubridate_1.7.4      threejs_0.3.1       
 [7] httr_1.4.1           tools_3.6.1          backports_1.1.5     
[10] R6_2.4.1             DT_0.10              DBI_1.0.0           
[13] lazyeval_0.2.2       colorspace_1.4-1     withr_2.1.2         
[16] tidyselect_0.2.5     gridExtra_2.3        prettyunits_1.0.2   
[19] processx_3.4.1       Brobdingnag_1.2-6    compiler_3.6.1      
[22] cli_1.1.0            rvest_0.3.5          shinyjs_1.0         
[25] xml2_1.2.2           labeling_0.3         colourpicker_1.0    
[28] scales_1.1.0         dygraphs_1.1.1.6     ggridges_0.5.1      
[31] callr_3.3.2          digest_0.6.23        base64enc_0.1-3     
[34] pkgconfig_2.0.3      htmltools_0.4.0      dbplyr_1.4.2        
[37] fastmap_1.0.1        htmlwidgets_1.5.1    rlang_0.4.2         
[40] readxl_1.3.1         rstudioapi_0.10      shiny_1.4.0         
[43] farver_2.0.1         generics_0.0.2       zoo_1.8-6           
[46] jsonlite_1.6         gtools_3.8.1         crosstalk_1.0.0     
[49] inline_0.3.15        magrittr_1.5         loo_2.1.0           
[52] bayesplot_1.7.1      Matrix_1.2-17        munsell_0.5.0       
[55] abind_1.4-5          lifecycle_0.1.0      stringi_1.4.3       
[58] pkgbuild_1.0.6       plyr_1.8.4           parallel_3.6.1      
[61] promises_1.1.0       crayon_1.3.4         miniUI_0.1.1.1      
[64] lattice_0.20-38      haven_2.2.0          hms_0.5.2           
[67] zeallot_0.1.0        ps_1.3.0             pillar_1.4.2        
[70] igraph_1.2.4.2       markdown_1.1         shinystan_2.5.0     
[73] codetools_0.2-16     reshape2_1.4.3       stats4_3.6.1        
[76] rstantools_2.0.0     reprex_0.3.0         glue_1.3.1          
[79] modelr_0.1.5         vctrs_0.2.0          httpuv_1.5.2        
[82] cellranger_1.1.0     gtable_0.3.0         assertthat_0.2.1    
[85] mime_0.7             xtable_1.8-4         broom_0.5.2         
[88] coda_0.19-3          later_1.0.0          rsconnect_0.8.15    
[91] shinythemes_1.1.2    bridgesampling_0.7-2
1 Like

Potential answers to this question will widen our ability to tackle agronomic questions through modelling physiological processes using Stan, so I will be following this thread up close!

1 Like

I am sorry that neither me nor anyone else responded to your post yet. I currently don’t have time myself to dive into this in all detail. Basically, what you may be looking for is a statistician proficient in Stan with whom you can collaborate on this project who has then more incentives to invest considerable amounts of time to help you with the Stan modelling.

2 Likes

Hi Paul,

No problem at all. And many thanks for your response. We would be very interested in joining a statistician proficient in Stan that could mentor us into how to develop hierarchical discrete-time dynamic models. Time-frame would not be an issue since we are certain that this is the way forward in our present and future projects. It would be great if we can share collaboration with someone in the Forum who can enlighten us and perhaps share coauthorship with us in a paper for a plant biology/agricultural audience.

I am wondering if stripping back the model to something simple like a non-linear curve with no inflection point would be a better starting point? That way you could get a handle on the Stan and recovering your parameters from a simpler case. I did this with another group of stake holders. They have similar data (but for fish) so we started with the simple curve and moved into a Von Bertalanffy Growth Function model. With the idea of moving the model into a hierarchical model with photo-period and what not added in.

Thanks @Ara_Winter! I will try your suggestion and let you know soon. But I must recognise that I didn’t expect an issue with the nonlinear formulae. I thought my way of setting mu to zero when time = 1 might be the weakest part of my code. Do you agree with the way I specified it (line 92 of soy_dev.stan)? I’ll keep working on specifying the photoperiod multiplier as a (reversed) Von Bertalanffy growth curve.

@Alan_Severini let me get my two non-linear examples up on github for you to look at. It’s worth noting that I did the bare skeleton on these and then let folks play them. If I can get some time I can dig deeper into your model. I downloaded it and the sim data file but am currently swamped writing up a model for work ;) I can get to it this weekend but hopefully someone else can help before that.

EDIT:
After you have the Von B up and running maybe code up a seasonal Von B but swap in the photo-period for the seasonality? Spitballing here I haven’t really thought that through yet. It might be a terrible idea.

1 Like

Sorry about how messy the models are!
Messy models

Messy models look good! Thanks. Yes, I will code the Von B with photoperiod in the abscissa. I have one less parameter there since the asymptote equals 1 ;)

1 Like

Just got this up and running. So your prior choice is dictating your parameters. When I set
target += normal_lpdf(b_dOptSdPm | 40, 1.5);

Then I can recover the simulated value. However this is well less than ideal. :) We need to dive in the model code to see what’s driving this.

Thanks! I’ve just run the model with the von Bertalanffy growth model replacing the original phoFun function…

  real phoFunVB(real pho, real Pcrit, real Psen) {
    real x;
    if (pho < Pcrit)
      x = 1 - exp(Psen * (pho - Pcrit)); 
    else
      x = 0;
    return x;
  }

and the same issue appeared: b_dOptSdPm is much smaller than expected:

Inference for Stan model: soy_dev.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.
Inference for Stan model: soy_dev.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.
                 mean se_mean    sd    2.5%     25%     50%     75%   97.5%
b_dOptFiFl[1]   15.01    0.06  6.04    3.22   10.85   15.04   19.18   26.72
b_dOptFlSd[1]   15.89    0.07  7.24    2.00   10.91   15.94   20.87   29.76
b_dOptSdPm[1]   13.11    0.01  0.46   12.17   12.81   13.11   13.41   14.02
b_Pcrit[1]      15.79    0.00  0.20   15.45   15.65   15.78   15.91   16.22
b_Psen[1]        0.55    0.00  0.06    0.44    0.51    0.55    0.59    0.67
sigma            0.99    0.00  0.03    0.93    0.97    0.99    1.01    1.05
lp__          -956.12    0.51 16.01 -988.04 -966.32 -955.79 -945.19 -925.62
              n_eff Rhat
b_dOptFiFl[1]  8957    1
b_dOptFlSd[1]  9660    1
b_dOptSdPm[1]  3274    1
b_Pcrit[1]     3197    1
b_Psen[1]      4629    1
sigma          2696    1
lp__            978    1

Looking forward for you to check the model, @Ara_Winter. Perhaps the fact that sigma and energy__ correlate with each other indicates that the model needs to be reparameterized?

soy_dev_fake_data_VB.R (5.8 KB) soy_dev.stan (4.2 KB) weather.csv (100.3 KB)

1 Like

@Alan_Severini Oh nice find! Yeah it looks like trying to reparameterize the model would be a good idea.

1 Like

Also looking at your code all the vector[N]'s, vector[N] mu, Yl[Jmi] = Ymi, mu[1] = 0, and for loop feel like they should be in the transformed parameters block. But I am still reading up on this type of model so maybe it’s fine?

I was also messing around with the mu in the loop so that might mess things up for ya. Sorry forgot to change that back to what you had.

soy_dev_winter_mod.stan (4.0 KB)

No problem @Ara_Winter – I saw the change in mu. Also, there was an update in brms and C_1 and C_2 are now C[, 1] and C[, 2], respectively. I changed that in the attached stan code, but am still unable to retrieve b_dOptSdPm. I also narrowed the prior of b_Pcrit, since we know from plant ecophysiology that it should be around 11 to 14 h for soybean:
target += normal_lpdf(b_Pcrit | 13, 1);

And I replaced mu[1] = 0 by mu[1] = init, where init is

real<lower=0> init; // initial value of mu

with prior

target += normal_lpdf(init | 0, 0.01);

but still no luck:

Inference for Stan model: soy_dev_winter_mod.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                 mean se_mean    sd    2.5%     25%     50%     75%   97.5%
b_dOptFiFl[1]   15.18    0.07  5.98    3.54   11.20   15.12   19.14   26.94
b_dOptFlSd[1]   16.12    0.10  7.02    2.27   11.39   16.11   20.93   29.58
b_dOptSdPm[1]   13.37    0.01  0.38   12.61   13.12   13.38   13.63   14.10
b_Pcrit[1]      11.30    0.01  0.31   10.69   11.10   11.31   11.51   11.89
b_Psen[1]        0.17    0.00  0.02    0.14    0.16    0.17    0.18    0.21
init             0.02    0.00  0.01    0.00    0.01    0.02    0.02    0.04
sigma            0.94    0.00  0.03    0.88    0.92    0.94    0.96    1.00
lp__          -919.79    0.46 16.12 -952.37 -930.30 -919.42 -908.49 -889.17
              n_eff Rhat
b_dOptFiFl[1]  6670    1
b_dOptFlSd[1]  5355    1
b_dOptSdPm[1]  2050    1
b_Pcrit[1]     1668    1
b_Psen[1]      1752    1
init           3546    1
sigma          2537    1
lp__           1236    1

Samples were drawn using NUTS(diag_e) at Sun Jan 12 20:42:03 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

soy_dev_winter_mod.stan (4.2 KB)

1 Like

Ok. Hmm I’ll look at the new model. So state space models are in this same category of model types. Those I am familiar with. So let me compare a few I have coded up and see we can figure out something from there.

1 Like

Thanks! I know almost nothing about state space models, but I presume that in my case the process error would be nil? Observation error would be the only source of error – I guess. Good luck with that! I’ll be looking forward to your progress.

It looks like there is a new variable in your model now? Kc? Stan throws an error about it when I try and run the latest soy_dev_winter_mod.stan .

EDIT:
I went back to the earlier model without the Kc and added in the update to the prior. I threw a print statement into the mu loop (this will eventually crash R btw). I noticed after the first else if statement mu was returning nan’s at some point.

I think that Kc is generated by brms::make_stancode(). I’m using brms 2.11.0. Which version is yours?

Let me check!

1 Like

Can you tell me a bit more about how you picked your priors for the model?