Model consistently under estimates parameters


#1

I’ve been trying to fit the model below, with simulated data, (below that). The whole thing is a pretty simple system. You have a variable MS, which measures steady state protein levels, and a variable ribo - which should measure their synthesis, modulo some constant factor rTE.

If ribo was constant, this would be a simple ODE. However since it varies, things get complicated. rTE and the degredation constant for (‘deg’ or it’s log transform ‘ldeg’) are non identifiable, in cases where the ribo variable changes in such a way that it looks like a degredation curve.

Still, This model should I think, work pretty well when riboseq changes upwards. And hopefully some kind of shared distribution would allow those cases to inform others. Below is my attempt at this. It’s spitting out warnings about divergences, and if I scale up the number of fake genes i fit it to, it goes completely nuts and gives reports of what look like totally multimodal posteriors. It is also, even on small input datasets, biased downwards - it consistently estimates an rTE lower than the one I used to simulate the data.

I’m kind of out of my depth here so any help would be very much appreciated:

  1. Am I doing something obviously wrong in model specification? E.g. my gamma prior?
  2. How can I check that the parameters I use to simulate data work best in the model I"ve specified? Is there a way to evaluate the model at specific parameters?
  3. Why do rTE and ldeg look colinear in the pairs plot?
  4. Why am I getting this bimodality in tau - the standard deviation of the MS?
#Runs pretty acceptably
data {
  int<lower=0> G;          //  genes
  int<lower=0> T;          //  timepoints
  int<lower=0> K;          //   replicates
  vector<lower=0>[G] MS[T,K];  // mass spec data
  vector<lower=0>[G] ribo[T]; // riboseq (synthesis) data

}
parameters {
  vector[G] ms0logratio;  // starting mass spec relative to rTE

  vector<lower=-10,upper=0>[G] ldeg;  //amount degraded each tp - log scale
  
  
  real<lower=0> rTEmu; //
  real<lower=0> rTEsig; //
  
  real<lower=0> tau;
  vector<lower=0>[G] rTE;
  
  
  real<lower=0>ms0lr_sig;
  real<lower=0>ms0lr_mu;
}

transformed parameters {

  vector<lower=0>[G] deg; 
  vector[G] prot[T];
  vector<lower=0>[G] MS0;
 
  #defining our starting parameter MS0 in terms of it's ratio to the production
  MS0 = rTE .* exp(ms0logratio);
  
  #defining deg in terms of logdeg
  deg = exp(ldeg);
  
  prot[1,1:G] = MS0[1:G];
  
  for(t in 2:T){
    prot[t,1:G] = (ribo[t,1:G] .* rTE[1:G]);
    for(ti in 1:(t-1)){
      prot[t,1:G] += ((prot[ti,1:G]) .* exp(- ti * deg[1:G])) ;
    }
  }  
}

model{
  
  ms0lr_sig ~ gamma(1,1);// weak prior on the ms0lr variance
  
  deg ~ beta(0.25,0.175); //weak prior on amount degraded between timepoints.
  
  ms0logratio ~ normal(rep_vector(ms0lr_mu,G),rep_vector(ms0lr_sig,G)); # join distribution of the log ratios for m0
  rTE ~ normal(rep_vector(rTEmu,G),rep_vector(rTEsig,G)); # join distribution of the rTE valuess
  
  for(t in 1:T){//for each tp
    for(k in 1:K){//for each replicate
      MS[t,k,1:G] ~ normal(prot[t,1:G], tau ) ;//normally distributed MS signal
    }
  }
  
  
}



#function simulating ms given a degredation constant, a constant value for rTE, starting ms, the riboseq,
simulate_data <- function(ldeg,rTE,ribo,prot0,ms_sd=1,n_reps=K){
  deg = exp(ldeg)
  prot = rep(NA,tps)
  prot[1] = prot0
  for (i in 2:length(prot)){

    prot[i] = rTE*ribo[i] + (prot[i-1]*(1-deg)) 
  }
  prot

  MS=replicate(n_reps,{rnorm(n=seq_along(prot),mean=prot,sd=ms_sd)})

  data_frame(ldeg=ldeg,rTE=rTE,ribo=list(ribo),prot0=prot0,ms_sd=ms_sd,MS=list(MS))
}

#some basic patterns for the synthesis change pattern
tps = 5
ribopatterns = list(
  increasing=c(100,100,300,300,500),
  stable=c(200,200,200,200,200),
  decreasing=c(300,300,100,100,50)
)
#constants describing data shape
tps = 5
n_genes = 7
K = 3
#mostly increasing, a few stable/decreasing genes
ribopatinds <- sample(1:3,n_genes,rep=T,prob=c(.9,.05,.05))
ribopatinds <- c(1,1,1,1,1,2,3)

ribopatinds[1]<-1
message(capture.output(table(ribopatinds))%>%paste0(collapse='\n'))
#choose a pattern
ribo = ribopatterns[ribopatinds]%>%simplify2array()%>%t
ribo = ribo + rpois(length(ribo),10)
#simulaterTE values
rTEs = rnorm(n_genes,1000,10)
#simulate degredations based on a half life distribution with median 48, sd~30 in the log10 scale 
k= - log(0.5) / (10^rnorm(n_genes,log10(48),log10(30)))
degs <- 1-exp(-k*36)
#simualte the ms0 starting values
ms0=ribo[,1]*exp(rnorm(n_genes,0,3))*rTEmu
#
simdata <- map_df(1:n_genes,~simulate_data(
  log(degs[.]),
  rTEs[.],
  ribo[.,],
  ms0[.],
  ms_sd=1)
)

ms_array <- simdata$MS%>%simplify2array%>%aperm(c(1,2,3))
ribo_mat <- simdata$ribo%>%simplify2array()

stanfit <- rstan::stan(file='src/degmodel.stan',data=list(G=n_genes,T=tps,K=K,
                        MS=ms_array,ribo=ribo_mat),
                       control=list(adapt_delta=0.95,max_treedepth=15),
                       # init=function(z) 

Pairs plot: