I’ve been trying to fit the model below, with simulated data, (below that). The whole thing is a pretty simple system. You have a variable MS, which measures steady state protein levels, and a variable ribo - which should measure their synthesis, modulo some constant factor rTE.
If ribo was constant, this would be a simple ODE. However since it varies, things get complicated. rTE and the degredation constant for (‘deg’ or it’s log transform ‘ldeg’) are non identifiable, in cases where the ribo variable changes in such a way that it looks like a degredation curve.
Still, This model should I think, work pretty well when riboseq changes upwards. And hopefully some kind of shared distribution would allow those cases to inform others. Below is my attempt at this. It’s spitting out warnings about divergences, and if I scale up the number of fake genes i fit it to, it goes completely nuts and gives reports of what look like totally multimodal posteriors. It is also, even on small input datasets, biased downwards - it consistently estimates an rTE lower than the one I used to simulate the data.
I’m kind of out of my depth here so any help would be very much appreciated:
- Am I doing something obviously wrong in model specification? E.g. my gamma prior?
- How can I check that the parameters I use to simulate data work best in the model I"ve specified? Is there a way to evaluate the model at specific parameters?
- Why do rTE and ldeg look colinear in the pairs plot?
- Why am I getting this bimodality in tau - the standard deviation of the MS?
#Runs pretty acceptably
data {
int<lower=0> G; // genes
int<lower=0> T; // timepoints
int<lower=0> K; // replicates
vector<lower=0>[G] MS[T,K]; // mass spec data
vector<lower=0>[G] ribo[T]; // riboseq (synthesis) data
}
parameters {
vector[G] ms0logratio; // starting mass spec relative to rTE
vector<lower=-10,upper=0>[G] ldeg; //amount degraded each tp - log scale
real<lower=0> rTEmu; //
real<lower=0> rTEsig; //
real<lower=0> tau;
vector<lower=0>[G] rTE;
real<lower=0>ms0lr_sig;
real<lower=0>ms0lr_mu;
}
transformed parameters {
vector<lower=0>[G] deg;
vector[G] prot[T];
vector<lower=0>[G] MS0;
#defining our starting parameter MS0 in terms of it's ratio to the production
MS0 = rTE .* exp(ms0logratio);
#defining deg in terms of logdeg
deg = exp(ldeg);
prot[1,1:G] = MS0[1:G];
for(t in 2:T){
prot[t,1:G] = (ribo[t,1:G] .* rTE[1:G]);
for(ti in 1:(t-1)){
prot[t,1:G] += ((prot[ti,1:G]) .* exp(- ti * deg[1:G])) ;
}
}
}
model{
ms0lr_sig ~ gamma(1,1);// weak prior on the ms0lr variance
deg ~ beta(0.25,0.175); //weak prior on amount degraded between timepoints.
ms0logratio ~ normal(rep_vector(ms0lr_mu,G),rep_vector(ms0lr_sig,G)); # join distribution of the log ratios for m0
rTE ~ normal(rep_vector(rTEmu,G),rep_vector(rTEsig,G)); # join distribution of the rTE valuess
for(t in 1:T){//for each tp
for(k in 1:K){//for each replicate
MS[t,k,1:G] ~ normal(prot[t,1:G], tau ) ;//normally distributed MS signal
}
}
}
#function simulating ms given a degredation constant, a constant value for rTE, starting ms, the riboseq,
simulate_data <- function(ldeg,rTE,ribo,prot0,ms_sd=1,n_reps=K){
deg = exp(ldeg)
prot = rep(NA,tps)
prot[1] = prot0
for (i in 2:length(prot)){
prot[i] = rTE*ribo[i] + (prot[i-1]*(1-deg))
}
prot
MS=replicate(n_reps,{rnorm(n=seq_along(prot),mean=prot,sd=ms_sd)})
data_frame(ldeg=ldeg,rTE=rTE,ribo=list(ribo),prot0=prot0,ms_sd=ms_sd,MS=list(MS))
}
#some basic patterns for the synthesis change pattern
tps = 5
ribopatterns = list(
increasing=c(100,100,300,300,500),
stable=c(200,200,200,200,200),
decreasing=c(300,300,100,100,50)
)
#constants describing data shape
tps = 5
n_genes = 7
K = 3
#mostly increasing, a few stable/decreasing genes
ribopatinds <- sample(1:3,n_genes,rep=T,prob=c(.9,.05,.05))
ribopatinds <- c(1,1,1,1,1,2,3)
ribopatinds[1]<-1
message(capture.output(table(ribopatinds))%>%paste0(collapse='\n'))
#choose a pattern
ribo = ribopatterns[ribopatinds]%>%simplify2array()%>%t
ribo = ribo + rpois(length(ribo),10)
#simulaterTE values
rTEs = rnorm(n_genes,1000,10)
#simulate degredations based on a half life distribution with median 48, sd~30 in the log10 scale
k= - log(0.5) / (10^rnorm(n_genes,log10(48),log10(30)))
degs <- 1-exp(-k*36)
#simualte the ms0 starting values
ms0=ribo[,1]*exp(rnorm(n_genes,0,3))*rTEmu
#
simdata <- map_df(1:n_genes,~simulate_data(
log(degs[.]),
rTEs[.],
ribo[.,],
ms0[.],
ms_sd=1)
)
ms_array <- simdata$MS%>%simplify2array%>%aperm(c(1,2,3))
ribo_mat <- simdata$ribo%>%simplify2array()
stanfit <- rstan::stan(file='src/degmodel.stan',data=list(G=n_genes,T=tps,K=K,
MS=ms_array,ribo=ribo_mat),
control=list(adapt_delta=0.95,max_treedepth=15),
# init=function(z)
Pairs plot: