Hello all,
I am very interested in using this model for some enquiries:
The model is very simple, we have f, vector or uniform distributed frequencies between 0 and 0.5, fst a scalar parameter relating populations and beta, the genetic effects for each f.
The authors state that for 10000 snps their model needs one day to run, in my machine its less than an hour, still I think their original code can be improved.
In my case I plan to use it for hundreds of thousands of betas, thus any performance gain would be welcomed. I would be grateful if you have any thoughts on how to improve performance.
This is the original authors code,
data {
int<lower=0> J; // number of snps
real beta[J]; // effect sizes
real<lower=0> f[J]; // minor allele frequences
real<lower=0> Fst; // Known value of Fst
}
parameters {
real S;
real<lower=0> sigma_b;
real<lower=0,upper=0.5> p[J]; // true minor allele frequences
}
transformed parameters {
}
model {
S ~ uniform(-2,2);
sigma_b ~ uniform(0,2);
for(j in 1:J){
f[j] ~ beta( ((1-Fst)/Fst)*p[j], ((1-Fst)/Fst)*(1-p[j]) );
beta[j] ~ normal(0,pow(p[j]*(1-p[j]),S/2)*sigma_b);
}
}
which I have vectorized as follows
data {
int<lower=0> J; // number of snps
vector[J] beta; // effect sizes
vector<lower=0,upper=1>[J] f; // minor allele frequences
real<lower=0> Fst; // Known value of Fst
}
parameters {
real<lower=-2,upper=2> S;
real<lower=0, upper=2> sigma_b;
vector<lower=0,upper=0.5>[J] p; // true minor allele frequences
}
transformed parameters {
vector[J] snpsd;
for(j in 1:J)
snpsd[j] = pow(p[j]*(1-p[j]),S/2)*sigma_b;
}
model {
// vectorised likelihood
f ~ beta( ((1-Fst)/Fst)*p, ((1-Fst)/Fst)*(1-p ));
beta ~ normal(0,snpsd);
}
Which really does not make much difference performance-wise from what I tested.