Hello,
I have a model that is working well, however it takes really long. Given the amount of parameters and the simplicity of the model I would expect a much faster execution
I have 25 dimensions, 500 points, 100 samples
the log data has mean 6 sd 2, so the original data has big range. Would standardisation make sense in this case?
Thus the formula is like:
S = 100
P = 500
y[p, s] ~ normal(a[s,1] * b[p,1] + … + a[s,25] * b[p,25]);
data{
int G; // Number of marker genes
int P; // Number of cell types
int S; // Number of mix samples
int R; // Number of covariates (e.g., treatments)
int K;
int group[S]; // Array of covariates
vector[G] y[S]; // Mix samples matrix
vector[P] expr[G];
}
parameters{
simplex[P] pi[S]; // Matrix of cell type proportions
real<lower=0> sigma; // error
vector<lower=0>[P] alpha[R]; // Prior to pi
}
transformed parameters{
vector[P] log_pi[S];
vector[G] expr_conv[S];
for(s in 1:S) log_pi[s] = log(pi[s]);
for(s in 1:S) for(g in 1:G) {
expr_conv[s,g] = log_sum_exp(expr[g] + log_pi[s]);
}
}
model{
sigma ~ normal(0,0.05); // Prior to sigma
for(s in 1:S) alpha[group[s]] ~ gamma(1.05,0.05);
for(s in 1:S) pi[s] ~ dirichlet(alpha[group[s]]); //proportions
for(s in 1:S) y[s] ~ student_t(2, expr_conv[s], sigma); // Calculating probability of mix
}
Some suggestions? (Some simples regression tools in R take seconds, my model several hours, I would like to stay competitive) Thanks!
P.S. I am logging y outside stan, should I use log_normal inside stan?