Hi everyone,
We are running into some issues with initializing our negative binomial distribution (log alternative parameterization). We are trying to model the counts of transcripts (in RNASeq, etc.) using a negative binomial distribution, where the LFC follows a normal distribution and the variance follows a Cauchy distribution.
However, we get the “Initialization between (-2, 2) failed after 100 attempts” error message:
“Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.”
We tried various things, including limiting our Cauchy distribution:
reciprocal_phi ~ cauchy(0., 5.); -->
reciprocal_phi ~ cauchy(0., 1.);
and also specifying an init_r of 0.1, but we are still getting the error message. There’s probably something pretty fundamental we are missing, would appreciate any suggestions.
Our transcripts count data is a matrix of 5 genomic regions X 10 samples (as an initial trial run). None of the rows or columns sum to zero. Another strange thing is that I am also running the full counts matrix of 20k genomic regions X 10 samples and it seemed to get over the initialization…?
Thank you! Posting the stan code and the python code below:
Stan:
data {
int<lower=0> N; // number of samples
int<lower=0> D; // number of dimensions
int<lower=0> p; // number of covariates
real depth[N]; // sequencing depths of microbes
matrix[N, p] x; // covariate matrix
int y[N, D]; // observed microbe abundances
}
parameters {
// parameters required for linear regression on the species means
matrix[p, D-1] beta;
real reciprocal_phi;
}
transformed parameters {
matrix[N, D-1] lam;
matrix[N, D] lam_clr;
matrix[N, D] prob;
vector[N] z;
real phi;
phi = 1. / reciprocal_phi;
z = to_vector(rep_array(0, N));
lam = x * beta;
lam_clr = append_col(z, lam);
}
model {
// setting priors ...
reciprocal_phi ~ cauchy(0., 1.);
for (i in 1:D-1){
for (j in 1:p){
beta[j, i] ~ normal(0., 1.); // uninformed prior
}
}
// generating counts
for (n in 1:N){
for (i in 1:D){
target += neg_binomial_2_log_lpmf(y[n, i] | depth[n] + lam_clr[n, i], phi);
}
}
}
Python code:
# Load the data
reads_array = pd.read_csv('/mnt/ceph/users/dli/atacseq/pca_clustering/counts_universe_gut/subset_counts.csv', sep='\t', index_col=0)
metadata = pd.read_csv('/mnt/ceph/users/dli/atacseq/pca_clustering/counts_universe_gut/metadata.csv', sep=' ', index_col=0)
formula = 'C(HistoStandard) + C(Batch) + C(Diagnosis)'
X = dmatrix(formula, metadata, return_type='dataframe')
Y = reads_array.T
data = {
'N' : X.shape[0],
'D' : reads_array.shape[0],
'p' : X.shape[1],
'depth' : np.log(reads_array.sum(axis=0).values),
'x' : X.values,
'y' : Y.values.astype(np.int64)
}
sm = pystan.StanModel(model_code=stan_model)
fit = sm.sampling(data=data, iter=1000, chains=4, init_r=.1, warmup=500)
res = fit.extract(permuted=True)