Effect of drug on gene expression using Bayes

Would I be correct in my understanding that the data consists of a single count observation per combination of person and gene?

In the interim, it strikes me that while there is the potential for correlation structure in your experiment that you might benefit from incorporating in your modelling, the specific notion that only some genes’ expression will be affected by the drug is best captured by either a mixture model or something like the horseshoe prior. @avehtari might advise on best use of the horseshoe (and whether it’s recommended over the mixture model below), but here’s how I’d do it with a mixture and considering only the post-test data (to highlight the mixture approach, I’ll follow-up with a pre/post model that includes some correlation structure):

data{
  int num_obs ;
  int obs_count[num_obs] ;
  int num_subj ;
  int subj_for_obs[num_obs] ;
  int num_genes ;
  int gene_for_obs[num_obs] ;
  int<lower=0,upper=1> pre_or_post[num_obs] ;  
  int<lower=0,upper=1> drug_or_placebo[num_obs] ;
}
parameters{
  //subject parameters
  real subj_mean ;
  real<lower=0> subj_sd ;
  vector[num_subj] subj ;
  //gene parameters:
  real affected_genes_mean ;
  real<lower=0> affected_genes_sd ;
  real<lower=0> null_genes_sd ;
  real<lower=0,upper=1> prop_genes_affected ;
  vector[num_genes] gene ;
}
model{
  subj ~ normal(subj_mean,subj_sd) ;
  prop_genes_affected ~ ... //put your prior on the proportion of genes affected here
  for(this_gene in 1:num_genes){ //have to loop for log_mix
    target += log_mix(
      prop_genes_affected
      , normal_lpdf(gene[this_gene] | affected_genes_mean,affected_genes_sd)
      , normal_lpdf(gene[this_gene] | 0,null_genes_sd)
    ) ;
  }
  obs_count ~ count_distribution( //insert whatever count dist is best for your data here
    subj[subj_for_obs] 
    + gene[gene_for_obs]
  ) ;
}

That’s the “centered parameterization” version, but it’s easy to add non-centered to either or both subj & gene.

3 Likes