Using Poisson likelihood with spike counts using unequal bin sizes

I’m comparing the difference in firing rates for a population of neurons (i.e., cells) in the brain under two conditions (A and B). For each neuron, we have four data points:

- sample_time_A
- spike_count_A
- sample_time_B
- spike_count_B

For each neuron in each condition, we counted the number of events (spike_count) occurring within a fixed interval (sample_time in seconds). We are interested in determining whether the rate (number of events per second) differs between the two conditions across the two populations. This is the distribution of the rates (spike_count_A/sample_time_A is blue, spike_count_B/sample_time_B is blue):

Now, this seems like a very straightforward t-test or Wilcoxon sign test to see if the means of A and B are different. I can just compute rate_A and rate_B (i.e. spike_count/sample_time) and use a paired t-test or Wilcoxon sign test. Both return p-values that are significant (<= 0.01).

However, I thought it would be fun to test my rudimentary knowledge of Bayesian statistics. Since rate_A and rate_B can be modeled by a gamma distribution and the observed variable (spike_count) can be a poisson, why not try it out? However, I am getting a result that tells me the difference in rate between the two conditions is not significant (i.e., the 90% credible interval overlaps with 0) as shown by the posterior for the difference in the means of rate_A and rate_B.

I calculated the posterior for the mean of A by dividing the posterior of sr_alpha_A by the posterior of sr_alpha_B (since the expected value of a gamma distribution is alpha divided by beta). For B, the calculation was (sr_alpha_A + sr_alpha_delta_B) / (sr_beta_A + s r_beta_delta_B). The posterior for the difference in the means of A and B is the difference between the relevant posteriors.

The lack of significance is fine with me. If it’s not significant, it’s important to report that result in the paper. However, I thought that the paired t-test had pretty weak assumptions about normality. Further, isn’t the Wilcoxon supposed to be non-parameteric? So, shouldn’t the Bayesian and frequentist approaches agree? This makes me wonder if I set up my model wrong. Here it is:

data {
	int<lower=1> n_cells;
	real<lower=0> sample_time_A[n_cells];
	real<lower=0> sample_time_B[n_cells];
	int<lower=0> spike_count_A[n_cells];
	int<lower=0> spike_count_B[n_cells];
}

parameters {
	real<lower=0> sr_alpha_A;
	real<lower=0> sr_beta_A;
	real sr_alpha_delta_B;
	real sr_beta_delta_B;
	
	real<lower=0> sr_cell_A[n_cells];
	real<lower=0> sr_cell_B[n_cells];
}

model {
	sr_alpha_A ~ gamma(0.5, 0.1);
	sr_beta_A ~ gamma(0.1, 0.1);
	sr_alpha_delta_B ~ normal(0, 0.01);
	sr_beta_delta_B ~ normal(0, 0.01);
	
	sr_cell_A ~ gamma(sr_alpha_A, sr_beta_A);
	sr_cell_B ~ gamma(sr_alpha_A + sr_alpha_delta_B, sr_beta_A + sr_beta_delta_B);
	
	for (i in 1:n_cells) {
		spike_count_A[i] ~ poisson(sr_cell_A[i] * sample_time_A[i]);
		spike_count_B[i] ~ poisson(sr_cell_B[i] * sample_time_B[i]);
	}
}

I would really appreciate any pointers!

So, I just realized that the Bayesian model is not properly accounting for the fact that the observations are paired. Any advice on how I should do this? Most of the examples I’ve seen are for normal distributions, so it’s a bit unclear how I might do this when dealing with gamma distributions.

Problem solved. It was a matter of ensuring that the paired samples were modeled properly in the model. New model below:

data {
    int<lower=1> n_cells;
    real<lower=0> sample_time_A[n_cells];
    real<lower=0> sample_time_B[n_cells];
    int<lower=0> spike_count_A[n_cells];
    int<lower=0> spike_count_B[n_cells];
}

parameters {
    real<lower=0> rate_alpha;
    real<lower=0> rate_beta;
    real<lower=0> rate_cell[n_cells];
    
    real<lower=0> rate_ratio_mean;
    real<lower=0> rate_ratio_sd;
    real<lower=0> rate_ratio_cell[n_cells];
}

transformed parameters {
    real rate_A;
    real rate_B;
    real rate_change;
    real rate_cell_A[n_cells];
    real rate_cell_B[n_cells];
    real rate_cell_change[n_cells];
    
    rate_A = rate_alpha/rate_beta;
    rate_B = rate_A * rate_ratio_mean;
    rate_change = rate_B - rate_A;
    
    rate_cell_A = rate_cell;
    for (i in 1:n_cells) {
        rate_cell_B[i] = rate_cell[i] * rate_ratio_cell[i];
        rate_cell_change[i] = rate_cell_B[i] - rate_cell_A[i];
    }
}

model {
    rate_alpha ~ gamma(0.5, 0.1);
    rate_beta ~ gamma(0.1, 0.1);
    
    rate_ratio_mean ~ normal(1, 1)T[0, ];
    rate_ratio_sd ~ normal(0, 1)T[0, ];
    
    rate_cell ~ gamma(rate_alpha, rate_beta);
    rate_ratio_cell ~ normal(rate_ratio_mean, rate_ratio_sd);
    
    for (i in 1:n_cells) {
        spike_count_A[i] ~ poisson(rate_cell[i] * sample_time_A[i]);
        spike_count_B[i] ~ poisson((rate_cell[i] * rate_ratio_cell[i]) * sample_time_B[i]);
    }
}
2 Likes

Thanks for sharing, what were the final results?

And I don’t understand the T[0, ] here

   rate_ratio_mean ~ normal(1, 1)T[0, ];

do you mind explaining it?

It truncates the distribution. normal(mean, sd)T[lower, upper]. If lower or upper are left blank, the distribution is unconstrained in that direction. So, using normal(0, 1)T[0, ] creates a half-normal distribution with a standard deviation of 1. You can create a half-cauchy using this approach as well.

1 Like

That’s what I thought, but isn’t that redundant with the way you define this parameter?

real<lower=0> rate_ratio_mean;

I don’t know enough about STAN to know if it was necessary to do both. Sounds like it’s not necessary? I thought T[0,] was the correct way to define half-priors. However, I found that I must do real<lower=0> to avoid initialization errors. If I just use T[0,], then I get initialization errors (since STAN is trying to use negative values). So, that’s why I have both in my code.

Is the correct way to implement a half-prior to use real<lower=0>?

I think you can just use

real<lower=0> rate_ratio_mean;

Thanks!