Borrowing Strength Regression Model with Different Numbers of Observations

I would like to estimate a linear time trend with varying slopes/varying intercepts model for hierarchical data (borrowing strength). I have found examples of how to do this when each row in the dataset represents an individual observation and groups with more observations have more individual strength while groups with few observations have less strength and are pulled toward the overall mean. (An example radon model from Gelman and Hill textbook.)

Unfortunately, the data I am working with are aggregated. I know how many observations there were at each time point, but I only have recorded the mean in that group. I would like to estimate slopes/intercepts that are pulled more toward the overall mean when there is less data to support the slope/intercept estimate.

I thought I could do this by specifying that the mean I observe is normally distributed with mean equal to a linear function of time and a standard deviation proportional to the number of observations that underlie the slope/intercept estimate (as shown in the model below). I created some simulated data (also below) to test this out, but unfortunately I get divergences. I have increased the number of iterations, increased adapt_delta, but it didn’t fix the problem. I think I have a model mis-specification, and I would like some feedback about what else I could try or how else I might conceptualize this problem. Thank you very much!

data {
  int<lower=0> N;
  int<lower=0> ncompany;
  vector[N] y;
  vector[N] x;
  int company[N];
  vector[N] sigma;
parameters {
  real<lower=0> sigma_a;
  real<lower=0> sigma_b;
  vector[ncompany] a;
  vector[ncompany] b;
  real mu_a;
  real mu_b;
model {
  mu_a ~ normal(0, 10000);
  mu_b ~ normal(0, 10000);

  a ~ normal(mu_a, sigma_a);
  b ~ normal(mu_b, sigma_b);
  y ~ normal(a[company] + b[company].*x, sigma);

options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')

t <- 1:12
y <- c(5*t + 3
       , 5*t + 6
       , 4*t + 7
       , 4*t + 12
       , 3*t + 2
       , 3*t + 4
       , 15*t + 2)
y <- y + rnorm(length(y), 0, 1)

vsvi_dat <- list(N = length(y)
                    , ncompany = 7
                    , y = y
                    , x = rep(t, 7)
                    , company = c(rep(1:6, each=12), rep(7, 12))
                    , sigma = sapply(c(rnorm(72, .001, 0.0005), rnorm(12, 0.06, 0.02))
                                            , function(x) max(c(x, 0.00001)))

fit <- stan(file = 'vsvi.stan', data = vsvi_dat
            , control=list(max_treedepth = 20
                           , adapt_delta = 0.95))

Some problems I see at first glance:

You might get better results by using the brms package as it handles a lot of this stuff for you. brms let’s you set standard error on observations, but since you do not have the actual sigmas of the data, I think you might as well weight each timepoint by sqrt(N_observations) (the standard error of an estimate is proportional to square root of the number of observations).