Mean by groups

What would be the best way to do a mean by groups in Rstan ? Data looks something like:

Groups X
1 10
1 20
1 23
2 15
2 17
3 20
3 25

data {
  int<lower=0> N;  // number of observations
  int<lower=0> J;   // number of groups
  int<lower=1,upper=J> id[N]; // group membership indicators
  vector[N] X; // data
}
parameters {
  vector[J] eps;
  real mu_global;
  real<lower=0> sigma_mu;
  real<lower=0> sigma_X;
}
transformed parameters {
  vector[J] mu = mu_global + sigma_mu * eps;
}
model {
  target += normal_lpdf(X | mu[id], sigma_X);
  target += normal_lpdf(eps | 0, 1); // implies mu ~ normal(mu_global, sigma_mu)
  // priors for mu_global, sigma_mu, and sigma_X
}

Although for a model this simple, you could do it with the stan_lmer function in the rstanarm R package or the brm function in the brms R package.

Thanks! Does this assume that all the groups have same number of observations ?

No.

It seems like the stan model code that you provided works better than stan_lmer. Output from stan_lmer does not return a shinystan ourput. Thanks.

All of the model fitting functions in the rstanarm package work with ShinyStan; in fact, there are a number of enhanced posterior predictive checks that are possible when the model is estimated via rstanarm that are not directly possible when estimated with rstan. You are doing something wrong, but I am not sure what.

model <- stan_lmer(X~ -1 +(1|Group), data = df)

I am doing a no intercept model here.

And that causes launch_shinystan(model) to error? If so, it is a bug. However, it is usually not a good idea to remove the global intercept. The default prior on the group-specific deviations from the intercept is normal with a standard deviation that is unit exponential. If you are deviating from zero rather than from the the global intercept, that may well be unreasonably strong. Moreover, it is rare for an outcome to have a mean of zero by construction of the research design.

The stan_lmer model eventually converges but gives the following error for inital 3 chains (out of 4)

Click the Refresh button to see progress of the chains
starting worker pid=9492 on localhost:11681 at 15:46:05.901
starting worker pid=14348 on localhost:11681 at 15:46:06.289
starting worker pid=12188 on localhost:11681 at 15:46:06.673
starting worker pid=13296 on localhost:11681 at 15:46:07.052
Error in new_CppObject_xp(fields$.module, fields$.pointer, …) :
no valid constructor available for the argument list
trying deprecated constructor; please alert package maintainer

SAMPLING FOR MODEL ‘continuous’ NOW (CHAIN 1).

Gradient evaluation took 0.002 seconds
1000 transitions using 10 leapfrog steps per transition would take 20 seconds.
Adjust your expectations accordingly!

Iteration: 1 / 1000 [ 0%] (Warmup)
Error in new_CppObject_xp(fields$.module, fields$.pointer, …) :
no valid constructor available for the argument list
trying deprecated constructor; please alert package maintainer

SAMPLING FOR MODEL ‘continuous’ NOW (CHAIN 2).

Gradient evaluation took 0.003 seconds
1000 transitions using 10 leapfrog steps per transition would take 30 seconds.
Adjust your expectations accordingly!

That isn’t an actual error. And it has nothing to do with ShinyStan.

OK, do you happen to know what this error means ?

It means you are using rstan 2.16.x to estimate a model that was built with rstanarm 2.15.x. For the 2.16 release of Stan the preferred constructor of the model object changed so that it would take three arguments instead of two, so that any model which did pseduo-random number generation in the transformed data block would yield the same realizations across the chains before striding each chain by a function of the chain ID so that the realizations of the parameters would be independent. The Stan programs in rstanarm (like over 99% of Stan programs) do not do pseudo-random number generation in the transformed data block, so it makes little difference whether the newer three-argument constructor or the older two-argument constructor is used. But rstan 2.16.x first tries the three-argument constructor and upon finding that it does not exist for models such as those in rstanarm 2.15.x, falls back to the two-argument constructor with a message saying “trying deprecated constructor”. It then samples the same way as it did in 2.15.x, 2.14.x, etc.

What if the data looks like this
Group X
1 10 20 30
2 15 25
3 18 30 55
what’s the best way to do it in Rstan?

Hi! Welcome to the Stan discourse! :)

I guess the easiest way is to transform the data to the format seen in the first post - concatenate the X values and keep the group indices. Then you can use rstanarm or brms or Rstan as described above.

Cheers!

1 Like

Yes,
Max_Mantei
is correct.

In terms that are often used in R when talking about data structures, you usually want your data in a long format (though, there are some occasions where that is not true). So the original poster had data in long, your data is specified in wide format.

depending on how much data you are dealing with there are lots of methods to convert your data from one format to the other. Search for tidy data, long, wide or something similar and you’ll find plenty of techniques.

1 Like