Mean by groups

Mayank_Nautiyal · July 20, 2017, 2:32pm

What would be the best way to do a mean by groups in Rstan ? Data looks something like:

Groups X
1 10
1 20
1 23
2 15
2 17
3 20
3 25

bgoodri · July 20, 2017, 5:52pm

data {
  int<lower=0> N;  // number of observations
  int<lower=0> J;   // number of groups
  int<lower=1,upper=J> id[N]; // group membership indicators
  vector[N] X; // data
}
parameters {
  vector[J] eps;
  real mu_global;
  real<lower=0> sigma_mu;
  real<lower=0> sigma_X;
}
transformed parameters {
  vector[J] mu = mu_global + sigma_mu * eps;
}
model {
  target += normal_lpdf(X | mu[id], sigma_X);
  target += normal_lpdf(eps | 0, 1); // implies mu ~ normal(mu_global, sigma_mu)
  // priors for mu_global, sigma_mu, and sigma_X
}

Although for a model this simple, you could do it with the stan_lmer function in the rstanarm R package or the brm function in the brms R package.

Mayank_Nautiyal · July 21, 2017, 3:25pm

Thanks! Does this assume that all the groups have same number of observations ?

bgoodri · July 21, 2017, 3:33pm

No.

Mayank_Nautiyal · July 23, 2017, 5:54pm

It seems like the stan model code that you provided works better than stan_lmer. Output from stan_lmer does not return a shinystan ourput. Thanks.

bgoodri · July 23, 2017, 6:04pm

All of the model fitting functions in the rstanarm package work with ShinyStan; in fact, there are a number of enhanced posterior predictive checks that are possible when the model is estimated via rstanarm that are not directly possible when estimated with rstan. You are doing something wrong, but I am not sure what.

Mayank_Nautiyal · July 23, 2017, 6:25pm

model <- stan_lmer(X~ -1 +(1|Group), data = df)

I am doing a no intercept model here.

bgoodri · July 23, 2017, 6:45pm

And that causes launch_shinystan(model) to error? If so, it is a bug. However, it is usually not a good idea to remove the global intercept. The default prior on the group-specific deviations from the intercept is normal with a standard deviation that is unit exponential. If you are deviating from zero rather than from the the global intercept, that may well be unreasonably strong. Moreover, it is rare for an outcome to have a mean of zero by construction of the research design.

Mayank_Nautiyal · July 23, 2017, 7:48pm

The stan_lmer model eventually converges but gives the following error for inital 3 chains (out of 4)

Click the Refresh button to see progress of the chains
starting worker pid=9492 on localhost:11681 at 15:46:05.901
starting worker pid=14348 on localhost:11681 at 15:46:06.289
starting worker pid=12188 on localhost:11681 at 15:46:06.673
starting worker pid=13296 on localhost:11681 at 15:46:07.052
Error in new_CppObject_xp(fields$.module, fields$.pointer, …) :
no valid constructor available for the argument list
trying deprecated constructor; please alert package maintainer

SAMPLING FOR MODEL ‘continuous’ NOW (CHAIN 1).

Gradient evaluation took 0.002 seconds
1000 transitions using 10 leapfrog steps per transition would take 20 seconds.
Adjust your expectations accordingly!

Iteration: 1 / 1000 [ 0%] (Warmup)
Error in new_CppObject_xp(fields$.module, fields$.pointer, …) :
no valid constructor available for the argument list
trying deprecated constructor; please alert package maintainer

SAMPLING FOR MODEL ‘continuous’ NOW (CHAIN 2).

Gradient evaluation took 0.003 seconds
1000 transitions using 10 leapfrog steps per transition would take 30 seconds.
Adjust your expectations accordingly!

bgoodri · July 23, 2017, 9:51pm

That isn’t an actual error. And it has nothing to do with ShinyStan.

Mayank_Nautiyal · July 23, 2017, 10:00pm

OK, do you happen to know what this error means ?

bgoodri · July 23, 2017, 11:21pm

It means you are using rstan 2.16.x to estimate a model that was built with rstanarm 2.15.x. For the 2.16 release of Stan the preferred constructor of the model object changed so that it would take three arguments instead of two, so that any model which did pseduo-random number generation in the transformed data block would yield the same realizations across the chains before striding each chain by a function of the chain ID so that the realizations of the parameters would be independent. The Stan programs in rstanarm (like over 99% of Stan programs) do not do pseudo-random number generation in the transformed data block, so it makes little difference whether the newer three-argument constructor or the older two-argument constructor is used. But rstan 2.16.x first tries the three-argument constructor and upon finding that it does not exist for models such as those in rstanarm 2.15.x, falls back to the two-argument constructor with a message saying “trying deprecated constructor”. It then samples the same way as it did in 2.15.x, 2.14.x, etc.

huxxx657 · October 27, 2019, 8:03pm

What if the data looks like this
Group X
1 10 20 30
2 15 25
3 18 30 55
what’s the best way to do it in Rstan?

Max_Mantei · October 27, 2019, 8:08pm

Hi! Welcome to the Stan discourse! :)

I guess the easiest way is to transform the data to the format seen in the first post - concatenate the X values and keep the group indices. Then you can use rstanarm or brms or Rstan as described above.

Cheers!

MegBallard · October 31, 2019, 12:45am

Yes,
Max_Mantei is correct.

In terms that are often used in R when talking about data structures, you usually want your data in a long format (though, there are some occasions where that is not true). So the original poster had data in long, your data is specified in wide format.

depending on how much data you are dealing with there are lots of methods to convert your data from one format to the other. Search for tidy data, long, wide or something similar and you’ll find plenty of techniques.

Topic		Replies	Views
Difference in posterior distributions for group effects for brms and hand coded stan model Modeling rstan , brms	2	941	February 17, 2022
Infer mean and std of four separate groups, need help indexing data vector Modeling specification	9	437	June 29, 2022
Stan does not converge Modeling rstan	7	70	May 7, 2025
Fitting a multi-group Rasch Model (IRT) Modeling	2	1200	February 11, 2020
Is there an equivalent of make_standata from brms for rstanarm? rstanarm	2	369	August 22, 2023

Mean by groups

Related topics