Choosing categories

I have a model in mind but I am not sure how to write it.

Each unit i \in 1..5 gets discrete value g_i \in 1..3 with G_j \sim \text{Categorical}. Then for each i a h_i \in 1..4 is selected randomly with H_k|G_{j} \sim \text{Categorical}_{j}. Then Y_i|H_k \sim \text{Normal}_k.

It doesn’t work because, I think,

  1. I’m probably writing it wrong in the first place.
  2. It involves discrete choices so I need to use special tricks from 5.3 Summing out the responsibility parameter | Stan User’s Guide.

But I am a beginner so I would like to ask for some help. How can I write the model I described?

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> h;
  int g[N];
  int h[N];
  vector[N] y;
}
parameters {
  vector[K] alpha;
  real<lower=0> sigma;
  simplex[K] theta;
  simplex[J] phi[K];
  vector[K] v;
  vector[J] w;
}
model {
    // Assign a mean for each group.
    for (j in 1:J) {
        alpha[j] ~ normal(0, sigma);
    }


    for (k in 1:K) {
      phi[k] ~ dirichlet(w);
    }

    for (n in 1:N) {

        // Choose a group.
        g[n] ~ categorical(theta);

        // Choose a group.
        h[n] ~ categorical(phi[g[n]]);

        // Distribution centered at the group mean.
        y[n] ~ normal(alpha[h[n]], sigma);
    }
}


{
    "N": 5,
    "K" : 3,
    "J" : 4,
    "g": [1,2,3,1,2],
    "h": [1,2,3,4,1],
    "y": [9,19,30,11,21]
}

If both g and h are observed then you don’t need to sum over discrete choices. Those tricks are only needed for inferring unknown discrete quantities.
Your model looks correct but needs priors for w or v. Actually, v doesn’t do anything so it’s totally unconstrained by the model.

I guess it’s saying there’s an out-of-bounds access somewhere?

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> J;
  int g[N];
  int h[N];
  vector[N] y;
}
parameters {
  vector[K] alpha;
  real<lower=0> sigma;
  simplex[K] theta;
  simplex[J] phi[K];
  vector[K] v;
  vector[J] w;
}
model {
    // Assign a mean for each group.
    for (j in 1:J) {
        alpha[j] ~ normal(0, sigma);
    }

    for (j in 1:J) {
      w[j] ~ exponential(1);
    }

    for (k in 1:K) {
      phi[k] ~ dirichlet(w);
    }

    for (n in 1:N) {

        // Choose a group.
        g[n] ~ categorical(theta);

        // Choose a group.
        h[n] ~ categorical(phi[g[n]]);

        // Distribution centered at the group mean.
        y[n] ~ normal(alpha[h[n]], sigma);
    }
}
{
    "N": 5,
    "K" : 3,
    "J" : 4,
    "g": [1,2,3,1,2],
    "h": [1,2,3,4,1],
    "y": [9,19,30,11,21]
}

program: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h:408: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Matrix<double, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = double; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.
zsh: abort (core dumped)  ./program sample data file=data.json

The newest Stan gives a bit more information:

Exception: vector[uni] indexing: accessing element out of range. index 4 out of range; expecting index to be between 1 and 3 (in ‘bounds.stan’, line 20, column 8 to column 36)

The declared size of alpha is wrong.

 parameters {
-  vector[K] alpha;
+  vector[J] alpha;

Also note that w needs a lower bound, and v is still unused.

-  vector[K] v;
-  vector[J] w;
+  vector<lower=0>[J] w;
2 Likes

I expected sigma_across to be ~10, sigma_within to be ~1, and the alphas to be 10,20,30,40. But the results are very different.

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> J;
  int g[N];
  int h[N];
  vector[N] y;
}
parameters {
  vector[J] alpha;
  real<lower=0> sigma_within;
  real<lower=0> sigma_across;
  simplex[K] theta;
  simplex[J] phi[K];
  vector<lower=0>[J] w;
}
model {
  sigma_across ~ exponential(10);
  sigma_within ~ exponential(10);

  // Assign a mean for each group.
  for (j in 1:J) {
      alpha[j] ~ normal(0, sigma_across);
  }

  for (j in 1:J) {
    w[j] ~ exponential(10);
  }

  for (k in 1:K) {
    phi[k] ~ dirichlet(w);
  }

  for (n in 1:N) {

    // Choose a group.
    g[n] ~ categorical(theta);

    // Choose a group.
    h[n] ~ categorical(phi[g[n]]);

    // Distribution centered at the group mean.
    y[n] ~ normal(alpha[h[n]], sigma_within);
  }
}
 Row │ variable       mean           eltype   
     │ Symbol         Float64        DataType 
─────┼────────────────────────────────────────
   1 │ lp__           -164.212       Float64
   2 │ accept_stat__     0.814788    Float64
   3 │ stepsize__        0.0549356   Float64
   4 │ treedepth__       5.461       Int64
   5 │ n_leapfrog__     57.503       Int64
   6 │ divergent__       0.174       Int64
   7 │ energy__        174.419       Float64
   8 │ alpha.1           0.0122528   Float64
   9 │ alpha.2           0.0057851   Float64
  10 │ alpha.3           0.0200274   Float64
  11 │ alpha.4           0.00494958  Float64
  12 │ sigma_within      6.65023     Float64
  13 │ sigma_across      0.0866053   Float64
  14 │ theta.1           0.38032     Float64
  15 │ theta.2           0.363308    Float64
  16 │ theta.3           0.256371    Float64
  17 │ phi.1.1           0.466454    Float64
  18 │ phi.2.1           0.456802    Float64
  19 │ phi.3.1           0.105254    Float64
  20 │ phi.1.2           0.0420757   Float64
  21 │ phi.2.2           0.432637    Float64
  22 │ phi.3.2           0.0769598   Float64
  23 │ phi.1.3           0.0544648   Float64
  24 │ phi.2.3           0.0559648   Float64
  25 │ phi.3.3           0.733882    Float64
  26 │ phi.1.4           0.437006    Float64
  27 │ phi.2.4           0.0545955   Float64
  28 │ phi.3.4           0.0839046   Float64
  29 │ w.1               0.179877    Float64
  30 │ w.2               0.114077    Float64
  31 │ w.3               0.139717    Float64
  32 │ w.4               0.141734    Float64

17% divergent transitions is very bad. The sampler is not able to explore the posterior distribution adequately. I wouldn’t trust those results.

The exponential prior for w concentrates near zero and that’s where phi's Dirichlet prior has very difficult geometry, especially if you don’t have much data. Unless you have a good reason to believe w is near zero you should consider a different prior, for example log-normal.

1 Like

That worked, thank you.