# Choosing categories

I have a model in mind but I am not sure how to write it.

Each unit i \in 1..5 gets discrete value g_i \in 1..3 with G_j \sim \text{Categorical}. Then for each i a h_i \in 1..4 is selected randomly with H_k|G_{j} \sim \text{Categorical}_{j}. Then Y_i|H_k \sim \text{Normal}_k.

It doesn’t work because, I think,

1. I’m probably writing it wrong in the first place.
2. It involves discrete choices so I need to use special tricks from 5.3 Summing out the responsibility parameter | Stan User’s Guide.

But I am a beginner so I would like to ask for some help. How can I write the model I described?

data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> h;
int g[N];
int h[N];
vector[N] y;
}
parameters {
vector[K] alpha;
real<lower=0> sigma;
simplex[K] theta;
simplex[J] phi[K];
vector[K] v;
vector[J] w;
}
model {
// Assign a mean for each group.
for (j in 1:J) {
alpha[j] ~ normal(0, sigma);
}

for (k in 1:K) {
phi[k] ~ dirichlet(w);
}

for (n in 1:N) {

// Choose a group.
g[n] ~ categorical(theta);

// Choose a group.
h[n] ~ categorical(phi[g[n]]);

// Distribution centered at the group mean.
y[n] ~ normal(alpha[h[n]], sigma);
}
}


{
"N": 5,
"K" : 3,
"J" : 4,
"g": [1,2,3,1,2],
"h": [1,2,3,4,1],
"y": [9,19,30,11,21]
}



If both g and h are observed then you don’t need to sum over discrete choices. Those tricks are only needed for inferring unknown discrete quantities.
Your model looks correct but needs priors for w or v. Actually, v doesn’t do anything so it’s totally unconstrained by the model.

I guess it’s saying there’s an out-of-bounds access somewhere?

data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> J;
int g[N];
int h[N];
vector[N] y;
}
parameters {
vector[K] alpha;
real<lower=0> sigma;
simplex[K] theta;
simplex[J] phi[K];
vector[K] v;
vector[J] w;
}
model {
// Assign a mean for each group.
for (j in 1:J) {
alpha[j] ~ normal(0, sigma);
}

for (j in 1:J) {
w[j] ~ exponential(1);
}

for (k in 1:K) {
phi[k] ~ dirichlet(w);
}

for (n in 1:N) {

// Choose a group.
g[n] ~ categorical(theta);

// Choose a group.
h[n] ~ categorical(phi[g[n]]);

// Distribution centered at the group mean.
y[n] ~ normal(alpha[h[n]], sigma);
}
}

{
"N": 5,
"K" : 3,
"J" : 4,
"g": [1,2,3,1,2],
"h": [1,2,3,4,1],
"y": [9,19,30,11,21]
}


program: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h:408: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Matrix<double, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = double; Eigen::Index = long int]: Assertion index >= 0 && index < size()' failed.
zsh: abort (core dumped)  ./program sample data file=data.json


Exception: vector[uni] indexing: accessing element out of range. index 4 out of range; expecting index to be between 1 and 3 (in ‘bounds.stan’, line 20, column 8 to column 36)

The declared size of alpha is wrong.

 parameters {
-  vector[K] alpha;
+  vector[J] alpha;


Also note that w needs a lower bound, and v is still unused.

-  vector[K] v;
-  vector[J] w;
+  vector<lower=0>[J] w;

2 Likes

I expected sigma_across to be ~10, sigma_within to be ~1, and the alphas to be 10,20,30,40. But the results are very different.

data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> J;
int g[N];
int h[N];
vector[N] y;
}
parameters {
vector[J] alpha;
real<lower=0> sigma_within;
real<lower=0> sigma_across;
simplex[K] theta;
simplex[J] phi[K];
vector<lower=0>[J] w;
}
model {
sigma_across ~ exponential(10);
sigma_within ~ exponential(10);

// Assign a mean for each group.
for (j in 1:J) {
alpha[j] ~ normal(0, sigma_across);
}

for (j in 1:J) {
w[j] ~ exponential(10);
}

for (k in 1:K) {
phi[k] ~ dirichlet(w);
}

for (n in 1:N) {

// Choose a group.
g[n] ~ categorical(theta);

// Choose a group.
h[n] ~ categorical(phi[g[n]]);

// Distribution centered at the group mean.
y[n] ~ normal(alpha[h[n]], sigma_within);
}
}

 Row │ variable       mean           eltype
│ Symbol         Float64        DataType
─────┼────────────────────────────────────────
1 │ lp__           -164.212       Float64
2 │ accept_stat__     0.814788    Float64
3 │ stepsize__        0.0549356   Float64
4 │ treedepth__       5.461       Int64
5 │ n_leapfrog__     57.503       Int64
6 │ divergent__       0.174       Int64
7 │ energy__        174.419       Float64
8 │ alpha.1           0.0122528   Float64
9 │ alpha.2           0.0057851   Float64
10 │ alpha.3           0.0200274   Float64
11 │ alpha.4           0.00494958  Float64
12 │ sigma_within      6.65023     Float64
13 │ sigma_across      0.0866053   Float64
14 │ theta.1           0.38032     Float64
15 │ theta.2           0.363308    Float64
16 │ theta.3           0.256371    Float64
17 │ phi.1.1           0.466454    Float64
18 │ phi.2.1           0.456802    Float64
19 │ phi.3.1           0.105254    Float64
20 │ phi.1.2           0.0420757   Float64
21 │ phi.2.2           0.432637    Float64
22 │ phi.3.2           0.0769598   Float64
23 │ phi.1.3           0.0544648   Float64
24 │ phi.2.3           0.0559648   Float64
25 │ phi.3.3           0.733882    Float64
26 │ phi.1.4           0.437006    Float64
27 │ phi.2.4           0.0545955   Float64
28 │ phi.3.4           0.0839046   Float64
29 │ w.1               0.179877    Float64
30 │ w.2               0.114077    Float64
31 │ w.3               0.139717    Float64
32 │ w.4               0.141734    Float64



17% divergent transitions is very bad. The sampler is not able to explore the posterior distribution adequately. I wouldn’t trust those results.

The exponential prior for w concentrates near zero and that’s where phi's Dirichlet prior has very difficult geometry, especially if you don’t have much data. Unless you have a good reason to believe w` is near zero you should consider a different prior, for example log-normal.

1 Like

That worked, thank you.