I want to use this model.
data {
int<lower=0> N;
vector[N] u;
real g[N];
int h[N];
vector[N] y;
}
parameters {
real alpha;
real phi;
real upsilon;
real kappa;
real<lower=0> sigma1;
real<lower=0> sigma2;
}
model {
sigma2 ~ lognormal(0,1);
phi ~ normal(0,1);
upsilon ~ normal(0,1);
kappa ~ normal(0,1);
alpha ~ normal(0,1);
for (n in 1:N) { g[n] ~ normal(0,1); }
for (n in 1:N) { h[n] ~ categorical_logit(phi * g[n] + upsilon * u[n]); }
for (n in 1:N) { y[n] ~ normal(alpha[h[n]] + kappa * u[n], sigma2); }
}
Everybody recommends to first generate data so I can see if the model is ok on assumed parameters, so I swapped the data
and parameters
blocks to get
data {
int<lower=0> N;
real alpha;
real phi;
real upsilon;
real kappa;
real<lower=0> sigma1;
real<lower=0> sigma2;
}
parameters {
vector[N] u;
real g[N];
int h[N];
vector[N] y;
}
model {
sigma2 ~ lognormal(0,1);
phi ~ normal(0,1);
upsilon ~ normal(0,1);
kappa ~ normal(0,1);
alpha ~ normal(0,1);
for (n in 1:N) { g[n] ~ normal(0,1); }
for (n in 1:N) { h[n] ~ categorical_logit(phi * g[n] + upsilon * u[n]); }
for (n in 1:N) { y[n] ~ normal(alpha[h[n]] + kappa * u[n], sigma2); }
}
but Stan isn’t willing to generate data for me to test with.
line 13, column 4 to column 13:
-------------------------------------------------
11: vector[N] u;
12: real g[N];
13: int h[N];
^
14: vector[N] y;
15: }
-------------------------------------------------
(Transformed) Parameters cannot be integers.
I tried generating the data with a linear continuous version instead
data {
int<lower=0> N;
real alpha;
real phi;
real upsilon;
real kappa;
real<lower=0> sigma1;
real<lower=0> sigma2;
}
parameters {
vector[N] u;
real g[N];
real h[N];
vector[N] y;
}
model {
sigma1 ~ lognormal(0,1);
sigma2 ~ lognormal(0,1);
phi ~ normal(0,1);
upsilon ~ normal(0,1);
kappa ~ normal(0,1);
alpha ~ normal(0,1);
for (n in 1:N) { g[n] ~ normal(0,1); }
for (n in 1:N) { h[n] ~ normal(phi * g[n] + upsilon * u[n], sigma1); }
for (n in 1:N) { y[n] ~ normal(alpha * h[n] + kappa * u[n], sigma2); }
}
which did generate some data, and then I converted the h
values into int
s by cutting the continuous values into bins. Then stan was able to sample from the int
model but it produced wrong answers
> summarize_stan(indir("analysis.comcsv"))
13×4 DataFrame
Row │ variable mean std eltype
│ Symbol Float64 Float64 DataType
─────┼────────────────────────────────────────────────────────
1 │ lp__ -2535.86 1.72457 Float64
2 │ accept_stat__ 0.918997 0.0958284 Float64
3 │ stepsize__ 0.0273539 3.12406e-17 Float64
4 │ treedepth__ 6.288 1.15948 Int64
5 │ n_leapfrog__ 103.196 37.6453 Int64
6 │ divergent__ 0.0 0.0 Int64
7 │ energy__ 2538.92 2.52855 Float64
8 │ alpha -0.0209184 0.0342002 Float64
9 │ phi -0.0857657 0.0631577 Float64
10 │ upsilon 0.000929568 8.91259e-5 Float64
11 │ kappa -0.99995 0.000101321 Float64
12 │ sigma1 2.05748 0.0459591 Float64
13 │ sigma2 2.25148 0.047969 Float64
> trueparams
Dict{String, Real} with 7 entries:
"sigma1" => 1.0
"upsilon" => 2.0
"N" => 1000
"kappa" => 1.0
"phi" => 0.0
"alpha" => -1.0
"sigma2" => 2.0
This whole thing was a lot of work, especially to get a wrong answer. My questions are:
- How can I generate test data for the original model? Can Stan do it?
- Is the original model ok? If I write code in some other language to generate data like that model, will Stan be able to sample from it correctly?