Hi all!

I have a question related to the application of Stan results to downstream information theoretic metrics including entropy, conditional entropy, and information gain.

Here’s the background - I have a feature vector of response variables y that is say a measurement of an individual’s stature and my target or conditioning variable say x is the country that individual comes from. My ultimate goal is to learn the information gain i.e., how much more certain are we of an individual’s country of origin given their stature.

To be precise IG = H(x) - H(x|y) or the entropy of the target variable minus the conditional entropy of x given y. Given stature is continuous, entropy is equal to -\int_{x} f(x)logf(x) dx and therefore the conditional entropy is -\int_{x,y} f(x,y)logf(x|y) dxdy.

Math aside, does anyone have any suggestions on how the Stan log probability density can be used here. Either extracting the results OR possibly embedded in generated quantities?

Here I include a VERY general model that models stature as a normal distribution with a mean function and sd function. Further, X here is not the target variable from above, it is a covariate age. I assume I’d have to include the target somewhere. Also, my ultimate goal is to also get IG based on a MVN model of more than one trait (i.e., stature and weight).

```
data{
int N; // # of individuals
vector y[N]; // vector of responses per individual
//predictors
real X[N]; //age
}
parameters{
real a;
real r;
real b;
real s_scale;
real kappa;
}
transformed parameters{
real mu[N];
real Sigma[N];
for(i in 1:N){
mu[i] = a*X[i]^(1+b[k];
Sigma[i] = s_scale*(1+kappa*x[i]);
}
model{
a ~ normal(0,10);
r ~ normal(0,1);
b ~ normal(0,10);
kappa ~ normal(0,1);
s_scale ~ cauchy(0,5);
y ~ normal(mu, Sigma);
}
```