Prior on sum of parameters - what about Jacobian?

Hello,

I would like to insert information on some of the proportion values.

For example

simplex[K] p;
y ~ normal(p[1] * a + p[2] * b + p[3] * c ... , sigma);
p ~ dirichlet();
p[1] + p[2] ~ normal(0.6, 0.0001);

What would be the Jacobian here?

The reason why I would like to give prior is that with the same code I should be able to to give arbitrary sum values to arbitrary groups of “p” (with input mapping array), and arbitrary K.

Thanks

Try this

parameters{
  simplex[K] p;
} 
transformed parameters{
  real sum_p12;
  sum_p12 = p[1] + p[2];
}
model {
  y ~ normal(p[1] * a + p[2] * b + p[3] * c ... , sigma);
  p ~ dirichlet();
  sum_p ~ normal(0.6, 0.0001);
}

Mixing is going to be pretty poor though as you effectively have a redundant parameter.

Thanks Julian,

What would the redundant parameter?

I don’t understand what you’re trying to do here. It helps immensely if people tell us what’s data and what’s a parameter and what the shape of everything is (vector, matrix, etc.). We’re good guessers, but it’s a constant effort.

Why do you want two priors on p here?

Also, you don’t really ever want to use normal(0.6, 0.0001) in Stan as that constrains the value to a very small window around 0.6—better to take a raw variable with a unit scale and rescale it if you really need this prior.

Hello @Bob_Carpenter ,

yes I always try to keep at minimum for not overwhelm. Here a better part of the (bigger) original model.

basically sometime I have information about groups of pi, for an higher hierarchy. I known that the proportion of higher hierarchy sub cell types, meaning that the sum off immune cell types makes the 50% of the mixture.
Therefore I want to constrain their sum to be what I want.

Very important… I want to do it in a way that the same code can deal with any grouping so this grouping/constrain/prior cannot be hard coded. That’s why I though to put a harsh prior on their sum.

data{
  int G;                                                           // Number of marker genes
  int P;                                                           // Number of cell types
  int S;                                                           // Number of mix samples 
  
  vector[G] m[S];                                                  // Mix samples matrix

}
parameters{
  simplex[P] pi[S];                                                // Matrix of cell type proportions
  real<lower=0> sigma;                           // Standard deviation of predicted mixed expression
  vector[P] expr[G];                                            // Predicted marker signature

}
model{
  sigma ~ normal(0,0.5);  
   
  for(s in 1:S) alpha[group[s]] ~ gamma(1.1,0.1);              // Prior to alpha
  for(s in 1:S) pi[s] ~ dirichlet(alpha[group[s]]);                // Prior to a
  for(s in 1:S) for(g in 1:G) {
    m[s,g] ~ student_t(2, log_sum_exp(expr[g] + log(pi[s])), sigma);               // Calculating probability of inferred mixed gene expression
  }
}

You don’t want to try to make things deterministic by imposing a very strong prior. It’s better to code the constraint up directly.

As I keep trying to tell people on the list, it’s hard for me to go back through chains of posts and put information together. I read a lot of these every day and they all run together after a while.

The manual explains how to impose summation constraints. If you want a parameter vector alpha to sum to sum_alpha, then you can do this:

parameters {
  vector[K - 1] alpha_raw;

transformed parameters {
  vector[K] alpha;
  alpha[1:(K - 1)] = alpha_raw;
  alpha[K] = alpha_sum - sum(alpha_raw);

And then you’re guaranted that sum(alpha) = alpha_sum by construction.

Is there a reason to do that rather than using simplex? It seems that the alphas are constrained to be positive.

parameters{
  simplex[K] alpha_raw;
}
transformed_parameters{
  vector[K] alpha;
  alpha = alpha_sum*alpha;
}

Also, is the block assignment new? I didn’t see it in the manual and I think I tried it at one point and got an error.

Thanks Bob and Aaron,

The @aaronjg solution seems quite elegant

The problem is that I cannot programmatically create groups of proportions of different length.

For example for the proportion vector

K=10;
alpha[K];

I cannot group alpha arbitrarily, for example

alpha = list(
simplex[2],
simplex[5],
simplex[3]
)

and then do

y ~ normal(alpha[1,1] * SUM_P_GROUP_1 * a + alpha[1,2] * SUM_P_GROUP_1 * b + alpha[2,1] * SUM_P_GROUP_2 * c ... , sigma);

As far as I know lists are not available in Stan.

In that case, I would look at the manual to see how the simplex transformation is implemented, and then implement it yourself. So you could have a new unconstrained vector of length (K-Groups) and then use the broken stick transformation to get the constrained scaled simplex.