Gaussian quadrature inside stan?

Hi everybody,

My question is how can i do gaussian quadrature?

I am doing IRT-like model.

Since i have over 10000 subjects, I need to do marginalize the likelihood

so that the number of subjects would not be a problem.

And person parameters are also nuisance parameters…

I hear it’s doable with ODE solver,

But I am new to stan and I wonder if somebody can help me with it…

What I need to do is with th ~ N(0,1), integrate f(x)p(x) dx with p(x) normal distribution

Thanks in advance

Gaussian quadrature is not exposed in Stan but the integrate_1d function does tanh-sinh quadrature which is better for a wider class of one-dimensional integrals.

You can also precalculate the nodes and weights in the software of your choice, and then provide them as data.

1 Like

Well the problem is the function is also a function of other parameters… I am wondering integrate 1d can manage those functions

2019년 8월 2일 (금) 오후 3:33, Tamas K Papp via Stan mc_stan@discoursemail.com님이 작성:

Sorry, I don’t understand. Given nodes and weights, Gaussian quadrature is just a weighted sum (or a dot product). When integrating

g(\lambda) = \int_{-\infty}^{\infty} f(x, \lambda) p(x) dx \approx \sum_i f(x_i, \lambda) w_i

where p(x) is the density of N(0, 1), you are free to calculate f in any arbitrary way at the fixed x_i.

Yes in your formula f has another variable lambda and in my case it is not data but it is a model parameter so i have to do integrate at hand to incorporate the fact that the integration depends on model parameters…

2019년 8월 2일 (금) 오후 4:05, Tamas K Papp via Stan mc_stan@discoursemail.com님이 작성:

I don’t see why \lambda cannot be a parameter.

Can you provide some Stan code (without the integral calculation part, of course) to make the discussion more concrete?

1 Like

I have yet to translate JAGS to STAN but in case you understand JAGS,

My initial model goes like this

model {
  for (n in 1:N) {
    for (i in 1:I) {
      for (k in 1:K) { w[n, i, k] <- (1 - alpha[n, k])*Q[i, k] }
      IRT[n, i] <- 1/(1+exp(-1.7*(th[n] + easy[i])))
      p[n, i] <- pai_star[i] * prod(pow(r_star[i, 1:K], w[n, i, 1:K])) * IRT[n, i]
      Y[n, i] ~ dbern(p[n, i]) 
    }
    for (k in 1:K) { alpha[n, k] <- all.patterns[c[n], k]}
    c[n] ~ dcat(pai[1:C]) }
    
  pai[1:C] ~ ddirch(delta[1:C]) 


  for (n in 1:N) {
    th[n] ~ dnorm(0, 1)
  }      
  for (i in 1:I) {
    pai_star[i] ~ dbeta(a.pai_star, b.pai_star)
    easy[i] ~ dunif(-3, 3)
    for (k in 1:K) {
      r_star[i, k] <- xr_star[i, k] * Q[i, k]
      xr_star[i, k] ~ dbeta(a.xr_star, b.xr_star) }}
}

This is called “reparameterised unified model(RUM)”, aka Fusion model if you are interested. And I want to marginalised the likelihood over th so that too many i’s(subjects) would not a problem.

“Marginalization” here I mean something when people do for frequentist Mixed effect model, people do not estimate individual group effect but just variance given that the individual effects are normally distributed.

So for my model,
IRT[n, i] <- 1/(1+exp(-1.7*(th[n] + easy[i]))) should be

\int_\theta \frac{1}{(1+exp(-1.7 \times (\theta + easy[i])))}d\theta

Actually the model above also has individual alpha[n, k] (with n is an individual index) but I also need to marginalise that as in (https://journals.sagepub.com/doi/10.1177/0146621617707511)

most of the times, delta[1:C] is just a vector of 1’s and all.patterns is a vector of \{0,1\}^K.

PS. distribution of \theta can be parameterised like \theta \sim \mathcal{N}(0, \sigma_\theta^2) and logistic function can be approximated by normal cdf.

I don’t know much JAGS but I think I can translate your model to Stan.
First, let’s just write the type declarations and create some fake Stan code that closely resembles the original JAGS.

data {
  int<lower=1> N;
  int<lower=1> I;
  int<lower=1> K;
  int<lower=1> C;
  vector<lower=0>[C] delta;
  real<lower=0> a_pai_star;
  real<lower=0> b_pai_star;
  real<lower=0> a_xr_star;
  real<lower=0> b_xr_star;
  int<lower=0,upper=1> Q[I,K];
  int<lower=0,upper=1> all_patterns[C,K];
  int<lower=0,upper=1> Y[N,I];
} parameters {
  simplex[C] pai;
  vector[N] th;
  real<lower=0,upper=1> pai_star[I];
  real<lower=-3,upper=3> easy[I];
  real<lower=0,upper=1> xr_star[I, K];
  int<lower=1,upper=C> c[N];
} transformed parameters {
  real r_star[I,K] = xr_star .* Q;
  int alpha[N,K] = all_patterns[c];
  int w[N,I,K] = (1 .- alpha) .* Q;
  real IRT[N,I];
  real p[N,I];
  for (n in 1:N) for (i in 1:I) {
    IRT[n,i] = inv_logit(1.7*(th[n] + easy[i]));
    p[n,i] = pai_star[i] * IRT[n,i];
    for (k in 1:K)
      p[n,i] *= pow(r_star[i,k],w[n,i,k]);
  }
} model {
  c ~ categorical(pai);
  pai ~ dirichlet(delta);
  th ~ std_normal();
  pai_star ~ beta(a_pai_star, b_pai_star);
  xr_star ~ beta(a_xr_star, b_xr_star);
  easy ~ uniform(-3, 3);
  Y ~ bernoulli(p);
}

This will not compile because Stan does not allow discrete parameters and there are some vectorization mistakes but it should be straightforward to compare with the JAGS code. Next step is to merge model and transformed parameters blocks into a single model block and move all variables into the innermost loop possible.

model {
  for (n in 1:N) {
    for (i in 1:I) {
      real r_star[K] = xr_star[i] .* Q[i];
      int alpha[K] = all_patterns[c[n]];
      int w[K] = (1 .- alpha) .* Q[i];
      real IRT = inv_logit(1.7*(th[n] + easy[i]));
      real p = pai_star[i] * IRT;
      for (k in 1:K)
        p *= pow(r_star[k], w[k]);
      Y[n,i] ~ bernoulli(p);
    }
    c[n] ~ categorical(pai);
  }
  pai ~ dirichlet(delta);
  th ~ std_normal();
  pai_star ~ beta(a_pai_star, b_pai_star);
  xr_star ~ beta(a_xr_star, b_xr_star);
  easy ~ uniform(-3, 3);
}

Then simplify the code and eliminate some variables. When Q=0 you have pow(0,0) which I assume evaluates to 1 in JAGS. Also here I replace the sampling statements with target += in preparation for the last step.

model {
  for (n in 1:N) {
    for (i in 1:I) {
      real IRT = inv_logit(1.7*(th[n] + easy[i]));
      real p = pai_star[i] * IRT;
      for (k in 1:K)
        if (all_patterns[c[n],k] == 0 && Q[i,k] == 1)
          p *= xr_star[i,k];
      target += bernoulli_lpmf(Y[n,i]|p);
    }
    target += categorical_lpmf(c[n]|pai);
  }
  pai ~ dirichlet(delta);
  th ~ std_normal();
  pai_star ~ beta(a_pai_star, b_pai_star);
  xr_star ~ beta(a_xr_star, b_xr_star);
  easy ~ uniform(-3, 3);
}

As I mentioned, Stan cannot sample discrete parameters. Therefore we must marginalize out c. The marginalized model is

data {
  int<lower=1> N;
  int<lower=1> I;
  int<lower=1> K;
  int<lower=1> C;
  vector<lower=0>[C] delta;
  real<lower=0> a_pai_star;
  real<lower=0> b_pai_star;
  real<lower=0> a_xr_star;
  real<lower=0> b_xr_star;
  int<lower=0,upper=1> Q[I,K];
  int<lower=0,upper=1> all_patterns[C,K];
  int<lower=0,upper=1> Y[N,I];
} parameters {
  simplex[C] pai;
  vector[N] th;
  real<lower=0,upper=1> pai_star[I];
  real<lower=-3,upper=3> easy[I];
  real<lower=0,upper=1> xr_star[I,K];
} model {
  for (n in 1:N) {
    real lp1[C];
    for (c in 1:C) {
      real lp2[I+1];
      for (i in 1:I) {
        real IRT = inv_logit(1.7*(th[n] + easy[i]));
        real p = pai_star[i] * IRT;
        for (k in 1:K) if (all_patterns[c,k] == 0 && Q[i,k] == 1)
          p *= xr_star[i,k];
        lp2[i] = bernoulli_lpmf(Y[n,i]|p);
      }
      lp2[I+1] = categorical_lpmf(c|pai);
      lp1[c] = sum(lp2);
    }
    target += log_sum_exp(lp1);
  }
  pai ~ dirichlet(delta);
  th ~ std_normal();
  pai_star ~ beta(a_pai_star, b_pai_star);
  for (i in 1:I)
    xr_star[i] ~ beta(a_xr_star, b_xr_star);
  easy ~ uniform(-3, 3);
}

The simplification removed alpha from the output but I don’t think you can marginalize th.

2 Likes

I may be missing something, but I am assuming that you want th not as a parameter, but as an integrand in the expression

multiplied by the standard normal density in th.

Then, as I said, you want to pass in th_weights and th_nodes as data for Gaussian quadrature (note that the standard normal pdf does not exactly coincide with the Gauss-Hermite weight function, there is a scale adjustment), and simply evaluate

dot_product(1/(1+exp(-1.7*(th_nodes + easy[i]))), th_weights)

which I think should vectorize.

Wow, I think this would work.

I could not see this way, thank you!

1 Like

Very minor, but you could also simplify that further to:

dot_product(inv_logit(-1.7*(th_nodes + easy[i])), th_weights)

Since the inv_logit function is also built to handle extreme values better than hand-coding the math

2 Likes