Sooo…
the aftermath of my previous topic made me discover the existence (and usefulness) of the Wallenius’ noncentral hypergeometric distribution to tackle the biased urn theory.
I ping @mathDR because he has some work done about it.
Is there any code available for this distribution?
Hey @fabio there isn’t a default distribution in Stan. I made a few attempts to code one up but landed on the following derivation. Wallenius__Noncentral_Hypergeometric_Distribution.pdf (131.8 KB)
Where the unnormalized probability mass function is
{
int c = size(m);
int numpoints = 100;
vector[c] vec_x = to_vector(x); // number of balls of each category that were drawn
vector[c] vec_m = to_vector(m); // number of balls of each category in the urn
real D = sum(omega .* (vec_m-vec_x));
real r = 2.5*D; // This should be optimized using Agner Fog's derivation
real dtau = 1./num_points;
vector[c] r_times_omega = r * omega;
vector[numpoints-1] return_val;
// Special Cases
if (d == 0) {
return 0.;
}
for (i in 1:numpoints-1)
{
real log_eye_dtau = log(i*dtau);
return_val[i] = (r*d-1.0)*log_eye_dtau +
sum(vec_x .* log1m_exp((r_times_omega)*log_eye_dtau));
}
return (log(r) + log(D) + log_sum_exp(return_val));
}
This could probably be optimized some more (and I would like to add a wallenius_rng
function, but haven’t needed it yet).
Please let me know if you find it useful and/or optimize it further! (Note, one could probably make use of the new newton_algebraic_solver
to find the optimal r
in the function, whereas I just use r = 2.5*D
)
Dan
Can you use the Fisher noncentral hypergeometric distribution?
@spinkey No. Fisher noncentral hypergeometric distribution generates another process. For the difference see Noncentral_hypergeometric_distributions. However, the two distribution could be of interest for me.
I noticed some typos in my writeup above. I will fix them then re-upload
I played around and got the integrate_1d to work for this distribution. No doubt that this is way slower than @mathDR approximation.
This is my first time using the integrate functionality so that was cool! I’m sure there’s a more numerically stable way to code it (@andrjohns or @bbbales2 may know a good way to use the xc
parameter here).
functions {
real wallenius_integral(real t, // Function argument
real xc, // Complement of function argument
// on the domain (defined later)
real[] theta, // parameters
real[] x_r, // data (real)
int[] x_i) { // data (integer)
real omega = theta[1];
real Dinv = 1 / theta[2];
int k = x_i[1];
int n = x_i[2];
return pow(1 - t^(omega * Dinv), k) * pow(1 - t^Dinv, n - k);
}
real wallenius_lpmf(int k, int n, int[] m) {
return -log1p(m[1]) - lbeta(m[1] - k + 1, k + 1) -
log1p(m[2]) - lbeta(m[2] - n + k + 1, n - k + 1);
}
}
data {
int<lower=0> N;
int y[N];
int n;
int m[2];
real tol;
}
transformed data {
real x_r[0];
int x_i[N, 2];
for (i in 1:N)
x_i[i] = {y[i], n};
}
parameters {
real<lower=0> odds;
}
model {
odds ~ exponential(1);
for (i in 1:N) {
real D = odds * (m[1] - y[i]) + (m[2] - n + y[i]);
y[i] ~ wallenius(n, m);
target += log(integrate_1d(wallenius_integral, 0, 1,
{odds, D},
x_r,
x_i[i],
tol));
}
}
generated quantities {
real out = odds / (1 + odds);
}
Here’s R code to generate data.
library(cmdstanr)
library(BiasedUrn)
fp <- file.path("../wallenius.stan")
mod <- cmdstan_model(fp, force_recompile = T)
# dWNCHypergeo(nrand, m1, m2, n, odds, precision=1E-7)
# m1: initial number of red balls in urn
# m2: initial number of white balls in urn
# n : total number of balls sampled
# odds: probability ratio of red over white
# number of realizations
N <- 10
m1 <- 1000
m2 <- 1000
n <- 200
prob <- 0.8
odds <- prob / (1 - prob)
y <- rWNCHypergeo(N, m1, m2, n, odds)
mod_out <- mod$sample(
data = list(N = N,
y = y,
n = n,
m = c(m1, m2),
tol = 0.1),
chains = 2,
init = 1,
adapt_delta = 0.8,
parallel_chains = 2,
iter_warmup = 200,
iter_sampling = 200
)
@andrjohns is there way to call the integrate_1d
in the lpmf function? I was getting complaints about the inputs being only data.
OMG this is AWESOME! I am going to use this to check my approximation and report back! Maybe with like ~ 250 points, the left hand rule will get pretty close?
That exponential prior is too strong in larger odds. I’ve changed to odds ~
odds ~ gamma(2, 0.1);
Here’s the multivariate version
functions {
real multi_wallenius_integral(
real t, // Function argument
real xc, // Complement of function argument
// on the domain (defined later)
real[] theta, // parameters
real[] x_r, // data (real)
int[] x_i) { // data (integer)
real Dinv = 1 / theta[1];
int Cp1 = num_elements(x_i);
int n = x_i[1];
real v = 1;
for (i in 2:Cp1)
v *= pow(1 - t^(theta[i] * Dinv), x_i[i]);
return v;
}
real multi_wallenius_lpmf(int[] k, int n, vector m) {
int C = num_elements(m);
real lp = 0;
for (i in 1:C)
lp += -log1p(m[i]) - lbeta(m[i] - k[i] + 1, k[i] + 1);
return lp;
}
}
data {
int<lower=0> N;
int<lower=0> C;
int y[N, C + 1];
int n;
vector[C] m;
real tol;
}
transformed data {
real x_r[0];
// int x_i[N, C + 1];
//
// for (i in 2:N)
// x_i[i] = append_array({n}, y[i]);
}
parameters {
simplex[C] odds;
}
model {
for (i in 1:N) {
real D = to_row_vector(odds) * (m - to_vector(y[i, 2:C + 1]));
y[i, 2:C + 1] ~ multi_wallenius(n, m);
target += log(integrate_1d(
multi_wallenius_integral, 0, 1,
append_array({D}, to_array_1d(odds)),
x_r,
y[i],
tol));
}
}
corresponding R code
### multi-wallenius
fp <- file.path("../multi_wallenius.stan")
mod <- cmdstan_model(fp, force_recompile = T)
N <- 10
m <- c(500, 1000, 800)
n <- 55
odds <- c(0.2, 0.7, 0.1)
y <- rMWNCHypergeo(N, m, n, odds)
meanMWNCHypergeo(m, n, odds, precision=1E-7)
mod_out <- mod$sample(
data = list(N = N,
C = length(m),
y = cbind(n, t(y)),
n = n,
m = m,
tol = 0.1),
chains = 2,
init = 1,
adapt_delta = 0.8,
parallel_chains = 2,
iter_warmup = 200,
iter_sampling = 200
)
I fixed the typos and added some exposition. Note, there is working Stan code outlined in the paper.
wallenius_noncentral_stan_writeup.pdf (132.7 KB)
How does the approximation results compare to the full model? Is there bias, sampler issues or anything else that you’ve noticed? Do you know when “r” may be an issue and the Fog derivation is necessary (what is that btw)?
I’ve put my code at helpful_stan_functions/multi_wallenius_hypergeometric.stan at main · spinkney/helpful_stan_functions · GitHub
Example program at
R program to run it all at
I’d like to add the approximation (of course all credit attributed to you). Just want to put notes for when users should be cautious about using the approximation.
@spinkney did you run any examples with n
closer to the total number of balls in the urn? I am getting some intialization errors when I increase n
to like 2100.
Also, thanks for the question re what value of r
should be used? I actually reformulated the approximation in terms of the untransformed integral. I will post that in a bit.
And the error is something like log(0) cannot evaluate likelihood
, right?
I’ve been able to get rid of these errors and speed the code up if we allow reals instead of ints. Since these hypergeometric distributions are working on ratios I think it’s ok to allow reals. What I do when I work with large “urn” values is that I divide by some amount 10/100/etc and work on the small scale. The parameter values I’m interested in - the “weighted odds” - stay the same but the code runs much, much faster.
For example, opening up the pmf to a pdf and letting the lbeta functions “interpolate” across the integers gives:
functions {
real multi_wallenius_integral(
real t, // Function argument
real xc, // Complement of function argument
// on the domain (defined later)
real[] theta, // parameters
real[] x_r, // data (real)
int[] x_i) { // data (integer)
real Dinv = 1 / theta[1];
int Cp1 = num_elements(x_r);
real v = 1;
for (i in 2:Cp1)
v *= pow(1 - t^(theta[i] * Dinv), x_r[i]);
return v;
}
real multi_walleniusc_lpdf(data real[] k, vector m, vector p,
data int[] x_i, data real tol) {
int C = num_elements(m);
real D = dot_product(to_row_vector(p), (m - to_vector(k[2:C + 1])));
real lp = log(integrate_1d(
multi_wallenius_integral, 0, 1,
append_array({D}, to_array_1d(p)),
k,
x_i,
tol));
for (i in 1:C)
lp += -log1p(m[i]) - lbeta(m[i] - k[i + 1] + 1, k[i + 1] + 1);
return lp;
}
}
data {
int<lower=0> N;
int<lower=0> C;
real y[N, C + 1];
vector[C] m;
real tol;
}
transformed data {
real x_r[0];
int x_i[0];
}
parameters {
simplex[C] probs;
}
model {
// for (i in 1:N)
// y[i] ~ multi_wallenius(m, probs, x_r, tol);
for (i in 1:N)
y[i] ~ multi_walleniusc(m, probs, x_i, tol);
}
To test
fp <- file.path("../multi_walleniusc.stan")
mod <- cmdstan_model(fp, force_recompile = T)
N <- 20
m <- c(2525, 1333, 888)
n <- 4234
odds <- c(0.2, 0.7, 0.1)
y <- rMWNCHypergeo(N, m, n, odds)
meanMWNCHypergeo(m, n, odds, precision=1E-7)
mod_out <- mod$sample(
data = list(N = N,
C = length(m),
y = cbind(n / 100, t(y / 100)),
m = m / 100,
tol = 0.01),
chains = 2,
init = 1,
adapt_delta = 0.8,
parallel_chains = 2,
iter_warmup = 200,
iter_sampling = 200
)
mod_out
Oh I really like that idea! I asked about using high draw values because if you know apriori that n_draws < min(m_i),
I would think to model the likelihood with a standard multinomial (and just assume with replacement for the individual draws). I guess there is some ambiguity because the non-central distributions’ bias depends on the number of balls in the urn, so the probability should decrement as you draw them (but the weights don’t), but “in real life” I don’t know how important that fact is.
In particular, for “most” problems where I would apply this, I would basically just want to use Categorical
likelihoods for each draw, and then have the likelihood’s dimension vary as the colors run out. I tried going down that path, but when I found the non-central hypergeometric distributions, I stopped that branch.
Sorry I meant to reply to this ages ago. The “r” comes from a transform that is applied to the integral to move the mass of the integrand away from 0. This is covered here with the explanation in section 5.5.
The optimal “r” is the solution to an optimization problem that hypothetically could be done in Stan using the new algebraic solvers…
I haven’t had time to revisit this, but I plan to in the new year. I think it would make a nice write up to put somewhere!
Yo, @mathDR and @fabio. I rewrote the BiasedUrn
numerical integration for the Wallenius multivariate function in Stan. It’s much faster and works for large values. You may have to increase the size of the inits to around 4 but seems to generally work well.
library(BiasedUrn)
N <- 10
m <- c(256342, 4656, 15234, 3400)
n <- 10000
odds <- c(0.2, 0.3, 0.1, 0.25)
true_omega <- odds / sum(odds)
y <- rMWNCHypergeo(N, m, n, odds)
mod_out <- mod$sample(
data = list(N = N,
C = length(m),
y = t(y),
m = m),
chains = 2,
init = 1,
adapt_delta = 0.8,
parallel_chains = 2,
iter_warmup = 200,
iter_sampling = 200
)
functions {
real log1pow(real q, real x) {
return x * log1m_exp(q);
}
real pow2_1(real q, real p) {
// calculate 2^q and (1-2^q) without loss of precision.
// return value is (1-2^q). 2^q is returned in y0
real y, y1;
real qq = q * log2();
if (abs(qq) > 0.1) {
y = exp(qq); // 2^q
y1 = 1.0 - y; // 1-2^q
} else {
// Use expm1
y1 = expm1(qq); // 2^q-1
y = y1 + 1.0; // 2^q
y1 = -y1; // 1-2^q
}
if (p != 0) {
return y;
}
return y1; // Return y1
}
real lnbico(array[] int x, array[] int m) {
return sum(lchoose(m, x));
}
array[] real findpars(array[] real x, vector omega, array[] real m) {
// calculate d, E, r, w
// find r to center peak of integrand at 0.5
real dd = 0; // scaled d
real dr; // 1/d
real E = 0;
real lastr = 1;
real z, zd, rr, rrc, rt, r21, a, b, ro, k1;
real r2 = 0.;
real omax = max(omega); // highest omega
real omaxr = 1. / omax; // 1/omax
int colors = num_elements(x);
real r = 1;
array[colors] real omeg; // scaled weights
for (i in 1 : colors) {
// scale weights to make max = 1
omeg[i] = omega[i] * omaxr;
// calculate d and E
dd += omeg[i] * (m[i] - x[i]);
E += omeg[i] * m[i];
}
dr = 1.0 / dd;
E *= dr;
rr = r * omax;
if (rr <= dr) {
rr = 1.2 * dr;
} // initial guess
int j = 0;
while (1) {
rrc = 1.0 / rr;
z = dd - rrc; // z(r)
zd = rrc * rrc; // z'(r)
for (i in 1 : colors) {
rt = rr * omeg[i];
if (rt < 100.0 && rt > 0.0) {
// avoid overflow and division by 0
r21 = pow2_1(rt, 0); // r2=2^r, r21=1.-2^r
r2 = pow2_1(rt, 1);
a = omeg[i] / r21; // omegai/(1.-2^r)
b = x[i] * a; // x*omegai/(1.-2^r)
z += b;
zd += b * a * r2 * log2();
}
}
if (zd == 0) {
reject("can't find r in function findpars");
}
rr -= z / zd; // next r
if (rr <= dr) {
rr = lastr * 0.125 + dr * 0.875;
}
j = j + 1;
if (j == 70) {
reject("convergence problem searching for r in function findpars");
}
if (abs(rr - lastr) < rr * 1e-5) {
break;
}
lastr = rr;
}
real rd = rr * dd;
r = rr * omaxr;
// find peak width
real phi2d = 0.0;
for (i in 1 : colors) {
ro = rr * omeg[i];
if (ro < 300 && ro > 0.0) {
// avoid overflow and division by 0
k1 = pow2_1(ro, 0);
k1 = -1.0 / k1;
k1 = omeg[i] * omeg[i] * (k1 + k1 * k1);
} else {
k1 = 0.0;
}
phi2d += x[i] * k1;
}
phi2d *= -4.0 * rr * rr;
if (phi2d > 0.0) {
reject("peak width undefined in function findpars");
}
real wr = sqrt(-phi2d);
real w = 1.0 / wr;
return {E, r, rd, w, wr, phi2d, log(rr) + log(dd)};
}
real integrate_step(real ta, real tb, vector omega, array[] int x,
array[] int m, real r, real rd, real bico) {
// integration subprocedure used by integrate()
// makes one integration step from ta to tb using Gauss-Legendre method.
// result is scaled by multiplication with exp(bico)
real ab, delta, tau, ltau, y, tot, taur, rdm1;
int colors = num_elements(x);
array[8] real xval = {-0.960289856498, -0.796666477414, -0.525532409916,
-0.183434642496, 0.183434642496, 0.525532409916,
0.796666477414, 0.960289856498};
array[8] real weights = {0.10122853629, 0.222381034453, 0.313706645878,
0.362683783378, 0.362683783378, 0.313706645878,
0.222381034453, 0.10122853629};
delta = 0.5 * (tb - ta);
ab = 0.5 * (ta + tb);
rdm1 = rd - 1.0;
tot = 0.0;
for (j in 1 : 8) {
tau = ab + delta * xval[j];
ltau = log(tau);
taur = r * ltau;
y = 0.0;
for (i in 1 : colors) {
// possible loss of precision due to subtraction here:
y += log1pow(taur * omega[i], x[i]); // ln((1-e^taur*omegai)^xi)
}
y += rdm1 * ltau + bico;
if (y > -50.0) {
tot += weights[j] * exp(y);
}
}
return delta * tot;
}
real search_inflect(real t_from, real t_to, vector omega, array[] int x,
real r, real rd) {
// search for an inflection point of the integrand PHI(t) in the interval
// t_from < t < t_to
int colors = num_elements(x);
real t;
real t1; // independent variable
vector[colors] rho = r * omega; // r*omega[i]
real q; // t^rho[i] / (1-t^rho[i])
real q1; // 1-t^rho[i]
array[colors, 4, 4] real zeta; // zeta[i,j,k] coefficients
array[4] real phi; // derivatives of phi(t) = log PHI(t)
real Z2; // PHI''(t)/PHI(t)
real Zd; // derivative in Newton Raphson iteration
real rdm1; // r * d - 1
real tr; // 1/t
real log2t; // log2(t)
real method; // 0 for z2'(t) method, 1 for z3(t) method
int iter = 0; // count iterations
real t_fromm = t_from;
real t_too = t_to;
rdm1 = rd - 1.0;
if (t_fromm == 0.0 && rdm1 <= 1.0) {
return 0.0;
} //no inflection point
t = 0.5 * (t_fromm + t_too);
for (i in 1 : colors) {
// calculate zeta coefficients
zeta[i, 1, 1] = rho[i];
zeta[i, 1, 2] = rho[i] * (rho[i] - 1.0);
zeta[i, 2, 2] = rho[i] * rho[i];
zeta[i, 1, 3] = zeta[i, 1, 2] * (rho[i] - 2.0);
zeta[i, 2, 3] = zeta[i, 1, 2] * rho[i] * 3.0;
zeta[i, 3, 3] = zeta[i, 2, 2] * rho[i] * 2.0;
}
while (1) {
t1 = t;
tr = 1.0 / t;
log2t = log(t) * (1.0 / log2());
phi[1] = 0.0;
phi[2] = 0.0;
phi[3] = 0.0;
for (i in 1 : colors) {
// calculate first 3 derivatives of phi(t)
if (rho[i] != 0.0) {
q1 = pow2_1(rho[i] * log2t, 0);
q = pow2_1(rho[i] * log2t, 1) / q1;
phi[1] -= x[i] * zeta[i, 1, 1] * q;
phi[2] -= x[i] * q * (zeta[i, 1, 2] + q * zeta[i, 2, 2]);
phi[3] -= x[i] * q
* (zeta[i, 1, 3]
+ q * (zeta[i, 2, 3] + q * zeta[i, 3, 3]));
}
}
phi[1] += rdm1;
phi[2] -= rdm1;
phi[3] += 2.0 * rdm1;
phi[1] *= tr;
phi[2] *= tr * tr;
phi[3] *= tr * tr * tr;
method = ceil(iter % 2); // alternate between the two methods
Z2 = phi[1] * phi[1] + phi[2];
Zd = method * phi[1] * phi[1] * phi[1]
+ (2.0 + method) * phi[1] * phi[2] + phi[3];
if (t < 0.5) {
if (Z2 > 0.0) {
t_fromm = t;
} else {
t_too = t;
}
if (Zd >= 0.0) {
// use binary search if Newton-Raphson iteration makes problems
t = (t_fromm != 0.0 ? 0.5 : 0.2) * (t_fromm + t_too);
} else {
// Newton-Raphson iteration
t -= Z2 / Zd;
}
} else {
if (Z2 < 0.0) {
t_fromm = t;
} else {
t_too = t;
}
if (Zd <= 0.0) {
// use binary search if Newton-Raphson iteration makes problems
t = 0.5 * (t_fromm + t_too);
} else {
// Newton-Raphson iteration
t -= Z2 / Zd;
}
}
if (t >= t_too) {
t = (t1 + t_too) * 0.5;
}
if (t <= t_fromm) {
t = (t1 + t_fromm) * 0.5;
}
iter += 1;
if (iter > 20) {
reject("Search for inflection point failed!");
}
if (abs(t - t1) < 1e-5) {
break;
}
}
return t;
}
real integrate(real accuracy, vector omega, array[] int x, array[] int m) {
real s; // result of integration step
real tot; // integral
real ta, tb; // subinterval for integration step
array[7] real pars = findpars(x, omega, m); // {E, r, rd, w, wr, phi2d}
real E = pars[1];
real r = pars[2];
real rd = pars[3];
real w = pars[4];
real wr = pars[5];
real phi2d = pars[6];
real logrd = pars[7];
real bico = lnbico(x, m);
// Choose method:
if (w < 0.02) {
// normal method. Step length determined by peak width w
real delta, s1;
s1 = accuracy < 1e-9 ? 0.5 : 1.0;
delta = s1 * w; // integration steplength
ta = 0.5 + 0.5 * delta;
// real ta, real tb, vector omega, array[] int x, array[] int m, real r, real rd, real bico
tot = integrate_step(1.0 - ta, ta, omega, x, m, r, rd, bico); // first integration step around center peak
while (1) {
tb = ta + delta;
if (tb > 1.0) {
tb = 1.0;
}
s = integrate_step(ta, tb, omega, x, m, r, rd, bico); // integration step to the right of peak
s += integrate_step(1.0 - tb, 1.0 - ta, omega, x, m, r, rd, bico); // integration step to the left of peak
tot += s;
if (s < accuracy * tot) {
break;
} // stop before interval finished if accuracy reached
ta = tb;
if (tb > 0.5 + w) {
delta *= 2.0;
} // increase step length far from peak
if (tb >= 1.) {
break;
}
}
} else {
// difficult situation. Step length determined by inflection points
real t1 = 0;
real t2 = 0.5;
real tinf, delta, delta1;
tot = 0.0;
// do left and right half of integration interval separately:
while (1) {
// integrate from 0 to 0.5 or from 0.5 to 1
tinf = search_inflect(t1, t2, omega, x, r, rd); // find inflection point
delta = tinf - t1;
if (delta > t2 - tinf) {
delta = t2 - tinf;
} // distance to nearest endpoint
delta *= 1.0 / 7.0; // 1/7 will give 3 steps to nearest endpoint
if (delta < 1e-4) {
delta = 1e-4;
}
delta1 = delta;
// integrate from tinf forwards to t2
ta = tinf;
while (1) {
tb = ta + delta1;
if (tb > t2 - 0.25 * delta1) {
tb = t2;
} // last step of this subinterval
s = integrate_step(ta, tb, omega, x, m, r, rd, bico); // integration step
tot += s;
delta1 *= 2.0; // double steplength
if (s < tot * 1e-4) {
delta1 *= 8.0;
} // large step when s small
ta = tb;
if (tb >= t2) {
break;
}
}
if (tinf != 0.0) {
// integrate from tinf backwards to t1
tb = tinf;
while (1) {
ta = tb - delta;
if (ta < t1 + 0.25 * delta) {
ta = t1;
} // last step of this subinterval
s = integrate_step(ta, tb, omega, x, m, r, rd, bico); // integration step
tot += s;
delta *= 2.0; // double steplength
if (s < tot * 1e-4) {
delta *= 8.0;
} // large step when s small
tb = ta;
if (ta <= t1) {
break;
}
}
}
t1 += 0.5;
t2 += 0.5;
if (t1 >= 1) {
break;
}
}
}
return log(tot) + logrd;
}
real multi_wallenius_lpmf(data array[,] int y, data array[] int m, vector p) {
int N = size(y);
real lpmf = 0;
for (n in 1 : N) {
lpmf += integrate(1e-12, p, y[n], m);
}
return lpmf;
}
}
data {
int<lower=0> N;
int<lower=0> C;
array[N, C] int y;
array[C] int m;
}
parameters {
simplex[C] probs;
}
model {
y ~ multi_wallenius(m, probs);
}
This is great! So recently I have been using my “homegrown” unnormalized log marginal likelihood of:
real approximate_wallenus_integral(
vector weights,
data array[] int y,
data array[] int m,
data vector log_n_dt
)
{
int num_categories = size(y);
vector[num_categories] vec_y = to_vector(y); // number of balls of each category that were drawn
vector[num_categories] vec_m = to_vector(m); // number of balls of each category in the urn
real one_over_D = 1.0 / dot_product(weights,(vec_m-vec_y));
return log_sum_exp((log1m_exp(one_over_D .* (log_n_dt * weights')) * vec_y));
}
real multi_wallenius_lpmf(
data array[] int y,
data array[] int m,
vector weights,
data vector log_n_dt
)
{
return approximate_wallenus_integral(weights, y, m, log_n_dt);
}
where log_n_dt
is of length number of points you slice [0,1]
into for the integral and is defined as:
int num_points = 100;
vector[num_points-1] log_n_dt;
for (n in 1:num_points-1)
{
log_n_dt[n] = log(1.0*n/num_points);
}
The reason I like this is because the formulation allows us to write out the gradients analytically.
I have those at hand and am trying to put them into c++ to see how fast using analytical gradients will be vs autodiff.
If interested I can post the gradients here.
Also, it goes without saying but if anyone sees any speedups here, they would be MOST welcome!
I am trying to scale this up to tens of thousands of items in omega so every little bit will help!!
Hey @spinkney I can definitely say that the above approximation returns the correct weights when doing inference against simulated data.
That said, as I scale the size of the weights, the autodiff gradient calculation os getting prohibitively long.
I would like to apply analytical gradients to help speed things up, but don’t have the best handle on the c++ templating. Is this something you are familiar with?
I can help with this.
To help get started with adding the cpp code and gradients check out
- Gamma distribution quantile and inverse quantile function - #37 by andrjohns
- Hoping for some guidance / help with implementing custom log likelihood and gradient for research project (details below) - #14 by rok_cesnovar
- Vector-valued functions with manual gradients in external C++ - #3 by stevebronder