Variance model

linas · May 2, 2020, 1:37pm

It did converge good Rhat neff no divergencies good mixing, However predictions were very poor since the distribution of s1 ranged between 5 and 7 while y was in the range [0, 1]. I am curious if such transformation is valid.

linas · May 2, 2020, 1:41pm

y=0,0.1,0.2,…,1. There are 11 categories. For simplicity I assumed it continuous. There are bunch of 0 and 1 so beta freaked out. Probably binomial?

andre.pfeuffer · May 2, 2020, 5:27pm

Why don’t you model it as a negative binomial distribution?

linas · May 2, 2020, 7:57pm

physically y is not counts. let’s see how it goes.

andre.pfeuffer · May 6, 2020, 4:41pm

Sure I thought about some tweaking. But the beta distribution should work:

b  ~ beta(mu * s, (1-mu)*s);
target += y * log(b) + (1-y) * log1m(b);

linas · May 6, 2020, 9:31pm

OK, I tried normal with constant sd and it was still best. Normal with “humped” sd or negative binomial converged but predictions were very bad. I am surprised about the convergence and no divergencies. I will try as you have suggested. Since y is a vector should mu be a vector and s a real?

linas · May 7, 2020, 5:37pm

I need some clarifications. Is this possible to write in vector form? In my case y is a vector and mu is an unconstrained vector. Also I am more comfortable with _lpdf form since I want to run those statements in a function. Is this correct:

vector fun(vector Y, vector mu, real s, int N) {
  vector[N] pmu = Phi(mu);
  vector[N] b;
  real lp = 0.;
  vector[N] one = rep_vector(1.0, N);

  for (i = 1:N) {
    b[i] = beta_lpdf(pmu[i] * s, (one[i] - pmu[i]) * s);
    lp += y[i] * log(b[i]) +(one[i] - y[i]) .* log1m(b[i]);
  }
  return [lp]';
}

Can it be vectorized?

andre.pfeuffer · May 8, 2020, 5:18pm

See Chapter 18.3. https://mc-stan.org/docs/2_19/stan-users-guide/functions-accessing-the-log-probability-accumulator.html

You have to add _lp to your function. Then you can use target += and inside your function
use return sum(lp).

You also don’t need to define a one vector. You can write 1 - y.
Yes, it can be vectorized.

linas · May 8, 2020, 6:04pm

Since I am using MPI I have to follow it’s standard and return [lp]’. Could you please help with vectorization? Here is my attempt:

vector fun(vector Y, vector mu, real s, int N) {
vector[N] pmu = Phi(mu);
vector[N] b;
real lp;

for (i = 1:N) b[i] = beta_lpdf(pmu[i] * s, (1 - pmu[i]) * s);
lp = sum(y .* log(b) +(1 - y) .* log1m(b);

return [lp]’;
}

linas · May 9, 2020, 12:49pm

This is not working. I wonder how to express your suggestion in mapped function.

andre.pfeuffer · May 9, 2020, 1:00pm

I highly recommend first to code a non-parallel version and then transform it to a parallel version.
Do you think about using map-rect or reduce-sum?

linas · May 9, 2020, 1:19pm

The problem is that I already have the version with normal coded, see below. It is making good predictions already but I am a perfectionist and looking for even better. Also I don’t like using normal for data in [0, 1].

vector lp_reduce(vector phi, vector theta, real[] xr, int[] xi) {
  int N = xi[1];
  int y[N] = xi[2:(N + 1)];
  vector[N] yT = theta[1:N];
  real sigma = phi[1];

  real lp = neg_binomial_2_lpmf(y | exp(yT), sigma);
  return [lp]';
}  

lp = sum(map_rect(lp_reduce, phi, theta, xr, xi));

linas · May 25, 2020, 5:18pm

Here is the current model. The model predictions for in sample data are quite good but since observed variable is in [0,1] I am not confident that normal distribution is best.

Objective. Model canopy of trees over several years. Trees are infected by some bug and are being treated with insecticide against this bug. Observed canopy is a number in [0, 1].

Model is latent variables Gaussian Process.
yT_t^x\sim multivariate\_normal(yT_{t-1}^x,K(X_t^x|\alpha,\rho))
y_t^x\sim normal(yT_t^x,\sigma)

Where yT_t^x- true canopy of tree x at time t, y_t^x- observed canopy of tree x at time t, \sigma - observation error, X_t^x - design matrix, K – squared exponential kernel, \alpha,\rho - kernel parameters

Model parameters: \alpha,\rho,\sigma,yT_0^x

Can somebody help me to figure more appropriate distribution?

maxbiostat · May 25, 2020, 6:00pm

You can always consider a logit transformation.

linas · May 25, 2020, 6:50pm

Thanks. Problem is that there are bunch of y=0 or 1.

linas · May 25, 2020, 7:19pm

Do you mean logit(\tilde y^x_t)\sim normal(yT^x_t,\sigma) where
\tilde y_t^x=y_t^x+\delta and \delta=10^{-5} if y_t^x=0, -10^{-5} if y_t^x=1, 0 otherwise?

maxbiostat · May 25, 2020, 7:23pm

If that hack gives good results, than yes, that’s what I mean ;-)

linas · May 25, 2020, 7:37pm

Is there a better way to implement this hack?
for (i in in 1:N)
y_bar[i] = if_else(y[i] == 0, y[i] + 1e-5, if_else(y[i] == 1, y[i] - 1e-5, y[i]));

linas · May 25, 2020, 8:03pm

Or rather
for (i in 1:N)
y_bar[i] = y[i] == 0 ? y[i] + 1e-5 : (y[i] == 1 ? y[i] - 1e-5 : y[i]);

linas · May 27, 2020, 2:59pm

Predictions were much worse than normal model. It is not too surprising since variance was large for y=0 or y=1 and small for y=0.5. Physically variance is small for y=0 or y=1 and large for y=0.5.

Topic		Replies	Views
Simple model that converges in JAGS but not in Stan Modeling	18	1634	June 9, 2019
Bayesian inference in Stan Modeling	40	5856	July 18, 2017
A hard modeling problem using Bayesian inference Modeling	23	1994	May 31, 2017
Request for help understanding model behavior Modeling	17	1540	March 24, 2018
Multiple outcomes hurdle model - exists? Modeling	11	1570	March 13, 2019

Variance model

Related topics