# Unable to initialize sampler in 2 stage latent variable model

I’m trying to fit a latent variable model with two stages of noise (normal, then binomial-logit) added to the input (A bunch of stimulus values ranging from 9-16) giving rise to categorical outputs (binary decisions).

I keep getting “Error in sampler\$call_sampler(args_list[[i]]) : Initialization failed.”

I have tried this in pyStan, matlabStan and rStan, and have tried uncentering, adding priors to all my params, initializing with fixed values, looping instead of vectorized sampling, and replacing the final sampling statement with target increments - always get the same error. Please help!

The model is:
p(obs |x) = N(x,\sigma)
q_{obs} = \beta(2\Phi\big(\frac{\mu-obs}{\sigma}\big)-1)
p(y=1|q_{obs}) = \frac{1}{1+e^{-q_{obs}}}

The marginal likelihood (i.e. p(y=1|x) marginalized over the latent variable q_{obs}) is an intractable integral, so I cannot specify it analytically. I also get the non-fatal warning:
“Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform. Left-hand-side of sampling statement:
obs ~ normal(…)”
I’m not sure how to do this…

It seems like this should be doable, since my model is similar to the “reg_dif” model here http://mc-stan.org/users/documentation/case-studies/tutorial_twopl.html . Would appreciate any help :)

Here’s my rStan code:

data {
int<lower=1> N;
vector[N] x;
int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0> sigma;
real<lower=0> beta;
real mu;
}
transformed parameters{
vector[N] eps;
vector[N] obs;
vector[N] q_obs;
obs = x + sigma*eps;
q_obs = beta*(2*Phi((mu-obs)/sigma)-1);
}
model {
mu ~ normal(12.5,3);
eps ~ normal(0,1);
y ~ bernoulli_logit(q_obs);
}


My data is in an rdump here: https://drive.google.com/file/d/1QMc6expP0VN1MtpkroaY27ktzuOZzJQ0/view?usp=sharing

“Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform. Left-hand-side of sampling statement:
obs ~ normal(…)”

This happens because you’re declaring eps as a transformed parameter, then placing a prior on it in the model block. You probably want to move vector[N] eps; up to the parameters block. Here’s a thread that goes into detail on placing priors on transformed parameters: Putting priors on transformed parameters .

I’m guessing this isn’t your intention though, so just rerun the model like this and it should initialize (it did for me).

data {
int<lower=1> N;
vector[N] x;
int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0> sigma;
real<lower=0> beta;
real mu;
vector[N] eps;
}
transformed parameters{
//vector[N] eps;
vector[N] obs;
vector[N] q_obs;
obs = x + sigma*eps;
q_obs = beta*(2*Phi((mu-obs)/sigma)-1);
}
model {
mu ~ normal(12.5,3);
eps ~ normal(0,1);
y ~ bernoulli_logit(q_obs);
}



Unrelated to the initialization issue, you may want to place stronger prior/constraint on sigma to keep it away from 0 so that the (mu-obs)/sigma term doesn’t blow up. around slide 28 in this presentation by Andrew Gelman he gives a good example for a nonnegative parameter prior that avoids the boundary.

2 Likes

Thank you so much, that works like a charm! This also helped me to understand the difference between the blocks a bit better, thanks a ton!

As for the sigma constraint…I tried using a gamma prior as Andrew suggested to keep it away from zero, but it still kept blowing up, whereas inverting the parameter and using beta = 1/sigma instead seemed to work better…Is this trick often used or is it better to find a weakly informative prior that works?

I actually tried fitting your model to the data, and had the opposite problem of what I expected: sigma shot off to infinity rather than rubbing up against zero.

A few questions I have:

In p(obs |x) = N(x,\sigma) what does the variable obs mean? And what effect does the stimulus x have on this variable

Also, in your Stan code you have \beta lower bounded to zero. Why is this the case? Since your inputing q_{obs} as the bernoulli_logit parameter, shouldn’t q_obs live in the real numbers?

2\Phi\big(\frac{\mu-obs}{\sigma}\big)-1 lives in (0, 1), so q_{obs} = \beta(2\Phi\big(\frac{\mu-obs}{\sigma}\big)-1) lives in (0, inf)

if beta was unbounded, q_obs would live in the real numbers. I’m curious as to why you are “squishing” your predictor twice, ie once with the cumulative normal function Phi and once with the bernoulli_logit

Yeah I was able to accurately recover sigma when the second source of (logistic) noise was not added, but not so well when it is…I do know that the sigma &beta parameters tradeoff.

obs[i] is meant to represent a noisy observation of the stimulus x[i] on every trial - I’m assuming that the observation noise is additive, zero-mean gaussian noise with s.d. \sigma, so obs is normally distributed around the stimulus on that trial.

Beta is meant to represent the inverse temperature of a softmax decision rule based on q_{obs}, which represents the action value difference between decision=1 and decision=0. In other words it is the inverse S.D. of logistic noise added to q_{obs} hence is always positive.
(2\Phi\big(\frac{\mu-obs}{\sigma}\big)-1) actually lives in [-1,1] , so q_{obs} lives in (-inf,inf) and beta just acts as a scale factor.

The model that I have described here is actually novel in my subfield, since people typically either use observation noise (in perceptual decision making studies) or decision noise (in value based decision making studies) but we have some evidence for both in our data…
This paper has both in separate chapters - http://www.gatsby.ucl.ac.uk/~dayan/papers/NDM002wc.pdf
The observation process giving rise to q_obs is described around page 438, while the stochastic decision rule is described around page 448.