Slight specification changes resulting in warnings and unexpected results

Mike · April 5, 2020, 10:04am

Hello,

New to Stan and I’m currently having trouble testing various specifications of my model. I’m using RStan and this is part of a replication project that estimates ideological positions of political figures using twitter. I am most interested in estimates of phi. Below is the original model. It works as expected and estimates for phi range between -2 and 2.

data {
int<lower=1> J; // number of twitter users
int<lower=1> K; // number of elite twitter accounts
int<lower=1> N; // N = J x K
int<lower=1,upper=J> jj[N]; // twitter user for observation n
int<lower=1,upper=K> kk[N]; // elite account for observation n
int<lower=0,upper=1> y[N]; // dummy if user i follows elite j
}

parameters {
vector[K] alpha;
vector[K] phi;
vector[J] theta;
vector[J] beta;
real mu_beta;
real<lower=0.1> sigma_beta;
real mu_phi;
real<lower=0.1> sigma_phi;
real<lower=0.1> sigma_alpha;
real gamma;
}

model {
alpha ~ normal(0, sigma_alpha);
beta ~ normal(mu_beta, sigma_beta);
phi ~ normal(mu_phi, sigma_phi);
theta ~ normal(0, 1); 
for (n in 1:N)
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - 
gamma * square( theta[jj[n]] - phi[kk[n]] ) );
}

I’m trying to re-specify the model from one based on euclidean distance to a bilinear one. Specifically I’m changing the model from this:

y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] -  
gamma * square( theta[jj[n]] - phi[kk[n]] ) );

To this:

y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] -  
gamma * ( theta[jj[n]] * phi[kk[n]] ) );

When I test the bilinear model I get warnings of divergent transitions and large R-hat’s that I don’t get with the euclidean model. On small sample sizes with bilinear I get estimates of phi that range between -5 and 5 with standard deviations about half the size of the estimate. When I increase the sample size on the bilinear model all the estimates hover around zero and the standard deviations are 3-5 times the size of the estimates. The sign of the estimates are consistently correct (e.g. Barack Obama is pointing in the opposite direction as Fox News), but the values themselves are strangely small while the standard deviations are huge. This does not happen with the euclidean model, and I’m unclear as to why this is happening with the bilinear model.

Nothing else changes in either the Stan code or my R code outside of the model specification I highlighted.

I’m currently running 2 chains with 500 iterations, 100 of which are warm up. I’ve tried increasing all of these but it makes no difference. My R code is below.

stan model.R (4.0 KB)

mike-lawrence · April 5, 2020, 1:45pm

My guess is that that seemingly small difference actually changes the geometry substantially, leading to ridges or funnels. Have you taken a look at the pairs plot? Maybe @betanalpha could comment on the geometry and any options for reparameterizing the bilinear model.

Mike · April 5, 2020, 8:04pm

I think you’re right. It seems converge at different sample sizes. 100 and 1,000 work, 300 and 10,000 do not. Unfortunately I’m unsure how to trouble shoot it.

I haven’t been able to look at the pairs plot either.

> pairs(stan.fit)

results in

Error in plot.new() : figure margins too large

So I tried to export the plots as a pdf instead but it just hangs indefinitely.

mike-lawrence · April 5, 2020, 8:19pm

Oh, I just noticed that you don’t have priors specified for a bunch of the parameters. When you don’t specify an explicit prior for a parameter, this implies a uniform prior across the domain of the parameter, which is usually a really bad idea. So you should definitely do some thinking on some at least weakly-informative priors for all the parameters. Also, is there any reason you’re specifying a lower bound of 0.1 on the sigma parameters? A lower bound of zero is more common, and it’s easy to specify zero-avoiding priors if you have negligible credibility at zero.

mike-lawrence · April 5, 2020, 8:23pm

For the pairs plots, I’d focus on the non-vector parameters first, so do:

pairs(
    x = my_stanfit
    , pars = c(
        'mu_beta'
        , 'sigma_beta'
        , 'mu_phi'
        , 'sigma_phi'
        , 'sigma_alpha'
        , 'gamma'
    )
)

mike-lawrence · April 5, 2020, 8:25pm

If, after specifying weakly informative priors for all parameters, you’re still encountering divergences, try doing the non-centered parameterization for beta and/or phi (possibly alpha too?). The pairs plot I suggest above might help you discern which parameters need to be non-centered.

Mike · April 5, 2020, 11:41pm

Thank you for your assistance, this is incredibly helpful.

I thought I had set the priors, but maybe I’m misunderstanding something about how RStan works. I set my inits in the R:

inits <- rep(list(list(alpha=normalize(log(colK+0.0001)), 
                       beta=normalize(log(rowJ+0.0001)),
                       theta=rnorm(J), phi=start.phi,mu_beta=0, sigma_beta=1, 
                       gamma=abs(rnorm(1)), mu_phi=0, sigma_phi=1, sigma_alpha=1)),2)

Reference them in the model block:

model {
alpha ~ normal(0, sigma_alpha);
beta ~ normal(mu_beta, sigma_beta);
phi ~ normal(mu_phi, sigma_phi);
theta ~ normal(0, 1); 
for (n in 1:N)
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - 
gamma * ( theta[jj[n]] * phi[kk[n]] ) );
}

And then fit the model, calling inits:

stan.fit <- stan(model_code=stan.code, data = stan.data, 
                 iter=n.iter, warmup=n.warmup, init=inits, chains=2, 
                 thin=thin, cores=2

I was able to get pairs working thanks to your help! If I’m interpreting this correctly, gamma is telling me there’s multi modality and it’s failing to converge as a result.

This reading (https://www.martinmodrak.cz/2018/05/14/identifying-non-identifiability/) seems to indicate that such models are dead in the water?

mike-lawrence · April 6, 2020, 12:34am

OK, so to specification of priors goes in the model block. Specifying non-default initial values usually shouldn’t be required if all parameters are on a reasonable scale. To specify a prior, you would do something like:

model{
    alpha ~ normal(0, sigma_alpha);
    beta ~ normal(mu_beta, sigma_beta);
    phi ~ normal(mu_phi, sigma_phi);
    theta ~ normal(0, 1); 
    mu_beta ~ normal(0,1);
    sigma_beta ~ normal(0,1) ;
    mu_phi ~ normal(0,1);
    sigma_phi ~ normal(0,1);
    sigma_alpha ~ normal(0,1) ;
    for (n in 1:N){
        y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - 
            gamma * ( theta[jj[n]] * phi[kk[n]] ) );
    }
}

Now, I just put a standard normal on all the mus and a half-standard-normal on all the sigmas; you’ll have to figure out for yourself what distributions actually reflect your beliefs.

Looking at the pairs plot, yes, gamma is bimodal, and you’ll have to figure out how to fix that. It looks like the modes are on either side of zero, so constraining gamma to positive might do the trick.

It also looks like there’s a funnel geometry between mu_phi & sigma_phi, so you might try a non-centered parameterization for phi.

stijn · April 6, 2020, 1:41am

Hi Mike,

Welcome to the community.

These kind of specifications are really hard to fit. Actually, just multiplying two parameters is already hard to fit. There are two reasons. If one of the parameters is 0, the others can be whatever they want, they are not going to affect the likelihood. The other problem is that if two of the parameters switch signs, there is no change in the likelihood. I think that is why you see two roughly symmetric distributions around zero for gamma in the pairs plots. I think you will want to look into IRT and latent factor models on the forum on tips on how to constraint the model to make it identifiable.

betanalpha · April 6, 2020, 2:49am

Right, moving from the Euclidean distance structure to a multiplicative structure (the phrase “bilinear” has a formal use in math which is very different from how it is used here and so I would recommend against using it) is a huge change in the model. As @stijn notes the new model has the form of an item response theory model (see also Bradley-Terry models) which can be fit in Stan but require care to insure sufficient identification. There is plenty of discussion of these issues and how to work with them in the series of IRT case studies at Stan - Case Studies.

In addition to the multimodality inherent to the multiplicative term the hierarchal structure probably needs to be non-centered, given that you see both multiple modes and funnel-like structures in your pairs plots.

When trying to reproduce existing analyses keep in mind that the Hamiltonian Monte Carlo diagnostics used in Stan are very sensitive and so Stan is much better at knowing when it’s not working, and hence when there are modeling issues to deal with, than other algorithms. Consequently one shouldn’t be surprised to find that a previous model implementation is problematic and the previous analysis may have in fact been limited by bias inferences.

Mike · April 6, 2020, 7:03pm

Thanks everyone this is all very helpful, great community here. As an initial attempt to fix it I tried constraining gamma to be positive and estimating more priors but this didn’t seem to change much.
I think I found a potential solution here: Non convergence issue on polytomous IRT model

That said, I’m getting in to territory that’s over my head and need to turn my attention to other things at the moment. So I’ll take a few days to read up, test a few solutions, and post back here with what I find/follow up questions.

Topic		Replies	Views
Getting different results when using target += Modeling rstan , fitting-issues	3	83	December 12, 2024
Help with understanding my results Modeling fitting-issues , specification , performance	5	1195	February 8, 2019
Warning General	2	1000	July 21, 2017
Warnings in cmdstan but not in rstan Modeling fitting-issues , specification	2	472	September 30, 2020
Issues in my stan code Modeling rstan , fitting-issues	8	567	January 26, 2021

Slight specification changes resulting in warnings and unexpected results

Related topics