Hello,
New to Stan and I’m currently having trouble testing various specifications of my model. I’m using RStan and this is part of a replication project that estimates ideological positions of political figures using twitter. I am most interested in estimates of phi. Below is the original model. It works as expected and estimates for phi range between -2 and 2.
data {
int<lower=1> J; // number of twitter users
int<lower=1> K; // number of elite twitter accounts
int<lower=1> N; // N = J x K
int<lower=1,upper=J> jj[N]; // twitter user for observation n
int<lower=1,upper=K> kk[N]; // elite account for observation n
int<lower=0,upper=1> y[N]; // dummy if user i follows elite j
}
parameters {
vector[K] alpha;
vector[K] phi;
vector[J] theta;
vector[J] beta;
real mu_beta;
real<lower=0.1> sigma_beta;
real mu_phi;
real<lower=0.1> sigma_phi;
real<lower=0.1> sigma_alpha;
real gamma;
}
model {
alpha ~ normal(0, sigma_alpha);
beta ~ normal(mu_beta, sigma_beta);
phi ~ normal(mu_phi, sigma_phi);
theta ~ normal(0, 1);
for (n in 1:N)
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] -
gamma * square( theta[jj[n]] - phi[kk[n]] ) );
}
I’m trying to re-specify the model from one based on euclidean distance to a bilinear one. Specifically I’m changing the model from this:
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] -
gamma * square( theta[jj[n]] - phi[kk[n]] ) );
To this:
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] -
gamma * ( theta[jj[n]] * phi[kk[n]] ) );
When I test the bilinear model I get warnings of divergent transitions and large R-hat’s that I don’t get with the euclidean model. On small sample sizes with bilinear I get estimates of phi that range between -5 and 5 with standard deviations about half the size of the estimate. When I increase the sample size on the bilinear model all the estimates hover around zero and the standard deviations are 3-5 times the size of the estimates. The sign of the estimates are consistently correct (e.g. Barack Obama is pointing in the opposite direction as Fox News), but the values themselves are strangely small while the standard deviations are huge. This does not happen with the euclidean model, and I’m unclear as to why this is happening with the bilinear model.
Nothing else changes in either the Stan code or my R code outside of the model specification I highlighted.
I’m currently running 2 chains with 500 iterations, 100 of which are warm up. I’ve tried increasing all of these but it makes no difference. My R code is below.
stan model.R (4.0 KB)