Dear all,

I’m looking for some modeling advice for the following setup to model daily polls at a national level. I’m first going to describe the setup, then present the model and define the problem, and then ask a couple of specific questions.

**The Setup**

I have a set of pre-election opinion polls. There are 5 candidates in this election. Any given poll presents a comparison between a subset of candidates (e.g. the 1st candidate v. the 2nd, or the 2nd v. the 3rd), but only a small subset present a full choice-set.

(Toy) Example:

Day 1: Biden = 47%, Trump = 45%, Jill Stein = 3%, Other = 5%;

Day 2: Biden = 50%, Trump = 50%;

Day 3: Biden = 50%, Trump = 45%; Other = 5%;

… etc.

I want to be able to inform an estimate of the underlying vote-choice probabilities for all 5 candidates using all the polls – those using the partial choice-sets as well as the full choice set.

The information from each poll can be broken down into 2 parts:

- the support of the candidates relative to each other;
- the effect of the specific comparison (i.e. if Jill Stein stays in the set, does this have an independent effect on the choice made by voters).

Let’s ignore (2) ( it is taken care of by choice-specific random effects) , and assume each poll shows us genuine relative support per candidate independent of choice-set, and that respondents are unaffected by the choice-set (I believe this is the `Independence of Irrelevant Alternatives’ assumption).

I want my model to estimate the correlation between any two candidates. When I see the poll on day 2, I want my model to understand that this represents a relative gain of Trump over Biden, and predict the appropriate share for Stein and Other according to the correlations between the candidates.

**If Trump and Other vote are negatively correlated, for example, I’d like to see the predicted other-share for that day decrease, even if we did not see a single poll with `Other’ in the choice-set for that day.**

**The Model**

I have tried to model this via a poisson likelihood with a latent multivariate random-walk as follows:

```
data{
int<lower = 1> N; // total n. candidate measurements
int Y[N]; // daily vote counts
int n[N]; // daily sample-size
int choice_id[N]; // candidate id
int<lower=1> choice_N; // total number of candidates under consideration
int dte_id[N]; // days-to-election id
int<lower=1> dte_N; // max number of days-to-election
int set_id[N]; // candidate-set id
int<lower=1> set_N; // max number of different candidate-sets
real<lower = 0> lkj_prior; // prior for the strength of candidate correlatioin
}
parameters {
matrix[choice_N,set_N] gamma; // candidate choice effect
vector<lower=0>[set_N] gamma_scale; // candidate choice effect
array[dte_N] matrix[choice_N,set_N] z; // auxiliary variable for multivariate random walk
matrix<lower = 0>[choice_N,set_N] delta_scale; // day-to-day change
array[set_N] cholesky_factor_corr[choice_N] L_Omega; // correlation between candidates
}
transformed parameters {
vector[N] lambda; // latent support
array[dte_N] matrix[choice_N,set_N] pi; // probability for each candidate
matrix[choice_N,set_N] gamma_star; // choice-level support on election day
array[dte_N] matrix[choice_N,set_N] delta_star; // daily support per candidate
array[set_N] matrix[choice_N,choice_N] L; // covariance matrix
for(s in 1:set_N) {
// non centered param on gamma
gamma_star[:,s] = gamma[:,s]*gamma_scale[s];
// define covariance matrix
L[s] = diag_pre_multiply(delta_scale[1:choice_N,s], L_Omega[s]);
}
// multivariate random walk on daily support
for(s in 1:set_N){
for(c in 1:choice_N) {
delta_star[1][c,s] = 0; // identify via corner constraint
}
for(t in 2:dte_N) {
delta_star[t][1:choice_N,s] =
delta_star[t-1][1:choice_N,s] +
L[s] * z[t][1:choice_N,s];
} }
// ensure sum to 1
for(t in 1:dte_N){
for(s in 1:set_N){
pi[t][1:choice_N,s] =
softmax(
gamma_star[1:choice_N,s] +
delta_star[t][1:choice_N,s]
);
} }
// linear predictor
for (i in 1:N) lambda[i] = n[i]*pi[dte_id[i]][choice_id[i],set_id[i]];
}
model {
to_vector(gamma) ~ std_normal();
gamma_scale ~ std_normal();
for(t in 1:dte_N) to_vector(z[t]) ~ std_normal();
to_vector(delta_scale) ~ std_normal();
for(s in 1:set_N) L_Omega[s] ~ lkj_corr_cholesky(lkj_prior);
Y ~ poisson( lambda ); // likelihood
}
```

I then calculate the daily probabilities by applying softmax to the estimated daily preferences:

\pi_{tj} = \frac{\mbox{exp}(\gamma^\star_{j, s = s*} + \delta^\star_{t,j, s = s*})}{ \sum_k \mbox{exp}(\gamma^\star_{k, s = s*} + \delta^\star_{t,k, s = s*})} .

Note here that the daily preferences \delta are estimated for each choice-set which is see in the polls, and s = s^* represents the `full’ choice-set, i.e. that which involves all of the candidates.

This model is generally quite good, and manages to recover the correct ‘unobserved’ change in support, or at least to include it in a 95% interval.

**The Problem**

I ran some tests on a simulated dataset. I fit the above model and another version without the correlation (so a simple random walk).

Unfortunately, it seems the above correlated model provides no improvement in fit (RMSE, Bias, correlation, coverage are all roughly the same for correlated and independent models).

Even when the underlying correlations from the simulated data are high, I don’t see any improvement in accounting for this correlation. I’m a bit puzzled about this (maybe I shouldn’t be ? )

I have observed that whatever the underlying correlation I try to model, this is mostly overtaken by the sum-to-1 effect of softmax, which imposes its own (mostly negative) correlation structure.

**Questions**

(1) Do you have any idea why this implementation of the model does not provide gains over the independent random walk implementation ? Is there something in the code that I have wrong ?

(2) Is it possible to model a situation in which time-series of probabilities which sum to 1 are assigned a correlation-structure of the sort I’m imagining ? so where a small probability is highly negatively correlated with a large-one ? Looking at the correlation structure implied by a multinomial distribution:

{\displaystyle \rho (X_{i},X_{j})={\frac {\operatorname {Cov} (X_{i},X_{j})}{\sqrt {\operatorname {Var} (X_{i})\operatorname {Var} (X_{j})}}}={\frac {-p_{i}p_{j}}{\sqrt {p_{i}(1-p_{i})p_{j}(1-p_{j})}}}=-{\sqrt {\frac {p_{i}p_{j}}{(1-p_{i})(1-p_{j})}}}.}

, which I understand arises as a result of the sum-to-1 constraint, It seems like changes in a small probability are always bound have minimal, undifferentiated impact on the larger probabilities ( please tell me if I’m wrong) .

Any other suggestions would be highly appreciated.

Many thanks in advance !

Roberto