Ragged arrays and time series operations

Michael_F · February 19, 2018, 3:09pm

Thanks Bob and Ben. I’ve got the model working on a limited set of data, but it’s slow and when I expand to my full data set it takes a very long time and usually runs into problems (e.g. Error in unserialize(socklist[[n]]) : error reading from connection). Because of the ragged array set up the mu matrix I’m estimating is about 37000x4. I was getting the inefficient deep copy warning because I had mu[t,]= mu[t-1,] and now have a loop specifying the columns which helped immensely on a small dataset, but there must be more things I’m doing inefficiently in the stan model. Does anything I’m doing look especially egregious or non-best practice? One thought I had was that I think I’m specifying the prior for the final day of each election twice - once through the innovation and once through mu_finish.

data{
  
//read in dimensions
 int N; //number of unique polls
 int p; //number of parties 
 int N_election; // number of elections max(election_order)
 int N_poll_elec_prov_id; //number of unique poll-election-region pairs
 int s[N_poll_elec_prov_id]; //lengths of ragged pollster/elec/region arrays
 int total_days; //necessary length of mu 
 int n_days[N_election]; //number of days in each election
 int day_index[N]; //index for matching polls to mu

 int N_pollster; //number of pollsters for house effects
 int pollster_id[N_poll_elec_prov_id];
 int last_day_index[N_election-1]; 
 matrix[N_election,p] mu_start; //starting values from previous election
 matrix [N_election-1,p] mu_finish; //finishing values from election results

// actual values in polls, in 0-1 scale
  matrix[N,p] y; //matrix of polls - with p as number of parties
  matrix[N,p] y_se; //matrix of se - with p as number of parties
}

parameters{
  matrix[N_pollster,p] house_effect; //house effects
  matrix[total_days,p] innovation;
  real<lower=0> sigma;
  real<lower=0> sd_inflator[N_pollster]; //inflating the sd from each poll by same factor
  //real<lower=0> sd_inflator; //inflating the sd from each poll by same factor

}

transformed parameters{
  matrix[total_days,p] mu;
  {
    int pos = 1;
    for (k in 1:N_election) {
      mu[pos,] = mu_start[k,];
      for (t in (pos+1):(pos+n_days[k]-1)){
        for (i in 1:p){
        mu[t,i] = mu[t-1,i] + innovation[t,i]*sigma;
}
    }
      pos = pos + n_days[k];
  }
  }
}

model{
  for (i in 1:p){
    innovation[,i] ~ student_t(4, 0, 1);
     mu_finish[,i] ~ normal(mu[last_day_index,i], 0.0001);
    house_effect[,i] ~ normal(0,.05);
  }
  sigma ~ normal(0.001,0.001);
  sd_inflator ~ normal(1,5);
  


  {
  int pos2= 1;
  for (k in 1:N_poll_elec_prov_id) {
    for (i in 1:p){
    segment(y[,i], pos2, s[k]) ~ normal(mu[segment(day_index, pos2, s[k]),i]+house_effect[pollster_id[k],i], sd_inflator[pollster_id[k]]*segment(y_se[,i], pos2, s[k]));
    }
    pos2 = pos2 + s[k];
}
}
}

Topic		Replies	Views
Parallelizing model with ragged parameters using slicing operations Modeling paralellization	6	52	October 24, 2024
Ragged data matrix and some questions using "segment" function Modeling	1	907	November 1, 2017
Sampling efficiency when using a ragged array in a simplex-like way Modeling	6	919	July 19, 2017
Difficulties in coding autoregressive/panel model for ragged array Modeling panel-data , autoregressive-model	2	560	April 30, 2021
Following up on several discussions of simplex adjustments and ragged arrays of simplexes Modeling techniques , specification , user-defined-functions , constraint-transform	13	172	December 10, 2024

Ragged arrays and time series operations

Related topics