# Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS

I am very new to stan programming. I have learned rethinking and brms, however, I want to model time-series ordinal variables. I am quite confused about how to manage the data and how to adapt this code to ordinal regression. I appreciate any help and guidance. The code is as below and attached is the model I have been trying to address. But, this is the first time I try Stan code, and have no idea if I am on the

``````data {

int < lower = 0 > N; // number of individuals
int < lower = 1 > G; // number of covariates
int < lower = 1 > P; // Index of time .
int < lower = 1 > Y[N,G,P]; //  matrices dimension NxG for each P period
}
parameters {
vector [beta];
vector[K - 1] c ; // cut-point

}

model {
for (t in 2:P) {
for (g in 1:G) {
for (i in 1:N) {
Y[i, g, t] ~ ordered_logistic(beta*Y[i,g,t-1] + c [i,g,t]) ;
}}}
}

``````
2 Likes

when I started out with Stan I was also quite bewildered. You probably want to start with examples from the Stan manual closest to your use case and tweak it until you get what you want.

Thanks for your empathy and encouragement. Indeed, I am doing it and it is moving forward slowly. But, it is fun to pass through difficulties. I think I had to read at least three books so far and learn more in-depth to apply it to programing in STAN. I will share it when it is finalized. Stay well and safe.

So far as I can see without running it, your code is in order ; whether it makes sense or not is a different question :-)

I started learning Bayesian stuff under a year ago (but Iâ€™ve more experience in the non Bayesian world). What I personally found helpful is setting up a little REPL for yourself for every model: a way to go from running it to seeing how it fits quickly, so you can learn from the feedback. I find PPC and simulation to be especially useful. Itâ€™s also an easy way to understand what effect your changes are having. I also found fitting synthetic data first to be necessary; it makes clear what your generative assumptions are and gives you confidence that your model is doing what itâ€™s supposed to when the answer is known. Iâ€™ve personally found that more often than not you can make your synthetic data such that the model can not recover the parameters of the generative function. How and why this happens has almost always been insightful.

For most purposes; your model is â€ścorrectâ€ť because it fits the data as youâ€™d expect, and in the ensuing wrestling match of wrangling your model to fit the data itâ€™s pretty difficult not to get a good sense of how the code works.

5 Likes

I have done statistics of my project for man years and have learned Bayes using Statistics rethinking book, read Vhetari et al. papers, listened to his videos and also brms and recent papers by Paul on ordinal regression.
But, a time-series model and auto-regression on ordered variables with covariables in the model is a challenge and have not been published as I know.

Happy to help you muddle through and figure it out but itâ€™d be useful if you could provide some synthetic data for your model with known parameters that we can try to recover. It isnâ€™t obvious what (if anything) is the problem with the model at the moment.

Your help would be great and appreciate it.
I will prepare and will post here step by step to explain and share what I have done.
In advance, as I am using ordinal variables and monotonic effects, I used brms in STAN by make_stan with the hope to incorporate the time series.
I will post the syntax and a generative data soon. Thanks again for showing interest.

Hello,
Attached is a generic file.
I want to get time series for a model that also has co-variables and the response and co-variables are ordinal.
here

``````e4~ e3+e2+e1+hr1+hr2+hr3
``````

generic.csv (2.9 KB)

1 Like

I defined the data for Stan as below:

``````data_csv<-read.csv("generice.csv",header=TRUE,sep=",")

data_csv<-na.omit(data_csv)

offset <- 2 # column number we ignore to start with in data.frame
imo <- as.integer(27)
jmo <- array(NA, imo)

for(i in 1:imo){
jmo[i] <- max(data_csv[,i + offset])   ## i+1 as the first column is idno
}

nth <- as.integer(6)

gen_con_simo <- function(length){
return(rep(c(2), length))
}

######

data <-list (
N = nrow(data_csv),
Y = as.array(data_csv[, "e4"]),
nthres = nth,
K = imo +1,
X = as.matrix(data_csv[, (1 + offset):(imo + offset + 1)], rownames.force =
NA),
Ksp = imo,
Imo = imo,
Jmo = jmo,
disc =as.integer(1),
prior_only = 1
)
for(i in 1:imo) {
ind <- i + offset
data[[paste("Xmo_", as.character(i), sep = '')]] = array(data_csv[, ind])
data[[paste("con_simo_", as.character(i), sep = '')]] = gen_con_simo(jmo[i])
}
``````
``````functions {
/* cumulative-logit log-PDF for a single response
* Args:
*   y: response category
*   mu: latent mean parameter
*   disc: discrimination parameter
*   thres: ordinal thresholds
* Returns:
*   a scalar to be added to the log posterior
*/
real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
int nthres = num_elements(thres);
real p;
if (y == 1) {
p = inv_logit(disc * (thres[1] - mu));
} else if (y == nthres + 1) {
p = 1 - inv_logit(disc * (thres[nthres] - mu));
} else {
p = inv_logit(disc * (thres[y] - mu)) -
inv_logit(disc * (thres[y - 1] - mu));
}
return log(p);
}
/* cumulative-logit log-PDF for a single response and merged thresholds
* Args:
*   y: response category
*   mu: latent mean parameter
*   disc: discrimination parameter
*   thres: vector of merged ordinal thresholds
*   j: start and end index for the applid threshold within 'thres'
* Returns:
*   a scalar to be added to the log posterior
*/
real cumulative_logit_merged_lpmf(int y, real mu, real disc, vector thres, int[] j) {
return cumulative_logit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
}

/* compute monotonic effects
* Args:
*   scale: a simplex parameter
*   i: index to sum over the simplex
* Returns:
*   a scalar between 0 and 1
*/
real mo(vector scale, int i) {
if (i == 0) {
return 0;
} else {
return rows(scale) * sum(scale[1:i]);
}
}
}
data {
int a;
int b[1];

int<lower=1> N;  // number of observations
int Y[N];  // response variable
int<lower=2> nthres;  // number of thresholds
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int<lower=1> Ksp;  // number of special effects terms
int<lower=1> Imo;  // number of monotonic variables
int<lower=1> Jmo[Imo];  // length of simplexes
// monotonic variables
int Xmo_1[N];
int Xmo_2[N];
int Xmo_3[N];
int Xmo_4[N];
int Xmo_5[N];
int Xmo_6[N];

// prior concentration of monotonic simplexes
vector[Jmo[1]] con_simo_1;
vector[Jmo[2]] con_simo_2;
vector[Jmo[3]] con_simo_3;
vector[Jmo[4]] con_simo_4;
vector[Jmo[5]] con_simo_5;
vector[Jmo[6]] con_simo_6;

real<lower=0> disc;  // discrimination parameters
int prior_only;  // should the likelihood be ignored?
}

transformed data {
int Kc = K;
matrix[N, Kc] Xc;  // centered version of X
vector[Kc] means_X;  // column means of X before centering
for (i in 1:K) {
means_X[i] = mean(X[, i]);
Xc[, i] = X[, i] - means_X[i];
}
}

parameters {
vector[Kc] b;  // population-level effects
ordered[nthres] Intercept;  // temporary thresholds for centered predictors
vector[Ksp] bsp;  // special effects coefficients
// simplexes of monotonic effects
simplex[Jmo[1]] simo_1;
simplex[Jmo[2]] simo_2;
simplex[Jmo[3]] simo_3;
simplex[Jmo[4]] simo_4;
simplex[Jmo[5]] simo_5;
simplex[Jmo[6]] simo_6;
}
transformed parameters {
}
model {
// initialize linear predictor term
vector[N] mu = Xc * b;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += (bsp[1]) * mo(simo_1, Xmo_1[n]) + (bsp[2]) * mo(simo_2, Xmo_2[n]) + (bsp[3]) * mo(simo_3, Xmo_3[n]) + (bsp[4]) * mo(simo_4, Xmo_4[n]) + (bsp[5]) * mo(simo_5, Xmo_5[n]) + (bsp[6]) * mo(simo_6, Xmo_6[n]));
}
// priors including all constants
target += normal_lpdf(b | 0,1);
target += normal_lpdf(Intercept | 0,10);
target += normal_lpdf(bsp | 0,1);
target += dirichlet_lpdf(simo_1 | con_simo_1);
target += dirichlet_lpdf(simo_2 | con_simo_2);
target += dirichlet_lpdf(simo_3 | con_simo_3);
target += dirichlet_lpdf(simo_4 | con_simo_4);
target += dirichlet_lpdf(simo_5 | con_simo_5);
target += dirichlet_lpdf(simo_6 | con_simo_6);
// likelihood including all constants
if (!prior_only) {
for (n in 1:N) {
target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
}
}
}
generated quantities {
// compute actual thresholds
vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
}
``````

I ran the ordinal regression in brms and tried to convert the syntax to stan to apply time-series autoregressive with 4 time points.
Here are the syntax I used without time-series.
Thanks,
Shabnam

1 Like

Good day,

First question. Are trying multivariate or univariate models? Cause this code you post looks for one dimension, but the diagram above is for more dimensions.

Second question, your data is integer (1,2,3,4,â€¦) or boolean ([0,1] data)?

That will help me to understand better your code purpose. :)

Thanks for pitching in. Sheâ€™s included generic.csv above, which is a sample sheâ€™d like to model. Otherwise, have a look at the â€śdataâ€ť section of the model with regards to the type of each variable.

1 Like

OK I will update my question.

Can you please give me more info of the goal model? Cause the diagram looks like a order logistic VAR model, which I donâ€™t have any references or find anything useful to start with so I can be updated to the current workflow.

The Stan model written above looks like a univariate autoregressive logit model but it has some additional updates like thresholds adapted to ordinal data that I do not really follow.

So every tip or external info you had will make easier to me to catch up and understand the stancode purpose.

Hello,
I mainly need time-series autocorrelation for ordinal regression.
I already could apply ordinal regression for multivariate using brms.
Letâ€™s say e4 ~ e3+ e2+ e1
I need to get an autoregressive value for e.
The data is ordinal, not boolean.
Thanks,
Shabnam

The code I put here is for multivariate ordinal regression. I tried to transfer brms code to Stan code to apply time series to ordinal regression. I thought I need to change the looping and the equation in the model part, where the confusion stopped me. Please let me know if you have any question.
Thanks again.

HI! I have a proposal on how to do a univariate ordinal regression with an autoregressive component. I use the ideas of this article, and the Stan UserÂ´s guide.

My proposal code is this one: ordered_dynamic_regression_try.stan (951 Bytes) .

I didnâ€™t try to change your code cause it has some extras that I donâ€™t understand too well, but if you are satisfied with this code you can make the updates with no problem.

For the multivariate part, If you have the base multivariate ordinal regression that brms provides I can try to incorporate the autoregressive matrix, but basically is the same idea as this code.

There are some theoric components on the air, but we can discuss it and try to solve them later. Just tell me if this is on the right track and good luck. :)

I donâ€™t understand this part too well:

The code I put here is for multivariate ordinal regression.

Because the response variable in the code above is a univariate vector:

It confuses me, and thatâ€™s why I just try a simple model first, and this can give you the idea of the modelâ€™s big structure. Then you can add extra features to adjust the modelâ€™s fit. :)

Hello,
Thanks a lot for sharing. I use your code with a univariate model and will try a multivaraite model. I get back to you here.
Thanks again for your kind help.
Shabnam

What I did in brms was just a lagged model for one outcome. It is very different from a dynamic model that is in the diagram. I wanted to figure out hos to use STAN code first.
Modeling the dynamic part needs looping including the auto-regression and lagged parts. The article you shared is useful. I just need to figure out the coding. I will try your code and incorporate the looping part. Thanks again.

Yes I think a exponential family DLM will be a better approach than a logistic VAR regression! The theory in the first model is better fundamented and less constrained!

My concern with the logistic VAR is:
A glm ARÂ§ model is just adding the AR part to the regression and then apply an inverse transformation to recover the parameter. This way doesnot guaranty low dependency in the data so the estimation could be not reliable. And not to mention that generalized to a multivariate outcome could be hard

Because we use a latent variable as a linearity assumption, I assume VAR should be OK. But, maybe I misunderstood.
DLM is not in STAN, is there any other package or I should use STAN functions to develop it?
Thanks,