# Modeling some components of a vector that are equal

I am interested by modeling the force of infection for a given disease: \lambda(a,t) is the force of infection for people of age a and at time t. This is the instantaneous per capita rate at which susceptible individuals of age a acquire infection at time t. Ferguson et al. (1999) explained that Usually the proportion seropositive increases with age, in which case the rate of increase with age can be interpreted as a measure of the ‘intensity’ of infection experiences by the population in the past. Specifically, a large increase in seroprevalence between age a and a+1 could result from a high risk of infection which was specific to that age window but constant through time, or from an age-independent risk that was specific to the time period between a and a+1 years ago, or from any combination of the two. Age- and time-specific effects are invariably confounded in the absence of other data.
Concerning my modeling, I assume that the force of infection depends on time but not on age: \lambda(a,t)=\lambda(t), and that it is piecewise-constant with one value per year: \lambda(t)=\lambda_t. I also assume that the force of infection is constant for years before the 62th year before the time of observation: \lambda_{62}=\lambda_{63}=\ldots,\lambda_{N}, N being the age of the oldest person in my sample. Indeed, the number of individuals aged of more than 62 years is quite small in our sample, we do not have enough data to estimate the force of infection yearly for these years. Eventually, I have some other assumptions I do not write here, as they are not necessary to understand the model.

The data at hand consists of a sample of individuals taken in the whole population of interest, and for each sampled individual we have its age and its status according to the disease: seronegative or seropositive. Hence we can count the number of individuals in the sample of a given age, and the number of indivudals in the sample of a given age that are seronegative. We then have:
n a vector, with n[a] giving the number of individuals in our dataset of age a at the time of observation.
n0 a vector, with n0[a] giving the number of individuals in our dataset of age a that are seronegative at the time of observation.
The parameter of interest is lambda a vector, with lambda[a] giving the force of infection for the a^{th} year before the time of observation.

If we are interested by the status of only one individual of age a at time of observation, the random variable “The individual is seronegative” follows a Bernoulli distribution, and the probability for the individual to be seronegative is the probability to not have been infected the years 1,2,3,\ldots, a before the observation, which is:
e^{-\int_1^a \lambda(u)du}=e^{-\sum_1^a \lambda_j} = e^{-\theta_a} = \delta_a.

If we are interested by the random variable n0[a] that represents “The number of individuals of age a at time of observation which are seronegative among n[a] individuals”, we then look at the sum of n[a] independent and identical Bernoulli variables: it follows a binomial distribution with parameters n[a] and \delta_a.

My stan model is then the following, with prior distributions for the logit of the components of \lambda that are Normal(0,10).

// The input data:
// 'n0' and 'n'  of length 'N' the number of ages observed in the sample.

it is bewitching
Autres solutions :
data {
int<lower=1> N;
array[N] int<lower=0> n0; //int<lower=0> n0[N];
array[N] int<lower=0> n;//int<lower=0> n[N];
}
// The parameters of the model.
parameters {
vector<lower=0,upper=1>[N] lambda;
real temp;
}
// Transformed parameters. Here theta and delta calculated from lambda
transformed parameters {
vector[N] theta;
vector[N] delta;
//  real delta[N];
theta = cumulative_sum(lambda);
delta = exp(-theta);
}
// The model to be estimated.
model {
vector[N] logitlambda;
logitlambda = logit(lambda);
// prior
for (i in 1:61){
logitlambda[i] ~ normal(0, 10);
}
temp ~ normal(0, 10);
for (i in 62:N){
logitlambda[i] = temp;
}
n0 ~ binomial(n,delta);
}


Using cmdstanr, this model is compiling and I do not have any error when fitting it with the following simulated data:

data_list <- list(N = 95, n0 = c(6,4,5,6,5,1,5,4,7,1,5,7,4,6,8,5,8,5,16,8,16,37,25,16,16,15,21,21,21,22,23,23,26,11,17,15,16,25,10,14,13,
18,15,12,12,12,13,12,12,12,14,15,6,12,8,9,9,6,12,3,13,16,4,8,3,6,4,3,1,4,5,6,2,2,2,3,0,2,2,0,1,2,0,0,1,0,0,0,0,0,0,0,0,0,0),n=c(6,4,5,6,6,
1,6,4,8,1,6,9,5,8,10,7,10,7,22,11,22,52,34,22,23,22,30,31,30,32,34,34,39,17,27,24,26,40,16,22,22,30,25,21,20,22,24,22,22,21,26,27,10,22,14,
17,16,11,23,5,25,33,9,16,7,13,8,6,3,9,11,13,4,5,4,8,1,5,4,0,3,6,1,0,2,1,0,0,1,0,0,1,0,0,1))

fit <- mod2\$sample(
data = data_list,
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500
)


But there is something I do not understand: I want all the components of lambda (and hence of logitlambda) to be the same from the 62th to the last one (N=95). That is why I wrote in my model

temp ~ normal(0, 10);
for (i in 62:N){
logitlambda[i] = temp;
}


But when I look at the output, I do not have the same estimates for the components 62 to 95 (and they are quite different from one component to another).
Hence I probably did not write my model correctly. The problem seems to be with the temp parameter: I actualise it at each iteration and ask for all the components of \lambda to be equal to temp from the 62th to the last one, but it is not working.
I could try a model that focus only on the first 61 components of \lambda, dropping the last components that are assumed equal. It would be probably easier. But It would mean that I focus on the first 61 components of n0 and n, hence dropping all the information given by the individuals older than 62 in my sample. It would be a pity, as including these individuals in my sample took time and money. Moreover, even if they are not numerous, they can bring us some information. That is why I would prefer keeping all the components of my \lambda vector, assuming that the components from the 62th to the last one are equal.

See my response to your previous post, here.

I would suggest you only specify 62 elements of logitlambda, then copy the last element to logit[62:N]. This should achieve what you are looking for.

parameters{
// 1. specify logitlambda as a parameter
// 2. only specify 62 elements of logitlambda
vector[62] logitlambda;
}
transformed parameters{
// 3. treat lambda as a transformed parameter
vector[N] lambda;

lambda[1:61] = inv_logit(logitlambda[1:61]);
// 4. Assign the last element of logitlambda to the last 62:N elements of lambda
lambda[62:N] = inv_logit(logitlambda[62]);
}
model {
// 5. Put a prior on logitlambda; no need for temp
logitlambda ~ normal(0, 10);
}


As a side-note, transformed parameters in the model block are not saved. If you want to analyze them after drawing samples, you need to either put them in the transformed parameters block or generated quantities block.

1 Like

It seems to work with this model:

// The input data:
// 'n0' an array of integers giving the numbers of seronegative persons of the different ages in the sample,
// 'n' an array if integers giving the numbers of persons of the different ages in the sample,
// 'n0' and 'n'  of length 'N' the number of ages observed in the sample.
data {
int<lower=62> N;
array[N] int<lower=0> n0; //int<lower=0> n0[N];
array[N] int<lower=0> n;//int<lower=0> n[N];
}

parameters {
// specify logitlambda as a parameter
// only specify 62 elements of logitlambda
vector[62] logitlambda;
}

transformed parameters {
// treat lambda as a transformed parameter and assign the last element of logitlambda to the last 62:N elements of lambda
vector[N] lambda;
lambda[1:61] = inv_logit(logitlambda[1:61]);
for (i in 62:N){
lambda[i] = inv_logit(logitlambda[62]);
}
vector[N] theta;
vector[N] delta;
theta = cumulative_sum(lambda);
delta = exp(-theta);
}

model {
for (i in 1:62){
logitlambda[i] ~ normal(0, 10);
}
n0 ~ binomial(n,delta);
}


The results obtained with simulations are not good yet, but I am working on it!

Do you mean to do something like this?

transformed parameters {
// lambda = probability of infection at time T | uninfected at time T-1 (conditional probability of infection)
vector[N] lambda;
// theta = logged-probability of being infected BY time T (cumulative probability of infection)
vector[N] theta;
// delta = probability of being uninfected BY time T (cumulative probability of non-infection)
vector[N] delta;

lambda[1:61] = inv_logit(logitlambda[1:61]);
for (i in 62:N){
lambda[i] = inv_logit(logitlambda[62]);
}

theta = cumulative_sum(log(lambda));
delta = 1 - exp(theta);
}


Not related to your question, but your statement quoted above made me trip: why is lambda logit-transformed, i.e., bounded between zero and one? It is a rate parameter, so should only be positively constrained (or only log-transformed, not logit). Also, did you check how much probability mass a logit-normal(0,10) prior puts near zero and one? Is that really your prior expectation?

Yes, exactly!

You are right. lambda is an instantaneous per capita rate at which susceptible individuals acquire infection. For example, if there is x(a,t) susceptibles of age a at time t and x(a+1,t+1) susceptibles of age a+1 at time t+1, then lambda (per unit of time) is simply -ln[x(a+1,t+1)/x(a,t)] (see Grenfell and Anderson 1985). Concerning my prior, when I wrote the algorithm I did not think about it a lot and I simply took a prior that has been already used by other authors in my field of application. Now that my algorithm is working (I succeed to obtain good results on some simulations), I will try to think about it, and see if it has a lot of influence or not.