Sub-vectors of random variables with predefined vector of indices

#1

Dear All,

I am using Stan of a couple of months, and I hope I am not asking for too much by posting this problem here.

I am trying to create a sub-vector of random variables that has known vector of indices IDX_1.
The error message I am getting is “Cannot assign to function argument variables.”, and I would think that this is caused by the declaration of IDX_1 as a vector.

I am convinced that this is not the correct way of creating such sub-vectors, and at the same time I could not find any example that would show how to resolve this problem properly. Would you mind sharing your experience with Stan and present some solution to this problem?

Thank you in advance,
Peter

Please share your Stan program and accompanying data if possible.


data {
int<lower=1> N; // length of the data
vector[N] Factor1_est_0; // mu of factor 1
vector[N] Factor1_sd_0; // sd of factor 1
int<lower=1> N1; // length of subset of factor 1
vector[N1] IDX_1; // indices of subset of factor 1

}
parameters {
vector [N] Factor1_0; // unknown true value of factor 1
vector [N1] Factor1_1; // subvector of unknown true value of Factor1_0

}
model {
for(i in 1:N){
Factor1_0[i]~normal(Factor1_est_0[i], Factor1_sd_0[i]);
}
Factor1_1=Factor1_0[IDX_1]; // here’s the problem

}


0 Likes

#2

Indices need to be integers.

Try:

int<lower = 1, upper = N> IDX_1[N1]
1 Like

#3

Hello,

Thank you very much for help. This makes perfect sense. I am still getting some problems with assigning the filtered vector Factor1_0[IDX_1] to Factor1_1, but I think this should not be in the model block, rather in the log likelihood function that I created.

Thanks a mil!
Peter

0 Likes

#4

Hello,

I am having difficulty with applying that. The code seems to be correct - I defined a function, which is a log likelihood, and specified vector of integers, and I am getting the following error message:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

error in ‘model252876226dbb_stanmodel’ at line 14, column 5

13:   vector Factor1_0,
14:   int<lower = 1, upper = N1> IDX_1,
        ^
15:   //vector IDX_1,

Here’s the pseudo code:

FUNCT <- "functions {
real logLikelihood (
int N1,
vector Factor1_0,
int<lower = 1, upper = N1> IDX_1,

)
{
row_vector[N1] Factor1_1;
Factor1_1=Factor1_0[IDX_1];

}
"
Thanks in advance
Peter

0 Likes

#5

Can you post full code? It makes it easier to find what’s going wrong (cause I can just try to compile it).

1 Like

#6

Lower bounds and upper bounds are not allowed in function declarations.

1 Like

#7

Hello,
Thank you very much for your help and your time.
I simplified the problem to two toy examples, the first one doesn’t include any generation of sub-vectors within the model, but it shows that he code works (at least sometimes as this is a toy example).

The objective is to have some IHPP of regression type with error in variables (Facotor_1). I can send R files if needed.

rm(list = ls())
library(rstan)

Simulate some dummy data --------------------------------------------------

N0=500
N1=150
Factor1_mu_0=runif(N0,1,2)
Factor1_sd_0=rep(0.2,length(Factor1_mu_0))
IDX_1=sort(sample(1:N0,N1))
Factor1_1=Factor1_mu_0[IDX_1]

Toy Stan model - working -----------------------------------------------------

MODEL_SPEC <- ’
functions {
real logLikelihood (
int N0,
int N1,
vector Factor1_0,
vector Factor1_1,
real beta
)
{
real logLik;
logLik=sum(Factor1_1beta)-sum(betaFactor1_0);
return (logLik);
}
}
data {
int<lower=0> N0;
int<lower=1> N1;
vector[N0] Factor1_mu_0;
vector[N0] Factor1_sd_0;
vector[N1] Factor1_1;
}
parameters {
real<lower=0> beta;
vector[N0] Factor1_0;
}
model {
for(i in 1:N0){
Factor1_0[i]~normal(Factor1_mu_0[i], Factor1_sd_0[i]);
}
beta~normal(0,2);
target +=logLikelihood (N0,N1,Factor1_0,Factor1_1,beta);
}

MODEL = stan_model(model_code = MODEL_SPEC, model_name=“stanmodel”)
MODEL_SIM = sampling(MODEL,
data = list(N0 = N0, N1 = N1, Factor1_mu_0= Factor1_mu_0,Factor1_sd_0=Factor1_sd_0, Factor1_1= Factor1_1),
iter = 5000, warmup = 1000, chains = 1, cores = 1,control = list(adapt_delta = 0.80))
MODEL_POST = extract(MODEL_SIM)
par(mfrow=c(1,2))
hist(MODEL_POST$beta,xlab= expression(beta),main=’’)
plot(MODEL_POST$beta,xlab= expression(beta),main=’’,ylab=‘trace’, type=‘l’)

and here’s the model that includes sub-vectors within the model (i excluded form the code simulation of the data):

Toy Stan model -not working -----------------------------------------------

MODEL_SPEC <- ’
functions {
real logLikelihood (
int N0,
int N1,
vector Factor1_0,
int<lower = 1, upper = N1> IDX_1, // 1 difference
real beta
)
{
row_vector[N1] Factor1_1; // 2 difference
Factor1_1=Factor1_0[IDX_1]; // 3 difference
real logLik;
logLik=sum(Factor1_1beta)-sum(betaFactor1_0);
return (logLik);
}
}
data {
int<lower=0> N0;
int<lower=1> N1;
vector[N0] Factor1_mu_0;
vector[N0] Factor1_sd_0;
int<lower = 1, upper = N1> IDX_1; // 4 difference
}
parameters {
real<lower=0> beta;
vector[N0] Factor1_0;
}
model {
for(i in 1:N0){
Factor1_0[i]~normal(Factor1_mu_0[i], Factor1_sd_0[i]);
}
beta~lognormal(0,2);
target +=logLikelihood (N0,N1,Factor1_0,IDX_1,beta); // 5 difference
}

MODEL = stan_model(model_code = MODEL_SPEC, model_name=“stanmodel”)
MODEL_SIM = sampling(MODEL,
data = list(N0 = N0, N1 = N1, Factor1_mu_0= Factor1_mu_0,Factor1_sd_0=Factor1_sd_0, IDX_1= IDX_1),
iter = 5000, warmup = 1000, chains = 1, cores = 1,control = list(adapt_delta = 0.80))
MODEL_POST = extract(MODEL_SIM)
par(mfrow=c(1,2))
hist(MODEL_POST$beta,xlab= expression(beta),main=’’)
plot(MODEL_POST$beta,xlab= expression(beta),main=’’,ylab=‘trace’, type=‘l’)

Thank you very much!

Kind Regards
Peter

0 Likes

#8

Hello,

Thank you, for your reference. I believe the problem is outside of defining bounds, but I can be wrong here.

Best RGeards
Peter

0 Likes

#9

Surround code clips with three backticks “`” on both sides to get code formatting to work and then it’ll be easier to read on here.

You have to declare variables at the top of the code block they’re in (real logLik should go one line up)

For the assignment Factor1_1=Factor1_0[IDX_1]; to work, if Factor1_0 is a vector, then IDX_1 should be an array of integers (declared as int[] in the arguments list) and Factor1_1 should be a vector instead of a row_vector

Here’s a function that will compile (but I’m not sure if it’s doing the right thing):

  real logLikelihood (int N0, int N1, vector Factor1_0, int[] IDX_1, real beta) {
    vector[N1] Factor1_1; // 2 difference
    real logLik;
    Factor1_1=Factor1_0[IDX_1]; // 3 difference
    logLik=sum(Factor1_1 * beta)-sum(beta * Factor1_0);
    return (logLik);
  }

To get your full model to compile you’ll also need to declare IDX_1 to be an array of integers – if it’s not an array then this isn’t quite the code you want.

1 Like

#10

Hello,

Thank you very much for your code, time and proposed solution. I tested it, and it did not work, however I think I understand why this is happening.

Vectors of integers in Stan can be defined in data block - this part works very well, and apparently there is a bug in Stan that prevents defining vectors of integers in block of user defined functions.

If the above is correct (and everything tells me it is), this would imply that given Factor1_0~normal(mu1[i], sd1[i]), creating a sub-vector of Factor1_0 can be only done in the model block.
The only problem with this is that (as far as I can see/understand) we can’t assign variables “=” in the model block. I mean we can’t do anything like Factor1_1[j]=Factor1_0[IDX_1[j]].

If this is correct, the only solution to this problem is iterating in the model bloc over:
Factor1_1[j]~normal(Factor1_0[IDX_1[i]], 0.00000000001)

I am wondering if this would be in line with your understanding as well.

Many thanks for your help, I appreciate it very much!

Kind Regards
Peter Franke

0 Likes

#11

The error messages can be difficult to work through. I had this same thought when I was working on the previous thing, but eventually worked through all the other issues and got things moving. I do not think there is a bug here, though the error messages can be strange. Here is an example model:

functions {
  vector myFunc(vector v, int[] index) {
    vector[size(index)] out;
    out = v[index];
    return out;
  }
}

transformed data {
  vector[3] myData = [ 1, 2, 3 ]';
  int index[3] = { 3, 2, 1 };
}

parameters {
  real x;
}

model {
  print(myData, index);
  print(myFunc(myData, index));
  
  x ~ normal(0, 1);
}

If you save this in the file “test_indexing.stan”, you can run it in Rstan with:

library(rstan)
model = stan_model("test_indexing.stan")
sampling(model, iter = 1, chains = 1)

And the output will include a bunch of:

[1,2,3][3,2,1]
[3,2,1]

blocks of text.

The only thing you can assign in the model block are variables declared in the model block, but if Factor1_1 was defined in the model block, it is assignable.

This is not assignment. Sampling statements increment the log density. This translates to:

target += normal_lpdf(Factor1_1[j] - Factor1_0[IDX_1[i]], 0.00000000001);

Just to be clear, this is not randomly sampling a random number and assigning it to Factor1_1.

Can you post your model with the function in place and the error message you’re getting?

0 Likes