There are 5 stocks, and based on historical data, the winning ratio of these 5 stocks ( the stock with the highest daily return is considered the winner) is 1:(1 - \beta):(1 - 2\beta):2\beta:\beta. The past 250 days of data shows a performance of (84,64,48,39,15). Based on this data, the task is to estimate posterior \beta, which prior distribution is uniform distribution between 0 and 0.5.
The specific approach to solve this problem is to define a custom probability density function, i.e.
X = (X_1,X_2,X_3,X_4,X_5) ~ \sim (\frac{1}{3})^{X_{1}}(\frac{1-\beta}{3})^{X_2}(\frac{1-2\beta}{3})^{X_2}(\frac{2\beta}{3})^{X_4}(\frac{\beta}{3})^{X_5}
Furthermore, in the coding, I have considered two methods:
Method 1: Treating the 250 days of data as 250 individual samples, the data is transformed as follows:Get the reasonable parameter estimation result I expected.
(1,0,0,0,0) , stock1,84 days
…
(0,1,0,0,0) ,stock2,64 days
…
(0,0,1,0,0), stock3, 48 days
…
(0,0,0,1,0) ,stock4, 39 days
…
(0,0,0,0,1), stock5, 15 days
The stand code is:
functions {
real multinomialCustom_lpdf(real[] aX,real abeta){
return(
aX[1]*log(1.0/3) +
aX[2]*log((1-abeta)/3.0) +
aX[3]*log((1-2*abeta)/3.0) +
aX[4]*log(2*abeta/3.0) +
aX[5]*log(abeta/3.0)
);
}
}
data {
int<lower=1> N;
int<lower = 1> K;
real<lower = 0>y[N,K];
}
parameters {
real<lower=0,upper = 0.5> abeta;
}
model {
for(i in 1:N){
y[i] ~ multinomialCustom(abeta);
}
//y ~ multinomialCustom(abeta);
}
The corresponding R code is:
multi_custom <- function(y) {
y_mat <- NULL
k = length(y)
N = sum(y)
for(i in 1:k){
a <- mat.or.vec(y[i],k)
a[,i] <- 1
y_mat <- rbind(y_mat,a)
}
y_mat <- y_mat[sample(1:N,size = N,replace = FALSE),]
return(y_mat)
}
beta_fit1 <- stan(
file = "multinomial.stan",
data = list(N = 250,K = 5,y = multi_custom(y = c(84,64,48,39,15))
), # named list of data
chains = 4, # number of Markov chains
warmup = 5000, # number of warmup iterations per chain
iter = 20000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 1, # no progress shown
init_r = 0.25 )
Method 2: Considering the 250 days of data as only one sample,
(x_1,x_2,x_3,x_4,x_5) = (84,64,48,39,15)
The stan code:
functions {
real multinomialCustom_lpdf(real[] aX,real abeta){
return(
aX[1]*log(1/3) +
aX[2]*log((1-abeta)/3) +
aX[3]*log((1-2*abeta)/3) +
aX[4]*log(2*abeta/3) +
aX[5]*log(abeta/3)
);
}
}
data {
int<lower = 1> K;
real<lower = 0> y[K];
}
parameters {
real<lower = 0,upper = 0.5> abeta;
}
model {
// prior
abeta ~ uniform(0, 0.5);
// likehood
target += multinomialCustom_lpdf(y|abeta);
target += log(2.0);
}
beta_fit2 <- stan(
file = "multinomial1.stan",
data = list(K = 5,y = c(84,64,48,39,15)),
chains = 4, # number of Markov chains
warmup = 5000, # number of warmup iterations per chain
iter = 20000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 1 # no progress shown
)
However, I am confused because Method 2 is not successful and results in an error of
Chain 4: Log probability evaluates to log(0), i.e. negative infinity.
Chain 4: Stan can't start sampling from this initial value.
Chain 4: Initialization between (-2, 2) failed after 100 attempts.
Chain 4: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error : Initialization failed."
error occurred during calling the sampler; sampling not done
here are whatever error messages were returned
Therefore, I would greatly appreciate any help or guidance.