Unobserved Parameters & Multilevel Priors

I am following a Bayesian Learning Model, where consumers are assumed to hold some beliefs, and updated their beliefs as a Bayesian learner. At each decision point, they make decisions based on their beliefs.
The model has the following hierarchy:
image

where Q_ijt represents the mean of the belief distribution which follows a normal distribution. The issue here is that at any point of time t is Q_ijt unobserved. The typical MLE approach is to simulate Q_ijt, using simulated maximum likelihood. I wonder in the Bayesian approach, how should I deal with Q_ijt (how to construct the hierarchy)? Should I also use simulated data as data augmentation and then for each individual, compute the mean likelihood and add to the target+? or using transformed data/parameters?

parameters{
    real<lower=0> M;
    real<lower=0> V;
}

transformed parameters {
   array[nupdt] real Mit; // this is the unobserved parameter
}

model{
    M ~ uniform(1,4);
    V ~ uniform(1,4);

    int ob;
    ob=0;
    int upt;
    upt = 0;

    for (i in 1:ncust){
        int start_i;int end_i;int count_i;
        real mit0;real vit0;
        real mit1;real vit1;
        real mit_bar;real vit_bar;

        start_i = start[i];
        end_i = end[i];
        count_i = count[i];
        
        mit0 = 0.5;
        vit0 = 5;
            for (t in 1:count_i){
                ob +=1;
                int ini_it;
                int com_it;
                ini_it = Initiated[ob];
                com_it = Completed[ob];

                if (ini_it==1){
                    upt +=1;
                    vit1 = 1/(1/vit0 + ini_it/V);
                    vit_bar=ini_it*vit1^2/V;
                    mit_bar = vit1/vit0*mit0+ini_it*vit1/V*M; 
                    com_it ~ bernoulli_logit(M);

                    Mit[upt] ~ normal(mit_bar,vit_bar); //add the hierarchy by assuming a normal distribution
                } else{
                    vit1 = vit0;
                    mit1 = mit0;
                }

                ini_it ~ bernoulli_logit(Mit[upt]);
                mit0 = Mit[upt];
                vit0 = vit1;        
            }
    }
}

currently, I specify M_it (which is the Q_ijt) as transformed parameters, and then in model block, I specified the normal distribution. However, when I run this, it gives me the following error:
Chain [1] Rejecting initial value:
Chain [1] Error evaluating the log probability at the initial value.
Chain [1] Exception: normal_lpdf: Random variable is nan, but must be not nan! (in ‘het0804_2 copy.stan’, line 65, column 20 to column 55)

What should I do with this?

Thank you very much for your help!

Hi!

Not sure I understand your model fully and I’m not an expert but I think the rejection of initial values might be related to your constraints on M and V being different in the parameters block (greater than 0) to your priors (between 1 and 4). If the constraint is only to positive values but the prior gives zero mass outwith 1-4 then I can imagine you getting lots of impossible values. You could change the constraint in the parameter block or use a more flexible prior (uniforms are generally tricky to sample).

Hi, thank you very much for your help. So I changed the prior for M and V from uniform to normal distribution, and the model gets runing now even though the chain fails to converge.

Sorry for the confusion. I will try my best to clarify the model and problem. Roughly speaking, the context is as follows:
An individual makes a daily usage decision (ini_it = 1 or 0) based on an unobserved variable m_it, which is called his belief about the value from using the product. This belief is updated each time when the individual used the product based on some rules (Bayesian updating process). The main challenge is that this belief is not observed by researchers, so we have to integrate it out. Because this belief m_it is determined by m_i(t-1), so there is a chain in this unobserved parameter. And we can construct a hierarchy for m_it| m_i(t-1) in some way, and incorporate this into the model. The problem of treating m_it as parameters in stan is that the number of parameters needed to be estimated would grow considerately with the number of t (days) for each individual.

My question is, instead of directly adding m_it as a hierarchical level, can I directly use simulated data for m_it to avoid intergration and evaluate the loglikelihood?

So here is a simple version of my model:
what I did is to augment the likelihood with simulated data for integral, and for each individual, I compute their probability of the observed history, and then add to the target+.

will this work?

data{
    int <lower=0> ncust;
    int <lower=0> nobs;
    int <lower=0> nsimu;

    array[nobs] int<lower=0,upper=1> Initiated;
    array[nobs] int<lower=0,upper=15> Itd;
    array[nobs] vector[nsimu] Signals;
    array[ncust] int count;
}

parameters {
   real M;
   real <lower=0.01> V;
}

model{
    M ~ normal(2,2);
    V ~ lognormal(0,2);    

    int ob=1;

    for (i in 1:ncust){

        vector[nsimu] mit0 = rep_vector(0.4,nsimu); 
        vector[nsimu] vit0 = rep_vector(10,nsimu);
        vector[nsimu] mit1;vector[nsimu] vit1;

        int count_i = count[i];
        //array[count_i] vector[nsimu] Mi;
        //array[count_i] vector[nsimu] Vi;
        
        array[count_i] row_vector[nsimu] Prob;

        for (t in 1:count_i){
            int Initiated_it= Initiated[ob];
            for (s in 1:nsimu){
                Prob[t,s] = bernoulli_logit_lpmf(Initiated[ob]| mit0[s]);
                //print("prob_it: ",Prob[t,s]);
            }

            vector[nsimu] signal_it = Signals[ob]*sqrt(V)+M;
            vit1 = 1/(1/vit0+Initiated_it/V);
            //print("vit1:",vit1);
            mit1 = (inv(vit0).*mit0+Initiated_it*signal_it/V);
            mit1 = mit1.*vit1;
            //print("mit1:",mit1);
            mit0 = mit1;
            vit0 = vit1;

            ob +=1;
        }
        array[nsimu] real Prob_a;
        for (t in 1:count_i){
            for (s in 1:nsimu){
                Prob_a[s] += Prob[t,s];
            }
        }
        Prob_a = exp(Prob_a);
        
        target += log(mean(Prob_a));

    }
}

and here is code for the whole input data:

import numpy as np
import scipy.optimize as optimize
import matplotlib.pyplot as plt
import pandas as pd
#from numpyro.distributions import LKJ
from jax import random
rng_key = random.PRNGKey(0)

from cmdstanpy import CmdStanModel
import time
import arviz as az
class StopWatch:
    def __init__(self):
        self.start()
    def start(self):
        self.start_time = time.time()
    def elapsed(self):
        return time.time() - self.start_time
timer = StopWatch()

import seaborn as sns
np.random.seed(123)

ncust = 200
triallen = np.random.choice([7,15],p=[0.5,0.5],size=ncust)
M = 0.5;V=4
inibelief=[1,10]
Signals = np.random.normal(loc=M,scale=np.sqrt(V),size=(ncust,15))

Ms = []; Vs=[]; Uids=[]; Itd=[]; Initiated=[]
for i in np.arange(ncust):
    signals_i = Signals[i]
    triallen_i = triallen[i]

    mit0 = inibelief[0];vit0 = inibelief[1]

    for t in np.arange(triallen_i):
        util_initiated_itd = np.zeros(2)
        util_initiated_itd[1] = mit0
        prob_initiated_itd = np.exp(util_initiated_itd)/np.exp(util_initiated_itd).sum()
        initiated_itd = np.random.choice(range(2),p=prob_initiated_itd)

        Initiated = Initiated + [initiated_itd]
        vit1 = 1/(1/vit0+initiated_itd/V)
        mit1 = ((1/vit0)*mit0+initiated_itd*signals_i[t]/V)*vit1
        Ms = Ms + [mit1]
        Vs = Vs + [vit1]

        vit0 = vit1
        mit0 = mit1
    Uids = Uids + [i]*triallen_i
    Itd = Itd + np.arange(triallen_i).tolist()

df_use = {'Uids':Uids,'Itd':Itd,'Initiated':Initiated,'Ms':Ms,'Vs':Vs}

df_use = pd.DataFrame(df_use)

nsimu = 50
Signals = np.random.normal(loc=0,scale=1,size=(df_use.shape[0],nsimu))

data = {
    'ncust':ncust,
    'nobs':df_use.shape[0],
    'nsimu':nsimu,

    'Initiated':Initiated,
    'Itd':Itd,
    'Signals':Signals,
    'count':triallen
}