EBFMI (very) low, looking for reparameterization advice

Summary I have a model that is recovering parameters, has no divergences and good \hat{R}, but very low (<0.1) E-BFMI. Increasing warmup iterations moved E-BFMI up, but it is still < 0.1. I am looking for suggestions to reparametrize or otherwise address this, and/or advice on how to proceed. Thanks for taking a look!

The model

I have some response time data, and I want to model each response time as a reaction to an unobserved event. The probability of the unobserved event happening at time t is proportional to a normal cumulative density function. I am modeling the reaction time (i.e. how much time passes between the unobserved event and when the button is actually pressed) as lognormal.

\begin{align} rt[i] &= E[i] + d_{rt}[i] \\ E[i] &< rt[i] \\ E &\sim g(\mu_a, \sigma_a) \\ log(d_{rt}) &\sim \mathcal{N}(\mu_r, \sigma_r) \end{align}

where

g(x | \mu_a, \sigma_a) \propto Normal CDF(x, \mu_a, \sigma_a) - Normal CDF(0, \mu_a, \sigma_a)

I have medium-strong expectations about reasonable values for the mus and sigmas based on typical reaction time distributions and the construction of my experiment:
\mu_a \sim \mathcal{N}(1.35, 0.4)
\sigma_a \sim \mathcal{N}^+(0, 1)
\mu_r \sim \mathcal{N}(log(0.35), 0.2)
\sigma_r \sim \mathcal{N}^+(0, 0.2)

I’m simulating about 900 observations on rt, which is about how much data I have.

Code for the model (it uses reduce_sum so maybe isn’t the most readable) and a generator are attached:
antic_only_analysis.stan (4.6 KB)
antic_only_gen.stan (1.4 KB)

The Problem

I managed to simulate and get fairly quick and trouble-free fitting on simpler models. In one, I just simulated E and recover \mu_a and \sigma_a. Similarly, I can simulate d_{rt} and recover \mu_r and \sigma_r. When I fit their combination (as described above), I get low E-BFMI. I looked for guidance in Brief Guide to Stan’s Warnings, and at A Conceptual Introduction to Hamiltonian Monte Carlo. This helped me understand what was going on, but I am not sure of the best approach to fixing it.

Here’s the arviz.plot_energy() output:
image

And a pairs plot of energy__ and the parameters of the model:


lmu_react and lsig_react are \mu_r and \sigma_r, the parameters of the lognormal, and mu_antic, sig_antic are \mu_a, \sigma_a, respectively.

There is a strong correlation between \mu_a and \sigma_a as well as the correlation between energy__ and \sigma_r. The advice I have found suggests I should consider reparameterizing something to do with \sigma_r, but I am not sure where to begin with that.

Do I need to build a non-centered parameterization? Something like

...
parameters {
  vector[N] raw_log_drt;
  ...
} transformed parameters {
  vector[N] drt = exp(raw_log_drt*sigma + mu);
  ...
} model {
  raw_log_drt ~ std_normal();
  ...
}

I also considered using numeric integration to marginalize out the unobserved E, but that feels like a pretty terrible idea that might not address the BFMI problem anyway.

Having written all this down, I guess it’s not hard to try the non-centered approach I just described, so I’ll go ahead any try that to see if it helps. Maybe there’s some other problem lurking here, and perhaps you experts will find it. Thanks again for reading!

Edit: This solution is not as straightforward as I thought, because E[i] + d_rt[i] must add up to rt[i], so I can’t just do that proposed transformation.

I don’t think the correlation between mu_antic and sig_antic is the root of the issue; the variant of HMC used by Stan can handle correlations in the posterior just fine usually. I suspect something is up with E. Maybe add a few randomly-sampled entries in E to the pairs plots?

Thanks for taking a look!

That’s good to know.

Things get pretty tiny, here, but I couldn’t see anything in the pairs plot that looked too unusual, other than the relationship between energy and lsig_react looking at various random smatterings of E.

Stepping back a bit, are you familiar with the models of response time explored by the cognitive science literature? The model you’ve implemented is a bit unusual in attempting inference on a per-response value for E (which would be akin to a “non-decision time” or possibly “lapse time” in the cog sci world). Normally I’d say that a per-response parameter is going to experience serious identifiability issues, but the Informatjon you’re achieving through your CDF and priors pushes me into being less certain that this is an explicitly no-hope scenario. But I do suspect that if you didn’t outright need the per-response value of E and can work out the proper marginalization, you’d probably be best off doing the work to achieve that. But if you haven’t seen the stuff the cog sci folks use, I’d suggest starting with a lit review of those options first.

Hi @mike-lawrence thanks for the suggestions. I guess I’ll try to marginalize, because I really don’t need estimates about E.

Yes, I’m pretty familiar with various approaches to response time modeling, and without more context my model looks pretty bonkers. I’ll describe what I’m doing in a bit more detail, in case anyone is curious. The short version is that I’m aiming for a mixture model where some responses are ‘anticipatory’ and others are actual reactions. For now, I’m trying to get the anticipation part working (the other part is relatively easy).

I did an experiment in which participants knew to expect (and there was a visual indicator to this effect) that a stimulus would appear, and their task was to respond to the stimulus as quickly as possible, but to not respond before the stimulus appeared. Stimulus appearance time was normally distributed around a fixed time from the start of the trial. The experiment was about how response times change, and how many early responses we got based on whether we gave participants an explicit representation of the distribution of appearance times, or if we just showed them the average appearance time.

There were plenty of early responses, and there were more early responses when the stimulus appearance was ‘late’ relative to the mean appearance time.

So eventually I want to model my data as a mixture of reactions to the appearance of the target and as ‘anticipatory’ responses. With the model we’ve been discussing above, I am isolating the ‘anticipatory’ part for now.

Ah, neat, then you’re exploring stuff that’s in the same ballpark as my MSc work (Lawrence & Klein, 2013; JEP:Gen) and work that my advisor’s lab is continuing to do (they’re mostly doing elaborations of Kingstone’s cue-identity paradigm; ex here and here).

2 Likes

Is there a standard approach to modeling early responses in these temporal cuing paradigms, or are they typically just discarded? Our original plan was to just discard responses that were early, but there were so many obviously early responses we couldn’t ignore them, and there are plenty of responses that happen so soon after target onset that they mess up straightforward attempts to fit the RT distribution.

Using a simple cutoff and tossing RTs that are “too fast” is lamentably the most common thing (at least when I last had a good sense of the field, which is getting dated). But a proper mixture model would certainly be best.

So if I understand your paradigm, you should have data whereby you have a cued-time, a target time, and a response time, where the target times have a mean of the cued time but variability around that time so they can appear a bit ahead or behind. With simple detection as a response, you of course observe anticipations and wish to let your model capture those as such. Wouldn’t a simple mixture model suffice then?

For a single condition:

data{
    int N; 
    vector[N] cued_time ; // cued time of target relative to cue (i.e. all positive)
    vector[N] target_time ; // actual time of target relative to cue (i.e. all positive)
    vector[N] response_time ; // time of response *relative to cue* (i.e. all positive)
    int<lower=0,upper=1> r_before_t ; // 1 if response occurs before target (i.e. for sure anticipation)
}
transformed data{
    vector[N] response_time_to_target  = response_time - target_time ;
}
parameters{
    vector[2] mu ; // means (anticipation, non-anticipation)
    vector[2]<lower=0> sigma ; //sds (ditto)
    simplex[2] theta ; // mixing proportion
    vector[2] shift ; //shift of distributions
}
model{
    mu ~ std_normal() ;
    sigma ~ weibull(2,1) ; // peaked-away-from-zero (zero shouldn't be the most credible prior value for a noise parameter!)
    for(i in 1:N){
        if(r_before_t[i]==1){
            target +=  lognormal_lupdf( response_time[i] - shift[1] | mu[1] , sigma[1] ) ;
        }else{
           target += log_mix(
            theta
            , lognormal_lupdf( response_time[i] - shift[1] | mu[1] , sigma[1] ) ;
            , lognormal_lupdf( response_time_to_target[i] - shift[2] | mu[2] , sigma[2] ) 
          ) ;
    }
}

I’ve tweaked things such that both anticipations and real responses have a 3-parameter “shifted log-normal” distribution, and I always forget what scale the mean & sd are on for lognormals, so you’ll want to check that the priors I set up are sensible. You might need to constrain at least the anticipation shift to be smaller than the fastest clearly-an-anticipation response time.

Note that for multiple conditions, you’d then use cued_time as a covariate affecting everything (means, sds, mixing proportions). And now that I think of it, I don’t think the conditional using r_before_t is doing much besides improving efficiency very slightly.

It’s otherwise also not particularly compute-optimized; it may be more performant to collect the lupdf values in an array of vectors so that log_mix is called outside the loop as is done here. I’ve also coded an attempt at computing mixtures efficiently here but I did that before seeing the official example, so the official example might be faster. (R code to play with the latter here and resulting plot below)

1 Like

Wow, you really hammered that out quickly. Thanks for putting in so much effort to help me!

Your model is doing more-or-less what I want, but it doesn’t quite get at something I consider important, which is that I don’t think the anticipatory responses are well-captured with a single shift[2]. My earlier attempts included something like that, and prior predictive checks produced fairly convincing data, but the anticipatory lognormal ended up fitting a large standard deviation to cover a broad range of data with high probability of being anticipatory.

Unfortunately, I’ll have to continue working on this another time, as I must go pick up my kid from school! Thanks again for all the effort to help, I really appreciate it.

I don’t quite follow. Are you able to ammend my model to convey what you mean? Are you possibly missing that in the data declarations all times are relative to the cue onset, hence need for a shift in the anticipations that’s separate from the shift of the real-rts (which is actually parameterized as a shift relative to actual-target-onset)? The latter shift captures standard “non-decision time”, while the anticipation shift is more about the build-up of anticipatory potential.

Ah, yes, that would be a pathology of the mixture likelihood that is remediable by (1) a good prior on the mixture proportions and (2) a good prior on sigma[1] that keeps it from expanding. The structure I added whereby there is no mixture modelled for the clearly-anticipation responses might help too, but I’m less certain on that one.

Hi @mike-lawrence, I continue to appreciate your efforts here. I have not described my paradigm in sufficient detail for your to craft a reasonable model of the data it produces. I don’t have the methods section for this study written up yet (usually I would at this point, but it was part of a summer intern project, so we had to rush some bits), which would probably deliver the minimum level of detail needed to get advice on a good overall modelling approach. Understanding what we did an why we did it, as well as what we have tried that hasn’t worked, would involve a large time commitment that I wouldn’t expect from the kind folks on this forum.

With all that said, I think I can try to address this point:

Yes. I think what’s missing is that I expect the anticipation buildup to be variable (and I even have a hypothesis about the distribution of the variability), rather than to be well-captured by some single value. As written, your approach shoves that variability into the lognormal.

Of course, my intuition of how the fixed shift would work might be wrong. Perhaps the thing to do is to use the data generator I posted up-thread and do posterior predictive checks using the fixed shift approach. I might get to something like that later today, but if anyone wants to play around, they would need the inputs to the generator. I’m using n_trial = 900, speed=0.5, offset=0.6 and x is a sample from a truncated normal distribution with mean 1.5, standard deviation 0.4 truncated such that x*speed + offset > 0.

To be clear: I don’t expect anyone to do this work for me, but if they do, they might as well have the right numbers.

A dict of data that works for the generator

{‘n_trial’: 900,
‘x’: array([1.71463285, 1.66689475, 0.90731924, 1.71307008, 1.31193632,
1.44996142, 2.32668219, 0.89325194, 1.13971568, 1.00624363,
1.91225642, 1.70907977, 1.22368643, 1.04844887, 1.71911565,
1.15301112, 1.27015213, 1.71679685, 0.8956403 , 1.30126462,
1.50655083, 1.63352945, 1.39779123, 2.48004625, 1.79298203,
0.4813209 , 1.32449366, 0.68100022, 1.90142081, 1.0107301 ,
1.36114914, 1.82897625, 1.38188801, 0.72820253, 1.24423134,
1.3276236 , 1.5665894 , 1.79116287, 1.54880894, 0.65047677,
1.9103229 , 1.40934777, 1.78049746, 1.2058491 , 0.7849356 ,
1.35361581, 1.84845632, 2.22593959, 1.55227381, 1.81984828,
1.76025853, 0.76347841, 1.23800091, 1.02958715, 1.7289381 ,
1.01296009, 1.35965887, 1.17442139, 1.99598201, 1.83563993,
1.60630454, 1.18922454, 1.63723032, 1.49002682, 1.54287883,
1.95798897, 1.61394736, 1.27033043, 0.84155626, 1.32800517,
1.12869114, 1.20304446, 1.20318948, 1.23324058, 1.44377833,
1.27088956, 1.85315824, 1.57030199, 1.2666693 , 1.61093914,
1.00049481, 2.31923263, 1.82110694, 1.58621455, 1.6200748 ,
0.68461534, 1.35212787, 1.06662714, 1.68554411, 1.6358601 ,
2.02848215, 1.98702585, 2.18259066, 1.24812925, 1.25189015,
1.67745755, 1.71519595, 1.69183858, 1.64069403, 1.22749556,
1.84766861, 1.03273096, 1.45826971, 1.38283206, 1.60686946,
1.08443876, 1.69462284, 0.81316794, 2.01783959, 1.70718083,
1.9610907 , 1.83068041, 1.92627154, 1.99657413, 1.18082694,
0.98157704, 1.18232998, 1.35076343, 1.20619968, 1.08509699,
1.1280464 , 1.11920178, 1.09686694, 1.62086147, 1.96347604,
1.80253859, 0.95330026, 1.04132678, 1.46579874, 1.10955758,
1.33101545, 2.21423583, 1.43850687, 1.44520157, 1.72586252,
1.8089055 , 1.12446164, 2.02342588, 1.22606346, 2.52615033,
1.03511571, 1.92434987, 1.52615717, 2.00670719, 1.21306754,
1.59624943, 1.03360743, 1.05383834, 1.94995245, 2.02665669,
1.44359185, 2.03372728, 1.53040402, 1.65643279, 1.57468869,
1.75690086, 1.46184878, 1.59768032, 1.23098918, 1.33708018,
1.23399816, 1.08203408, 1.78176266, 1.25965472, 1.5337637 ,
2.01475279, 1.393136 , 1.8706551 , 1.11545181, 1.63388978,
1.78245451, 1.39345904, 1.09173469, 0.78618834, 1.03505652,
2.05331546, 1.28990848, 1.29916115, 0.81913944, 1.51053294,
1.4422668 , 1.82643041, 1.64660503, 1.71455167, 1.29287306,
1.69863057, 1.35654487, 1.65497962, 0.69062366, 1.20619905,
1.15390829, 1.53190636, 1.21426985, 1.74151888, 1.6333172 ,
1.35668188, 1.28105821, 1.38259594, 1.7783407 , 1.32755597,
1.38404155, 0.74263812, 1.49813512, 2.22117267, 1.43459651,
1.85509068, 1.38053967, 1.65122942, 0.91841248, 1.30352408,
1.29470082, 1.77673836, 1.57322709, 1.37747663, 1.29417811,
0.95072383, 1.37129909, 1.24497054, 1.31706388, 1.71968896,
0.58407867, 2.24813603, 1.60680373, 1.39291529, 1.83672922,
1.97071587, 1.14396165, 1.43880287, 1.7460923 , 1.60867927,
0.98798407, 1.96990396, 1.36448092, 1.65050172, 1.36679484,
1.88497946, 1.36080395, 1.81383336, 1.54263596, 1.82019101,
1.14640677, 0.65992776, 1.44684069, 1.24374296, 1.76923697,
1.37159481, 1.09661808, 2.06670466, 1.59739686, 1.44933372,
1.70686039, 0.99293329, 0.79224166, 1.92847163, 0.68967432,
2.41267697, 1.22253856, 1.30212025, 1.34877076, 1.82864399,
1.43720914, 2.41155551, 1.40308105, 1.55894971, 1.39788224,
1.4069269 , 1.24825872, 1.52967965, 1.8463802 , 1.35757903,
1.68841576, 1.05647921, 2.06255563, 0.86785457, 0.74051683,
0.86246504, 0.93159555, 1.05686055, 1.22259543, 1.95923815,
2.07663797, 1.25836004, 0.99992641, 1.73060089, 1.35297436,
0.9886346 , 1.27441345, 1.14550238, 2.00797807, 1.54740973,
1.80841585, 2.04497283, 1.40529412, 2.04335478, 1.82012415,
1.01633159, 1.91198009, 1.53362666, 1.5670273 , 0.87215216,
1.01164201, 1.34994048, 1.11823014, 1.67101063, 1.6166815 ,
1.09432881, 1.65274026, 1.04763288, 1.29412816, 1.47059081,
0.95988715, 1.09967059, 1.63403312, 1.43200101, 1.23255506,
1.57896525, 1.68481131, 2.20179102, 1.49090524, 0.88465731,
1.49848901, 1.76120428, 1.21013197, 1.38558028, 1.94651303,
1.62826254, 1.59729457, 1.91020226, 1.41061969, 1.64590243,
1.68724435, 1.12295343, 1.65456974, 0.93417275, 1.56933947,
1.81908228, 1.74303448, 1.24218675, 1.4353728 , 1.60239699,
1.26817191, 1.55225422, 1.63062612, 1.42146777, 1.1726542 ,
1.9485992 , 1.49897682, 1.17451942, 1.01423384, 1.36232369,
1.76330664, 1.28112729, 1.76907132, 1.5450341 , 1.16901005,
0.4481761 , 1.08256353, 1.32337861, 1.82635545, 2.18060832,
2.08542958, 1.72678751, 1.65811799, 2.06226756, 0.65676932,
1.42533439, 1.5233514 , 1.57877291, 1.34381009, 1.84565896,
1.31488243, 2.15875776, 1.7917435 , 1.52514179, 1.84988618,
1.16702769, 1.2566659 , 1.93775394, 1.48422246, 0.81275899,
0.80972178, 1.54504395, 1.67047107, 1.4255102 , 1.03914139,
1.49905117, 1.90585224, 1.94419814, 1.87702087, 1.17132595,
1.16767503, 0.92545812, 1.30240543, 1.66728468, 1.27841999,
1.43545897, 1.99439952, 1.03229532, 1.20078765, 0.91849164,
2.10494513, 2.14666041, 1.6804755 , 2.11250748, 2.00403577,
1.60841894, 2.20306183, 1.34236012, 1.48328287, 1.13137855,
1.38601093, 1.74763679, 1.05433886, 1.06326583, 1.6509275 ,
1.5177439 , 1.93494944, 1.30072436, 1.96418181, 1.47946113,
1.78208035, 1.17734493, 1.03981158, 1.13766344, 1.22209118,
1.50368442, 1.62185037, 2.27607412, 1.06242796, 1.53950485,
1.62002305, 1.20846696, 1.35788256, 1.68903824, 1.73243762,
2.25131805, 1.29337909, 1.1704503 , 2.00418248, 1.58125277,
1.85715857, 1.44906238, 1.11395936, 1.27648016, 1.81507292,
2.52310839, 1.7308328 , 2.08743733, 1.14926832, 2.019069 ,
1.16078664, 1.56252167, 1.41232572, 0.98999018, 0.80934903,
0.96980471, 1.52876938, 1.06479208, 1.31834463, 1.65832826,
1.42146178, 1.9561115 , 1.16310461, 2.21646323, 1.0412194 ,
1.68832791, 2.36257425, 1.32101045, 1.25728909, 1.54846982,
1.72411618, 1.01877469, 0.82379229, 1.51763467, 1.79559066,
1.42125335, 2.30277292, 1.94233597, 1.10191683, 1.68848332,
1.62518851, 1.61994904, 1.18129106, 1.78693737, 1.70697347,
1.16468437, 1.72696201, 1.22833675, 1.98598007, 1.15451829,
1.43935782, 1.67034698, 1.32863299, 1.81156725, 2.00943525,
2.40020496, 1.4901117 , 1.10478916, 1.12536501, 1.6120362 ,
1.75108655, 1.18599724, 1.72462402, 1.76267758, 1.83320655,
1.09647001, 1.30055274, 1.54043957, 0.77924227, 0.77152645,
1.50244127, 1.79812322, 1.13941687, 0.35291121, 1.60652455,
1.57896198, 0.92196547, 1.81693339, 2.21072702, 1.48835434,
1.21356814, 2.24883167, 1.34935901, 0.78926687, 0.82997834,
1.26579909, 1.32808331, 2.00175711, 1.95995968, 1.23929407,
1.43031391, 1.05469712, 1.21317114, 1.49084462, 1.51862192,
1.71236049, 1.67819836, 1.27240101, 1.83222945, 1.90802218,
1.64153393, 1.18996029, 1.25777007, 2.55708106, 1.23721765,
1.71453747, 1.73741001, 1.08678483, 1.38527555, 1.51470701,
1.24175409, 0.99156607, 1.64086503, 1.72590692, 1.37217605,
2.073707 , 0.89075541, 1.41334665, 2.27725675, 1.1865692 ,
1.47303008, 1.2560904 , 1.88787993, 1.74305116, 1.6333613 ,
1.63335762, 0.49388104, 1.23681437, 0.53692182, 1.22824339,
0.76292418, 1.69849503, 0.92589493, 1.57849516, 1.91678051,
1.63325409, 1.43248634, 1.16836458, 1.31842898, 1.43710868,
1.6402075 , 1.36736783, 1.39327693, 1.0812158 , 2.00723136,
1.35803991, 2.1627488 , 2.12341854, 0.84851886, 1.91923389,
0.76996699, 1.74303569, 0.44509898, 1.72026222, 2.01386418,
1.08003328, 1.34477569, 1.10533382, 0.96994145, 2.19852246,
1.97605538, 1.59301917, 1.82924604, 1.56212813, 1.94835323,
1.57141078, 1.5054509 , 1.72687721, 2.00690011, 1.92529319,
1.43853406, 1.94123959, 1.78927523, 0.64043908, 1.66707223,
0.99173274, 2.00656888, 1.33208685, 1.77562991, 2.06631587,
1.51374642, 1.23724896, 1.98017486, 1.60559504, 1.01234142,
1.75459959, 2.07547197, 1.63751913, 1.85606954, 1.17926343,
1.67274263, 2.07457123, 1.57996246, 1.16679774, 0.63501066,
1.62565372, 1.39936525, 2.26337194, 1.56248251, 1.90086155,
2.31643861, 1.63928433, 1.68333444, 1.60019671, 1.04772703,
1.98293651, 2.02300711, 2.18794756, 1.51951032, 0.99452297,
1.87728471, 1.68020186, 0.73746721, 1.45070275, 1.07620892,
1.31750901, 1.56066008, 1.21223382, 1.37301306, 1.67132645,
1.01180417, 1.04463269, 1.827016 , 1.92832295, 1.15110249,
2.33138556, 1.22779065, 1.47645222, 1.50106393, 1.52521838,
1.0394237 , 1.70750946, 1.43926356, 1.47819922, 1.10256747,
2.27997926, 0.49996611, 1.964369 , 1.65116239, 1.44731014,
2.16631243, 1.96845672, 1.33431872, 2.02717684, 1.93655268,
1.64180069, 1.71739631, 1.38051891, 1.4364717 , 1.98780913,
1.58628969, 0.72876095, 1.05476247, 0.78767555, 1.6115925 ,
0.98176347, 1.68563339, 1.74183964, 1.11093133, 1.55465234,
1.49946743, 1.3751751 , 1.33098199, 1.14225007, 1.68179629,
1.34248706, 1.41760965, 1.16407453, 1.65852036, 1.65613244,
0.65182448, 1.67526897, 2.1027117 , 2.09009647, 0.99783751,
1.36993122, 0.98981812, 1.03911643, 1.5935461 , 1.29838472,
0.66831748, 1.1016819 , 1.83844026, 1.41140539, 1.40894711,
1.18166355, 0.75582509, 1.58957992, 1.59767891, 1.67954386,
2.02715156, 1.74406052, 2.07014822, 2.13592923, 1.28003736,
1.61252277, 1.55438622, 1.90093171, 1.75987071, 2.36166838,
1.26060982, 1.29932783, 1.71678983, 1.61881236, 1.8568486 ,
1.62483518, 1.37111666, 1.86635259, 1.64617955, 1.7273529 ,
1.71705431, 1.08387838, 1.58607817, 1.03390473, 1.04431186,
1.20840003, 1.84687332, 1.96671491, 1.41084629, 1.89274108,
1.22191699, 1.51477716, 1.15399196, 1.86516082, 1.74347267,
1.00843463, 1.68494499, 1.60784254, 1.39911197, 1.3817205 ,
1.85188123, 1.59069061, 1.93320487, 1.4614318 , 1.69600429,
0.79395593, 1.46501481, 2.00196172, 1.74819466, 1.01779144,
1.57553618, 1.85119082, 1.50010091, 1.56699365, 1.36427381,
1.24300679, 2.14286506, 1.59398389, 1.45568081, 1.04927128,
1.5371623 , 1.80048409, 1.75316067, 1.27759954, 1.39032567,
1.40418482, 0.91907227, 1.22689382, 1.58171714, 2.09277857,
0.81487917, 2.15399552, 1.69876473, 1.47055098, 1.57529422,
1.35261053, 1.72897618, 0.53266903, 1.66388025, 1.90484943,
2.04521388, 2.39594575, 1.80452427, 0.92714725, 1.27421888,
1.96265603, 1.24267467, 2.1990744 , 2.27470253, 1.85310056,
1.39121889, 1.10854567, 1.88746342, 1.44893753, 1.79395335,
1.73395611, 1.32483362, 1.63990176, 1.83818099, 1.16999171,
0.78796025, 1.60617802, 1.04841343, 1.43661427, 1.45943919,
1.53514566, 1.95043635, 1.65028446, 1.68635235, 1.3880886 ,
1.32706886, 1.46790275, 2.47685703, 1.37287167, 1.36341208,
1.62446536, 2.07240138, 1.68833411, 1.29406607, 1.53130423,
1.20555371, 1.85808303, 1.03028183, 1.6106548 , 1.75266787,
1.46135747, 1.68914659, 1.45331804, 1.38766034, 1.61615369,
1.57382249, 1.67065774, 1.86530354, 1.62003532, 1.48214774,
1.14711647, 1.5838087 , 1.94523722, 2.00244614, 1.63363049,
1.84898474, 1.29667099, 0.65949719, 1.09506918, 1.7700706 ,
1.0774581 , 1.53167818, 1.69386866, 1.14178004, 2.16057685,
1.23572318, 1.70430607, 1.70835561, 1.44159575, 1.17905709,
1.24395464, 2.02645419, 1.21944601, 1.38200902, 1.35488915,
1.97516424, 1.45373348, 1.54546202, 1.58411512, 1.58878201,
1.92357928, 1.99314196, 1.30315019, 1.5236839 , 2.54998211]),
‘speed’: 0.49844344476991026,
‘offset’: 0.6108466976506971}

Maybe my math intuitions are off, but if you’re using a log-normal CDF/hazard function to capture the rising structure of this build-up, that by definition yields a log-normal PDF for the observations, so rt~lognormal(...) captures precisely what you are deriving more laboriously by hand with the CDF/hazard stuff, no?

Well, I’m modeling the response time rt as the sum of two variables. One is lognormally distributed, and the other has a distribution proportional to a normal cdf. If that somehow reduces to a shifted lognormal, or is even reasonably approximated by a shifted lognormal, then the posterior predictive check would reveal that nicely.

Using the generator I posted, here are the distributions of response times (collapsed over all values of x) of the anticipatory response for 10 realizations of the generating process (i.e. draws of the mus and sigmas):
image

The varying x can obscure the distribution, so here’s what it looks like with a constant x-value
image

As you can see, none of these look much like a shifted lognormal. So, to use the shifted lognormal model would mean rejecting the structure of the data generating process I’m positing. That might be the correct thing to do, but I’m not willing to abandon it just yet.

Oh! Yes, I see now that I hadn’t looked properly at what you were doing with the CDF/hazard. I considered recommending a negative-log-normal with zero set to the cued time, but the more I think about it the more your strategy of using the hazard function makes more sense as a theory-informed approach.

It took some doing, but it seems that integrating out the hidden E event has worked. I ended up calculating integrals with trapezoid integration.

Why not use integrate_1d?

In tests, my integrator returned values very close to the built-in integrate_1d when parameters were pretty close to the typical set. When the parameters got out into their tails, though, integrate_1d would fail because it detected errors bigger than the tolerance, but those errors were on the order of 10^{-200}. I figure this had to do with underflow, so I tried to build some simple checks into my version that let it manage anyway.

And thanks to @mike-lawrence for the tip to use Weibull priors instead of half-normal for the standard deviation parameters; that seems to keep the sampler from letting those get too close to zero. That solved my last few problems with divergences.

Honestly, I am not sure I have totally understood why this version works and the earlier one didn’t, but over a few cycles of parameter recovery, it is recovering parameters and passing the cmdstan diagnose() checks. And it’s about 4x faster. Here’s the model as re-written.
antic_only_gen.stan (2.2 KB)
antic_only_analysis.stan (5.8 KB)

3 Likes

The strong correlation between energy and \sigma_{r} indicates that the posterior geometry limits how well each Hamiltonian trajectories can explore \sigma_{r}. This is typically due to nasty correlations between \sigma_{r} and other parameters, and given the hierarchical structure immediate candidate would be a funnel geometry between \sigma_{r} and the \log d_{rt}.

“Reparameterization” is often thrown around as a cure-all without the necessary context. In the case of a hierarchical model with a normal population model the centered and non-centered parameterizations provide contrasting behaviors that can be useful in different circumstances. If you had a normal population model then yes a non-centered parameterization would be the worth checking.

The problem here is that while it may look like you have a normal population model for the \log d_{rt} you actually don’t because of the constraint. An interval censored normal model does not admit the same reparameterization options.

Zero-avoiding prior models should only ever be used if one has actual domain expertise that conflicts with values near zero. Otherwise one is just forcing larger heterogeneity than is actually consistent with he observed data.

Indeed. Though I’d argue that some useful generalization can be achieved through the heuristic that any parameter that serves the role (as in this case) of defining the magnitude of measurement noise should never have a peaked-at-zero prior because real-world measurements always have noise.

Actually, now that I think on it a moment, this may be a rare case where I disagree with @betanalpha! At least, I’m sure he’d agree with me that in the ideal the precise structure of all priors are well-informed by domain expertise. But absent that ideal, I’d assert that non-zero-peaked priors for variance terms should be a heuristic default and that zero-peaked are the ones that need specific justification from domain expertise.

Certainly any variance term intended to reflect the magnitude of measurement noise should be non-zero-centered (the world is noisy and our measurements moreso), and I think that even the variance terms describing variability not at the observation level but variability of latent quantities should also usually default to non-zero-peaked. At least in the domains I’m used to seeing these models (psychology, where it’s ridiculous to posit that an effect manifests equally among all people).

1 Like

If one can elicit a principled lower bound on the magnitude of the noise due to, for example, domain knowledge of the detectors then, sure, go for it. Without the explicit elicitation, however, one is vulnerable to unnecessarily excluding small but non-zero heterogeneities. The problem with a zero-avoiding prior is not the suppression of zero, it’s the suppression of model configurations near zero.

Can you give an example where you’ve encountered this? I’ve only ever experienced the pathology reported by OP related to zero-peaked priors on variances, and have a hard time imagining a scenario where, given reasonable weakly informed scale choice, non-zero-peaked priors for variances would yield misleading inference.