Pkl generates wrong declaration dimension

This appears it may be a bug. I have the following declarations

data {
	int N; 
	int T; 
	int D; 
	int S; 
	row_vector[D+1] z[N, T]; 
	row_vector[D] X[N, T]; 
	int y[N, 2, T]; // outcome
	int transD; 		
	int dphi; 		
	vector[dphi] mu_phi;		 
    matrix[dphi,dphi] Omega_phi;

When I run the code, it complains that I declared z to have the wrong dimension
initialization; variable name=z; position=0; dims declared=(34,80,6); dims found=(100,80,6)

N is definitely 100. If I re-pkl the .stan file, sometime it will complain the declaration is (33,80,6)
finally, I forced the declaration to be

	row_vector[D+1] z[100, T]; 

then it runs without the complaint.

Is that indeed a bug?

What is pkl?


Thanks. Does this only apply to PyStan?

Yes it’s part of running ep Stan. All in python

Mind moving the thread to Interfaces -> PyStan? I don’t think this has to do with stanc based on what info you’ve provided.

Really I would have thought if it misinterpreted the dimension N it rests w
the compiler

AFAIK, the pickling / unpickling process doesn’t run the stanc compiler a second time.

stanc is a compiler and it translates the .stan file to C++. Once the compiler is compiled, it will always create the same C++ file given the same .stan input. There really isn’t a chance for it to interpret differently based on multiple runs. (Unless there’s something really funky going on in Python, which is possible, but I think if you’re talking about behavior just when pickling / unpickling, then stanc isn’t the culprit.)

sorry I believe I made a confusing presentation. What I am saying is I repickled the stan file and ended up with a different result, meaning there is something random about this. However, the entire process is that the pkl file is passed to the client code and model.sampling called on it. downstream there will actually be a cython compilation process which takes the pkl as the input. this is as far I understand the process. nevertheless, it seems obvious that the compilation process mis-read N dimension in the the z[N, T] decalration.

Could it be that N is changing depending on the subprocess?

Add prints to your code and see what they say.

1 Like

I am quite sure this is a bug.

All I did is send in a second parameter Nz with identical value, now the declaration works correctly.

Sorry, I still don’t know what you’re trying to do / have done. Can you put down the full Stan program and a reproducible example in the form of a script or even just Python code?

I didn’t see an Nz in the original code, so it’s really hard to tell how this is relevant. (I’m not saying that it isn’t; I’m just saying that there isn’t enough information to help.)

after playing with code, I can tell the offending code is where I used [D + 1] as the declaring dimension. The compiler (or the process of instantiating StanModel, pickle, then cython compile) gets confused about the both N and D + 1 dimensions for that variable.

feel free to test that out with any code you might have. Just declare a dimension with [something + 1].

Mind posting a minimal example that shows the bug? I still don’t understand what you do to trigger the bug.

I really don’t understand this. The Stan language only allows for one declaration per variable.

Sorry, but please give us a minimal working example. With each step, you are doing. Otherwise, this is not going anywhere.

Is this a normal PyStan issue or something else? I’m confused why you are doing pickle-repickle step and for what? For .stan file?

The 33/34 vs 100 would indicate that you split your data in three parts.

1 Like

Hey guys, I apologize for creating this confusing issue, and thanks for following the thread painstakingly. I realize now that this bug only occurs in the combination of stan + ep-stan + the modifications I personally made to ep-stan. I am the only person that can really dig into it to the root, and I can only do that if I learn a lot more about cython, since the bug appears in the generated cython. For now, I am happy to find the work around, which is eliminating the use of [D + 1].

The 33/34 vs 100 would indicate that you split your data in three parts.

This is good insight, I didn’t notice that. It will be a good hint if I dig into the root cause.

I really don’t understand this. The Stan language only allows for one declaration per variable

This is referring to the declaration

row_vector[D+1] z[N, T];

You can see that even though it is a single declaration, it has 3 dimension variables.