TypeError: Object of type int64 is not JSON serializable

pd.DataFrame.min() is returning a 0 as an int64, which does not bother my laptop on Ubuntu 21.04, PyStan3.2, etc.
But on my server, Ubuntu 20.04, PyStan3.2, etc, my Stan model won’t even compile:

import stan
import numpy as np

schools_code = """
data {
  int<lower=0> J;         // number of schools
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood

schools_data = {"J": 8,
                'test': np.int64(0),
                "y": [28,  8, -3,  7, -1,  1, 18, 12],
                "sigma": [15, 10, 16, 11,  9, 11, 10, 18]}
posterior = stan.build(schools_code, data=schools_data, random_seed=1)

results in:

TypeError                                 Traceback (most recent call last)
~/papers/focal-responses-paper/bin/pystan3_debugging.py in <module>
     26                 "y": [28,  8, -3,  7, -1,  1, 18, 12],
     27                 "sigma": [15, 10, 16, 11,  9, 11, 10, 18]}
---> 28 posterior = stan.build(schools_code, data=schools_data, random_seed=1)
     30 fit = posterior.sample(num_chains=4, num_samples=1000)

~/.local/lib/python3.8/site-packages/stan/model.py in build(program_code, data, random_seed)
    448     """
    449     # `data` must be JSON-serializable in order to send to httpstan
--> 450     data = json.loads(DataJSONEncoder().encode(data))
    452     async def go():

/usr/lib/python3.8/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/usr/lib/python3.8/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

~/.local/lib/python3.8/site-packages/stan/model.py in default(self, obj)
     25         if isinstance(obj, np.ndarray):
     26             return obj.tolist()
---> 27         return json.JSONEncoder.default(self, obj)

/usr/lib/python3.8/json/encoder.py in default(self, o)
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')

TypeError: Object of type int64 is not JSON serializable
> /usr/lib/python3.8/json/encoder.py(179)default()
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')

Feature? Bug?

Btw, .astype(int) in pandas gives me this int64.

As a naive, very applied, simple user,I have to say, to be honest, sorry, my productivity on my main project has somewhat ground to a halt since I followed the recommendations everywhere to switch from pystan2 to pystan3. (Update: okay, finally learning to use venv may make me more sane, though developing for both at once).

1 Like

A selected design choice?

Have you had other problems using pystan or is the problem the new workflow?

Yes, but I’m a little too overwhelmed to itemize them all without making mistakes, and this is not the right subset of the forum for general beginner confusions. I had built a lot of infrastructure around pystan2, so it’s just felt very difficult to be stalled and going in circles and trying to go back to pystan2 when I’m completely stuck and many different codes of mine no longer work. I’ve already sought help here about managing jobs with pystan3 (and it was suggested I should look at cmdstan!), I find the available examples skimpy, it runs my model significantly slower than pystan2 for identical data and stan code, and not being able to pass the integer 0 in a natural way (!) feels like I must be the sole non-developer using it. In your language, maybe that’s “yes, just the new workflow”. :)

1 Like

PyStan 3 running a lot slower than CmdStan is a bug, assuming both are using the same version of Stan and Math. Please open an issue if you’ve found something like this.

It’s really hard to compare PyStan 2 and PyStan 3 performance directly because the Stan and Stan Math libraries have changed so much.

Definitely a feature. Sorry this wasn’t fixed in PyStan 2.

Stan uses int32 for integers. See https://mc-stan.org/docs/2_27/reference-manual/numerical-data-types.html#integers for details. So not accepting int64 seems reasonable to me. PyStan doesn’t want to be in the business of maintaining type conversion rules.

Also, I think it makes sense to be careful with packages outside the standard library. If we start adding special handling for numpy’s types, it’s difficult to defend not adding special handling for tensorflow’s types, torch’s types, jax’s types, etc.

Thanks for the report. If you’d like to add an FAQ entry on this, I would be glad to review it.

My guess is this problem would also arise if the data had numpy’s int32 type, not just the larger types. The builtin JSON library will fail on more or less any non-python base type

Tangentially related, but I think it’s almost always worth supporting numpy. A lot of numerical/scientific code treats numpy as a second standard library, and a lot of other libraries (like Jax et al) use numpy’s standards for type conversions and array APIs. Of the three you listed, to my knowledge all of them use numpy dtypes or compatible ones, meaning it is generally sufficient to call np.asarray on them and proceed as if it is a pure numpy object. These 8 lines in cmdstanpy are enough to cover most reasonable cases we’ve encountered with different scalar types, pandas objects, etc

1 Like

I agree about numpy int32, given that it’s Stan’s integer type. I wouldn’t object to a patch which makes PyStan 3 handle this without complaint.

Any sense in having a quick error check higher up the stack so that a user like me gets an error message from Stan rather than a depedency, saying what the problem is? Once could catch the error in line 450 of model.py and point people to a FAQ, or mention the list of allowed types in data passed in the data dict, …
I would have been comforted to hear from Stan rather than json, though I see now the comment in the error trace I got in exactly the right place:

# `data` must be JSON-serializable in order to send to httpstan

I have not reason to think that kind of weirdness is consistent with whatever philospohy and style you have all set up. Just throwing out the idea.

Thanks, all

I agree that it’s frustrating. I wouldn’t object to adding an isinstance check in this particular case, although doing so is definitely unpythonic (duck-typing, and all that).

More details:

Allowable types are indicated in the API documentation for build: API Reference — pystan 3.2.0 documentation

Here’s the type for data: Dict[str, Union[int, float, Sequence[Union[int, float]], numpy.ndarray]]

This is hard to read and will be improved eventually when we can assume everyone uses Python 3.10. Here’s the type in Python 3.10: dict[str, int | float | Sequence[int | float], numpy.ndarray].

Looking at that type, I can see that a bare numpy.int64 would not be accepted as an instance of the type.

update: Created a pull request which might (marginally) help things. docs: Avoid suggesting support for numpy types by riddell-stan · Pull Request #325 · stan-dev/pystan · GitHub

1 Like

CmdStanPy has method write_stan_json which should take care of this problem.

1 Like

I think I have a clear example of why we do not want to encourage people to think that PyStan will automatically convert integer-like types from arbitrary third-party libraries.

(Note that the the word “numpy” does not occur anywhere in PyStan’s documentation.)

Here’s numpy’s int64, which is well-behaved:

import numpy as np
import numbers
isinstance(np.int64(3), numbers.Integral)  # True

Here’s jax, another extremely popular library for scientific computing:

import jax.numpy as jnp
import numbers
isinstance(jnp.int64(3), numbers.Integral)  # False

In short, it’s hard to predict how third-party packages will behave.

Numpy integers (sub-dtypes of numpy.integer) will now be silently converted to Python ints, solving this problem. Here’s the PR: feat: Convert numpy integers in `data` to Python int by riddell-stan · Pull Request #327 · stan-dev/pystan · GitHub

1 Like