TypeError: Object of type int64 is not JSON serializable

cpbl · September 20, 2021, 2:37pm

pd.DataFrame.min() is returning a 0 as an int64, which does not bother my laptop on Ubuntu 21.04, PyStan3.2, etc.
But on my server, Ubuntu 20.04, PyStan3.2, etc, my Stan model won’t even compile:


import stan
import numpy as np

schools_code = """
data {
  int<lower=0> J;         // number of schools
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood
}
"""

schools_data = {"J": 8,
                'test': np.int64(0),
                "y": [28,  8, -3,  7, -1,  1, 18, 12],
                "sigma": [15, 10, 16, 11,  9, 11, 10, 18]}
posterior = stan.build(schools_code, data=schools_data, random_seed=1)

results in:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/papers/focal-responses-paper/bin/pystan3_debugging.py in <module>
     26                 "y": [28,  8, -3,  7, -1,  1, 18, 12],
     27                 "sigma": [15, 10, 16, 11,  9, 11, 10, 18]}
---> 28 posterior = stan.build(schools_code, data=schools_data, random_seed=1)
     29 
     30 fit = posterior.sample(num_chains=4, num_samples=1000)

~/.local/lib/python3.8/site-packages/stan/model.py in build(program_code, data, random_seed)
    448     """
    449     # `data` must be JSON-serializable in order to send to httpstan
--> 450     data = json.loads(DataJSONEncoder().encode(data))
    451 
    452     async def go():

/usr/lib/python3.8/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/usr/lib/python3.8/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

~/.local/lib/python3.8/site-packages/stan/model.py in default(self, obj)
     25         if isinstance(obj, np.ndarray):
     26             return obj.tolist()
---> 27         return json.JSONEncoder.default(self, obj)
     28 
     29 

/usr/lib/python3.8/json/encoder.py in default(self, o)
    177 
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')
    181 

TypeError: Object of type int64 is not JSON serializable
> /usr/lib/python3.8/json/encoder.py(179)default()
    177 
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')
    181

Feature? Bug?

Btw, .astype(int) in pandas gives me this int64.

As a naive, very applied, simple user,I have to say, to be honest, sorry, my productivity on my main project has somewhat ground to a halt since I followed the recommendations everywhere to switch from pystan2 to pystan3. (Update: okay, finally learning to use venv may make me more sane, though developing for both at once).

ahartikainen · September 20, 2021, 6:16pm

A selected design choice?

github.com/stan-dev/pystan

Cast np.int types to python int in `_make_json_serializable`

opened 12:08PM - 26 Feb 21 UTC

closed 01:29PM - 28 Feb 21 UTC

ahartikainen

``` def _make_json_serializable(data: dict) -> dict: """Convert `data` wit…h numpy.ndarray-like values to JSON-serializable form. Returns a new dictionary. Arguments: data (dict): A Python dictionary or mapping providing the data for the model. Variable names are the keys and the values are their associated values. Default is an empty dictionary. Returns: dict: Copy of `data` dict with JSON-serializable values. """ # no need for deep copy, we do not modify mutable items data = data.copy() for key, value in data.items(): # first, see if the value is already JSON-serializable try: json.dumps(value) except TypeError: pass else: continue # numpy scalar if isinstance(value, np.ndarray) and value.ndim == 0: data[key] = np.asarray(value).tolist() # numpy.ndarray, pandas.Series, and anything similar elif isinstance(value, collections.abc.Collection): data[key] = np.asarray(value).tolist() else: raise TypeError(f"Value associated with variable `{key}` is not JSON serializable.") return data ``` Currently `np.int64` type values raise TypeError (`raise TypeError(f"Value associated with variable `{key}` is not JSON serializable.")`). I think we can add `numbers.Integral` and `numbers.Real` and cast numbers automatically ``` from numbers import Intergral, Real if isinstance(value, Integral): value = int(value) elif isintance(value, Real): value = float(value) ```

Have you had other problems using pystan or is the problem the new workflow?

cpbl · September 20, 2021, 8:02pm

Yes, but I’m a little too overwhelmed to itemize them all without making mistakes, and this is not the right subset of the forum for general beginner confusions. I had built a lot of infrastructure around pystan2, so it’s just felt very difficult to be stalled and going in circles and trying to go back to pystan2 when I’m completely stuck and many different codes of mine no longer work. I’ve already sought help here about managing jobs with pystan3 (and it was suggested I should look at cmdstan!), I find the available examples skimpy, it runs my model significantly slower than pystan2 for identical data and stan code, and not being able to pass the integer 0 in a natural way (!) feels like I must be the sole non-developer using it. In your language, maybe that’s “yes, just the new workflow”. :)

ariddell · September 21, 2021, 1:19am

PyStan 3 running a lot slower than CmdStan is a bug, assuming both are using the same version of Stan and Math. Please open an issue if you’ve found something like this.

It’s really hard to compare PyStan 2 and PyStan 3 performance directly because the Stan and Stan Math libraries have changed so much.

ariddell · September 21, 2021, 1:31am

Definitely a feature. Sorry this wasn’t fixed in PyStan 2.

Stan uses int32 for integers. See https://mc-stan.org/docs/2_27/reference-manual/numerical-data-types.html#integers for details. So not accepting int64 seems reasonable to me. PyStan doesn’t want to be in the business of maintaining type conversion rules.

Also, I think it makes sense to be careful with packages outside the standard library. If we start adding special handling for numpy’s types, it’s difficult to defend not adding special handling for tensorflow’s types, torch’s types, jax’s types, etc.

Thanks for the report. If you’d like to add an FAQ entry on this, I would be glad to review it.

WardBrian · September 21, 2021, 1:32pm

My guess is this problem would also arise if the data had numpy’s int32 type, not just the larger types. The builtin JSON library will fail on more or less any non-python base type

Tangentially related, but I think it’s almost always worth supporting numpy. A lot of numerical/scientific code treats numpy as a second standard library, and a lot of other libraries (like Jax et al) use numpy’s standards for type conversions and array APIs. Of the three you listed, to my knowledge all of them use numpy dtypes or compatible ones, meaning it is generally sufficient to call np.asarray on them and proceed as if it is a pure numpy object. These 8 lines in cmdstanpy are enough to cover most reasonable cases we’ve encountered with different scalar types, pandas objects, etc

ariddell · September 21, 2021, 6:59pm

I agree about numpy int32, given that it’s Stan’s integer type. I wouldn’t object to a patch which makes PyStan 3 handle this without complaint.

cpbl · September 21, 2021, 10:20pm

Any sense in having a quick error check higher up the stack so that a user like me gets an error message from Stan rather than a depedency, saying what the problem is? Once could catch the error in line 450 of model.py and point people to a FAQ, or mention the list of allowed types in data passed in the data dict, …
I would have been comforted to hear from Stan rather than json, though I see now the comment in the error trace I got in exactly the right place:

# `data` must be JSON-serializable in order to send to httpstan

I have not reason to think that kind of weirdness is consistent with whatever philospohy and style you have all set up. Just throwing out the idea.

Thanks, all

ariddell · September 25, 2021, 11:53am

I agree that it’s frustrating. I wouldn’t object to adding an isinstance check in this particular case, although doing so is definitely unpythonic (duck-typing, and all that).

More details:

Allowable types are indicated in the API documentation for build: API Reference — pystan 3.2.0 documentation

Here’s the type for data: Dict[str, Union[int, float, Sequence[Union[int, float]], numpy.ndarray]]

This is hard to read and will be improved eventually when we can assume everyone uses Python 3.10. Here’s the type in Python 3.10: dict[str, int | float | Sequence[int | float], numpy.ndarray].

Looking at that type, I can see that a bare numpy.int64 would not be accepted as an instance of the type.

update: Created a pull request which might (marginally) help things. docs: Avoid suggesting support for numpy types by riddell-stan · Pull Request #325 · stan-dev/pystan · GitHub

mitzimorris · September 25, 2021, 2:34pm

CmdStanPy has method write_stan_json which should take care of this problem.

ariddell · September 26, 2021, 1:06pm

I think I have a clear example of why we do not want to encourage people to think that PyStan will automatically convert integer-like types from arbitrary third-party libraries.

(Note that the the word “numpy” does not occur anywhere in PyStan’s documentation.)

Here’s numpy’s int64, which is well-behaved:

import numpy as np
import numbers
isinstance(np.int64(3), numbers.Integral)  # True

Here’s jax, another extremely popular library for scientific computing:

import jax.numpy as jnp
import numbers
isinstance(jnp.int64(3), numbers.Integral)  # False

In short, it’s hard to predict how third-party packages will behave.

ariddell · September 27, 2021, 1:25pm

Numpy integers (sub-dtypes of numpy.integer) will now be silently converted to Python ints, solving this problem. Here’s the PR: feat: Convert numpy integers in `data` to Python int by riddell-stan · Pull Request #327 · stan-dev/pystan · GitHub

Topic		Replies	Views
Pystan converting reals to ints? Modeling pystan	3	618	November 3, 2021
Exception: int variable contained non-int values PyStan	2	925	October 6, 2021
PyStan - Unexpected exception and slow training time to RStan PyStan	2	653	April 10, 2020
How do i figure out what's wrong with an invalid type error? General	4	2002	December 9, 2021
Dumping data in json file by cmdstanpy does not see a variable to be declared Interfaces cmdstanpy	2	846	March 10, 2022

TypeError: Object of type int64 is not JSON serializable

Related topics