Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in JSON.stringify replacer type. #36

Closed
MicahZoltu opened this issue Jan 6, 2024 · 13 comments
Closed

Bug in JSON.stringify replacer type. #36

MicahZoltu opened this issue Jan 6, 2024 · 13 comments
Labels
bug Something isn't working

Comments

@MicahZoltu
Copy link

Code:

JSON.stringify(value, (_, value) => value)

Type error:

No overload matches this call.
  Overload 1 of 3, '(value: unknown, replacer?: (string | number)[] | null | undefined, space?: string | number | null | undefined): StringifyResult<unknown>', gave the following error.
    Argument of type '(this: JSONComposite<unknown>, _: string, value: unknown) => unknown' is not assignable to parameter of type '(string | number)[]'.
  Overload 2 of 3, '(value: unknown, replacer: (this: JSONComposite<unknown>, key: string, value: unknown) => JSONValueF<unknown>, space?: string | number | null | undefined): string', gave the following error.
    Type 'unknown' is not assignable to type 'JSONValueF<unknown>'.
  Overload 3 of 3, '(value: unknown, replacer: (this: JSONComposite<unknown>, key: string, value: unknown) => JSONValueF<unknown> | undefined, space?: string | number | null | undefined): string | undefined', gave the following error.
    Type 'unknown' is not assignable to type 'JSONValueF<unknown> | undefined'.

I'm pretty sure this code is sound/correct, as you can always return the input value as output from the replacer function (it is the expected behavior for anything you don't want to replace), so this should not result in an error. The value parameter should be a subset type of the return type for the replacer function, but as things are currently defined A is not assignable to JSONValueF<A>.

One maybe solution that appears to work is changing:

type ToJSON<A> = A extends { toJSON(...args: any): infer T } ? T : A;

to

type ToJSON<A> = A extends { toJSON(...args: any): infer T } ? T : JSONValueF<A>;

Another maybe solution that appears to work is changing:

value: ToJSON<A>

to

value: JSONValueF<A>

I have no idea if these changes have affects outside of JSON.stringify, or if they are the right way to address this problem. They are just me poking around trying to figure out the issue.

@uhyo uhyo added the bug Something isn't working label Jan 6, 2024
@aaditmshah
Copy link
Contributor

I'm pretty sure this code is sound/correct, as you can always return the input value as output from the replacer function (it is the expected behavior for anything you don't want to replace), so this should not result in an error.

This is false. For example, you can't stringify a bigint.

JSON.stringify(0n, (_, value) => value); // TypeError: Do not know how to serialize a BigInt

The JSON.stringify function is supposed to convert JSON values into strings. A definition of a JSON value is precise.

type JSONValue = string | number | boolean | { [key: string]: JSONValue } | JSONValue[] | null;

However, sometimes we want to stringify a value which isn't strictly JSON. For example, we might want to stringify an object which contains a Set or a Map. To do so, we first need to convert the value into a JSON value. We can convert any value to JSON using an anamorphism. An anamorphism is a function which recursively generates a data structure. In our case, we want to recursively generate a JSON value. Here's what an anamorphism for JSON looks like.

const ana = <A>(value: A, replacer: (value: A) => JSONValueF<A>): JSONValue => {
  const result = replacer(value);
  if (result === null || typeof result !== 'object') return result;
  if (Array.isArray(result)) return result.map((value) => ana(value, replacer));
  return Object.fromEntries(
    Object.entries(result).map(([key, value]): [string, JSONValue] => [
      key,
      ana(value, replacer),
    ])
  );
};

As you can see, the ana function converts any value of type A into a JSONValue. However, in order for it to do that you need to provide it a replacer function which converts a value of type A into a JSONValueF<A>. The ana function recursively generates the JSON value, and the replacer function is a single step in this recursive process. That means the replacer function needs to convert a value of type A into an "almost JSON" value. This "almost JSON" value is JSONValueF<A>, i.e. the F-algebra of JSON values, and it's defined as follows.

type JSONValueF<A> = string | number | boolean | { [key: string]: A } | A[] | null;

As you can see, JSONValueF<A> is almost the same as JSONValue. The only difference is that we replace the recursion in JSONValue with the non-recursive type A. In fact, JSONValue is the fixed point of JSONValueF<A>. This means that, JSONValue is the same type as JSONValueF<JSONValue>. Let's write that down, because it's important.

type JSONValue = JSONValueF<JSONValue>

// Therefore, substituting `JSONValueF<JSONValue>` for `JSONValue` we get

type JSONValue = JSONValueF<JSONValueF<JSONValue>>

// Therefore, substituting `JSONValueF<JSONValue>` for `JSONValue` we get

type JSONValue = JSONValueF<JSONValueF<JSONValueF<JSONValue>>>

// Therefore, substituting `JSONValueF<JSONValue>` for `JSONValue` we get

type JSONValue = JSONValueF<JSONValueF<JSONValueF<JSONValueF<JSONValue>>>>

As you can see, repeated applications of JSONValueF to itself ad-infinitum is the same as its fixed point, i.e. JSONValue. So, if you think of JSONValue as an onion, you can think of JSONValueF as a single layer of that onion. It's a single recursive step.

Anyway, coming back to the ana function. If you give ana a replacer function that performs a single step of the recursion by converting a value of type A into a JSONValueF<A>, then ana can use that replacer function to recursively build a JSONValue.

And, once you have a JSONValue you can convert it to a string using JSON.stringify. So, JSON.stringify is really the combination of two different functions. It's a combination of ana and stringify. First, it recursively converts a value into a JSONValue and then it converts the JSONValue into a string.

The value parameter should be a subset type of the return type for the replacer function, but as things are currently defined A is not assignable to JSONValueF<A>.

The value parameter is not supposed to be assignable to JSONValueF<A>. At least not for all possible types A. The job of the replacer function is not to replace a value with another value. The job of the replacer function is to generate one layer of a JSON value. It's unfortunate that this function is called replacer because it's a misnomer. It's not actually replacing anything. It would be more accurate to call the replacer function something like toJSONStep instead.

So, how do you actually use the replacer function of JSON.stringify? I wrote an example of how it can be used in the following thread.

mattpocock/ts-reset#124

Hope that clears your doubts. Let me know if there's anything else I can elucidate for you.

@MicahZoltu
Copy link
Author

I see the point you are making, but it is still not clear to me how to actually put this to use properly. In my actual use case, I want to do a normal JSON.stringify, but I would like to add support for bigint and Uint8Array. I otherwise want it to behave exactly like the built-in JSON.stringify. My code (which works as desired at runtime) looks something like this:

export function jsonStringify(value: unknown, space?: string | number | undefined): string {
    return JSON.stringify(value, (_, value) => {
		if (typeof value === 'bigint') return `0x${value.toString(16)}n`
		if (value instanceof Uint8Array) return `b'${Array.from(value).map(x => x.toString(16).padStart(2, '0')).join('')}'`
		return value
    }, space)
}

My expectation is that other than adding support for bigint and Uint8Array the behavior of JSON.stringify remains unchanged, including throwing an exception if it tries to serialize something that it doesn't know how to, and including all of the weird idiosyncrasies around default serialization of maps, functions, classes, etc. My intent isn't to support all possible types, only enhance the default serialization/deserialization with a couple of types specific to my application.

I think the place my mental model is not aligning with your mental model is that I am of the opinion that the replacer function is not expected to handle all possible inputs, it is only meant to enhance the default JSON.stringify with some subset of possible inputs that otherwise wouldn't serialize the way one expects/wants. It sounds like you are of the belief that the replacer function should handle every possible input or the replacer itself should throw/return undefined, but it should never return an invalid JSON item?

Even if I accept that the replacer function should never return an invalid JSON value, it still unclear to me how I can define my jsonStringify function without having to manually handle every possible built-in supported type or typecast. The following "works" but it feels like I'm doing something wrong when my goal is to replicate the built-in JSON.stringify behavior (including errors).

export function jsonStringify(value: unknown, space?: string | number | undefined): string | undefined {
    return JSON.stringify(value, (_, value) => {
		if (typeof value === 'bigint') return `0x${value.toString(16)}n`
		if (value instanceof Uint8Array) return `b'${Array.from(value).map(x => x.toString(16).padStart(2, '0')).join('')}'`
		if (typeof value === 'string') return value
		if (typeof value === 'number') return value
		if (typeof value === 'boolean') return value
		if (value === null) return value
		if (Array.isArray(value)) return value
		if (typeof value === 'object') return value as Record<string, unknown> // TODO: validate that all keys are strings
		return undefined
    }, space)
}

Tangential question: Are there any cases where JSON.stringify will throw an error other than bigint? In my testing, all other types of variables seem to either serialize or turn into undefined. I could probably get on board with the replacer needing to handle erroneous cases (like bigint), but I don't think TypeScript's type system really supports that since you can't do unknown except bigint.

@aaditmshah
Copy link
Contributor

aaditmshah commented Jan 7, 2024

The easiest way to solve this problem is to strengthen the input type. Consider defining your own JSON-like type.

export type MyJSONValue =
  | JSONPrimitive
  | bigint
  | Uint8Array
  | MyJSONValue[]
  | { [key: string]: MyJSONValue };

Now you can easily define your jsonStringify function. Just change the type of value from unknown to MyJSONValue.

export const jsonStringify = (value: MyJSONValue, space?: string | number): string =>
  JSON.stringify(
    value,
    (_, value) => {
      if (typeof value === 'bigint') return `0x${value.toString(16)}n`;
      if (value instanceof Uint8Array)
        return `b'${[...value]
          .map((x) => x.toString(16).padStart(2, '0'))
          .join('')}'`;
      return value;
    },
    space
  );

This has several advantages.

  1. You don't need to add extra code to handle all the cases.
  2. Because TypeScript uses structural typing, as opposed to nominal typing, it just works out of the box.
  3. If the input value has an unexpected type then TypeScript will flag it for you. This is a good thing, because you can then decide how you want to handle the issue. For example, you could decide to add an extra case to MyJSONValue. Or, you might find out that some other part of your application has a bug which is causing TypeScript to complain. There are a lot of benefits to using a more precise type than unknown.

However, if you don't want to strength the input type then you can weaken the output type. I see that you already figured out how to make jsonStringify type check by weakening the output type. There are two disadvantages of weakening the output type.

  1. You have to handle a lot of extra cases.
  2. The result will be of type string | undefined instead of just string.

Both of these problems can be easily solved. To solve the first problem, we just need to create a new type predicate.

export const isJSONValueF = (value: unknown): value is JSONValueF<unknown> =>
  typeof value === 'string' ||
  typeof value === 'number' ||
  typeof value === 'boolean' ||
  typeof value === 'object';

Now, instead of writing several cases you can write a single case for JSON values.

Next, to solve the second problem, we just throw an error if the return value of JSON.stringify is undefined.

export const jsonStringify = (value: unknown, space?: string | number): string => {
  const result = JSON.stringify(
    value,
    (_, value) => {
      if (typeof value === 'bigint') return `0x${value.toString(16)}n`;
      if (value instanceof Uint8Array)
        return `b'${[...value]
          .map((x) => x.toString(16).padStart(2, '0'))
          .join('')}'`;
      return isJSONValueF(value) ? value : undefined;
    },
    space
  );
  if (result === undefined)
    throw new TypeError('The input value contains a function or a symbol');
  return result;
};

Personally, I prefer strengthening the input type instead of weakening the output type. Strengthening the input type helps catch more bugs at compile time and doesn't require any extra JavaScript code.

Tangential question: Are there any cases where JSON.stringify will throw an error other than bigint?

Yes, it will also throw an error if your input value has circular references.

@MicahZoltu
Copy link
Author

The easiest way to solve this problem is to strengthen the input type. Consider defining your own JSON-like type.

In situations where you are stringifying a concretely typed thing, I agree that this approach is certainly the way to go. In my use case, this function is used to serialize thrown data for error reporting, where I don't have control over what is thrown (libraries can throw arbitrarily shaped things) and I need a reliable way to serialize that to a string. I can't think of how I could handle cycles other than by building my own serializer from scratch, but I can trivially handle bigints, and I can serialize common objects (e.g., Uint8Array) into something readable.

Yes, it will also throw an error if your input value has circular references.

I believe the type definitions you have here don't protect against circular references either, so it seems that if one handles the bigint case, everything else should be able to be returned from the replacer function and stuff will work properly?


I can appreciate, and support, the desire to give stronger type checking when you are serializing something with a shape known at compile time and I wouldn't want a change that breaks that. Perhaps an additional overload can be supplied for when value is explicitly of type unknown, and in this case the replacer function would just be (string, unknown) => unknown? This would retain the strict validation that the replacer function properly handles everything in the case where you have a well known input type, while still allowing for the case where you are serializing unknown JS values without needing typecasts in the replacer function.

@aaditmshah
Copy link
Contributor

aaditmshah commented Jan 8, 2024

The easiest way to solve this problem is to strengthen the input type. Consider defining your own JSON-like type.

In situations where you are stringifying a concretely typed thing, I agree that this approach is certainly the way to go. In my use case, this function is used to serialize thrown data for error reporting, where I don't have control over what is thrown (libraries can throw arbitrarily shaped things) and I need a reliable way to serialize that to a string.

If you can't strengthen the input then you have to weaken the output. That means, instead of writing return value you have to write return isJSONValueF(value) ? value : undefined. It's a minor change.

I can't think of how I could handle cycles other than by building my own serializer from scratch, but I can trivially handle bigints, and I can serialize common objects (e.g., Uint8Array) into something readable.

You don't handle cycles. Trying to stringify a cyclic data structure is an error. And, you don't handle errors. You fix them. So, don't worry about "handling" cyclic data structures.

Yes, it will also throw an error if your input value has circular references.

I believe the type definitions you have here don't protect against circular references either

Correct. The TypeScript type system is not sophisticated enough to be able to guard against circular references.

so it seems that if one handles the bigint case, everything else should be able to be returned from the replacer function and stuff will work properly?

I don't understand the question. Could you elucidate? Understood the question. Yes, if the return type was something like SetDifference<unknown, bigint>, i.e. any value except for bigint, then it would be safe. However, safety is not the only issue here. We can be precise about the return type, and being precise has additional advantages which I explain in the next section.

Perhaps an additional overload can be supplied for when value is explicitly of type unknown, and in this case the replacer function would just be (string, unknown) => unknown? This would retain the strict validation that the replacer function properly handles everything in the case where you have a well known input type, while still allowing for the case where you are serializing unknown JS values without needing typecasts in the replacer function.

I don't see any reason to add an additional overload for unknown inputs. What you originally thought was a bug, isn't a bug. The type definition of JSON.stringify is correct. Additionally, the specific problem that you're trying to solve is easily solvable by weakening the output.

Finally, the type (key: string, value: unknown) => unknown is plain wrong. The unknown return type allows you to return anything as an output. However, we statically know that the output can't be anything. For example, we know that the output definitely can't be a bigint because that's an error. However, even if it wasn't an error we should still restrict the output to JSONValueF<unknown> | undefined because the expectation is that replacer must return a JSON-like value. In vanilla JavaScript it's all right to return non-JSON values because JSON.stringify would just ignore them. However, in TypeScript we can be more precise about the return type.

Additionally, by forcing developers to return undefined instead of non-JSON values we can be more precise about the return type of JSON.stringify too. If the replacer function never returns undefined then the output of JSON.stringify is just string. However, if the replacer function does return undefined then the output of JSON.stringify is correctly string | undefined. If we change the return type of replacer to unknown then the output of JSON.stringify would always be string | undefined. We wouldn't be able to take advantage of the the overload with the return type string.

@MicahZoltu
Copy link
Author

I think our disagreement comes down to a disagreement on what the "correct" function of the replacer is. This is valid code IMO, but it sounds like you are of the opinion it is not:

JSON.stringify({ a: new Map(), b: new Set(), c: () => {}, d: 5n }, x => x)

IMO, the job of the replacer isn't to definitely handle all possible inputs, but instead it is meant to supplement default JSON.stringify behavior with additional behaviors. If I understand your position correctly, the replacer function should handle all provided input subtypes, only delegating handling to the built-in JSON.stringify for things that are well supported by the bulit-in behavior (which is limited to the subset of string | number | boolean | object.

If I follow your suggestion, are you confident that all default behavior would be retained? This includes things like arrays with function elements serializing as null instead of undefined or omitted and similar unexpected behavior. I think the thing I'm struggling with is that I have used JS enough to not be confident that your proposed isJSONValueF(value) ? value : undefined will correctly mimic built-in behavior.

@aaditmshah
Copy link
Contributor

I think our disagreement comes down to a disagreement on what the "correct" function of the replacer is. This is valid code IMO, but it sounds like you are of the opinion it is not:

JSON.stringify({ a: new Map(), b: new Set(), c: () => {}, d: 5n }, x => x)

No, this code looks correct to me too.

IMO, the job of the replacer isn't to definitely handle all possible inputs, but instead it is meant to supplement default JSON.stringify behavior with additional behaviors. If I understand your position correctly, the replacer function should handle all provided input subtypes, only delegating handling to the built-in JSON.stringify for things that are well supported by the bulit-in behavior (which is limited to the subset of string | number | boolean | object.

No, I'm not of the opinion that the replacer function should handle all possible inputs either. In fact, it's perfectly fine if the replacer function doesn't handle any input at all. For example, I think the following code is correct.

JSON.stringify(class FooBar {}, () => Math.E);

If I follow your suggestion, are you confident that all default behavior would be retained? This includes things like arrays with function elements serializing as null instead of undefined or omitted and similar unexpected behavior. I think the thing I'm struggling with is that I have used JS enough to not be confident that your proposed isJSONValueF(value) ? value : undefined will correctly mimic built-in behavior.

Yes, it works as expected. Playground Link.

const result = JSON.stringify([() => {}], (_, x) => typeof x === 'function' ? undefined : x);

console.log(result); // "[null]"

@MicahZoltu
Copy link
Author

Yes, it works as expected. Playground Link.

Yeah, I checked that one before posting but still included it as an example of a weird edge case. I'm curious if all such weird edge cases are handled correctly with your proposed solution, or if there will still be some differences in the output?

JSON.stringify({ a: new Map(), b: new Set(), c: () => {}, d: 5n }, x => x)

No, this code looks correct to me too.

Sorry, I meant this (my previous example was definitely wrong), which fails to typecheck with better-typescript-lib:

JSON.stringify({ a: new Map(), b: new Set(), c: () => {}, d: 5n }, (_key, value) => value)

@aaditmshah
Copy link
Contributor

Yeah, I checked that one before posting but still included it as an example of a weird edge case. I'm curious if all such weird edge cases are handled correctly with your proposed solution, or if there will still be some differences in the output?

Yes, all edge cases are handled correctly. If you return a non-JSON value from the replacer function then it's treated as undefined by JSON.stringify anyway. The internal SerializeJSONProperty function literally returns undefined for non-JSON values. So, return isJSONValueF(value) ? value : undefined always produces the same result as return value.

Sorry, I meant this (my previous example was definitely wrong), which fails to typecheck with better-typescript-lib:

JSON.stringify({ a: new Map(), b: new Set(), c: () => {}, d: 5n }, (_key, value) => value)

It also throws a TypeError. Hence, better-typescript-lib correctly prevents you from making such mistakes.

@MicahZoltu
Copy link
Author

It also throws a TypeError. Hence, better-typescript-lib correctly prevents you from making such mistakes.

If you know the type in advance, yes (and I can appreciate your argument that the code is incorrect because of that). These do not throw a runtime error though, and are also correct IMO (but this library prevents me from writing them):

JSON.stringify({ a: new Map(), b: new Set(), c: () => {} }, (_key, value) => value)
JSON.stringify({ a: new Map(), b: new Set(), c: () => {}, d: 5n }, (_key, value) => typeof value === 'bigint' ? value.toString() : value)

@aaditmshah
Copy link
Contributor

There's a flaw in your reasoning though. You're assuming that what you want is also what everybody else wants. Consider the following data.

const exhibitA = {
  foo: new Map([[0, 1], [1, 2], [2, 4], [3, 8], [4, 16], [5, 31]])
};

We want to serialize it. Hence, we use JSON.stringify with a replacer that converts every Map into an isomorphic object. Then the software requirements change, and now we need to serialize the following data too.

const exhibitB = {
  bar: new Set([1, 2, 4, 8, 16, 31])
};

However, we forget to update the replacer function. If we're using your definition of JSON.stringify then we won't get any compile-time type error. Hence, JSON.stringify would silently and incorrectly stringify exhibitB as {foo:{}}. This violates The Zen of Python, which states, "Errors should never pass silently."

This is a bug which could have easily been caught at compile time. So, there's real value in forcing developers to only return a value which satisfies JSONValueF<A>. The objective of the replacer is to convert non-JSON values into JSON values so that they can be serialized. Hence, it makes perfect sense to force developers to return a JSON-like value.

If you don't want to return a JSON-like value, then you can explicitly return undefined instead. It doesn't make sense to reduce type safety for everybody just to satisfy your desire to avoid an if statement. Besides, as The Zen of Python states, "Explicit is better than implicit." Hence, IMHO explicitly returning undefined for non-JSON values makes the code more understandable.

@MicahZoltu
Copy link
Author

Your reference to The Zen of Python makes me chuckle given how amazingly un-typed (dynamic) Python is. 😆

I agree with you that if a user is leveraging the structural typing you have created here, then a change as you suggested should result in an error and I would not want any change to support my use case to break that. This is why ideally I would like (but maybe it isn't possible) an overload specifically for unknown inputs, which is a pretty common scenario for (de)serializing JSON in my experience (e.g., data read off disk, data received over a non-JSON wire protocol, error data from libraries, interfacing with poorly typed libraries, etc.).

Are you of the opinion that there isn't a way to add an overload that is a bit more permissive specifically for the unknown input case?

@aaditmshah
Copy link
Contributor

This is why ideally I would like (but maybe it isn't possible) an overload specifically for unknown inputs

You've not provided any convincing reason to add an overload for unknown inputs.

Are you of the opinion that there isn't a way to add an overload that is a bit more permissive specifically for the unknown input case?

Indeed. I see no way to add an overload for unknown inputs without sacrificing type safety or impairing the experience of other developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants