Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonSerializerOptions.WriteIndividualObjects #38344

Closed
juliusfriedman opened this issue Jun 24, 2020 · 4 comments
Closed

JsonSerializerOptions.WriteIndividualObjects #38344

juliusfriedman opened this issue Jun 24, 2020 · 4 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Milestone

Comments

@juliusfriedman
Copy link
Contributor

juliusfriedman commented Jun 24, 2020

Background and Motivation

Streaming large JSON data to a client sometimes requires the use of streaming the response to get the ideal latency during loading.

E.g. one usually knows the entire data size and their position in that buffer and can typically easily display a progress element client side during such streaming.

This is especially hard when you have an object which has nested objects (which unfortunately depending on your settings of course) TimeSpan is serialized as a nested object, consider:

public class MyModel{
public string Name {get; set;} = "Whatever";
public int Age {get; set;} = 777;
public DateTime Birthday {get; set;} = new DateTime(1986, 11, 12);
public TimeSpan SomeDuration {get; set;} = TimeSpan.Zero;
}

Resulting Json would be:

{
"name": "Whatever",
"age": 777,
"birthday": "1986-11-12T00:00:00.0000000",
"someDuration:{"ticks":0,"days":0,"hours":0,"milliseconds":0,"minutes":0,"seconds":0,"totalDays":0,"totalHours":0,"totalMilliseconds":0,"totalMinutes":0,"totalSeconds":0}
}

And you can see the potential for the splitting of the duration to be greater than that of anything else in my model in this case; and usually what occurs is that part of the key is written at then end of the stream i.e. tot or total or totalS etc which means I have to go back and find whatever started my object (can be really hard depending on the nesting) delimit /slice there, attempt to partially parse what I do have (if anything) and then wait for the rest to proceed.

Proposed API

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions .. {
+    public bool WriteIndividualObjects {get; init;}
     }

Usage Examples

I have a list of 1 - N objects (Or just 1 really large object)

I want to continue to let ASP.Net handle response serialization and I don't want to manually write a JsonConverter to have the semantics I want for each individual type.

I want the JsonSerializer to respect WriteIndividualObjects and although it cannot be guaranteed that the client will have the adequate buffer to receive such data in a single receive I will only write in those chunks specified.

The goal is to avoid when chunking the data that the receiver has to deal with partial objects if at all possibe:

Extreme Case

[{...................... | Chunk (partial)
....},{...........}| Chunk (Split, End and Start Partial)
,{.......................| Chunk (partial)
.}] |EOS

Typical Case

[{......},{.........},{....... | Chunk (2 objects 1 Partial)
....},{...........}| Chunk (Split, End and Start Partial)
,{.......................| Chunk (partial)
.}] |EOS

I would like to avoid the split JSON object (especially keys) in-between chunks if at all possible.

I realize this is highly dependent on the clients receive buffer among other things however if there was at least a strategy which did not involve manually having to write each object into the response stream then I would be satisfied.

After setting proposed property I would expect the writes / flushes to the stream to occur at object boundaries so it was less likely I would have to deal with nested object graphs (yes I know if the object is very large it's hard to avoid this).

Result Case

[{......},| Start of list and 1 complete object and delimiter
{.........},| 1 Complete object and delimiter
{...........},| 1 Complete object and delimiter
{...........},| 1 Complete object and delimiter
{.....................}]|EOS 1 Complete object and end of list

Perhaps the start and end of list should also be small writes to allow for even easier parsing. (Especially on fast connections), i.e.:

[|Start of list
{......},| 1 complete object and delimiter
{.........},| 1 Complete object and delimiter
{...........},| 1 Complete object and delimiter
{...........},| 1 Complete object and delimiter
{.....................} 1 Complete object (user knows end of list must be next)
]|end of list

Alternative Designs

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions .. {
+    public int FlushSize {get; init;} //char([]) /Span<char>FlushMarker(s) etc
     }

As it seems to compliment nicely DefaultBufferSize, which in theory I would then set to something like 8192 and the FlushSize to 4096 but that still can't guarantee that I write a chunk that contains invalid JSON i.e.: {.......,"SomeKe, y": null}

Consider Also

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions .. {
+    public void OnStartWrite(); //Called when a single complete object or primitive was being written to the underlying writer
+    public void OnEndWrite(); //Called when a single complete object or primitive was completely written to the underlying writer
+    public void OnStartRead(); //Called when a single complete object or primitive was being read to the underlying writer
+    public void OnEndRead(); //Called when a single complete object or primitive was completely read to the underlying writer
     }

Would allow for manually flushing of the Stream underlying or whatever other action might be required.

Possibly in addition to although could technically be used instead of the aforementioned but would require sizes to be known in advance and is probably useful in it's own regard.

One could also use Web Sockets / SignalR though that requires somewhat of a paradigm shift in Api and overhead for running the end point.

One could also use a client side parser such as oboejs which provides a variety of options and is very small.

I Believe there are also several variations of this approach, most of which are spec compliant e.g.:

  1. Using NewLine to delimit chunks of complete data only {....}\r\n or [\r\n{....}\r\n,\r\n{....}\r\n,\r\n]\r\n
  2. Using NewLine to delimit all pieces of an object e.g. WriteIndented (Just need to ensure not emitting a partial object key or value, if in a large graph stop at the last valid key which can fit in the buffer and start the next write at that key)
  3. Using a special object to delimit completed data [{...},true,{....}] etc (true in this case is the delimiter, still error prone as the true delimiter maybe split currently [no matter what you choose]) without changes

Risks

Low, depending on design choice, OnWrite methods could get chatty very quickly.

Benefits

Using fetch and ReadableStreams one can then attempt to use this setting such that they can avoid complex parsing logic which must deal with incomplete JSON data while streaming.

Other notes

It maybe beneficial to control both the Tab and NewLine and other characters emitted by the Reader and Writers through the options.

Would not be opposed to FlushIndividualObjects as the name as that is perhaps more inline with the actual semantics of it's operation.

Would not be opposed to FlushDelimiters as the name if that approach was taken.

@juliusfriedman juliusfriedman added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 24, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jun 24, 2020
@juliusfriedman juliusfriedman changed the title JsonSerializerOptions JsonSerializerOptions.WriteIndividualObjects Jun 24, 2020
@danmoseley
Copy link
Member

@juliusfriedman could you remove the "Please provide the specific public API signature diff that you are proposing. For example:" line above maybe then we could use this as an example of a good proposal?

@juliusfriedman
Copy link
Contributor Author

juliusfriedman commented Jun 26, 2020

@danmosemsft , Sorry I have done that. I am in the midst of a few deadlines so I apologize for that as well as my other proposal quality. I will get that this weekend. LMK if you need anything else.

@danmosemsft I Updated the proposal, please let me know if anything else needs to be clarified

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Jun 29, 2020
@layomia layomia added this to the Future milestone Jun 29, 2020
@doggy8088
Copy link

Refer to #55583 . I think it should have way to avoid chunking.

@eiriktsarpalis
Copy link
Member

I'm not sure what "Objects" means in the proposed WriteIndividualObjects name. What if you're serializing an array of tree-like documents that contain an arbitrary degree of nestedness? Should the the serializer be flushing every time it finishes writing an object at depth n? But then its parent objects would always be rendered incomplete in the flushed chunk.

We actually debated exposing something similar when designing IAsyncEnumerable serialization support, where the user could force flushing after iterating a fixed interval of elements. We ended up not implementing the feature, instead relying on DefaultBufferSize being the only mechanism for controlling flush intervals.

Given the above, I don't believe we should implementing this feature.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

6 participants