Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

with Multiline option still get \n at the end of json text #40

Open
ethanvc opened this issue May 13, 2024 · 7 comments
Open

with Multiline option still get \n at the end of json text #40

ethanvc opened this issue May 13, 2024 · 7 comments

Comments

@ethanvc
Copy link

ethanvc commented May 13, 2024

	var buf bytes.Buffer
	encoder := jsontext.NewEncoder(&buf, jsontext.Multiline(false))
	encoder.WriteToken(jsontext.ObjectStart)
	encoder.WriteToken(jsontext.ObjectEnd)
	s := buf.String()
	if s == "{}\n" {
                 // will go here, which is not consistent with Multiline option
		fmt.Println("here")
	}

This behavior is not expected by library users.
Expected output: "{}" without the tail \n.

@dsnet
Copy link
Collaborator

dsnet commented May 13, 2024

The Multiline option used to be called Expand, which somewhat avoided this oddity. \cc @veqryn

The documentation for Multiline says:

the JSON output should expand to multiple lines,
where every JSON object member or JSON array element appears on
a new, indented line according to the nesting depth.

In other words, it affects the emission of newlines within top-level JSON values, and does not affect the emission of newlines between top-level JSON values.

It is a valid use-case to have output where each top-level JSON value be non-multiline, but separated by newlines:

{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

Such a format is usually called "newline delimited JSON".

Furthermore, the current semantic matches the behavior of the v1 "json" package.

Thus, the current semantic is reasonable, though the naming of the option may need to be adjusted to be clear that it only takes effect within a top-level value.

The Marshal and MarshalWrite functions internally avoid the trailing newline because they use an unexported OmitTopLevelNewline option. Perhaps exposing this option is what you really want?

@dsnet
Copy link
Collaborator

dsnet commented May 13, 2024

One reason we did not expose OmitTopLevelNewline is because of the following edge case:

func main() {
    var buf bytes.Buffer
    encoder := jsontext.NewEncoder(&buf, jsontext.OmitTopLevelNewline(true))
    encoder.WriteToken(jsontext.Uint(98765))
    encoder.WriteToken(jsontext.Uint(43210))
    fmt.Println(buf.String())
}

What should this do?

  • Report an error when writing 43210? (i.e., we only permit encoding a single top-level token)
  • Emit 9876543210 in which case it is impossible to parse back the original input?
  • Emit 98765 43210, where we special-case putting a whitespace in-between JSON numbers to avoid any ambiguities?

@dsnet
Copy link
Collaborator

dsnet commented May 13, 2024

An alternative API is an option that emits newlines between top-level values, instead of after every top-level value. The distinction is subtle, but accomplishes what you would want:

between after
"{}" "{}\n"
"{}\n{}" "{}\n{}\n"
"0" "0\n"
"0\n1" "0\n1\n"

@ethanvc
Copy link
Author

ethanvc commented May 14, 2024

One reason we did not expose OmitTopLevelNewline is because of the following edge case:

func main() {
    var buf bytes.Buffer
    encoder := jsontext.NewEncoder(&buf, jsontext.OmitTopLevelNewline(true))
    encoder.WriteToken(jsontext.Uint(98765))
    encoder.WriteToken(jsontext.Uint(43210))
    fmt.Println(buf.String())
}

What should this do?

  • Report an error when writing 43210? (i.e., we only permit encoding a single top-level token)
  • Emit 9876543210 in which case it is impossible to parse back the original input?
  • Emit 98765 43210, where we special-case putting a whitespace in-between JSON numbers to avoid any ambiguities?

I think the first option is good and if we want write multi top json value in one buffer, we should do it explicitly.

@ethanvc
Copy link
Author

ethanvc commented May 16, 2024

An alternative API is an option that emits newlines between top-level values, instead of after every top-level value. The distinction is subtle, but accomplishes what you would want:

between after
"{}" "{}\n"
"{}\n{}" "{}\n{}\n"
"0" "0\n"
"0\n1" "0\n1\n"

Yes, this is what i want, and i think is more reasonable.

@ethanvc
Copy link
Author

ethanvc commented May 23, 2024

so, what's the plan for this topic? @dsnet

@dsnet
Copy link
Collaborator

dsnet commented May 23, 2024

I agree this is something to address, but I think we'll defer addressing it until the v2 work lands in stdlib. There's already enough other work to do in preparation for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants