Skip to content

Commit

Permalink
Rusty byte strings in RON, deprecate base64 (byte) strings (#438)
Browse files Browse the repository at this point in the history
* Switch from base64 to rusty byte strings, deprecate base64 support

* Add the Value::Bytes variant

* Extend Value tests for Value::String and Value::Bytes

* Include byte strings in the RON grammar

* Fix ASCII escape decoding for strings and byte strings

* Fix byte string error display for #462 test

* Fix byte string error test

* Add a CHANGELOG entry

* Added a deprecation error test for v0.10

* Add tests for v0.9 optional base64 byte string support

Co-authored-by: Sebastian Dröge <sebastian@centricular.com>

* Add an example for using base64-encoded bytes with ron

* Fix formatting in README

* Remove outdated extension docs

* Add tests for unescaped and raw byte strings

* Fix fuzzer-found issue with serialising invalid UTF-8 byte strings

* Fix fuzzer found issue with `br#` being parsed as the identifier `br`

* Fix parsing of byte escapes in UTF-8 strings to produce proper Unicode characters

* Fix fuzzer-found interaction with unwrap_variant_newtypes

* Add support for strongly typed byte literals

* Add missing Value serialising tests

* Add test to show that #436 is solved with strongly typed base64 user-side types

* Add more coverage tests

---------

Co-authored-by: Sebastian Dröge <sebastian@centricular.com>
  • Loading branch information
juntyr and sdroege committed Sep 1, 2023
1 parent bc723e1 commit ba66c8e
Show file tree
Hide file tree
Showing 22 changed files with 1,271 additions and 114 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Allow `ron::value::RawValue` to capture any whitespace to the left and right of a ron value ([#487](https://github.com/ron-rs/ron/pull/487))
- Fix serialising reserved identifiers `true`, `false`, `Some`, `None`, `inf`[`f32`|`f64`], and `Nan`[`f32`|`f64`] ([#487](https://github.com/ron-rs/ron/pull/487))
- Disallow unclosed line comments at the end of `ron::value::RawValue` ([#489](https://github.com/ron-rs/ron/pull/489))
- **Format-Breaking:** Switch from base64-encoded to Rusty byte strings, still allow base64 deserialising for now ([#438](https://github.com/ron-rs/ron/pull/438))
- Add support for byte literals as strongly typed unsigned 8-bit integers ([#438](https://github.com/ron-rs/ron/pull/438))

## [0.8.1] - 2023-08-17

Expand Down
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ default = []
integer128 = []

[dependencies]
# FIXME @juntyr remove base64 once old byte strings are fully deprecated
base64 = "0.21"
bitflags = { version = "2.0", features = ["serde"] }
indexmap = { version = "2.0", features = ["serde"], optional = true }
Expand All @@ -37,3 +38,4 @@ serde_bytes = "0.11"
serde_json = "1.0"
option_set = "0.2"
typetag = "0.2"
bytes = { version = "1.3", features = ["serde"] }
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ While data structures with any of these attributes should roundtrip through RON,

* Numbers: `42`, `3.14`, `0xFF`, `0b0110`
* Strings: `"Hello"`, `"with\\escapes\n"`, `r#"raw string, great for regex\."#`
* Byte Strings: `b"Hello"`, `b"with \x65\x73\x63\x61\x70\x65\x73\n"`, `br#"raw, too"#`
* Booleans: `true`, `false`
* Chars: `'e'`, `'\n'`
* Optionals: `Some("string")`, `Some(Some(1.34))`, `None`
Expand Down
30 changes: 27 additions & 3 deletions docs/grammar.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ For the extension names see the [`extensions.md`][exts] document.
## Value

```ebnf
value = integer | float | string | char | bool | option | list | map | tuple | struct | enum_variant;
value = integer | byte | float | string | byte_string | char | bool | option | list | map | tuple | struct | enum_variant;
```

## Numbers
Expand All @@ -60,6 +60,8 @@ unsigned_octal = "0o", digit_octal, { digit_octal | "_" };
unsigned_hexadecimal = "0x", digit_hexadecimal, { digit_hexadecimal | "_" };
unsigned_decimal = digit, { digit | "_" };
byte = ascii | ("\\", (escape_ascii | escape_byte));
float = ["+" | "-"], ("inf" | "NaN" | float_num), [float_suffix];
float_num = (float_int | float_std | float_frac), [float_exp];
float_int = digit, { digit | "_" };
Expand All @@ -74,9 +76,13 @@ float_suffix = "f", ("32", "64");
```ebnf
string = string_std | string_raw;
string_std = "\"", { no_double_quotation_marks | string_escape }, "\"";
string_escape = "\\", ("\"" | "\\" | "b" | "f" | "n" | "r" | "t" | ("u", unicode_hex));
string_raw = "r" string_raw_content;
string_escape = "\\", (escape_ascii | escape_byte | escape_unicode);
string_raw = "r", string_raw_content;
string_raw_content = ("#", string_raw_content, "#") | "\"", { unicode_non_greedy }, "\"";
escape_ascii = "'" | "\"" | "\\" | "n" | "r" | "t" | "0";
escape_byte = "x", digit_hexadecimal, digit_hexadecimal;
escape_unicode = "u", digit_hexadecimal, [digit_hexadecimal, [digit_hexadecimal, [digit_hexadecimal, [digit_hexadecimal, [digit_hexadecimal]]]]];
```

> Note: Raw strings start with an `r`, followed by n `#`s and a quotation mark
Expand All @@ -93,6 +99,24 @@ Also see [the Rust document] about context-sensitivity of raw strings.

[the Rust document]: https://github.com/rust-lang/rust/blob/d046ffddc4bd50e04ffc3ff9f766e2ac71f74d50/src/grammar/raw-string-literal-ambiguity.md

## Byte String

```ebnf
byte_string = byte_string_std | byte_string_raw;
byte_string_std = "b\"", { no_double_quotation_marks | string_escape }, "\"";
byte_string_raw = "br", string_raw_content;
```

> Note: Byte strings are similar to normal strings but are not required to
contain only valid UTF-8 text. RON's byte strings follow the updated Rust
byte string literal rules as proposed in [RFC #3349], i.e. byte strings
allow the exact same characters and escape codes as normal strings.

[RFC #3349](https://github.com/rust-lang/rfcs/pull/3349)

> Note: Raw byte strings start with an `br` prefix and follow the same rules
as raw strings, which are outlined above.

## Char

```ebnf
Expand Down
146 changes: 146 additions & 0 deletions examples/base64.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
//! ron initially encoded byte-slices and byte-bufs as base64-encoded strings.
//! However, since v0.9, ron now uses Rusty byte string literals instead.
//!
//! This example shows how the previous behaviour can be restored by serialising
//! bytes with strongly-typed base64-encoded strings, or accepting both Rusty
//! byte strings and the legacy base64-encoded string syntax.

use base64::engine::{general_purpose::STANDARD as BASE64, Engine};
use serde::{de::Visitor, Deserialize, Deserializer, Serialize, Serializer};

#[derive(Debug, PartialEq, Serialize, Deserialize)]
struct Config {
#[serde(with = "ByteStr")]
bytes: Vec<u8>,
#[serde(with = "Base64")]
base64: Vec<u8>,
#[serde(with = "ByteStrOrBase64")]
bytes_or_base64: Vec<u8>,
}

enum ByteStr {}

impl ByteStr {
fn serialize<S: Serializer>(data: &[u8], serializer: S) -> Result<S::Ok, S::Error> {
serializer.serialize_bytes(data)
}

fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<Vec<u8>, D::Error> {
struct ByteStrVisitor;

impl<'de> Visitor<'de> for ByteStrVisitor {
type Value = Vec<u8>;

fn expecting(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
fmt.write_str("a Rusty byte string")
}

fn visit_bytes<E: serde::de::Error>(self, bytes: &[u8]) -> Result<Self::Value, E> {
Ok(bytes.to_vec())
}

fn visit_byte_buf<E: serde::de::Error>(self, bytes: Vec<u8>) -> Result<Self::Value, E> {
Ok(bytes)
}
}

deserializer.deserialize_byte_buf(ByteStrVisitor)
}
}

enum Base64 {}

impl Base64 {
fn serialize<S: Serializer>(data: &[u8], serializer: S) -> Result<S::Ok, S::Error> {
serializer.serialize_str(&BASE64.encode(data))
}

fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<Vec<u8>, D::Error> {
let base64_str = <&str>::deserialize(deserializer)?;
BASE64.decode(base64_str).map_err(serde::de::Error::custom)
}
}

enum ByteStrOrBase64 {}

impl ByteStrOrBase64 {
fn serialize<S: Serializer>(data: &[u8], serializer: S) -> Result<S::Ok, S::Error> {
if cfg!(all()) {
// either of these would work
serializer.serialize_str(&BASE64.encode(data))
} else {
serializer.serialize_bytes(data)
}
}

fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<Vec<u8>, D::Error> {
struct ByteStrOrBase64Visitor;

impl<'de> Visitor<'de> for ByteStrOrBase64Visitor {
type Value = Vec<u8>;

fn expecting(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
fmt.write_str("a Rusty byte string or a base64-encoded string")
}

fn visit_str<E: serde::de::Error>(self, base64_str: &str) -> Result<Self::Value, E> {
BASE64.decode(base64_str).map_err(serde::de::Error::custom)
}

fn visit_bytes<E: serde::de::Error>(self, bytes: &[u8]) -> Result<Self::Value, E> {
Ok(bytes.to_vec())
}

fn visit_byte_buf<E: serde::de::Error>(self, bytes: Vec<u8>) -> Result<Self::Value, E> {
Ok(bytes)
}
}

deserializer.deserialize_any(ByteStrOrBase64Visitor)
}
}

fn main() {
let ron = r#"Config(
bytes: b"only byte strings are allowed",
base64: "b25seSBiYXNlNjQtZW5jb2RlZCBzdHJpbmdzIGFyZSBhbGxvd2Vk",
bytes_or_base64: b"both byte strings and base64-encoded strings work",
)"#;

assert_eq!(
ron::from_str::<Config>(ron).unwrap(),
Config {
bytes: b"only byte strings are allowed".to_vec(),
base64: b"only base64-encoded strings are allowed".to_vec(),
bytes_or_base64: b"both byte strings and base64-encoded strings work".to_vec()
}
);

let ron = r#"Config(
bytes: b"only byte strings are allowed",
base64: "b25seSBiYXNlNjQtZW5jb2RlZCBzdHJpbmdzIGFyZSBhbGxvd2Vk",
bytes_or_base64: "Ym90aCBieXRlIHN0cmluZ3MgYW5kIGJhc2U2NC1lbmNvZGVkIHN0cmluZ3Mgd29yaw==",
)"#;

assert_eq!(
ron::from_str::<Config>(ron).unwrap(),
Config {
bytes: b"only byte strings are allowed".to_vec(),
base64: b"only base64-encoded strings are allowed".to_vec(),
bytes_or_base64: b"both byte strings and base64-encoded strings work".to_vec()
}
);

println!(
"{}",
ron::ser::to_string_pretty(
&Config {
bytes: b"only byte strings are allowed".to_vec(),
base64: b"only base64-encoded strings are allowed".to_vec(),
bytes_or_base64: b"both byte strings and base64-encoded strings work".to_vec()
},
ron::ser::PrettyConfig::default().struct_names(true)
)
.unwrap()
);
}
24 changes: 9 additions & 15 deletions src/de/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ use std::{
str,
};

use base64::Engine;
use serde::{
de::{self, DeserializeSeed, Deserializer as _, Visitor},
Deserialize,
Expand All @@ -17,7 +16,7 @@ use crate::{
error::{Result, SpannedResult},
extensions::Extensions,
options::Options,
parse::{Bytes, NewtypeMode, ParsedStr, StructType, TupleMode, BASE64_ENGINE},
parse::{Bytes, NewtypeMode, ParsedByteStr, ParsedStr, StructType, TupleMode},
};

mod id;
Expand Down Expand Up @@ -322,8 +321,12 @@ impl<'de, 'a> de::Deserializer<'de> for &'a mut Deserializer<'de> {
b'{' => self.deserialize_map(visitor),
b'0'..=b'9' | b'+' | b'-' | b'.' => self.bytes.any_number()?.visit(visitor),
b'"' | b'r' => self.deserialize_string(visitor),
b'b' if matches!(self.bytes.bytes().get(1), Some(b'\'')) => {
self.bytes.any_number()?.visit(visitor)
}
b'b' => self.deserialize_byte_buf(visitor),
b'\'' => self.deserialize_char(visitor),
other => Err(Error::UnexpectedByte(other as char)),
other => Err(Error::UnexpectedByte(other)),
}
}

Expand Down Expand Up @@ -460,18 +463,9 @@ impl<'de, 'a> de::Deserializer<'de> for &'a mut Deserializer<'de> {
return visitor.visit_byte_buf(bytes);
}

let res = {
let string = self.bytes.string()?;
let base64_str = match string {
ParsedStr::Allocated(ref s) => s.as_str(),
ParsedStr::Slice(s) => s,
};
BASE64_ENGINE.decode(base64_str)
};

match res {
Ok(byte_buf) => visitor.visit_byte_buf(byte_buf),
Err(err) => Err(Error::Base64Error(err)),
match self.bytes.byte_string()? {
ParsedByteStr::Allocated(byte_buf) => visitor.visit_byte_buf(byte_buf),
ParsedByteStr::Slice(bytes) => visitor.visit_borrowed_bytes(bytes),
}
}

Expand Down
2 changes: 1 addition & 1 deletion src/de/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ fn test_byte_stream() {
small: vec![1, 2],
large: vec![1, 2, 3, 4]
}),
from_str("BytesStruct( small:[1, 2], large:\"AQIDBA==\" )"),
from_str("BytesStruct( small:[1, 2], large:b\"\\x01\\x02\\x03\\x04\" )"),
);
}

Expand Down
2 changes: 1 addition & 1 deletion src/de/value.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ impl<'de> Visitor<'de> for ValueVisitor {
where
E: Error,
{
self.visit_string(String::from_utf8(v).map_err(|e| Error::custom(format!("{}", e)))?)
Ok(Value::Bytes(v))
}

fn visit_none<E>(self) -> Result<Self::Value, E>
Expand Down
Loading

0 comments on commit ba66c8e

Please sign in to comment.