Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add std::any::Provider support to Serializer #2420

Closed

Conversation

bryanburgers
Copy link
Contributor

@bryanburgers bryanburgers commented Mar 27, 2023

Turn a Serializer into a std::any::Provider in a backward compatible way.

This allows specialization between a Serializer and a Serialize in a generic way.

The one particular use case I have in mind is for serde_dynamo. DynamoDB supports lists of generic items – which is what serde_dynamo currently uses to serialize e.g. a Vec<String>. However, DynamoDB also supports specialized types like String Set that it would be nice to be able to opt in to.

Using std::any::Provider, this could look something like this:

/// A specialized serializer
///
/// This would most likely be used with `#[serde(with = "string_set")]`
struct StringSet<T>(T);

impl<T> Serialize for StringSet<T> where T: Serialize {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        if let Some(string_set_serializer) = std::any::request_value::<serde_dynamo::StringSetSerializer>(&serializer.provider()) {
            self.0.serialize(string_set_serializer)
        } else {
            self.0.serialize(serializer)
        }
    }
}

/// In user code, a struct that wants `strings` to be serialized
/// using a DynamoDb string set *if* the serializer is
/// `serde_dynamo`
#[derive(Debug, Serialize, Deserialize)]
struct Subject {
    strings: StringSet<Vec<String>>,
}

With this set up, Subject.strings would be serialized as normal using serde_json

{
    "strings": ["one", "two"]
}

but when using serde_dynamo, Subject.strings would use the specific StringSetSerializer to get custom behavior.

This is a very specific example where serde_dynamo would likely provide both the StringSet wrapper and the StringSetSerializer.

There may also be situations where the ecosystem would develop common practices. One example is the Serializer::is_human_readable function. If this existed before that was introduced, it's possible the serializers could have provided struct IsHumanReadable(bool) and serializers could have used that to determine how to display timestamps. It's possible that this alleviates some of the need for #455, but I haven't looked at every use case there.

Links regarding std::any::Provider:

Implementation notes:

  • I opted for getting this out for discussion and review quickly, which means I didn't spend too much time on documentation and zero time on tests. If this is an approach the team is interested in, I'll be happy to revisit those. I've added doc tests to provide and provider.
  • There doesn't necessarily need to be a Serializer::provider convenience method. We could use documentation to push implementors to use SerializerProvider::wrap(&serializer) when they need a provider. Based on doc tests and some experimenting, I think having the default provider method is valuable.
  • This is most useful to my use case on Serializer, but I wonder if it also would be useful on the Serialize* sub-traits of Serializer, and/or on Deserialize traits. I've added this to Deserialize as well.

@RReverser
Copy link

This would be extremely useful for things like serde_json::RawValue and similar!

@RReverser
Copy link

Although it would need to apply to Deserialize as well.

@bryanburgers
Copy link
Contributor Author

I can certainly add it for Deserialize, too!

@RReverser
Copy link

RReverser commented Apr 2, 2023

FWIW I'm just a drive-by reader, not maintainer of Serde, although I could very much make use of such feature if it ever lands to avoid hacks for communication between deserializer & custom type in serde-wasm-bindgen.

@bryanburgers bryanburgers force-pushed the serializer-provider branch 2 times, most recently from 6b6b91e to 3ac28f0 Compare April 3, 2023 13:55
@oli-obk
Copy link
Member

oli-obk commented Apr 4, 2023

I don't know about the timeline on std::any::Provider stabilization, but if/when it lands this would indeed be a great solution to the serializer-specific specialization problem.

I think having the default provider method is valuable.

While it is valuable for everyone who needs it, it also adds a just-for-convenience method to an already extremely big trait. Afaict even the SerializerProvider type is not needed in serde itself, right?

@bryanburgers
Copy link
Contributor Author

I don't know about the timeline on std::any::Provider stabilization, but if/when it lands this would indeed be a great solution to the serializer-specific specialization problem.

🥳

Yeah, I'm aware that std::any::Provider isn't stabilized, and I'm in no hurry.

I think having the default provider method is valuable.

While it is valuable for everyone who needs it, it also adds a just-for-convenience method to an already extremely big trait. Afaict even the SerializerProvider type is not needed in serde itself, right?

I suppose there's a balance here between smaller API surface and more documentation burden.

  • Serializer::provider method + SerializerProvider
    • Adds an unnecessary method to the trait
    • Documentation says "Use serializer.provider()"
  • SerializerProvider only
    • No unnecessary method on the trait
    • Documentation says "Use SerializerProvider::wrap(&serializer)" and users need to know about SerializerProvider
  • Neither
    • No unnecessary method on the trait
    • No extra SerializerProvider and DeserializerProvider structs
    • Documentation has at least one example of creating your own Provider adapter and then using it

serde is a huge part of the Rust ecosystem, so I'm going to defer to you if we fall at different points on that balance.

I'll be glad to take that stuff out and update the documentation.

Turn a `Serializer` and a `Deserializer` into a [`std::any::Provider`]
in a backward compatible way.

This allows specialization between a `Serializer` and a `Serialize`, and
between a `Deserializer` and a `Deserialize`.

The one particular use case I have in mind is for [`serde_dynamo`].
DynamoDB supports lists of generic items – which is what `serde_dynamo`
currently uses to serialize e.g. a `Vec<String>`. However, DynamoDB also
supports specialized types like [String Set] that it would be nice to be
able to opt in to.

Using [`std::any::Provider`], this could look something like this:

    /// A specialized serializer
    ///
    /// This would most likely be used with `#[serde(with = "string_set")]`
    struct StringSet<T>(T);

    impl<T> Serialize for StringSet<T> where T: Serialize {
        fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where
            S: serde::Serializer,
        {
            if let Some(string_set_serializer) = std::any::request_value::<serde_dynamo::StringSetSerializer>(&serializer.provider()) {
                self.0.serialize(string_set_serializer)
            } else {
                self.0.serialize(serializer)
            }
        }
    }

    /// In user code, a struct that wants `strings` to be serialized
    /// using a DynamoDb string set *if* the serializer is
    /// `serde_dynamo`
    #[derive(Debug, Serialize, Deserialize)]
    struct Subject {
        strings: StringSet<Vec<String>>,
    }

With this set up, `Subject.strings` would be serialized as normal using
`serde_json`

    {
        "strings": ["one", "two"]
    }

but when using `serde_dynamo`, `Subject.strings` would use the specific
`StringSetSerializer` to get custom behavior.

This is a very specific example where `serde_dynamo` would likely
provide both the `StringSet` wrapper and the `StringSetSerializer`.

There *may* also be situations where the ecosystem would develop common
practices. One example is the `Serializer::is_human_readable` function.
If this existed before that was introduced, it's *possible* the
serializers could have provided `struct IsHumanReadable(bool)` and
serializers could have used that to determine how to display timestamps.

Links regarding [`std::any::Provider`]:

* `std::any::Provider` RFC: https://rust-lang.github.io/rfcs/3192-dyno.html
* `std::any::Provider` tracking issue: rust-lang/rust#96024

[`std::any::Provider`]: https://doc.rust-lang.org/nightly/core/any/trait.Provider.html
[`serde_dynamo`]: https://docs.rs/serde_dynamo
[String Set]: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html
@Mingun
Copy link
Contributor

Mingun commented Apr 25, 2023

This would be very useful for deserialization to add an ability to provide spans of deserialized objects to the Deserialize implementation aware of spans. Unfortunately, deserializer is consumed during deserialization so providing safe interface and implementation could be impossible (it could not be possible to ask deserializer of spans after deserialization because it is consumed).

I was able to implement such a concept by extending Deserializer trait with safe interface, but with unsafe implementation, a sketch:

trait PositionProvider {
  type Position;

  fn current_position() -> Self::Position;
}
trait Deserializer<'de> {
  type PositionProvider: PositionProvider;

  fn position_provider(&self) -> Self::PositionProvider;
  ...
}

// impl
impl<'de, 'a> Deserializer<'de> for &'a mut MyDeserializer {
  type PositionProvider = MyPositionProvider;

  fn position_provider(&self) -> Self::PositionProvider {
    MyPositionProvider(self as _)
  }
}

// Cannot use reference here because of borrow checker
// and that fact that deserializer is consumed during deserialization
struct MyPositionProvider(*const MyDeserializer);
impl PositionProvider for MyPositionProvider {
  type Position = usize;

  fn current_position(&self) -> usize {
    let de: &MyDeserializer = unsafe { &*self.0 };
    de.current_position();
  }
}

Then any type can be wrapped into

struct Spanned<T> {
  value: T,
  span: Range<usize>,
}
impl Deserialize<'de> for Spanned<T>
where
  T: Deserialize<'de>,
{
  fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> {
    let provider = deserializer.position_provider();
    let start = provider.current_position();
    let value = T::deserialize(deserializer)?;
    let end = provider.current_position();

    Ok(Self { value, span: (start..end).into() })
  }
}

This trick is only worked for deserializers that implements Deserializer for a reference to the deserializer struct (but it seems most of the deserializers implemented like that)

@RReverser
Copy link

@Mingun You could just make position_provider return an Rc<Cell<...>> or something. It doesn't have to hold a reference to the Deserializer - instead, Deserializer could own position provider and hand out its Rc-clones to anyone who wants them, making the code safe.

@RReverser
Copy link

Looks like the Provider feature has been removed from nightly Rust in favour of just extending the Error trait, so this PR is sadly no longer viable.

@oli-obk oli-obk closed this Dec 12, 2023
@Mingun
Copy link
Contributor

Mingun commented Dec 13, 2023

Actually, the Provider trait was needed only to use std::any::provide_value helper method. It is still possible to implement that method in serde for use with Serializer / Deserializer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants