Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apache-avro cannot serialize generated unions #71

Closed
mati865 opened this issue Aug 16, 2024 · 4 comments
Closed

apache-avro cannot serialize generated unions #71

mati865 opened this issue Aug 16, 2024 · 4 comments

Comments

@mati865
Copy link
Contributor

mati865 commented Aug 16, 2024

Related (but for deserialization): #18

Input.avsc:

{
  "name": "Foo",
  "type": "record",
  "fields": [
    {
        "name": "bar",
        "type": [
            "string",
            "bytes"
        ]
    }
  ]
}

Reproducer:

fn foo() {
        let msg = Foo {
            bar: "bar".to_string().into(),
        };

        let schema = Schema::parse_str("<schema str>").unwrap();
        let value = apache_avro::to_value(msg).unwrap();

        let result = value.resolve(&schema);
}

result here will contain Err(Could not find matching type in union). Again a quick and dirty fix is to apply #[serde(untagged)] but creating custom serializer probably would be preferred.
I might be able to find time for it somewhere within next 2 weeks.

@lerouxrgd
Copy link
Owner

I gave it a quick try and the following seems to work:

Working test for your example
#[cfg(test)]
mod tests {
    /// Auto-generated type for unnamed Avro union variants.
    #[derive(Debug, PartialEq, Eq, Clone, serde::Serialize, serde::Deserialize)]
    #[serde(remote = "Self")]
    pub enum UnionStringBytes {
        String(String),
        Bytes(#[serde(with = "apache_avro::serde_avro_bytes")] Vec<u8>),
    }

    impl From<String> for UnionStringBytes {
        fn from(v: String) -> Self {
            Self::String(v)
        }
    }

    impl TryFrom<UnionStringBytes> for String {
        type Error = UnionStringBytes;

        fn try_from(v: UnionStringBytes) -> Result<Self, Self::Error> {
            if let UnionStringBytes::String(v) = v {
                Ok(v)
            } else {
                Err(v)
            }
        }
    }

    impl From<Vec<u8>> for UnionStringBytes {
        fn from(v: Vec<u8>) -> Self {
            Self::Bytes(v)
        }
    }

    impl TryFrom<UnionStringBytes> for Vec<u8> {
        type Error = UnionStringBytes;

        fn try_from(v: UnionStringBytes) -> Result<Self, Self::Error> {
            if let UnionStringBytes::Bytes(v) = v {
                Ok(v)
            } else {
                Err(v)
            }
        }
    }

    impl<'de> serde::Deserialize<'de> for UnionStringBytes {
        fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where
            D: serde::Deserializer<'de>,
        {
            Self::deserialize(deserializer)
        }
    }

    #[rustfmt::skip]
    impl serde::Serialize for UnionStringBytes {
        fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where
            S: serde::Serializer,
        {
            struct NewtypeVariantSerializer<S>(S);

            impl<S> serde::Serializer for NewtypeVariantSerializer<S>
            where
                S: serde::Serializer,
            {
                type Ok = S::Ok;
                type Error = S::Error;
                type SerializeSeq = serde::ser::Impossible<S::Ok, S::Error>;
                type SerializeTuple = serde::ser::Impossible<S::Ok, S::Error>;
                type SerializeTupleStruct = serde::ser::Impossible<S::Ok, S::Error>;
                type SerializeTupleVariant = serde::ser::Impossible<S::Ok, S::Error>;
                type SerializeMap = serde::ser::Impossible<S::Ok, S::Error>;
                type SerializeStruct = serde::ser::Impossible<S::Ok, S::Error>;
                type SerializeStructVariant = serde::ser::Impossible<S::Ok, S::Error>;
                fn serialize_bool(self, _v: bool) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_i8(self, _v: i8) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_i16(self, _v: i16) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_i32(self, _v: i32) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_i64(self, _v: i64) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_u8(self, _v: u8) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_u16(self, _v: u16) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_u32(self, _v: u32) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_u64(self, _v: u64) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_f32(self, _v: f32) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_f64(self, _v: f64) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_char(self, _v: char) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_str(self, _v: &str) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_bytes(self, _v: &[u8]) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_none(self) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_some<T: ?Sized + serde::Serialize>(self, _value: &T) -> Result<Self::Ok, Self::Error>{ unimplemented!() }
                fn serialize_unit(self) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_unit_struct(self, _name: &'static str) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_unit_variant(self ,_name: &'static str, _variant_index: u32, _variant: &'static str) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_newtype_struct<T: ?Sized + serde::Serialize>(self, _name: &'static str, _value: &T,) -> Result<Self::Ok, Self::Error> { unimplemented!() }
                fn serialize_seq(self,_len: Option<usize>,) -> Result<Self::SerializeSeq, Self::Error> { unimplemented!() }
                fn serialize_tuple(self, _len: usize) -> Result<Self::SerializeTuple, Self::Error> { unimplemented!() }
                fn serialize_tuple_struct(self,_name: &'static str,_len: usize) -> Result<Self::SerializeTupleStruct, Self::Error> { unimplemented!() }
                fn serialize_tuple_variant(self,_name: &'static str,_variant_index: u32,_variant: &'static str,_len: usize) -> Result<Self::SerializeTupleVariant, Self::Error> { unimplemented!() }
                fn serialize_map(self,_len: Option<usize>) -> Result<Self::SerializeMap, Self::Error> { unimplemented!() }
                fn serialize_struct(self,_name: &'static str,_len: usize) -> Result<Self::SerializeStruct, Self::Error> { unimplemented!() }
                fn serialize_struct_variant(self,_name: &'static str,_variant_index: u32,_variant: &'static str,_len: usize) -> Result<Self::SerializeStructVariant, Self::Error> { unimplemented!() }
                fn serialize_newtype_variant<T: ?Sized + serde::Serialize>(
                    self,
                    _name: &'static str,
                    _variant_index: u32,
                    _variant: &'static str,
                    value: &T,
                ) -> Result<Self::Ok, Self::Error> {
                    value.serialize(self.0)
                }
            }

            Self::serialize(self, NewtypeVariantSerializer(serializer))
        }
    }

    #[derive(Debug, PartialEq, Eq, Clone, serde::Serialize, serde::Deserialize)]
    pub struct Foo {
        pub bar: UnionStringBytes,
    }

    #[test]
    fn union_serde() {
        let msg = Foo {
            bar: "bar_value".to_string().into(),
        };

        let schema = apache_avro::Schema::parse_str(
            r#"{
  "name": "Foo",
  "type": "record",
  "fields": [
    {
        "name": "bar",
        "type": [
            "string",
            "bytes"
        ]
    }
  ]
}"#,
        )
        .unwrap();
        let value = apache_avro::to_value(msg).unwrap();
        dbg!(&value);
        let s = value.resolve(&schema).unwrap();
        dbg!(s);
    }
}

You can copy it to lib.rs directly and play with it withcargo test union_serde -- --nocapture.

If you find that this approach is working fine I can implement it completely (in the templating system etc).

@mati865
Copy link
Contributor Author

mati865 commented Aug 18, 2024

Thank you, looks like a lot of work!
I'll try tomorrow and return with feedback.

@mati865
Copy link
Contributor Author

mati865 commented Aug 19, 2024

Although we cannot yet verify the correctness of serialized data, avro happily accepts it. I think it should be fine.

lerouxrgd added a commit that referenced this issue Sep 3, 2024
@lerouxrgd
Copy link
Owner

Released in 0.15.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants