Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core-util] Add reshape helper #28095

Closed
wants to merge 4 commits into from
Closed

[core-util] Add reshape helper #28095

wants to merge 4 commits into from

Conversation

xirzec
Copy link
Member

@xirzec xirzec commented Dec 21, 2023

Packages impacted by this PR

@azure/core-util

Describe the problem that is addressed by this PR

When trying to serialize/deserialize object payloads, we often have to recurse into them in order to rename properties and stringify/destringify values. This helper allows for applying these kinds of operations to an object where there is a known path to the property being modified. In the event the property does not exist, no mutation occurs.

// changes input to look like `{b: 1}`
reshape({a: 1}, "a", "b");

// changes input to look like `{a: "hello 1"}`
reshape({a: 1}, "a", (value) => `hello ${value}`);

// changes input to look like `{b: "hello 1"}`
reshape({a: 1}, "a", "b", (value) => `hello ${value}`);

// arrays are also supported, output looks like `["1", 2", "3"]`
reshape({a: [1, 2, 3]}, "a[]", String);

The only thing I don't currently handle is repeated selection, e.g. if you have a potential cycle in the path to a property and want to try recursing as deeply as you can.

Are there test cases added in this PR? (If not, why?)

Yes!

Checklists

  • Added impacted package name to the issue description
  • Added a changelog (if necessary)

@xirzec xirzec added Client This issue points to a problem in the data-plane of the library. Azure.Core labels Dec 21, 2023
@xirzec xirzec self-assigned this Dec 21, 2023
@azure-sdk
Copy link
Collaborator

azure-sdk commented Dec 21, 2023

API change check

APIView has identified API level changes in this PR and created following API reviews.

@azure/core-util

Copy link
Member

@joheredi joheredi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super cool, this will help us simplify the generated code a lot!

@joheredi
Copy link
Member

needs to be exported in index.ts

@joheredi
Copy link
Member

to provide some perspective, integrating this into the codegen will help us to go from this:

image

To this:
image

expect(() => reshape(input, "a.b[]", "oops")).toThrowError("cannot rename array indices");
});

it("can rename properties of elements of an array", function () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to flatten the properties by using this util?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can support add nodes into it, then the flatten problem should also be solved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write something like this:

reshape({ properties: { a: 1, b: 2 } }, "", (input: any) => input.properties);

(This helped me find a bug where I wasn't handling if you returned something different when selecting the root object)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do something like

reshape({ properties: { a: 1, b: 2 } }, "properties.a", "a") 
reshape({ properties: { a: 1, b: 2 } }, "properties.b", "b") 

want to sell jsonpath a little :)
if we support jsonpath, we could just use something like reshape({ properties: { a: 1, b: 2 } }, "$.properties", "$.")

@xirzec what I was referring to adding nodes was something during serialization in flatten's case

reshape({ a: 1, b: 2 }, "a", "properties.a") 
reshape({ a: 1, b: 2 }, "b", "properties.b") 

Which I am not sure if it's working?

Back to the flatten, it looks a little weird to me, that where we put the flatten reshape could have impact on the reshape of the properties inside it.

if we have something like body = {properties: { a: {c: 3}, b: 2 } },

when we do flatten first, we will get

body = reshape(body, "properties.a", "a") 
body = reshape(body, "a", "d")

when we do flatten last, we will get

body = reshape(body, "properties.a", "properties.d") 
body = reshape(body, "properties.d", "d")

which is why I prefer to support reshape in codegen as models based, instead of properties based. see Azure/autorest.typescript#2180 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mark it as resolved now, I still have concerns about this part. Thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does TypeSpec have a way to move properties around on models?

I know we've been talking about flatten, but in general my understanding is that when we map "friendly names" they exist at the same level. That's why the "rename" signature for reshape takes only the new name of the final property pointing to the selected object rather than a full path again.

For example, reshape({a: { b: { c: 1 }}}, "a.b.c", "d") produces {a: {b: { d: 1}}} not {a: { b: {} }, d: 1}

To be clear, I'm generally opposed to flattening per our other discussions, so I wasn't really trying to design for it. You are correct though that we'd have to be careful to do deeper modifications first if we were going to flatten a parent property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does TypeSpec have a way to move properties around on models?

No, TypeSpec doesn't and they don't intend to provide this ability if I understand correctly. Though @bterlson could comment to confirm.

If our intention for reshape is just for rename properties or changing the type of values, then reshape would be a little weird name to me ?

Per the other discussion, I think our point for flatten is to only support flatten for those RPs with high usage. which means we still need to support flatten, let's say we have other way to support flatten in codegen, the flatten order still have influence on the reshape.

I still prefer to have a reshape helper that can more generally do the convert on models basis instead of properties basis, maybe not for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is if the purpose of reshape is limited to rename the properties and change the properties value, I am actually not very convinced that to how much extent it could simply the generated code, the current generated code for serialize and deserialize has a lot of lines because most of the lines are just copy values, which is a problem that can somehow resolved a little if we can have better modularization issue Azure/autorest.typescript#1971 resolved first ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to both better names and alternative codegen approaches. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can work on a poc of how I think the generated code should look like ?

Copy link
Member

@qiaozha qiaozha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but I do feel like this is the right thing to do about the serialize and deserialize :)


type SelectorPart = SelectorArrayIterator | SelectorPropertyName;

function* selectorParts(selector: string): Iterable<SelectorPart> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it's possible, but could we leverage jsonpath to do the selection ? I feel like that's more generally used ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are considering jsonpath to do the selection, could it be used in the customization? like in the original autorest modelerfour, they use directive for customized transformation, and the selector is also using jsonpath.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was looking at this a bit, and it seems that jsonpath never quite made it into an RFC: https://datatracker.ietf.org/wg/jsonpath/about/

The original description appears to be hosted here: https://goessner.net/articles/JsonPath/index.html#e2

I think my main concern with trying to support it is that it is more verbose and has additional capabilities that I'm not sure we would want, like the array slice operator and traversal expressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask a silly question here, I am not sure I get the point of why jsonpath being made into an RFC matters to us ? 😊

My concerns of supporting selector by ourself is that, it could possibly be buggy as I am not sure if we have all cases considered ? I am also not sure if it's a good idea that service team who wants to write their own customization serialize and deserialize function and would have to learn the grammer we invented here?

Copy link
Member

@MaryGao MaryGao Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer to have jsonpath to locate the objects with less learning curve. Is it possible our selectors are just selected/supported subsets of jsonpath?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there are two different considerations here, one is taking a dependency on some third party jsonpath library, the other is supporting the full jsonpath syntax.

The former I'm not that interested in, since it feels unnecessary to take a dependency for the problem we have (provide an efficient and debuggable primitive for renaming properties and mapping values) and we wouldn't be able to easily optimize any code in the dependency itself.

As to the latter, if we're saying "hey this is a syntax people know, so we should use it", that's great except since it's not a standard, there could be different implementations with different behaviors/syntax, so it's not really buying us as much familiarity with an existing syntax. In addition, since this isn't a consumer-facing function, I'm not sure I buy the learning curve being difficult or there being a high need for familiarity.

To address @MaryGao's point, if we wanted to align the syntax we totally could, but it would simply change something like a.b.c[]d into $.a.b.c[*].d and all we've really done is:

  • add a questionably useful $. prefix to every single selector
  • forced you to write * in between the array braces
  • put an extra . after the brace syntax to access properties inside an array element.

I'm not convinced that adding these extra characters greatly improves readability.

I'm curious about this statement:

service team who wants to write their own customization serialize and deserialize function

Is this a scenario we support today? And if so, why would they necessarily have to use this helper? Presumably if they are going through the work of writing a custom TS function that converts one shape to another, they could do so efficiently without any helpers, much like we did in past track 2 dataplane packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder a case like { a: {b: {c: 1}}, b: {c: 2} }, if we pass b.c, would it be confusing which part we are referring to ?

Is this a scenario we support today? And if so, why would they necessarily have to use this helper?

Can't argue with that, customer can definitely write customization helper as they like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder a case like { a: {b: {c: 1}}, b: {c: 2} }, if we pass b.c, would it be confusing which part we are referring to ?

I mean if a model from a service has multiple sections with repeated names it could be a little confusing to look at, but is it more confusing than reading parsedBody.a.b.c vs parsedBody.b.c ?

@jeremymeng
Copy link
Member

Would it be useful to take a list of mappings?

I thought about chaining calls too:

deserializedResult = new Transformer(result.body)
    .reshape("created_at", "createAt")
    .reshape("fine_tuned_model", "fineTunedModel")
    // ...
    .result();

But neither unlikely matters much for generated code.

Copy link
Member Author

@xirzec xirzec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to comments and pushed some updates. Thanks for the feedback!


type SelectorPart = SelectorArrayIterator | SelectorPropertyName;

function* selectorParts(selector: string): Iterable<SelectorPart> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was looking at this a bit, and it seems that jsonpath never quite made it into an RFC: https://datatracker.ietf.org/wg/jsonpath/about/

The original description appears to be hosted here: https://goessner.net/articles/JsonPath/index.html#e2

I think my main concern with trying to support it is that it is more verbose and has additional capabilities that I'm not sure we would want, like the array slice operator and traversal expressions.

sdk/core/core-util/src/reshape.ts Show resolved Hide resolved
expect(() => reshape(input, "a.b[]", "oops")).toThrowError("cannot rename array indices");
});

it("can rename properties of elements of an array", function () {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write something like this:

reshape({ properties: { a: 1, b: 2 } }, "", (input: any) => input.properties);

(This helped me find a bug where I wasn't handling if you returned something different when selecting the root object)

@xirzec
Copy link
Member Author

xirzec commented Dec 27, 2023

Would it be useful to take a list of mappings?

I thought about chaining calls too:

My main consideration for not taking an array or doing a fluent pattern was I wanted to make it really easy to set a breakpoint at each step and log before/after so you could quickly narrow down where the shape wasn't what you expected.

import { describe, it, assert, expect } from "vitest";
import { reshape } from "../../src/reshape";

describe("reshape", function () {
Copy link
Member

@MaryGao MaryGao Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that input could be an array?

What if the input is not an object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work:

it("can handle arrays as the initial input", function () {
    const input = [1, 2, 3];
    const actual = reshape(input, "[]", String);
    const expected = ["1", "2", "3"];
    assert.deepEqual(actual, expected);
  });

Copy link
Member

@qiaozha qiaozha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have another question, should we add this to the non branded core ?

expect(() => reshape(input, "a.b[]", "oops")).toThrowError("cannot rename array indices");
});

it("can rename properties of elements of an array", function () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does TypeSpec have a way to move properties around on models?

No, TypeSpec doesn't and they don't intend to provide this ability if I understand correctly. Though @bterlson could comment to confirm.

If our intention for reshape is just for rename properties or changing the type of values, then reshape would be a little weird name to me ?

Per the other discussion, I think our point for flatten is to only support flatten for those RPs with high usage. which means we still need to support flatten, let's say we have other way to support flatten in codegen, the flatten order still have influence on the reshape.

I still prefer to have a reshape helper that can more generally do the convert on models basis instead of properties basis, maybe not for this PR.


type SelectorPart = SelectorArrayIterator | SelectorPropertyName;

function* selectorParts(selector: string): Iterable<SelectorPart> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder a case like { a: {b: {c: 1}}, b: {c: 2} }, if we pass b.c, would it be confusing which part we are referring to ?

Is this a scenario we support today? And if so, why would they necessarily have to use this helper?

Can't argue with that, customer can definitely write customization helper as they like.

expect(() => reshape(input, "a.b[]", "oops")).toThrowError("cannot rename array indices");
});

it("can rename properties of elements of an array", function () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is if the purpose of reshape is limited to rename the properties and change the properties value, I am actually not very convinced that to how much extent it could simply the generated code, the current generated code for serialize and deserialize has a lot of lines because most of the lines are just copy values, which is a problem that can somehow resolved a little if we can have better modularization issue Azure/autorest.typescript#1971 resolved first ?

@xirzec
Copy link
Member Author

xirzec commented Jan 2, 2024

Abandoning this helper in favor of improving the codegen logic directly. The complexity of specifying the property to change being the most undesirable feature.

@xirzec xirzec closed this Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants