Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Body extractor #11

Open
bradleybeddoes opened this issue Aug 5, 2017 · 22 comments
Open

Implement Body extractor #11

bradleybeddoes opened this issue Aug 5, 2017 · 22 comments

Comments

@bradleybeddoes
Copy link
Contributor

Must support:

  • application/x-www-form-urlencoded
  • multipart/form-data
  • text/plain (?)

Per other extractors for Request path and Query string, we've already shipped this must be type safe and support optional fields.

@abonander
Copy link

I've been working on multipart-async for a bit now but I'm not sure what I want the final API to look like yet. I'm open to collaboration!

@heronhaye
Copy link

+1. Data validation is central in Rocket and very powerful (https://rocket.rs/guide/requests/#field-validation), works for form data, query strings, JSON, etc.

@bradleybeddoes
Copy link
Contributor Author

Hey @abonander sorry it has taken me a few days to get back to you, very hectic this end.

I certainly would like to get your help in figuring this out for Gotham as it seems you've already gotten quite a ways down the path of implementing this with Hyper natively.

I'm of the opinion (and @smangelsdorf may disagree!) that this kind of functionality is important enough that we need to have a solution for it in Gotham core and then support it long term.

Did you happen to get a chance to look at the Gotham API yet? I'm wondering how well you think our extractors concept might fit with what you've already done, especially with how we do proc_macro_derives for request path and query string structs that are defined by the application using Gotham.

@bradleybeddoes
Copy link
Contributor Author

Hi @modalduality, I agree that validation is really important and Extractors can, at least partially, do this.

Right now an Extractor makes sure that Request data is type valid for the path and query string. So if your app is expecting to get a u8 but the Request presents anything else the Request is aborted with a 400 Bad Request, assuming you derive StaticResponseExtender. You can easily customize the specific Response with your own StaticResponseExtender implementation instead of deriving.

If you have a custom type you could implement FromRequestPath or FromQueryString and make any validations you need during the conversion process, simply creating an Err if something isn't right.

However this discussion probably highlights something that would be quite useful for Extractors, an extension to PathExtractor and QueryStringExtractor along the lines of fn validate(state: &State) -> Result<(), Err> that the Router would call if fn extract is successful. This would allow the application to do further validation if it so desired, e.g. ensuring that a String value which is type valid also meets some regexp or a u8 has been provided within some desired range.

Thoughts?

@abonander
Copy link

abonander commented Aug 15, 2017

@bradleybeddoes I like the concept of extractors, but I was talking a bit lower-level. At its core, multipart-async just parses the request body into sections which are still essentially futures::Stream<Item = Vec<u8>>, so it would be necessary to either collect those into a datastructure before touching user code, or create an extractor trait which can return a Future.

If collecting fields to a datastructure, we're going to have to touch the filesystem because fields in a multipart request can come from files and possibly be too big to fit into memory. In multipart (the original implementation for synchronous APIs), I created an abstraction which can save all files in a given request to a temp directory so they can be accessed arbitrarily.

Because the only really supported file operations on any given OS are blocking (there have been some attempts at asynchronous disk I/O but they're not really satisfactory), recreating this abstraction for asynchronous requests would require a pool of background worker threads. I want to keep multipart-async simple but I was thinking of building a separate crate which provides cross-platform asynchronous disk I/O using background worker threads, built around the futures::Stream abstraction. It could perform various tricks to increase throughput, like mem-mapping and reading single aligned blocks at a time.

The question becomes: how much of this do we want to expose to the user? Do we want to expose files as Stream<Item = Vec<u8>> or give them the form-data as a map of strings and tempfiles (which is, from what I've seen, somewhat common in web stacks)?

@smangelsdorf smangelsdorf added this to the 0.3 milestone Feb 28, 2018
@abonander
Copy link

abonander commented Mar 1, 2018

@bradleybeddoes congrats on the 0.2 release. I see this issue is on the 0.3 milestone now, have you given any thought to my previous comment?

I'd be happy to work with integrating multipart-async, I just need to know what direction you want to take with it.

@bradleybeddoes
Copy link
Contributor Author

@abonander, thanks!. We have and we're certainly very keen to work with you.

I've run out of time and brain power today but I have half a response formulated in my head already, I'll get something more substantial to you in the next day or so on this so we can get moving 👍.

@bradleybeddoes
Copy link
Contributor Author

Hi again @abonander, thanks for sticking with us, busy times.

So @smangelsdorf and I have thrown this around a little bit offline and whilst we have a pretty reasonable idea on what application/x-www-form-urlencoded will look like, which is essentially the same as what path/query extractors are today just applied to the body, we're unsure how multipart/form-data should integrate.

Which usually means it is time to experiment 🎉 🤓 👍.

We could:

  1. Get application/x-www-form-urlencoded support into some sort of usable state first and then look at multipart or;
  2. Jump in, and hack on the Router to see what can be achieved with multipart and worry about unifying the implementations later on. This could lead to a dead end of course but we'd at least learn a lot.

You mentioned you might have some time to start looking into this, does 1 or 2 seem like a better approach for you?

We also briefly discussed if adding a dependency on the multipart-async crate is a good idea, purely around long term maintenance/API stability. Nothing we need to decide right now but something to chat about in the future in terms of goals/plans/support etc for the crate.

@abonander
Copy link

@bradleybeddoes It all depends on whether or not you want to treat urlencoded and multipart forms with the same interface. I haven't been exposed to a large number of web frameworks (except PHP 5 and the Yii framework, and I can't say that they set great examples) so I don't really know what the precedents are.

If you expect an endpoint to only accept multipart uploads then a router-based solution works okay; I've provided some integrations to this end in the synchronous multipart crate for Hyper and Iron.

@abonander
Copy link

Express.js appears to do the middleware thing, recommending multer as a bodyparser. Multer integrates some basic form validation, expecting specific field names and types and only saving that much. I don't know if it throws away unexpected fields or returns an error though.

@bradleybeddoes bradleybeddoes changed the title Implement Form data extractor Implement Body extractor Mar 15, 2018
@abonander
Copy link

abonander commented Mar 16, 2018

@bradleybeddoes I don't have a working prototype yet but I'm conceiving of a derive-based solution to handing form input that would cover both x-www-form-urlencoded as well as multipart/form-data and provide validation as well. It would be quite similar to Serde but specialized for form input types and designed around push-based deserialization to support streaming async bodies.

Something like:

#[derive(FormInput)]
pub struct RegistrationForm {
    #[validate(regex = "[a-z,A-Z,0-9,_]", error = "usernames may only contain letters, numbers, or underscores")]
    username: String,
    #[validate(min_len = "8")]
    password: String,
    email: Email, // derefs to `str`, auto-validates for emails
    dateOfBirth: Date,
    #[validate(expected_val = "true", error = "you must accept our terms of service to register")]
    acceptedTos: bool,
    #[validate(content_type = "image/*", error = "avatar must be an image")] // with const-generics this could be a type parameter instead
    avatar: Option<UploadedFile>, // optional field
    // user doesn't have to handle multipart at all
}

The derive itself or a macro could generate a BodyExtractor impl for this type so the user would just have to declare it.

The actual core FormInput trait wouldn't be aware of sync/async or the specific http/web stack, it would just spit out a list of fields and the implementation would parse them and then provide them to it.

@abonander
Copy link

(the example is just a strawman, the user would probably want to check validation errors and emit messages programmatically with however they plan on returning them to the user)

@whitfin
Copy link
Contributor

whitfin commented Apr 19, 2018

Hey all; couple of questions because I'm working on a Gotham project and I'm hitting some stuff related to this (I think). I've only been looking at Gotham for a week or two, so if any of my understanding is wrong, apologies in advance!

I think a good place to start is to get parsing handled by an extractor at all, even without validation. I'm currently working on an API where I'm having to manually parse bodies in almost every handler. I'm working with JSON, and doing something like this. While this works, it's quite noisy to deal with. An API like this would be much nicer to use, and also jive with the existing extraction syntax:

route
        .post("/user/create")
        .with_body_extractor::<JsonExtractor<User>>>()
        .to(create_user)

This is essentially just plugging in an extractor for JSON and specifies the struct we want to parse to (via User). I'm not sure how feasible this is, but as a user, this seems pretty clean. You probably don't have to do much more than wrap up serde_json and you'd be good to go (incidentally, there is also a serde_urlencoded).

As for the validation aspects, has anyone considered a project like validator for this functionality? I've never used it myself, but it seems quite simple to integrate with. At a high level it looks quite similar to the suggestion from @abonander.

This might not be anything additional to what's already in this thread, but I just wanted to drop my feedback since I happened to come across this issue whilst looking for an OOTB solution.

@willi-kappler
Copy link

@whitfin: your JSON crate and the validator sound really nice.
@abonander and @bradleybeddoes: what do you think ?
I would like to help but I'm new to gotham and async. Is there any progress on "2. hack on Router" ?

@sezna
Copy link
Collaborator

sezna commented Aug 25, 2020

Is there any progress on the status of body post body extraction? I'm unsure what the currently advised way of extracting post bodies is -- the examples are using to_async but the latest version on crates.io doesn't have that method, and the examples are not versioned and released to crates.io.

@technic
Copy link

technic commented Aug 25, 2020

There was some progress with async fn as handlers, but probably that's not done yet.

I had my own implementation of this here

Then you can read full body to memory and parse it in any way. Like here

P.S. In the end I migrated to actix ;)

@msrd0

This comment has been minimized.

@sezna

This comment has been minimized.

@msrd0

This comment has been minimized.

@bradleybeddoes

This comment has been minimized.

@sezna

This comment has been minimized.

@colinbankier

This comment has been minimized.

@msrd0 msrd0 modified the milestones: 0.3, 0.6 Aug 26, 2020
@msrd0 msrd0 removed this from the 0.6 milestone Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants