Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway listing only 1000 first object of prefix/dir #150

Open
anisimovdk opened this issue Jun 30, 2023 · 10 comments
Open

Gateway listing only 1000 first object of prefix/dir #150

anisimovdk opened this issue Jun 30, 2023 · 10 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@anisimovdk
Copy link

Describe the bug
Listing only 1000 first object of prefix/dir.

To Reproduce
Steps to reproduce the behavior:

  1. Create more than 1000 object in bucket prefix/dir
  2. Try to list files tis nginx-s3-proxy
  3. You will see only first 1000 objects

Expected behavior
List all files in current prefix/dir. Or display max files based on some settings w/wo pagination.

Your environment

  • Version of the container used (if downloaded from Docker Hub or Github) - ghcr.io/nginxinc/nginx-s3-gateway/nginx-oss-s3-gateway:unprivileged-oss-20230623
  • S3 backend implementation you are using (AWS, Ceph, NetApp StorageGrid, etc) - Openstack Swift
  • How you are deploying Docker/Stand-alone, etc - Docker
  • NGINX type (OSS/Plus) - OSS
  • Authentication method (IAM, IAM with Fargate, IAM with K8S, AWS Credentials, etc) - Access Key, Secret Key.
@anisimovdk anisimovdk changed the title Display only 1000 object in prefix/dir Listing only 1000 first object of prefix/dir Jun 30, 2023
@anisimovdk anisimovdk changed the title Listing only 1000 first object of prefix/dir Gateway listing only 1000 first object of prefix/dir Jun 30, 2023
@dekobon
Copy link
Collaborator

dekobon commented Jul 5, 2023

I just confirmed with the AWS documentation that the maximum number of results that can be returned is 1k. I imagine that the only way to solve this is to add paging support to the directory listing style sheet and list proxy javascript. Perhaps, the XSL stylesheet could grab the key name of the last key returned and pass it as part of a link on a query parameter, whereby nginx will then issue a request to S3 with that key name set as the marker.

What comes to mind is how to sanitize that input such that it does not become a vector for injection attacks.

This would be a great PR.

@dekobon dekobon added enhancement New feature or request help wanted Extra attention is needed labels Jul 5, 2023
@xquek
Copy link

xquek commented Mar 12, 2024

Hi folks, I am actually interested in this feature and is happy to give it a try with its implementation.

I was wondering what are your recommendation for santaizing the inputs?

Will storing the nextToken from the response from S3 in a cache and comparing it to the token in the request as well as request paramaters be sufficient?

EDITED: i just realized that keyval cache is only available for paid version of nginx.

@4141done
Copy link
Collaborator

4141done commented Mar 14, 2024

Hi @xquek , thank you very much for your interest in helping out!

In terms of storing references to the nextToken or ContinuationToken cursor returned by s3, since this issue was filed, njs has introduced the [SharedDict](https://nginx.org/en/docs/njs/reference.html#ngx_shared) (more info here) object that can be used on both the paid and free version.

As a new(ish) maintainer I haven't dug too deeply into the xslt side of things but if we wanted "next" and "previous" functionality we could do something like pushing and popping the cursor into the shared dict. xslt_string_param may be helpful in getting things from javascript land into the xslt template.

As for sanitization, maybe @dekobon could comment more on their concerns. Populating the "next" and "previous" links in the template using values from the s3 object listing response seems safe enough to me but it's possible I'm missing something.

Happy to continue discussing as you get into the change but these are just my initial thoughts.

Ref:
Another comment I made on a related discussion highlighting some files to look at:
#207 (reply in thread)

@dekobon
Copy link
Collaborator

dekobon commented Mar 15, 2024

My thoughts on sanitation is need to be sure that only your expected inputs make it to the eventual S3 URL. Otherwise, arbitrary access to the S3 bucket may be possible. I think characters like the following may be the most problematic: %@?+. As such, we need to make sure things are properly URI encoded and that we do not accept unencoded special characters.

As for storing the cursor for the next or previous token, it seems like you could write that to the HTML output generated by the XSLT page. I may be missing something, but I'm not sure you need to use a shared dictionary for that.

I imagine something like a link that goes: http://mys3gatewayhost.foo/dir1/dir2/index.html?next=marker
If that link was written to the html output generated as part of an index page, you would have stored the state sufficiently between requests.

@4141done
Copy link
Collaborator

@dekobon thank you for your perspective. The assumption around a cache comes from the fact that s3 does not send any kind of cursor representing the previous page, only the next page. So if you wanted to enable "previous" button functionality you'd need to keep track of at least one token for the previous page.

@dekobon
Copy link
Collaborator

dekobon commented Mar 15, 2024

Couldn't the token for the previous link be passed as a URL parameter when you click the next button? This value could then be potentially passed to the XSL sheet.

@4141done
Copy link
Collaborator

Couldn't the token for the previous link be passed as a URL parameter when you click the next button? This value could then be potentially passed to the XSL sheet.

Ah right because the next link will go through the gateway as well so we'd be able to pull it out maybe using a js_set and template it back in using something like xslt_string_param? That makes sense to me. We'd have to test whether s3 will put up with us passing an unknown param along or whether we want to go to the trouble of massaging the querystring we pass on to s3.

@4141done
Copy link
Collaborator

Final update, I spoke with @dekobon offline and wanted to clarify a misunderstanding I had regarding sanitization:

Currently the s3 gateway strips any query parameters before passing requests on to S3 (this was the part I didn't know).

So in order to pass the continuationToken or nextToken we would need to create an allowlist of query parameters to be passed on, and properly uri-encode any links we generate that contain query parameters.

@xquek
Copy link

xquek commented Mar 21, 2024

Thanks! @4141done , just wondering if you would like me to pick this up ! thank you!

@4141done
Copy link
Collaborator

Please do! Excited to see someone tackle this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants