Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using a standardized API for file access #135

Open
jdries opened this issue Oct 11, 2018 · 15 comments · May be fixed by #518
Open

Consider using a standardized API for file access #135

jdries opened this issue Oct 11, 2018 · 15 comments · May be fixed by #518
Labels
breaking Breaking changes, requires a major-version (2.0.0 for example) extension feedback required file management help wanted
Milestone

Comments

@jdries
Copy link

jdries commented Oct 11, 2018

We have currently defined our own API for sharing files with OpenEO.
The S3 API is also a well known http-based file api (object storage).
I'm not an expert, so this is really more like a question to investigate if this would be usable.
If S3 covers all of our requirements, using it would simplify our own API, and also backend impementations as it is very widely adopted and supported by existing software.

@m-mohr
Copy link
Member

m-mohr commented Nov 7, 2018

I'd really like to adopt a well-known/well-defined API for file management. I'm also not an expert in S3 or other potential file-related HTTP APIs. Anybody here has experience? A first look at the S3 REST API makes me feel that it is a bit too complex for our "simple" implementations. I'm not sure yet whether that API could be stripped down to just allow a minimal subset, which I'd say is mandatory to keep the openEO API simple in this regard. If it needs to be fully implemented I could only see that to be added as an extension. Are there other file APIs we could adopt? I found Azure and Google of course.

@edzer
Copy link
Member

edzer commented Nov 7, 2018

Ideally this would be back-end heterogeneity that we would abstract away in the openEO API.

@m-mohr
Copy link
Member

m-mohr commented Nov 7, 2018

@edzer That is what we are trying at the moment with our file API, but it is a bit limited and proprietary. If there is a standard that could be adopted with a good ecosystem, it would be a good idea to adopt it. Not sure whether the existing cloud services as S3, Azure, GCS could handle that as they usually have service specific things in their APIs. So we are basically looking for an existing standard that already did the abstraction. If there is none, we probably continue with what we have at the moment.

@edzer
Copy link
Member

edzer commented Nov 7, 2018

GDAL supports the /vsi prefixes: /vsizip/, /vsis3/, /vsigcs/ etc see here that abstracts over many cases operationally, i.e. it is a working implementation. It does mean that a script needs to be adapted when porting from AWS to GCS.

@edzer
Copy link
Member

edzer commented Nov 7, 2018

But that might be OK (and could even be automated).

@m-mohr
Copy link
Member

m-mohr commented Nov 7, 2018

@edzer As discussed, that could be useful for back-end implementations, but I don't see a direct benefit for the API specification. I'm more looking for something like a simple and "modern" WebDAV.

@m-mohr
Copy link
Member

m-mohr commented Nov 7, 2018

Maybe remoteStorage is what we are looking for: https://remotestorage.io/
Some thoughts: https://unterwaditzer.net/2015/kill-webdav.html

The only thing that is more complex in remoteStorage than in WebDAV is authentication. RemoteStorage requires the server to support a subset of OAuth, and that's the only kind of authentication supported. It also requires WebFinger support instead of making it optional (like in WebDAV, where it's almost a luxury if the DAV client actually finds the HTTP endpoints it's supposed to use).

Sound great, but I'm wondering how we can integrate that given the fact that we need to merge the openEO and remoteStorage authentication procedures somehow.

Another interesting repo to look at is https://github.com/scality/cloudserver

@m-mohr m-mohr changed the title Consider using S3 API for file access Consider using a standardized API for file access Nov 7, 2018
@m-mohr m-mohr added this to the v1.0 milestone Dec 6, 2018
@mkadunc
Copy link
Member

mkadunc commented Jul 2, 2019

Maybe remoteStorage is what we are looking for: https://remotestorage.io/

I also like remoteStorage a lot, but it has a long way to go before it replaces S3 API as the go-to REST interface.

The industry seems to have settled on S3's interface for object storage - in addition to scality/cloudserver , many other solutions use the same API or provide S3-compatible proxy to GCS and others, e.g.
Min.io , Ceph, OpenStack Swift.

@m-mohr
Copy link
Member

m-mohr commented Sep 13, 2019

Conclusion from 3rd year planning:

  • remoteStorage.io is problematic as it requires you to use OAuth and it still is in draft state. May be reconsidered later, depending on the S3 integration (see below).
  • S3 will be explored by all back-ends regarding the feasibility of an implementation and report back in late Oct. If nobody has objections, we'll try to go for S3, although we should have somebody with S3 expertise have look over it (I don't have ever worked with S3). Is anybody aware of a public S3 OpenAPI document? I'll (hopefully) send an e-mail to remind people later.

If S3 is not manageable for back-ends to implement, we'll fall back to what we have at the moment.

@m-mohr m-mohr modified the milestones: future, v1.0 Sep 13, 2019
@mkadunc
Copy link
Member

mkadunc commented Sep 13, 2019

For Sinergise, S3 (or a subset thereof) would be the preferred interface for file access and management.

Swagger 2.0 spec. generated using https://github.com/APIs-guru/aws2openapi (looks quite current): https://github.com/APIs-guru/openapi-directory/blob/master/APIs/amazonaws.com/s3/2006-03-01/swagger.yaml

@m-mohr
Copy link
Member

m-mohr commented Sep 16, 2019

Thanks @mkadunc , appreciate the links!

The swagger file looks quite complicated (the file is 8000 lines, openEO API is not even half as long). Also, the generated version seems to have some issues regarding compatibility with OpenAPI.
S3 has many endpoints and to me it's not quite sure what they are all about, especially as they use fragments (e.g. /{Bucket}#publicAccessBlock), this doesn't look very "RESTish". Which of them do we actually need? GET/PUT/DELETE for /{Bucket} and /{Bucket}/{Key}? In the end I (we?) would need advice on how to integrate S3 in a way that it's compatible to their ecosystem.

@mkadunc
Copy link
Member

mkadunc commented Sep 16, 2019

I suggest we focus mostly on the Object operations, and leave management of buckets up to the backend (it seems that's how we started anyway) - from the Bucket operations we'll probably only need GET (list object).

I suggest we keep the openEO-mandated subset of supported API calls as small as possible, i.e. only the minimum required for basic functioning of openEO web editor.

@m-mohr
Copy link
Member

m-mohr commented Sep 16, 2019

Makes sense. Still need to figure out what is the minimum set of endpoints you need to implement.

What I don't like at all about S3 that it mandates using a different authentication procedure (HMAC?) as we currently use, which is the same reason for which we rejected remoteStorage.io. Also, the endpoints use XML, which we tried to avoid mixing with JSON at all costs. So I have more concerns implementing it after having a (quick) look at it.

@m-mohr
Copy link
Member

m-mohr commented Oct 17, 2019

No updates yet according to the dev telco today.

@m-mohr
Copy link
Member

m-mohr commented Jan 17, 2020

@jdries Any news on this? I'll move to "future" until there are new insights posted here.

@m-mohr m-mohr modified the milestones: v1.0-rc1, future Jan 17, 2020
@m-mohr m-mohr added the breaking Breaking changes, requires a major-version (2.0.0 for example) label Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking changes, requires a major-version (2.0.0 for example) extension feedback required file management help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants