Skip to content
This repository has been archived by the owner on Mar 25, 2022. It is now read-only.

Create S3 bucket for npm-on-ipfs #432

Closed
achingbrain opened this issue Sep 5, 2018 · 11 comments
Closed

Create S3 bucket for npm-on-ipfs #432

achingbrain opened this issue Sep 5, 2018 · 11 comments
Assignees

Comments

@achingbrain
Copy link
Member

achingbrain commented Sep 5, 2018

So we don't store repo data on the docker images and/or app servers, could a bucket and the appropriate access perms please be created so we can use ipfs/js-datastore-s3 to store data.

My thinking is we'd use one bucket with prefixes for all the various workers, though happy to use multiple buckets if there's a preference.

@achingbrain
Copy link
Member Author

Pinging @lgierth @victorbjelkholm

@victorb
Copy link
Member

victorb commented Sep 13, 2018

Could we possibly do some benchmarks before going this route? Might be a lot of difference in the access times by using S3 bucket compared to a normal disk, but I'm not sure what magnitude of difference.

@victorb victorb self-assigned this Sep 13, 2018
@achingbrain
Copy link
Member Author

Happy to use EFS or EBS if you prefer. At the moment each registry-mirror worker is storing the repo (and potentially all of npm) inside it's docker container which is not going to work for very long.

@achingbrain
Copy link
Member Author

I can do some benchmarking if the relevant environments are set up, or if I can get access to the AWS account to do it myself.

Failing that there are some ballpark figures here: https://stackoverflow.com/a/49188286

EFS looks like a good solution if we want decent throughput but the cost vs S3 is likely to be eye-watering.

@daviddias
Copy link
Member

Ping @eefahy

@eefahy
Copy link
Contributor

eefahy commented Sep 18, 2018

Starting with S3 and going from there (if the performance is 💩) makes sense to me. Who/What would need access to the S3 bucket? From the example seen here I think an IAM user is also required to mint an access and secret key which is then granted read/write access to the bucket? Does that seem correct?

The use case for this is a little fuzzy for me. @achingbrain would you mind chatting with me briefly about this so I get what you're trying to do?

@achingbrain
Copy link
Member Author

achingbrain commented Sep 18, 2018

The use case is for npm-on-ipfs - a service that lets you install npm dependencies (essentially tarballs of js modules and supporting files) which are fetched from IPFS instead of the central npm registry.

This is what we have: An EC2 instance running Docker. Within Docker there are a number of npm-on-ipfs workers waiting for requests (denoted registry-mirror in the diagrams below). Each worker maintains part or all of npm stored in it's mfs and should be able to fetch dependencies they don't have from sibling workers or the main npm registry itself. They run a watch process that updates their copy of npm with new modules as they are published to the main npm registry.

In front of them, also managed by Docker is an nginx container doing SSL termination using Let's Encrypt and load balancing requests between the registry-mirror workers. Each worker has an IPFS repo stored locally in the container. This obviously won't work long-term as the EC2 instance has limited disk space.

image

What we want is this. An ECS cluster running the registry-mirror workers, sat behind an ELB with the repos stored somewhere that isn't on the container, probably S3 in the first instance. We could have a bucket per worker but we should probably use a single bucket with prefixes instead so as to not have to create (potentially) loads and loads of buckets.

image

To get there, the stepping stone proposed in this issue is this. The EC2 instance running Docker which runs nginx and the registry-mirror workers, except the workers store their repos on S3.

image

Hopefully that's a bit clearer - let me know if it's not..

@eefahy
Copy link
Contributor

eefahy commented Sep 22, 2018

That helps a lot @achingbrain, thank you for that.

How is this service currently being managed and deployed? I have a lot of experience with ECS and am happy to help.

@achingbrain
Copy link
Member Author

In the interested of expediency an EC2 instance was created to deploy this on and I SSH in and manually update the app like an animal.

I'd love to have a better pipeline!

@achingbrain
Copy link
Member Author

achingbrain commented Sep 25, 2018

@eefahy do you think you'll be able to set up the S3 bucket soon? People are using npm-on-ipfs but it's not picking up module versions published since they are initially stored on ipfs because it would need the registry watcher turning on which would increase the rate it's using up storage significantly.

@victorb
Copy link
Member

victorb commented Sep 25, 2018

Done!

@victorb victorb closed this as completed Sep 25, 2018
@ghost ghost removed the status/ready Ready to be worked label Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants