Skip to content
This repository has been archived by the owner on Mar 25, 2022. It is now read-only.

Question: Infra support for go-ipfs CI #441

Closed
momack2 opened this issue Oct 19, 2018 · 13 comments
Closed

Question: Infra support for go-ipfs CI #441

momack2 opened this issue Oct 19, 2018 · 13 comments

Comments

@momack2
Copy link

momack2 commented Oct 19, 2018

The go-ipfs team had a hack week this week and spent some time looking at the Q4 OKRs we already picked up. We had an item that's hurting our dev velocity that seems to overlap a bit with one of the P1 OKRs the infra team (@mburns) is focusing on - and were wondering if it'd be reasonable to just make this a shared OKR that coordinates effort.

Our OKR: Unit tests and code coverage tests are run efficiently using CI

  • Ideally this would be both fast and reliable (Jenkins is faster, but we've had reliability issues)
  • It'd be nice to add the status to our packages page on github
    Your OKR: Supported CI service for public and private repos is known

Is resolving the open issues with Jenkins something on your guys' roadmap? What sort of CI support are you hoping to offer to teams like go-ipfs?

Questions from @eefahy :

  • what does "fast" mean in terms of minutes as an acceptable build time?
    • I think "fast" is somewhere in the <10min time frame (including sharness, etc)
  • What's the state of the art on running unit tests vs end to end tests?
    • Both unit and e2e tests exist and get run for each PR to the main repo (but tests on sub-packages don't run e2e tests - which can be a problem)
      • We currently run both Travis and Jenkins for CI. We've had trouble with breaks in Jenkins and AFAIK don't have permissions to change/fix things. Travis is more reliable, but really slow.
  • What's the state of building statically linked binaries for supported OSs?
    • We build artifacts but barely anyone is using them
  • How can CI interplay with the existing release workflow?
    • Releases are completely manual and we actually don't intend to combine CI infra with release infra (wouldn't want a hacked CI to lead to a hacked major release)
  • Would the ability to self service changes to CI be a benefit to the team?
    • Being able to self service changes to CI would likely be helpful - when Jenkins breaks we only have the option to complain to Victor right now =/

Looping in @Kubuxu and @magik6k for continuing this conversation.

@victorb
Copy link
Member

victorb commented Oct 19, 2018

Being able to self service changes to CI would likely be helpful - when Jenkins breaks we only have the option to complain to Victor right now

I would love to enable more people to help out with this. What's currently missing? I'm overhauling the docs to make the current setup easier to understand. Is there anything else I can do to help out with this?

@magik6k
Copy link
Member

magik6k commented Oct 19, 2018

Yeah, Docs would be awesome

@eefahy
Copy link
Contributor

eefahy commented Oct 19, 2018

@victorbjelkholm could you give us a summary of the reliability issues you are seeing in Jenkins? Specifically, around things that you've already tried?

We currently run both Travis and Jenkins for CI. We've had trouble with breaks in Jenkins and AFAIK don't have permissions to change/fix things. Travis is more reliable, but really slow.

Would it make sense to prototype the run in CircleCI (where we are able to bump VM size for linux builds) that could potentially give us a benchmark test that's reliable and fast?

@magik6k
Copy link
Member

magik6k commented Oct 20, 2018

CircleCi doesn't support windows and this is a must have for our CI

@magik6k
Copy link
Member

magik6k commented Oct 20, 2018

Also, most issues with jenkins can be potentially fixed by impsementing ipfs-inactive/dev-team-enablement#161

@eefahy
Copy link
Contributor

eefahy commented Oct 22, 2018

and were wondering if it'd be reasonable to just make this a shared OKR that coordinates effort.

I think a shared OKR is a wonderful idea! Perhaps something like:

OB: Unit tests and code coverage tests are run efficiently using CI
KR: Hard requirements for CI is documented (i.e. build time, unit test environment specs, e2e OSs, cost of service + cost to maintain)
KR: CI Implementation meets all documented requirements
KR: packages page tracks CI status

We build artifacts but barely anyone is using them

Where can I see stats on what binaries people are using? Does the popularity of binaries per OS have any implications on CI requirements?

Releases are completely manual and we actually don't intend to combine CI infra with release infra (wouldn't want a hacked CI to lead to a hacked major release)

This sounds like we don't trust access to our CI builds and workflow? We should change that if that's the case.

CircleCi doesn't support windows and this is a must have for our CI

The CircleCI folks tell me they are rolling out a windows option this quarter/early 2019. Does that change anything for us?

A couple of points from the infra perspective:

  • Running and maintaining CI/Jenkins is moving from the dev enablement team to infra as decided during IPFS days in Scotland
  • Our current Jenkins deployment is expensive, unreliable, and time consuming to maintain (but Consider using JClouds for jenkins ipfs-inactive/dev-team-enablement#161 could help)
    • If this happens, it should be accomplished with the infra teams input (specifically @mburns) since the dev enablement team is winding down and maintenance is moving to infra

@eefahy eefahy added the status/in-progress In progress label Oct 22, 2018
@victorb
Copy link
Member

victorb commented Oct 22, 2018

I've think I asked a couple of times but don't seem to get any response. Has CircleCI solved the issue where GitHub organizations together have a queue? Last time we used Circle/Travis, we ended up queueing builds for multiple hours as we couldn't 1) pay them to add more workers for us 2) run our own workers, which was one of the main reasons of the move to Jenkins.

If this happens, it should be accomplished with the infra teams input

I think the expensive and unreliable parts can be fixed, but will still continue to take some time to maintain. However, if we make a decision to drop Jenkins, I'd love to have the explicitely clear, so I don't use any of time to solve any more issues in Jenkins, unless build-or-death situtations.

@eefahy
Copy link
Contributor

eefahy commented Oct 22, 2018

CircleCI will take our money to add more containers, run them in parallel, and boost VM sizes for linux builds. macOS is limited to one size but with a different payment plan we get a bigger VM. We can't run our own workers tho so there's probably still some need for Jenkins if we require builds for OSs beyond what they provide. A hybrid approach might be the most cost effective way to run CI.

I should be clear that I'm not interested in forcing a particular CI solution but would like to have the requirements of CI well known so we know the most cost effective (including people cost) way to solve the problem. IMHO Jenkins is still very much on the table but the numbers need to work for it to make sense.

@eefahy eefahy assigned mburns and unassigned mburns Oct 23, 2018
@momack2
Copy link
Author

momack2 commented Oct 25, 2018

So my synopsis of the current state is the infra team is onboard to run/maintain/improve our CI solution (awesome! thanks!), however there are open questions about requirements and which CI solution is going to be the best fit. These depend on what support the CI tools can offer (for platforms / scalability / etc), our team requirements, and how costly they are in $ / people time. What's the best way to define exactly where those dependencies lie and make a call? Happy to help schedule time between Erin/Victor/ to make a final decision if that'd be useful.

@eefahy
Copy link
Contributor

eefahy commented Dec 1, 2018

This issue seems to have stalled a bit but I wanted to point out how the js team is approaching the issue: #442

Would the go team be interested in doing similar prototyping?

@magik6k
Copy link
Member

magik6k commented Dec 1, 2018

I'm open to experimenting with new solutions, we actually have a somewhat usable setup for circle and a bit less usable one for travis already, but Jenkins still wins for me with it's scripted pipeniles and has many features travis/circle are missing. I'd definetly want to give GitlabCI a try.

@eefahy
Copy link
Contributor

eefahy commented Dec 1, 2018

@magik6k can you say more about features Travis and Circle are missing? I'm also curious if you're seeing the issue described in ipfs-inactive/dev-team-enablement#113? As I understand it, that's the main reason the js team feels it cannot use Jenkins.

@eefahy eefahy added need/community-input Needs input from the wider community and removed status/in-progress In progress labels Dec 6, 2018
@scout
Copy link
Contributor

scout commented Mar 14, 2019

Closing in favor of https://github.com/protocol/infra/issues/432

@scout scout closed this as completed Mar 14, 2019
@ghost ghost removed the need/community-input Needs input from the wider community label Mar 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants