Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Move i686 CI testing from Travis to CircleCI #18007

Closed
wants to merge 5 commits into from

Conversation

ararslan
Copy link
Member

@ararslan ararslan commented Aug 13, 2016

This will hopefully lessen the Travis CI queue by moving the i686 Linux tests from Travis to CircleCI.

CircleCI will need to be manually enabled and configured by a JuliaLang owner, but much of the build configuration is in the circle.yml file added in this PR. I've been testing this on my fork of Julia and it seems to work okay, though the first successful run for me took about 2.5 hours because it had to build dependencies. CircleCI enables cached directories just as Travis does, so subsequent builds are shorter. I should note that I'm not sure how to configure fast fails for queued commits on the same PR, though that should be possible. Figured it out, it's just a project setting in CircleCI.

CircleCI gives you the option to use Ubuntu 12.04 or 14.04. I opted for the latter in my fork after reading their documentation about the difference. I can't guarantee that the YAML I set up here will work with Circle's 12.04 but I don't know why it wouldn't.

cc @tkelman

@tkelman
Copy link
Contributor

tkelman commented Aug 13, 2016

Nice! I think it will be worth running both in parallel for a little while, only shutting off the job in the travis matrix when we're happy that circle is working and will handle our load level on the free tier (which is only one concurrent worker, right?).

Is there a timeout on the initial cache population run time that you've seen? I'd almost rather try to use a non ubuntu distro if we can, I thought Circle lets you run in an arbitrary docker image of your choice?

ARCH: "i686"
BUILDOPTS: "-j3 VERBOSE=1 FORCE_ASSERTIONS=1 LLVM_ASSERTIONS=1"
TESTSTORUN: "all"
JULIA_CPU_CORES: 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know how many cores the circle ci vm's have available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have 2 vCPUs. But you can run up to 4 images in parallel in a single build and can manually split your tests across them. Not sure how helpful that will be for us given the current structure of the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we might be able to set up the parallelism by using different TESTSTORUN for the different containers. So for example, one could be running the linear algebra tests while another does libgit2, etc. I'm really familiar with how the choosetests thing works but that seems like it could be doable, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the free plan only gives you one build worker to start right? it just has 2 cores inside that one job

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm a little confused as to what you actually get in the free plan. I was able to select up to 4x parallelism, and the way Circle does parallelism is by running completely separate containers, each with 2 vCPUs, in parallel, and keeping each step of the build and test in sync between containers. Now, in the settings they also make it sound like you trade parallel containers for concurrent jobs. I'm not clear on the specifics of that. I should throw a bunch of PRs from different branches at it (to avoid auto-canceling) and see what happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some science and if my analysis is correct, you get a total of 4 containers at any given time on a free account. So with 4x parallelism, there are no containers left for other jobs. With 2x, there are 2 available containers left, so you get 1 other job with 2x parallelism. With 1x, there are 4 containers available for 4 jobs. 1x seems to take too long for it to be a viable replacement for Travis.

@ararslan
Copy link
Member Author

I actually couldn't find any information in the Circle docs as to the number of concurrent workers on the free plan but I think it's just one.

Is there a timeout on the initial cache population run time that you've seen?

You mean a timeout for actually building the stuff in the cached directories? It doesn't seem like it; that part finished successfully in over an hour and a half.

I thought Circle lets you run in an arbitrary docker image of your choice?

Yes, theoretically you can set it up to use Docker. Buuuuut I don't know how to use Docker at all, so I stuck with their regular Ubuntu images, which are i686. Why would you rather avoid Ubuntu? For variety since Travis Linux is also Ubuntu?

@tkelman
Copy link
Contributor

tkelman commented Aug 13, 2016

Why would you rather avoid Ubuntu? For variety since Travis Linux is also Ubuntu?

Yeah, mostly for test coverage's sake, since it's a little too easy to do things that end up only working on debian-shaped distributions if you don't test otherwise. If this already works it's a good step. We have some CentOS buildbots that serve this purpose too, it's just slightly less visible than pre-merge CI.

@ararslan
Copy link
Member Author

I'm having some issues getting the YAML configured properly (CircleCI is really annoying with how they do directory changing) so until I get that sorted out I'll close this to avoid spamming Travis and AppVeyor.

@ararslan ararslan closed this Aug 13, 2016
@tkelman
Copy link
Contributor

tkelman commented Aug 13, 2016

Appveyor has an [av skip], I've been meaning to check if travis has an equivalent. Might be worth iterating on this on a branch in your fork for a little bit, since github won't let you reopen a closed PR if you push any new commits to the same branch (unless they've fixed that limitation)

@ararslan
Copy link
Member Author

ararslan commented Aug 13, 2016

github won't let you reopen a closed PR if you push any new commits to the same branch

Ah crap, I just pushed a commit. :/ Edit: I still have the "Reopen pull request" button though, so maybe that's okay.

Appveyor has an [av skip], I've been meaning to check if travis has an equivalent

Nope, checked before closing this. Travis has [ci skip] which would also skip Circle.

@Keno Keno reopened this Aug 13, 2016
@Keno
Copy link
Member

Keno commented Aug 13, 2016

Seems fine to me ;)

@Keno
Copy link
Member

Keno commented Aug 13, 2016

In any case, maybe just do the experiments on a different branch and leave this open just in case GitHub gets confused?

@ararslan
Copy link
Member Author

Good idea 👍

@tkelman
Copy link
Contributor

tkelman commented Aug 13, 2016

maybe it was force pushing after a rebase that causes the github problem

@ararslan
Copy link
Member Author

At any rate, I seem to have gotten it working now (see https://circleci.com/gh/ararslan/julia/27), though the tests run pretty slowly. Not sure how the speed compares to i686 on Travis.

@kshyatt kshyatt added the testsystem The unit testing framework and Test stdlib label Aug 14, 2016
@ararslan
Copy link
Member Author

ararslan commented Aug 17, 2016

I was able to get it down to about 40 minutes using 3x parallelism the other day. It's still failing a libgit2 test due to SSH weirdness (I'm hoping #18066 will help with that), but it's otherwise working well. For the interested: https://circleci.com/gh/ararslan/julia/37.

@ararslan
Copy link
Member Author

ararslan commented Aug 20, 2016

Okay, I think I have it just about as good as it's going to get. Some notes:

  • Just under 50 minutes at 2x parallelism
  • One concurrent PR can be built alongside the current one
  • The time may be able to be reduced using more careful balancing
  • Every step in the YAML runs in its own shell, so cding applies to a single line only
  • The git config is set up to replace https from GitHub with an SSH URL, so that had to be manually reset in the YAML
  • Automatic build canceling for newer commits on a PR has to be set up in the project settings
  • All tests pass 🎉

@ararslan
Copy link
Member Author

@tkelman Think we're ready to give this a go?

@@ -0,0 +1,15 @@
#!/bin/bash
# Balance the testing load between 2 CircleCI parallel containers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add set -e just in case things go horribly wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deed is done

@ararslan
Copy link
Member Author

@tkelman Looks like the Circle webhook is using my fork

@tkelman
Copy link
Contributor

tkelman commented Aug 22, 2016

That may be because I didn't have "Permissive building of fork pull requests" turned on when you pushed that last commit, and since you have circle enabled for your fork. Not sure, but I'll turn that setting on now.

@ararslan
Copy link
Member Author

I'll turn Circle off for my fork. Maybe that'll confuse it less.

test:
override:
- /tmp/julia/bin/julia --precompile=no -e 'true'
- /tmp/julia/bin/julia-debug --precompile=no -e 'true'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--precompiled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should that have errored?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once it got to that point it would have. I think you caught it before Circle got to the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

committed a day ago

It didn't run on your fork then?

Copy link
Member Author

@ararslan ararslan Aug 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Weird. No, I guess it worked fine. Does Julia silently ignore invalid arguments? It does not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for me locally as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I thought we would check for that

@ararslan
Copy link
Member Author

Your tests failed on CircleCI

Dangit!! (╯°□°)╯︵ ┻━┻

Container 1:

/home/ubuntu/julia/usr/tools/llvm-size: '/home/ubuntu/julia/usr/bin/julia': No such file

Containers 2-4:

/home/ubuntu/julia/usr/tools/llvm-size: Command not found

The fact that it's different between containers is a more than a little strange. I remember seeing the "no llvm-size" error on a VM once but I can't recall what I did to fix that. Any ideas, @tkelman? If not, would you mind just restarting the build and we'll see if it was a fluke?

@tkelman
Copy link
Contributor

tkelman commented Aug 22, 2016

maybe because it's installing into a /tmp prefix?

nevermind, something failed to build in deps - not sure what

@ararslan
Copy link
Member Author

It was doing that before and was working when it was on my account though. Did something change in a makefile?

@tkelman
Copy link
Contributor

tkelman commented Aug 22, 2016

Ah you might be hitting the issue that I fixed on master with curl not being able to find libssh. Does circle not build the merge commit for PR's?

@ararslan
Copy link
Member Author

ararslan commented Aug 22, 2016

Does circle not build the merge commit for PR's?

No clue.

Speaking of Circle, the webhook seems to be MIA...

@ararslan
Copy link
Member Author

Ah.

'pull/18007' is not configured as a white-listed branch. Please see configuration docs for further details.

@tkelman
Copy link
Contributor

tkelman commented Aug 22, 2016

So I guess disabling "Only build pull requests" actually means don't build pull requests at all if you have a branch whitelist?

@ararslan
Copy link
Member Author

ararslan commented Aug 22, 2016

Circle is still upset about not being able to find llvm-size. I remember I had that problem once when I was building Julia on ElementaryOS but I can't for the life of me remember what I did to fix it... I thought it was make -C deps distcleanall but that appears not to have helped, at least on Circle.

Edit: Oh, now that we've cleaned up what gets shown in the log, I've found this:

CMake Error at /usr/share/cmake-2.8/Modules/CMakeTestCCompiler.cmake:61 (message):
  The C compiler "/home/ubuntu/bin/gcc" is not able to compile a simple test
  program.

That's... hmm. I think I was getting that too at one point on eOS.

@tkelman
Copy link
Contributor

tkelman commented Aug 23, 2016

That may have been an issue with #18164 that #18194 fixed? Not positive. There was a complaint about the red status so I've disabled the webhook for now, so will have to go back to trying on your fork? I couldn't find anywhere in their web UI to manually clear the cache, which may be making results from your fork not 100% representative.

@ararslan
Copy link
Member Author

ararslan commented Aug 23, 2016

Yeah, Circle being angry on PRs that don't have Circle tests is understandably pretty annoying. It's weird though, I wasn't getting that error on my fork. 😕 I'll rebase and try on the fork again and see what happens.

@ararslan ararslan closed this Jan 16, 2017
@ararslan ararslan deleted the aa/circleci branch January 16, 2017 03:55
@tkelman tkelman mentioned this pull request Aug 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testsystem The unit testing framework and Test stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants