Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI failure #873

Closed
hannesm opened this issue Sep 12, 2023 · 3 comments
Closed

CI failure #873

hannesm opened this issue Sep 12, 2023 · 3 comments

Comments

@hannesm
Copy link
Contributor

hannesm commented Sep 12, 2023

As you mentioned in #858, the CI service is considered to be stable.

Now, I just observed some failure at: https://ocaml.ci.dev/github/robur-coop/albatross/commit/2f316d2e49866fe08b9e12c12194062bbbaa2329/variant/debian-12-5.0_opam-2.1 -- and I've seen similar logs before, so maybe there's a way to tackle the root cause.

Since the CI sometimes removes all the logs, I paste below the entire log from the link above:

2023-09-12 11:12.42: New job: test robur-coop/albatross https://github.com/robur-coop/albatross.git#refs/heads/main (2f316d2e49866fe08b9e12c12194062bbbaa2329) (linux-x86_64:debian-12-5.0_opam-2.1)

Base: ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea

Opam project build


To reproduce locally:


git clone --recursive "https://github.com/robur-coop/albatross.git" -b "main" && cd "albatross" && git reset --hard 2f316d2e

cat > Dockerfile <<'END-OF-DOCKERFILE'

FROM ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea

# debian-12-5.0_opam-2.1

USER 1000:1000

ENV CLICOLOR_FORCE="1"

ENV OPAMCOLOR="always"

WORKDIR /src

RUN sudo ln -f /usr/bin/opam-2.1 /usr/bin/opam

RUN opam init --reinit -ni

RUN opam exec -- ocaml -version && opam --version

WORKDIR /src

RUN sudo chown opam /src

RUN cd ~/opam-repository && (git cat-file -e 95ff62cd8c4b49edfe81945606a015c8005774ae || git fetch origin master) && git reset -q --hard 95ff62cd8c4b49edfe81945606a015c8005774ae && git log --no-decorate -n1 --oneline && opam update -u

COPY --chown=1000:1000 albatross.opam ./

RUN opam pin add -yn albatross.dev './'

ENV DEPS="alcotest.1.7.0 angstrom.0.15.0 asn1-combinators.0.2.6 astring.0.8.5 base-bigarray.base base-bytes.base base-domains.base base-nnp.base base-threads.base base-unix.base base64.3.5.1 bigstringaf.0.9.1 bos.0.2.1 ca-certs.0.2.3 checkseum.0.5.1 cmdliner.1.2.0 conf-gmp.4 conf-gmp-powm-sec.3 conf-libnl3.1 conf-pkg-config.3 cppo.1.6.9 csexp.1.5.2 cstruct.6.2.0 decompress.1.5.2 dns.7.0.3 dns-client.7.0.3 dns-client-lwt.7.0.3 domain-name.0.4.0 dune.3.10.0 dune-configurator.3.10.0 duration.0.2.1 eqaf.0.9 faraday.0.8.2 faraday-lwt.0.8.2 faraday-lwt-unix.0.8.2 fmt.0.9.0 fpath.0.7.3 gmap.0.3.0 h2.0.10.0 happy-eyeballs.0.6.0 happy-eyeballs-lwt.0.6.0 hex.1.5.0 hkdf.1.0.4 hpack.0.10.0 http-lwt-client.0.2.5 httpaf.0.7.1 ipaddr.5.5.0 logs.0.7.0 lru.0.3.1 lwt.5.7.0 macaddr.5.5.0 metrics.0.4.1 metrics-influx.0.4.1 metrics-lwt.0.4.1 metrics-rusage.0.4.1 mirage-crypto.0.11.1 mirage-crypto-ec.0.11.1 mirage-crypto-pk.0.11.1 mirage-crypto-rng.0.11.1 mirage-crypto-rng-lwt.0.11.1 mtime.2.0.0 ocaml.5.0.0 ocaml-base-compiler.5.0.0 ocaml-config.3 ocaml-options-vanilla.1 ocaml-syntax-shims.1.0.0 ocamlbuild.0.14.2 ocamlfind.1.9.6 ocplib-endian.1.2 optint.0.3.0 owee.0.7 pbkdf.1.2.0 psq.0.2.1 ptime.1.1.0 randomconv.0.1.3 re.1.11.0 result.1.5 rresult.0.7.0 seq.base sexplib0.v0.16.0 solo5-elftool.0.3.1 stdlib-shims.0.3.0 tls.0.17.1 tls-lwt.0.17.1 topkg.1.0.7 uutf.1.0.3 x509.0.16.5 zarith.1.13"

ENV CI="true"

ENV OCAMLCI="true"

RUN opam update --depexts && opam install --cli=2.1 --depext-only -y albatross.dev $DEPS

RUN opam install $DEPS

COPY --chown=1000:1000 . /src

RUN opam exec -- dune build @install @check @runtest && rm -rf _build


END-OF-DOCKERFILE

docker build .

END-REPRO-BLOCK


2023-09-12 11:12.42: Using cache hint "robur-coop/albatross-ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea-debian-12-5.0_opam-2.1-d7790a8e8307b6c95dc48b4264ef6628"

2023-09-12 11:12.42: Using OBuilder spec:

((from ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea)

 (comment debian-12-5.0_opam-2.1)

 (user (uid 1000) (gid 1000))

 (env CLICOLOR_FORCE 1)

 (env OPAMCOLOR always)

 (workdir /src)

 (run (shell "sudo ln -f /usr/bin/opam-2.1 /usr/bin/opam"))

 (run (shell "opam init --reinit -ni"))

 (run (shell "opam exec -- ocaml -version && opam --version"))

 (workdir /src)

 (run (shell "sudo chown opam /src"))

 (run (cache (opam-archives (target /home/opam/.opam/download-cache)))

      (network host)

      (shell "cd ~/opam-repository && (git cat-file -e 95ff62cd8c4b49edfe81945606a015c8005774ae || git fetch origin master) && git reset -q --hard 95ff62cd8c4b49edfe81945606a015c8005774ae && git log --no-decorate -n1 --oneline && opam update -u"))

 (copy (src albatross.opam) (dst ./))

 (run (network host)

      (shell "opam pin add -yn albatross.dev './'"))

 (env DEPS "alcotest.1.7.0 angstrom.0.15.0 asn1-combinators.0.2.6 astring.0.8.5 base-bigarray.base base-bytes.base base-domains.base base-nnp.base base-threads.base base-unix.base base64.3.5.1 bigstringaf.0.9.1 bos.0.2.1 ca-certs.0.2.3 checkseum.0.5.1 cmdliner.1.2.0 conf-gmp.4 conf-gmp-powm-sec.3 conf-libnl3.1 conf-pkg-config.3 cppo.1.6.9 csexp.1.5.2 cstruct.6.2.0 decompress.1.5.2 dns.7.0.3 dns-client.7.0.3 dns-client-lwt.7.0.3 domain-name.0.4.0 dune.3.10.0 dune-configurator.3.10.0 duration.0.2.1 eqaf.0.9 faraday.0.8.2 faraday-lwt.0.8.2 faraday-lwt-unix.0.8.2 fmt.0.9.0 fpath.0.7.3 gmap.0.3.0 h2.0.10.0 happy-eyeballs.0.6.0 happy-eyeballs-lwt.0.6.0 hex.1.5.0 hkdf.1.0.4 hpack.0.10.0 http-lwt-client.0.2.5 httpaf.0.7.1 ipaddr.5.5.0 logs.0.7.0 lru.0.3.1 lwt.5.7.0 macaddr.5.5.0 metrics.0.4.1 metrics-influx.0.4.1 metrics-lwt.0.4.1 metrics-rusage.0.4.1 mirage-crypto.0.11.1 mirage-crypto-ec.0.11.1 mirage-crypto-pk.0.11.1 mirage-crypto-rng.0.11.1 mirage-crypto-rng-lwt.0.11.1 mtime.2.0.0 ocaml.5.0.0 ocaml-base-compiler.5.0.0 ocaml-config.3 ocaml-options-vanilla.1 ocaml-syntax-shims.1.0.0 ocamlbuild.0.14.2 ocamlfind.1.9.6 ocplib-endian.1.2 optint.0.3.0 owee.0.7 pbkdf.1.2.0 psq.0.2.1 ptime.1.1.0 randomconv.0.1.3 re.1.11.0 result.1.5 rresult.0.7.0 seq.base sexplib0.v0.16.0 solo5-elftool.0.3.1 stdlib-shims.0.3.0 tls.0.17.1 tls-lwt.0.17.1 topkg.1.0.7 uutf.1.0.3 x509.0.16.5 zarith.1.13")

 (env CI true)

 (env OCAMLCI true)

 (run (cache (opam-archives (target /home/opam/.opam/download-cache)))

      (network host)

      (shell "opam update --depexts && opam install --cli=2.1 --depext-only -y albatross.dev $DEPS"))

 (run (cache (opam-archives (target /home/opam/.opam/download-cache)))

      (network host)

      (shell "opam install $DEPS"))

 (copy (src .) (dst /src))

 (run (shell "opam exec -- dune build @install @check @runtest && rm -rf _build"))

)


2023-09-12 11:12.42: Waiting for resource in pool OCluster

2023-09-12 11:12.42: Waiting for worker…

2023-09-12 11:14.23: Got resource from pool OCluster

Building on x86-bm-c4.sw.ocaml.org

All commits already cached

HEAD is now at 2f316d2 opam: add fpath dependency explicitly


(from ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea)

2023-09-12 11:14.24 ---> using "f0d5e9b94774e5249d9626ec509c9051e626e9fa00cb03ff73f2cb6e7eb5228b" from cache

Uncaught exception: Sys_error("/var/cache/obuilder/result/f0d5e9b94774e5249d9626ec509c9051e626e9fa00cb03ff73f2cb6e7eb5228b/env: No such file or directory")

2023-09-12 11:14.24: Job failed: Failed: Internal error

And - as reported earlier, pasting from the Web UI is bad (it injects lots of newlines). I thought you had fixed that issue, but it looks like there's a regression.

@mtelvers
Copy link
Member

Thank you for reporting this issue. I have made a preliminary investigation: env contains the environment variables and is extracted from the Docker base image using docker image inspect and saving .Config.Env to a file. This file is missing because the worker exited with a fatal exception while the image was being extracted about half an hour earlier. Investigating that issue showed that the worker was running low on disk space and needed to prune virtually everything from the cache. The prune operation removed a cached layer, which was a dependency of a running job. The delete cascaded to the child layers, which could not be removed as it was in use, therefore causing the exception. The selection of items to be pruned is made by considering all cache layers ordered by time last used and which are older than 10 minutes. In this case, the 10-minute window was insufficient. I will look into this further tomorrow.

@edwintorok
Copy link
Contributor

FWIW I had a similar failure on my repo, and the way I worked it around is by pushing and empty commit git commit -am "bump for ci" --allow-empty (this still took advantage of most existing caching but got the broken worker out of this situation), otherwise simply restarting builds didn't help, it kept failing with same error (and yes I did notice an out of space earlier which affected both opam CI and ocaml CI).

@hannesm
Copy link
Contributor Author

hannesm commented Oct 7, 2023

I will close this issue, since there has been some commit to "ocurrent/obuilder" that may solve this issue.

@hannesm hannesm closed this as completed Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants