Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0112] Demote x86_64-darwin support to Tier 3 #112

Closed
wants to merge 1 commit into from

Conversation

piegamesde
Copy link
Member

@piegamesde piegamesde commented Oct 28, 2021

Rendered

🔗 Shepherd discussion: click here! 🔗

Discussion notice: please try to attach all discussions to a thread by using the code review feature. If your comment doesn't refer a specific line to attach to, use the first line instead.

@@ -0,0 +1,91 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this. Darwin support really requires a dedicated team to manage all of its idiosyncracies and breakages, and we don't have one.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding, we do have one - @NixOS/darwin-maintainers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main point here is that this team lacks the necessary man power. Whenever I've run into a non-trivial darwin problem where I've pinged the team, I usually was able to get some hints via Matrix, but the intended (?) thing never happened, namely someone with experience and stake in darwin taking fixing this problems onto them. I always ended up finding and testing the fix by myself, something I'm only able to do because I have a darwin machine I can use to do (multi-)day long rebuilds.

Slightly unrelated: This problem seems to be even worse for aarch64-darwin where even less ppl have access to the necessary hardware. For example ever since GHC 8.10.5 was released in early June, a lot of people, both users and core contributors, expressed interest in GHC for aarch64-darwin, however it took until late August for someone undertaking adding the actual support to nixpkgs.

Copy link
Member Author

@piegamesde piegamesde Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE the side note: aarch64-darwin is listed as Tier 7 in RFC 46 (~= no support), so it is out of scope for this RFC. I've adjusted the title accordingly. Edit: apparently, it is Tier 3 since a few weeks now.

@@ -0,0 +1,91 @@
---
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an active darwin user and maintainer, I'm sad to see it come to this, but I have to agree with the motivations and issues listed.

A less drastic step I would like to see tried first is to move any darwin-specific channel blockers to separate channels, so that darwin issues do not hold up linux security updates and can be dealt with out-of-band by darwin maintainers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, all grievances are about the channels, not about Darwin-only patches making expressions more complicated (which they do, but not by much, without breaking Linux builds, and macOS is popular enough to tolerate some extra verbosity here and there in the packages). I support going half-way and suspend channel blocker status (with extra Darwin-only channels that are blocked and build the same things anyway) until the SDK upgrade and Hydra/OfBorg maintenance questions are fully figured out (which they hopefully will be).

Copy link
Member

@vcunat vcunat Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channels? EDIT: I moved this comment to a better place: https://github.com/NixOS/rfcs/pull/112/files#r739342226

@@ -319,7 +319,7 @@ Define the preferences about the amount of time to wait for Tier-2 platform
fixes in various situations, and about interim resolution in case of failure
(keep old version on one platform, mark as broken, something else).

# Appendix A. Non-normative description of platforms in November 2019
# Appendix A. Non-normative description of platforms in October 2021
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this section of the RFC was intended as a living document to be updated by other RFCs, it was just an example listing of the platform tier list at the time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I noted it in the "Unresolved questions" section. Do you have alternative proposals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want a list of platforms and their tiers in Nixpkgs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssais thanks for the link. I didn't know about this file, if this is our new source of truth on the topic I think that it should be prominently linked from within RFC 46. Furthermore, we probably want to remove the appendix list from RFC 46 and instead add all support tiers to supported.nix.

CC @zimbatm and @7c6f434c as the authors of the respective files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for linking from #46 header as a minor edit to the RFC, +1 on adding further tiers, -1 on removing large chunks of clearly timestamped text from an accepted RFC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC #46 posits that the official reference for the platform list would be the manual. I wanted to add a link to the source and rendered section in the manual but it doesn't seem to exist yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What shall we do about this? My proposal would be to simply replace that section of RFC 46 with a pointer to the supported.nix so that we keep a single source of truth. But I can also add a section to the manual if you prefer, risking that both files become out of sync over time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just opened #113 to discuss this. Imo, the RFC shouldn't be altered significantly. A source of truth should be selected and pointed out from the RFC, be that documentation or another part of Nixpkgs.

Copy link
Member

@domenkozar domenkozar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the motivation list, I noticed there's a better proactive approach to fixing this.

We're already working on SDK bump as part of https://opencollective.com/nix-macos

We currently lack access to ofborg and hydra to add Darwin builders and to react when there are troubles blocking macOS machines.

It's not a problem of an understaffed macOS maintainer group, but a discussion about what is the process for someone to get access to our core infrastructure and help out.

The way I see it is that we can resolve both of these for 22.05 release without demoting the tier for macOS.

We have a lot of people using Nix with macOS and while I agree there are issues,
a lot of those listed do not have much to do with tier 1 or lack some kind of quantitative measure to be able to say what are the thresholds to keep things in tier 1.

I'd like to suggest:

  • we add people to Hydra/ofborg to help out with issues
  • wait for SDK to be ready (@toonn is working on this each month)
  • communicate better when macOS is blocking things to lower the feedback loop
  • set up specific metrics for a platform to have tier 1 support


Darwin is constantly adding additional maintainer burden. Especially, it does not live up to the requirements:

- "A lot of packages built by Hydra, full ofBorg support."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some numbers to back this off?

Darwin is constantly adding additional maintainer burden. Especially, it does not live up to the requirements:

- "A lot of packages built by Hydra, full ofBorg support."
- There is only one ofBorg builder. It can take several hours or days for an ofBorg build to finish for Darwin, if it finishes at all.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not due to lack of builders, but due to ofborg being understaffed. We have 5 machines waiting a few months to be added to the pool and there's just one person that can do it.


- "A lot of packages built by Hydra, full ofBorg support."
- There is only one ofBorg builder. It can take several hours or days for an ofBorg build to finish for Darwin, if it finishes at all.
- Hydra builders a greatly understaffed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, due to the fact, there's only one person maintaining Hydra and there's no way for people to apply and help out.

See NixOS/infra#182 as an example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We have lacked build capacity for darwin in the past in terms of number of builders and this may very well be a problem in the future again. For darwin we have to use physical machines ordered by the foundation and set up manually, due to legal and organizational issues. This approach scales much worse than what we use for linux currently and would be much harder to solve since it requires physical access.
  • There seem to be specific issues with either darwin or the darwin builders we have that is not down to a lack of people in the infra team. The darwin builders tend to get stuck for still unknown (?) reasons which requires reboots or restarting the queue runners. Such issues rarely happen with aarch64-linux and x86_64-linux. I don't find it very compelling to solve such an issue by throwing people at a problem that shouldn't exist in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re your second bullet. You say the problem is not due to a lack of people on the infra team but how do you expect the problem to be solved if no one can look into it? AFAIK the only people who can trigger the stop-gap "solution" of restarting builders and queue runners are Eelco and Graham. It seems like they either don't have time to do so or have another reason for not doing so (does it mess with progress of Linux evaluations?).

These same people are also the only ones with enough access to look into the problem, figure out these unknown reasons. If they don't have the time to restart the queue runner how can we expect them to have enough time to look into this? Many people have asked publicly whether they can help, I've also asked privately. Responses haven't been forthcoming. I believe the main problem here is there's no established way to deal with the trust required, maybe limited or temporary access could help with this?

We are trying to improve the Darwin situation to make it less of a burden to core maintainers and not having reliable CI makes this way harder on all parties involved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For darwin we have to use physical machines ordered by the foundation and set up manually, due to legal and organizational issues.

Is there more context available on this? AWS and MacStadium provide hosted machines for example, and they have enough support from Apple that they had access to M1 machines before they were generally available, so I can't imagine they're doing something Apple doesn't approve of. Maybe the situation changed since darwin ofborg was initially set up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS macs are actually physical machines and are really very expensive.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the fact that the x86_64-darwin builders are running in VMs, rather than macOS on bare metal, have anything to do with why they get stuck? Assuming the M1 builders aren't virtualized, are they hanging with the same frequency as the Intel builders?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the builders getting stuck. They usually become idle. It's the queue runner that stops scheduling jobs for some reason.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this only affects the macOS builders, and not the NixOS builders, is that right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding. Though this is all gleaned from messages here and in various Matrix rooms. Maybe @grahamc or @vcunat or others have more accurate information?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most common issue IIRC is that the builders get stuck on a single build step for days, the log often just showing “Sending Inputs”. One fix has been rebooting these machines.

- "A lot of packages built by Hydra, full ofBorg support."
- There is only one ofBorg builder. It can take several hours or days for an ofBorg build to finish for Darwin, if it finishes at all.
- Hydra builders a greatly understaffed.
- If Apple follows through on [their announcement to discontinue](https://www.businessinsider.com/apple-macbook-pro-discontinued) x86-based devices, the situation won't improve due to a physical lack of hardware.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this means? Apple is moving to aarch64 and we have 6 builders that are the fastest to finish in the whole hydra builder group.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few people will have outdated x86_64-darwin hardware lying around, which can reasonably be used to test changes that would time out on ofborg. This is at least what I have been doing, although it is quite painful (testing changes against staging takes a few days usually and you often run into some new darwin regression). aarch64-darwin hardware is much less available and I don't think many ppl now using NixOS will invest in hardware of that platform going forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to get x86_64-darwin with https://github.com/lhotari/action-upterm and other github actions for a period of 6h at once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“It is possible” is anything but “it is easy and well supported”. Do you have detailed instructions on how to do that with little overhead? e.g. how to integrate it into nix as a remote builder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the solution is (ab)using GitHub actions in a way that is clearly not intended and may well be prohibited in the future, then it may rather prove the point above?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future x86_64-darwin will definitely become hard to support, but that will happen as aarch64-darwin will become more common and easier to support. We shouldn't justify demoting the highest tier darwin platform with architecture specific reasoning alone. Demoting x86_64-darwin to tier 3 while aarch64-darwin is still tier 3 is completely different from doing that when aarch64-darwin is tier 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a quite straightforward solution to get an x86_64 macos machine spun up for ~$0.12/h if anyone wants to ping me about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@risicle ping

- If Apple follows through on [their announcement to discontinue](https://www.businessinsider.com/apple-macbook-pro-discontinued) x86-based devices, the situation won't improve due to a physical lack of hardware.
- "Most packages work, credible ambition to reach Tier 1 coverage at some point."
- Tier 1 support is far out of reach.
- We lack expertise in the greater community to support Darwin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can you provide some proof?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowledge about code signing seems to be centralized in 1-5 people at the moment, for example.

Copy link
Member

@domenkozar domenkozar Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ofborg and Hydra are essentially run by a single person, yet it is our core infrastructure (which is also affecting our darwin capacity).

I don't think the number of people is the trouble but the amount of time those maintainers can dedicate.

Tier 2 states that darwin maintainers are to be given time to address the issue, but it's not unlike tier 1 where that timing is indefinite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are saying that the single point of failure in the case of darwin/ofborg is a problem, but not in the case of darwin knowledge?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the issue here is not whether we lack expertise in the greater nix community, which might or might not be an issue (it's definitely much less than the Linux expertise, but maybe it's enough?).

The issue here is that we lack expertise in the very small subset of the community that has expertise in ofborg and hydra.


An excerpt of these issues are:

- The only way to debug issues without owning a Mac is to run Darwin in a VM, of which the legal situation is unclear.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many ways to do this, but we don't communicate it clearly.

There's https://github.com/lhotari/action-upterm that you can gain access to an SSH session with macos machine for up to 6h.

We only need to make this clear in our documentatation and contributing docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to have a long lived setup there is a legally questionable way to run Darwin inside of qemu. I don't want to rely on that and get into legal trouble in the long run.

Also I couldn't setup the nix-daemon inside the VM even after trying for weeks. Not sure if I was doing something wrong or the VM was special in some way. Also if I remember correctly no one could help me with the problem at the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a difference if you want to or if you can :)

GitHub actions makes it easy and legal to get access to a macOS machine in periods of 6h at once.

Either way, the point of supporting a platform is that there's enough maintainers using that platform, not that everyone should be able to run it on their machine of choice.

macOS is widely used in the world and that's why we support it and attract new maintainers :)

- The MacOS SDK cannot be updated and is stuck on 10.12 (released 2016/09) because Apple does not publish the required sources.
- https://github.com/NixOS/nixpkgs/issues/101229#issuecomment-938747052
- It is highly unlikely that we can report bugs to projects with such an outdated and unsupported SDK version
- The MacOS SDK being outdated blocks nixpkgs moving to go 1.17:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we're working on it and it's going to happen: https://discourse.nixos.org/t/nix-macos-monthly/12330

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the September entry:

As for the SDK bump, configd remains a hard nut to crack. Somewhere around macOS 10.13 Apple has stopped releasing all the headers that are necessary to build it. XPC no longer seems avoidable as a dependency, CoreFoundation seems incomplete (we rely on Darling for the missing bits now), there’s some packages hosted on opensource.apple.com 3 but not listed in the macOS releases, like neon and OpenBSM. I don’t see an alternative to getting the XPC headers from the SDK and likewise for some other missing headers. There’s several projects online where missing headers are worked around by stubbing, like OSXPrivateSDK 3 and GoVPN 3, but it doesn’t seem like a good way to deal with the issue. Fabricating a constant can lead to unexpected behavior. That’s why I intend to use binaries from the SDK whenever dependencies aren’t available. This will be a step back with regards to building from source unfortunately.

That doesn’t sound to me like it’s going to get better in the future, or whether it’s even doable in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very much doable, we just don't know how much of the stdenv will contain binary blobs. That's the case with darwin anyway, since it's linking against system frameworks and libSystem.

@toonn is working on this and it's indeed a lot of work, but it will get better :)

- The MacOS SDK being outdated blocks nixpkgs moving to go 1.17:
- https://github.com/NixOS/nixpkgs/pull/127519#issuecomment-864926149
- This will become a growing pain point once more packages become go 1.17-only (tailscale, talosctl, …)
- Multiple Packages were marked as broken, because they require symbols from newer MacOS SDK versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are the same instance of fixing SDK.

- Multiple Packages were marked as broken, because they require symbols from newer MacOS SDK versions.
- https://github.com/NixOS/nixpkgs/commit/3ceb5ab5ed0d1fcbe53ef00621f44d61dc524796
- https://github.com/NixOS/nixpkgs/commit/c9a3ac5d3cb5e910238d01e534d74d5d50e4b6b7
- Curl added a SystemConfiguration dependency for NAT64 support, which introduced a reference loop, requiring a downstream patch to workaround.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how software development between platforms works, we resolve this bug in a few days.

I see this as proof that macOS is a worthy platform, rather than the opposite.

Copy link
Member

@SuperSandro2000 SuperSandro2000 Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took in total about 5 months NixOS/nixpkgs#124502 and the Darwin maintainer team was pinged after about a week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems I was looking at the wrong pull request, would appreciate if there was a link :)

Looking closely at the fixes, it seems like we've been waiting on upstream to fix their breakage.

Tier 2 states that the platform should be given time to address the situation and then tier 1 can be prioritized. There's no reason to downgrade to Tier 3, as this case is already covered by Tier 2.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took in total about 5 months and the Darwin maintainer team was pinged after about a week.

I mean reading that thread again, that's not really my takeaway. Looks to me like the maintainers proposed some candidate solutions in a matter of days, followed by some brainstorming, and then months of no activity.

If there was a reasonable and well-defined deadline for the darwin maintainers to address the problem, we would have easily met it with one of the discussed workarounds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the author of said curl bump I have to say that the reality is that darwin did block curl updates in that case. What is the definition of blocked if not that nobody has the willingness/time/expertise to work on a solution for five months?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was a reasonable and well-defined deadline for the darwin maintainers to address the problem, we would have easily met it with one of the discussed workarounds.

Isn’t that the whole point of being Tier 2? Having a team which can fix blockers within days, and is organized enough to make that happen? That’s the main point why MacOS should be downgraded to Tier 3, and then it’s on the @NixOS/darwin-maintainers team to demonstrate that they can do it (e.g. through doing it over the course of 6 months).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just saying that this case seems like more of a #88 problem. Darwin certainly did block this update, and for far too long - but there was no established contract for how long is acceptable.

Copy link
Member

@domenkozar domenkozar Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was a reasonable and well-defined deadline for the darwin maintainers to address the problem, we would have easily met it with one of the discussed workarounds.

Isn’t that the whole point of being Tier 2? Having a team which can fix blockers within days, and is organized enough to make that happen? That’s the main point why MacOS should be downgraded to Tier 3, and then it’s on the @NixOS/darwin-maintainers team to demonstrate that they can do it (e.g. through doing it over the course of 6 months).

That's not the point of tier 2 as I understand it.

Tier 2 says:

If no solution is easily found, the problems should be reported to the platform maintainers with a reasonable amount of time provided for fixing the issue.

That can be understood two ways:

  1. after a some time passes, then the platform doesn't get the fix
  2. after a some time passes, then the platform risks being demoted to tier 3

If you look at Tier 1 requirements:

Problems on these platforms can block updates for as long as necessary to resolve the issue.

It's clear that it's 1), but there's unfortunately no time specified.

- Curl added a SystemConfiguration dependency for NAT64 support, which introduced a reference loop, requiring a downstream patch to workaround.
- This blocked updating to more recent curl versions for most of the last release cycle.
- https://github.com/NixOS/nixpkgs/pull/124502#issuecomment-850834981
- Enabling brotli support by default in curl broke the Darwin stdenv, which draws in a great number of packages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at those changes it seems the original author forgot to include references to darwin.

I don't understand how this affects the tiers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have included it if I would have known it but due to the fact that Darwin stdenv is very different compared to Linux and once again ofborg being to slow it was forgotten.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See all my other responses about ofborg/hydra situation.

@piegamesde piegamesde changed the title [RFC 0112] Demote Darwin support to Tier 3 [RFC 0112] Demote x86_64-darwin support to Tier 3 Oct 29, 2021
# Motivation
[motivation]: #motivation

Darwin is constantly adding additional maintainer burden. Especially, it does not live up to the requirements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another data point from my experience maintaining haskellPackages:

  • Hydra has been unreliable for getting darwin builds done. This has been a major pain point and seriously limited our ability to finish haskell-updates rotations in a sensible time frame. As a result we have decided to ignore queued darwin builds when merging haskell-updates. This is, however, only possible because nothing in haskellPackages relates to a darwin channel blocker, so this “just” slows down trunk evaluations building. For staging-next darwin has been the problem slowing down rotations.
  • Darwin is not just another platform: Maybe a bit obvious, but I want to point out that portability to darwin is much worse than for other Linux architectures: Upstream support is usually worse, it's harder to get help from upstream and the actual portability issues are often non-trivial to solve.

Copy link
Member

@vcunat vcunat Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some recent staging-next rotations also decided not to wait on x86_64-darwin. And it looks imminent for the current one, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 21.05 release, I decided to move forward when darwin rebuilds were ~6k in staging-next jobset, as it was mostly haskell packages. I'm not sure how many nix-darwin-haskell users we have, but having to lag behind unstable for a few days seems reasonable when it causes risk to release dates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another data point from my experience maintaining haskellPackages:

  • Hydra has been unreliable for getting darwin builds done. This has been a major pain point and seriously limited our ability to finish haskell-updates rotations in a sensible time frame. As a result we have decided to ignore queued darwin builds when merging haskell-updates. This is, however, only possible because nothing in haskellPackages relates to a darwin channel blocker, so this “just” slows down trunk evaluations building. For staging-next darwin has been the problem slowing down rotations.

I've noted in other comments what's going on with Hydra/ofborg and darwin workers, could we aim to have one thread for that?

  • Darwin is not just another platform: Maybe a bit obvious, but I want to point out that portability to darwin is much worse than for other Linux architectures: Upstream support is usually worse, it's harder to get help from upstream and the actual portability issues are often non-trivial to solve.

The reason we care so much about macOS is that it's so widely used among the developers - depending on what statistic you take a look at, it's somewhere around 45% of market share, barely a few percent after Linux.

It's a lot harder to support, but that shouldn't be the metric of itself, but rather for how long it's allowed to Tier 1 platforms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noted in other comments what's going on with Hydra/ofborg and darwin workers, could we aim to have one thread for that?

I'll reply there, sure. However my comment was more about how this impacts development and maintenance in nixpkgs.

The reason we care so much about macOS is that it's so widely used among the developers - depending on what statistic you take a look at, it's somewhere around 45% of market share, barely a few percent after Linux.

I find it quite instructive that this comment reads like a sales pitch: While I do subscribe to nixpkgs' emerging “build all the things” philosophy and would love to make nixpkgs available to every conceivable user, it is not my primary concern. The primary concern needs to be the ability to maintain whatever we support in a sustainable fashion. In my experience this is not the case and I don't want to burden contributors that don't use darwin with the extra work involved, like having to invest a full day into fixing a darwin regression introduced by a change trivial to implement for Linux.

Also I would like to ask you who the “we” you are referring to is. I think this RFC demonstrate how unenthusiastic some core contributors about supporting darwin as a Tier 2 platform.

It's a lot harder to support, but that shouldn't be the metric of itself, but rather for how long it's allowed to Tier 1 platforms.

Not sure I quite understand what you mean by the second part of this sentence, but I do think it is the most important metric as outlined above. nixpkgs is an open source project, so everything it is should be dictated by what its contributors want and can pull off.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darwin is not just another platform: Maybe a bit obvious, but I want to point out that portability to darwin is much worse than for other Linux architectures: Upstream support is usually worse, it's harder to get help from upstream and the actual portability issues are often non-trivial to solve.

Worth also noting that our darwin platform is unusual even to upstreams which support Linux + macOS - they typically expect XCode + BSD coreutils + Apple Clang, while we use "fake" xcbuild, GNU coreutils, and mainline LLVM Clang. So the portability issues are nontrivial even when macOS is ostensibly supported upstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of further diverging this tangent...

I think linux users do actually gain something from nixpkgs' darwin support which we tend to ignore - much better clang support almost across the board.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noted in other comments what's going on with Hydra/ofborg and darwin workers, could we aim to have one thread for that?

I'll reply there, sure. However my comment was more about how this impacts development and maintenance in nixpkgs.

The reason we care so much about macOS is that it's so widely used among the developers - depending on what statistic you take a look at, it's somewhere around 45% of market share, barely a few percent after Linux.

I find it quite instructive that this comment reads like a sales pitch: While I do subscribe to nixpkgs' emerging “build all the things” philosophy and would love to make nixpkgs available to every conceivable user, it is not my primary concern. The primary concern needs to be the ability to maintain whatever we support in a sustainable fashion. In my experience this is not the case and I don't want to burden contributors that don't use darwin with the extra work involved, like having to invest a full day into fixing a darwin regression introduced by a change trivial to implement for Linux.

I don't think anyone should be forced into fixing darwin breakages. We should organize better to have macOS contributors around that help out, but as part of tier 2 darwin has a limited time to response (which is unfortunately unspecified).

My comment wasn't meant as a sales pitch, but rather to build empathy to our users and why it's important that we try to fix things rather than make things worse (which is what this tier demotion would do).

Also I would like to ask you who the “we” you are referring to is. I think this RFC demonstrate how unenthusiastic some core contributors about supporting darwin as a Tier 2 platform.

As I've said, noone is forced to contribute darwin support. What is happening is a misunderstanding that someone is expected to fix their change to build on darwin. Tier 2 allows merging changes if there are no darwin fixes after a certain time (that time is yet to be determined, but that's a topic for the tiers RFC).

It's a lot harder to support, but that shouldn't be the metric of itself, but rather for how long it's allowed to Tier 1 platforms.

Not sure I quite understand what you mean by the second part of this sentence, but I do think it is the most important metric as outlined above. nixpkgs is an open source project, so everything it is should be dictated by what its contributors want and can pull off.

100%.

[design]: #detailed-design

- Together with this RFC, `0046-platform-support-tiers.md` will be updated accordingly (simply move `x86_64-darwin` down one section).
- Whatever needs to be done to make Darwin not block any channels anymore (TODO).
Copy link
Member

@vcunat vcunat Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channels
  • our Tier 3 says "No channel-blocking jobs on Hydra."
  • the NixOS channels never had anything to do with darwin at all, and those are to be used by (almost?) every *-linux user
  • we have some darwin-specific channels (e.g. nixpkgs-21.05-darwin), and there's no motivation to remove them; combination with Tier 3 wording would be weird, but that's a detail we might amend somehow
  • the nixpkgs-unstable channel... is just weird. I consider it mainly useful for x86_64-darwin. Even if you don't use NixOS, I believe the NixOS tests are a useful gating that discover also problems applicable to non-NixOS Linux usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About arch-specific channels: I wrote this sentence in #46 badly, sorry. The context is impact on the other development, so there is no justification in RFC to restrict configuration of channels not used for any higher-tier architecture.

About nixpkgs-unstable: I think there are cases where some annoying issues with boot process break NixOS tests and some non-NixOS Nix-on-GNU/Linux users switch to nixpkgs channels? Unsure, as I usually use git checkouts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could that line be improved then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line in the RFC? Maybe «No jobs on Hydra block any channel also used by a higher-tier platform.» would be an improvement and should not get too many objections. But maybe it can be improved even further.

[design]: #detailed-design

- Together with this RFC, `0046-platform-support-tiers.md` will be updated accordingly (simply move `x86_64-darwin` down one section).
- Whatever needs to be done to make Darwin not block any channels anymore (TODO).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm missing is a list of concrete requirements to reinstate x86_64-darwin to tier 2. Considering there is enough of a burden to demote the platform to tier 3, I want to avoid unclear pushback when efforts are under way towards reinstatement. Maintainers have reasons to be wary but "No extra support burden," isn't a realistic goal. Can we quantify what would need to happen?

As I understand there's two sets of problems, one can only be resolved with more hardware and people to manage Darwin infra:

  • "A lot of packages built by Hydra, full ofBorg support."
  • "Some ordinary packages are channel blockers on Hydra."

This is a problem the community of Darwin users at large can't really help with AFAICT, but maybe I'm mistaken?

The other has to do with the number of supported packages and seems like something Darwin users can help out with:

  • "Most packages work, credible ambition to reach Tier 1 coverage at some point."

Looking at Hydra I put the number of packages available on Darwin (both architectures) at around ~28k (from one of my own jobsets), for Linux it looks like ~42k (filter on x86_64-linux for staging-next). That's about ~75% so I'm not sure why tier 1 support is considered far out of reach based on this. (Anecdotally, as a Darwin user, package availability hasn't been much of an issue for me.)
Is there a specific number of packages we can aim to support or have I misunderstood and this point isn't actually about package availability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm missing is a list of concrete requirements to reinstate x86_64-darwin to tier 2. […] Can we quantify what would need to happen?

The argument is "Darwin does not meet the Tier 2 requirements right now", so the requirements to go back to Tier 2 are laid out in RFC 46.

Is there a specific number of packages we can aim to support or have I misunderstood and this point isn't actually about package availability?

I think there are two aspects to the "most packages work". The one is the user perspective, i.e. how many packages are available in the repository and not marked as broken. I'll trust your judgement on that. But the other one is the developer perspective, i.e. how often do packages break in Darwin-specific ways. Which goes back to your point mentioned above:

Maintainers have reasons to be wary but "No extra support burden," isn't a realistic goal.

  • There seems to be a general consensus that the current extra support burden is "too much"
  • Note that due to the nature of the platform (proprietary, further away from Linux, etc.), expect the amount of friction developers will be accepting to put up with to be lower than for example with aarch64-linux.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comes back to definining a set of e.g. 100 core packages, that should be maintained at all times, in addition to the tiers. Tier 1 and 2 should quickly fix any breakage of the core packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking for clarification of "Darwin does not meet the Tier 2 requirements." RFC #46 isn't very normative and I don't believe that was even the intent behind it.

These are the tier 2 requirements, as I read RFC 46:

  • A lot of packages built by Hydra, full ofBorg support.
  • Most packages work, credible ambition to reach Tier 1 coverage at some point.

A lot of packages built by Hydra,

Can we put a number on this? Say, 75% of x86_64-Linux packages?

full ofBorg support.

What does this mean, ofBorg supports Darwin builds, so that's not enough? Can we put a time on this, like within 24h? Sidenote: Can we change maintainer policy, temporarily, to avoid having patches applied across all platforms, excluding Darwin if the patch isn't needed there and would cause a stdenv rebuild, To avoid overpowering the CI capacity we do have.

Most packages work,

Can we put a number on this, like "less than 10% of packages are marked broken?"

credible ambition to reach Tier 1 coverage at some point.

How can we, as Darwin maintainers, demonstrate whether there is "credible ambition" to reach tier 1 and how far in the future can "some point" be?


I agree the current burden indeed seems like too much, only the core maintainers can really judge this and I don't see many voices in opposition of the sentiment. That's why I'm not arguing against demoting Darwin (though I do feel like grouping x86_64-darwin with aarch64-darwin feels overly harsh). What I am asking for is a clear way back to tier 2. If any maintainer can just shoot down attempts to reinstate Darwin as a tier 2 platform on the basis of the negative sentiment already incurred by the current situation with unclear statements like "it doesn't meet RFC 46 requirements," then how can we be expected to work towards this and not get demotivated?

It's very unfortunate that the crux of this issue (CI/ofBorg) is precisely something the community of Darwin users is relatively powerless to do much about.

Copy link
Member

@7c6f434c 7c6f434c Oct 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking for clarification of "Darwin does not meet the Tier 2 requirements." RFC #46 isn't very normative and I don't believe that was even the intent behind it.

It's very unfortunate that the crux of this issue (CI/ofBorg) is precisely something the community of Darwin users is relatively powerless to do much about.

Unfortunately, these two things are in fact related: as Tier 2 includes, among other things, commitments around maintaining the capacity for Hydra/OfBorg, Tier 2 and Tier 1 are not questions of a purely technical status review, but also of organisational coordination; RFC 46 intentionally did not try to predict how access and, if applicable, financial issues will be negotiated…

(Even Tier 3 description acknowledges a possibility of similar coordination risks, but so far we have avoided those as cross toolchains are not that heavy compared to the entirety of native Tier 1 package set)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I do feel like grouping x86_64-darwin with aarch64-darwin feels overly harsh

Yeah, I'm a bit sorry about that. When I wrote the RFC I still thought aarch64-darwin was in Tier 7 lol. The problem there seems to be that hydra building packages is bound to the support tier right now. We could change that and move aarch64-darwin to Tier 4, or create a new tier in between or create new rules or something. But in the end I had to deem this out of scope for this specific RFC.

Generally, this discussion has shown us a lot of flaws in our current support tier handling. I would be in favor of an RFC that improves that.

It's very unfortunate that the crux of this issue (CI/ofBorg) is precisely something the community of Darwin users is relatively powerless to do much about.

Yes, it really is. I hope that at the very least, this whole discussion gives us new energy to bring these issues forward.

@toonn toonn mentioned this pull request Oct 29, 2021
@@ -0,0 +1,91 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if #88 could help, by leading to a workflow «these things cannot be fixed quickly, so we go forward on Linux — then we pin old versions for Darwin»

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(By the way, #88 has been accepted and merged in the meantime)

@Ericson2314
Copy link
Member

Yeah I think we shouldn't do this just yet. It's nice to separate out what's a macOS issue vs what's an CPU ISA issue. We loose the ability to troubleshoot that at a glance with this!

@jonringer
Copy link
Contributor

jonringer commented Oct 30, 2021

It's nice to separate out what's a macOS issue vs what's an CPU ISA issue.

Another take on this is: What if macOS was generally more stable, and linux was the platform which lagged behind in successful builds and jobs? I think it's reasonable to have a lower standard of polish if the contributor base is smaller.

For the 20.09 and 21.05 release, the main reason to lengthen staging-next cycles was due to queued darwin builds. I think it's awkward to try and satisfy both platforms; but we currently have the NixOS release being burdened with an unrelated platform.

I'm not in favor of getting rid of darwin support, I think it's a distinct strength of nix that we are able to extend the usability of nixpkgs to more than one platform. But I think it's realistic to pull back our expectations until the contributor activity matches.

We loose the ability to troubleshoot that at a glance with this!

Unless I missed something, there won't be removal of darwin jobsets. All the tooling will still be available. Just progress will not be hindered by other platforms.

Overall this feels similar in spirit to #88, which is trying to unburden progress of changes.

@Ericson2314
Copy link
Member

@jonringer is the problem total build capacity or critical path length?

@jonringer
Copy link
Contributor

@jonringer is the problem total build capacity or critical path length?

for the 21.05 release, I think it was both. It would always seem like the last ~6k builds were haskellPackages.

Then again, I don't have any concrete supporting evidence.

My main concern is still when there is a failing build, there's very few people to call upon. We did have one individual (name is escaping me) that helped with a few hundred mac builds during the 21.05 ZHF, but we really need that level of activity to be consistent across multiple contributors for me to feel good about darwin being a blocking platform.

@Ericson2314
Copy link
Member

Fair enough. I hope that CA derivations will help expand capacity a lot with this sort of thing but until then yes fighting more fires understaffed even without worse critical paths is hard.

@Mic92 Mic92 added the status: open for nominations Open for shepherding team nominations label Nov 3, 2021
start-date: 2021-10-27
author: piegames
co-authors: many
shepherd-team: (names, to be nominated and accepted by RFC steering committee)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC is now open for nominations! ⬇️

Copy link
Member

@mweinelt mweinelt Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to nominate @risicle, who is very aware of the issues and pain points around Darwin in nixpkgs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be very conflicted.

As much as part of me wants to cast it into the pits of hell, I've come to the realization that what with homebrew coming to linux it's becoming a squash-or-be-squashed situation. Broad strokes here, but either the linux crowd present the macos developer community with a more palatable alternative or linux users will find themselves expected to use homebrew. I've already encountered situations joining teams where all the instructions and tooling for getting a development environment up and running expect homebrew, and it's only something that's going to grow. So my interest would be in how we can grow the number of darwin developers instead of demoting darwin.

That said, I think this is going to become academic because Apple will likely make macos an unworkable platform within a release or two.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to nominate @andir

This comment was marked as resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this misconception a few times now, so I'd like to clarify: You do not need to be in favor of the proposed changes in order to shepherd this RFC. Ideally, the set of shepherds would be as representative for the different positions and opinions voiced during the discussion as possible.

If the shepherds decide that the RFC needs an overhaul or rewrite, or that we should try a different approach than the one proposed here, then this is totally fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to accept the nomination and participate an the shepharding of this RFC. In whatever direction this will move this discussion and finally an actionable RFC. My position on this is two fold: I'm just as much in favor of this RFC (primarily for the discussions sake) but I see how Darwin has a special role. My primary object would be to find a better formal way to deal with platform incompatibilities and perhaps a sunsetting plan for x86_64-darwin as the end of hardware sales is near.

Copy link

@dhess dhess Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is x86-linux a Tier 1 platform? Ahh, I see that it is not: https://github.com/NixOS/rfcs/blob/master/rfcs/0046-platform-support-tiers.md#tier-3-1

I was asking rhetorically, in any case. There are literally millions of x86_64-darwin Macs in the world and given how long people hang on to laptop and desktop computers these days, there will be for long after Apple stops selling them. So long as Apple supports Rosetta 2, I don't think it makes sense to drop support for x86_64-darwin. (Perhaps that is what you meant by, "sunsetting plan.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhess i686-linux is not strictly speaking a Tier-1 platform but large parts of it are effectively treated at almost Tier-1 standards because they are dependencies of Wine build for the Tier-1 x86_64-linux platform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's gotten a bit quiet in here. But we are still looking for more shepherds in order to move forward.

- The existing testing infrastructure will remain unchanged. Tests will continue to be run to the extent the infrastructure can provide.

# Drawbacks
[drawbacks]: #drawbacks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause a bit of a feedback loop:

  1. More darwin breakages introduced.
  2. Bigger gap between capacity of darwin contributors and capacity requirements of darwin breakages.
  3. Quality of darwin support lowers.
  4. Influx of new darwin users decreases / outflux of darwin users increases.
  5. Influx of new darwin contributors decreases / outflux of darwin contributors increases.
  6. Go to 2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darwin is currently the only non-linux platform currently supported in tier 1 and 2. It is likely that by demoting it support for all non-linux platforms will suffer. That is partly by design: a lot of the pain points raised in this RFC are related to multiplatform development workflows, so having to support only a single platform will improve all those points, but it's worth highlighting this consequence explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause a bit of a feedback loop

I agree with the first thee points, but I expect that the difference for end users will be small enough that 4. won't happen. Especially because Nix on Darwin is always a gradual opt-in instead of being a total buy-in like NixOS.

It is likely that by demoting it support for all non-linux platforms will suffer.

Which non-Linux platforms do you have in mind and what does Darwin support do for them? Scrolling through the RFC 46 platform list, I see some embedded and mingw targets in Tier 4 which are hardly related to Darwin, and all others are unsupported (Tier 7).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetBSD situation has apparently improved since, and I guess FreeBSD/NetBSD support would benefit from better clang support, but mayb Firefox + musl-Linux would force us to keep up anyway?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which non-Linux platforms do you have in mind and what does Darwin support do for them?

I checked the supported platforms and I was surprised at finding out what they were, but I didn't have any specific platform in mind, my point is not that darwin support would be helpful to other platforms because it has something in common with them, it's that it stops us from assuming linux is the only platform we need to support.

In particular note this part of rfc 46:

Tier 3:
Completely platform-specific fixes are expected to be rare and non-intrusive.

If we consider only linux will be in tiers 1 and 2, that might result in any fix requiring a big change because the existing code assumes only linux to be rejected as too intrusive.

Maybe supporting only linux is what we'd prefer anyway, but it would be better to make that explicit then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word «completely» is meaningful there — the next paragraph reads

General cleanups of non-standard assumptions (e.g. «everything that is not x86 is a kind of ARM» or «malloc(0) behaviour is a reliable indicator of other malloc features») useful for these platforms are welcome.

So currently-musl-only fixes have been justified before as «unconditionally assuming this or that is a bug, and enough of musl work has been shown usable that our fix should be accepted».

Specifically Darwin is of course more at risk, as often needs source code patches (not just extra inputs or whatever) that are specific not even simply to macOS but also to which trade-offs have to be made for Nixpkgs to work on macOS.

- Hydra builders a greatly understaffed.
- If Apple follows through on [their announcement to discontinue](https://www.businessinsider.com/apple-macbook-pro-discontinued) x86-based devices, the situation won't improve due to a physical lack of hardware.
- "Most packages work, credible ambition to reach Tier 1 coverage at some point."
- Tier 1 support is far out of reach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a number we need to hit here? The RFC uses most, many, almost everything, and those adjectives match my personal experience: I haven't had any issue with getting the packages I needed on darwin, sometimes the package was marked as broken on darwin but the fix was very simple, sometimes it was marked as linux only but it worked without any changes on darwin.

If we have a specific % of packages that is needed to address this, we might be able to achieve it by going for those low hanging fruits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a number we need to hit here?

Again, we haven't. You can consider this a flaw in RFC 46. More generally though, Support Tier 1 is much more than simply the number of supported packages. Again, the phrasing of RFC 46 is misleading: these should have been two separate independent points.

sometimes it was marked as linux only but it worked without any changes on darwin

Then the package is a testimony of Darwin having been broken at one point in time in its history.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the package is a testimony of Darwin having been broken at one point in time in its history.

This is probably an overstatement, hard to be sure whether the expression author said «looks like it could be Linux-specific, it's not that I can cheaply check if it works on Darwin, whatever»

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, many nixpkgs developers don't care about Darwin to the point of not even thinking about explicitly disabling Darwin support. I consider the following far more likely: "CI fails for Darwin and I don't see an easy way to fix this, whatever".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as Nixpkgs is consistently bad at using tristate «yes/no/don't know» instead of boolean, I would expect just as many people say «so, where I know it works? Linux? let's put that». Knowing the nomenclature beyond Linux requires care already…

(It's usually about platforms, not broken, so you list positive claims there)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes the package was marked as broken on darwin but the fix was very simple, sometimes it was marked as linux only but it worked without any changes on darwin

I meant to specify both cases:

  1. isBroken was true on darwin, that definitely points to it having been broken on darwin at some point.
  2. platforms was set to linux, probably because whoever added the package didn't test it on any other platform.

I'm definitely guilty of the same thing, I set platforms to platforms.linux ++ platforms.darwin when I add new packages, even though it might support other platforms as well, I just don't know how and don't want to invest the effort in testing them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also many transitive broken on darwin packages. So packages where eval fails because they have a nixpkgs dependency on eg. systemd or other software that doesn't support darwin even though the package itself can compile just fine on darwin if that dependency was removed for darwin.


## Exemplary collection of issues with Darwin

The issues that are constantly encountered in relation to the Darwin platform have a genuine negative impact on the `nixpkgs` development experience. People are forced to deal with these problems even though they have no hardware to test on, lack the proper expertise, and are reluctant when it comes to investing additional time and energy to patch for the failing platform. Demoting Darwin to Tier 3 will allow more developers to refocus their attention on better supported platforms, care less about Darwin issues, and push that burden onto the Darwin maintainers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the major issue with darwin support in nixpkgs, and it's not captured in how the current support tiers in rfc 46 are formulated.

The support tiers are currently summarised as

  1. If the platform achieves the required tooling and package coverage
  2. then it can have the impact on nixpkgs

but in the case of darwin the impact on nixpkgs as a whole is so big, it sounds like we don't want to give it tier 2 impact, regardless of whether it actually achieves the current tier 2 tooling and package coverage.

I think I agree with that, but if that's the case, let's update the tiers so that it is clear what the requirements for each tier are. For example if we think not being able to spin up a VM for the platform on all major OSes is a problem, let's list that out as a requirement.

This might have been brought up in rfc 46 too because I see:

It is recommended to provide a derivation to test the software on this platform
(e.g. a Qemu-based derivation with all the necessary scripts).
As it is impossible to provide a legal testing setup for a Tier-2 platform
(macOS), this requirement is not strictly mandatory for Tier-3 tooling.

Please correct me if I'm wrong, I was not involved in rfc 46 so I might be missing some context there and misinterpreted it.

Copy link
Member

@7c6f434c 7c6f434c Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 46 got deeply rewritten in the process, so I did try to limit its scope, as well as the change it brought to Nixpkgs (and it says explicitly that movement between Tiers 1-2-3 is always a policy discussion not a purely technical one, that's also for a reason).

I support a public resolution of «if update breaks Darwin, follow #88 with the expectation that the fix will be prepared or not by people personally using Nix on macOS, but the timer starts from the mention of the relevant Darwin-specific team», modulo a low-effort way to get the «measure the impact» data for Darwin with zero knowledge or care about Darwin (well, if Darwin capacity at OfBorg becomes truly plentiful — beyond simply reasonably sufficient for current use — maybe poking it with a subset of rebuild report would become an OK option).

I believe that would mostly resolve the issue described in the specific paragraph (assuming the CI bottlenecks get resolved). It would also never ask me to think about Darwin as anything but a blackbox, which is good enough for me and hopefully for many others.

@domenkozar
Copy link
Member

domenkozar commented Nov 4, 2021

@jonringer is the problem total build capacity or critical path length?

for the 21.05 release, I think it was both. It would always seem like the last ~6k builds were haskellPackages.

Then again, I don't have any concrete supporting evidence.

My main concern is still when there is a failing build, there's very few people to call upon. We did have one individual (name is escaping me) that helped with a few hundred mac builds during the 21.05 ZHF, but we really need that level of activity to be consistent across multiple contributors for me to feel good about darwin being a blocking platform.

Could you clarify if the lack of people or lack of computing power was the problem for 21.05? The above reads to me as both.

Reading the post mortem on 21.05 I understood it was a lack of computing power, so a few weeks after I got us 5 macOS stadium machines to help mitigate that (that yet have to be added to our infra pool).

@jonringer
Copy link
Contributor

Could you clarify if the lack of people or lack of computing power was the problem for 21.05?

For the release, lack of computing power. The x86_64-darwin build queue would be blocking the ability to move forward with more staging-next iterations. I even made a comment on a staging-next PR about darwin compute, @vcunat also was concerned. Another iteration comment

For nixpkgs in general, there's still a lack of people stabilizing the darwin job sets in general. @stephank really stepped up in the 21.05 release and tackled a lot of failing darwin builds. My issue is that darwin gets much less love on average. @r-burns , @risicle , @marsam , and a few others do some great work, but I would really like for there to be both an increase in contributors and increase in activity.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/a-call-for-darwin-maintainers/15985/1

@domenkozar
Copy link
Member

domenkozar commented Nov 9, 2021

Reading through the whole RFC, there are a few issues to be addressed in order to fix the motivation section of this RFC.

a) It's not clear what the RFC 46 mean in practice.

There are really two issues here.

The first one is that the limits aren't quantified, for example Tier 2 says:

should be reported to the platform maintainers with a reasonable amount of time provided for fixing the issue.

We need to specify what is a reasonable time and what happens in case that time is passed.

The whole RFC lacks quantification, for example what means "full ofborg support".

If we don't specify those, the only alternative is to vote or express who has stronger feelings, which I'd say is unneeded politics.

b) Expand the darwin maintainer team

I've made a call for darwin maintainers: https://discourse.nixos.org/t/a-call-for-darwin-maintainers/15985

c) Darwin builders

Follow NixOS/infra#179 for status updates

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-macos-monthly/12330/12

@domenkozar
Copy link
Member

domenkozar commented Nov 16, 2021

A week later:

b) Expand the darwin maintainer team

I've made a call for darwin maintainers: https://discourse.nixos.org/t/a-call-for-darwin-maintainers/15985

We went from 7 members to 32.

c) Darwin builders

Follow NixOS/nixos-org-configurations#179 for status updates

@grahamc & @cole-h added 5 macOS machines to ofborg, which now also supports aarch64-darwin!

I don't have the time to propose changes to RFC 46, if someone wants to do that it would be great!

@spacekookie
Copy link
Member

This RFC is open for nominations btw! Feel free to nominate anyone you think would be a good shepherd for this RFC (including yourself)

@piegamesde
Copy link
Member Author

@spacekookie It seems like you missed the discussion thread for the nominations: #112 (comment) it's linked in the original post, but I'll edit it to make it more visible.

@spacekookie
Copy link
Member

@spacekookie It seems like you missed the discussion thread for the nominations: #112 (comment) it's linked in the original post, but I'll edit it to make it more visible.

I did see that thread, but so far no-one has accepted their nomination (unless I missed other comments?) The comment I posted here was to remind people, because so far there have not been a lot of accepted nominations, and we do close RFCs if they don't get enough interest/ shepherd nominations.

I am however confused why I thought that you had been nominated as a shepherd, when you are clearly the author :) I'll go edit NixOS/rfc-steering-committee#77 to reflect that

@Mic92
Copy link
Member

Mic92 commented Dec 1, 2021

@domenkozar would you like to be shepherd in this RFC?

@domenkozar
Copy link
Member

Thanks, but I'd rather spend my energy improving macOS support.

@piegamesde
Copy link
Member Author

I think it is safe to say that this RFC is rather dead now, since not enough people are willing to shepherd this. There is no point in keeping it open any longer. I think the RFC still has already made some impact for the better on the situation and I'd like to thank you all for the discussion.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/darwin-again/29331/1

philiptaron added a commit to philiptaron/nixpkgs that referenced this pull request Mar 11, 2024
…x to lib/systems/doubles.nix

This allows flakes which depend on `nixpkgs` to reference this list from
`nixpkgs.lib.systems.doubles.builtOnNixosHydra`, which is a long name but very true.
These are the doubles that are built on [Hydra](https://hydra.nixos.org/)!

See also:
- https://github.com/NixOS/rfcs/blob/master/rfcs/0046-platform-support-tiers.md
- NixOS/rfcs#112 (closed)
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/aux-computer-an-alternative-to-the-nix-ecosystem/44420/47

philiptaron added a commit to philiptaron/nixpkgs that referenced this pull request Jul 26, 2024
…x to lib/systems/doubles.nix

This allows flakes which depend on `nixpkgs` to reference this list from
`nixpkgs.lib.systems.doubles.builtOnNixosHydra`, which is a long name but very true.
These are the doubles that are built on [Hydra](https://hydra.nixos.org/)!

See also:
- https://github.com/NixOS/rfcs/blob/master/rfcs/0046-platform-support-tiers.md
- NixOS/rfcs#112 (closed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: open for nominations Open for shepherding team nominations
Projects
None yet
Development

Successfully merging this pull request may close these issues.