register NNlibCUDA as a subpackage of NNlib #300

CarloLucibello · 2021-03-16T16:25:20Z

No description provided.

CarloLucibello · 2021-03-16T16:25:53Z

@JuliaRegistrator register() subdir=lib/NNlibCUDA

JuliaRegistrator · 2021-03-16T16:26:04Z

Registration pull request created: JuliaRegistries/General/32119

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a NNlibCUDA-v0.1.0 -m "<description of version>" ca82fb23928c7ee7d08afb722718cf93be13f81c
git push origin NNlibCUDA-v0.1.0

DilumAluthge · 2021-03-16T17:27:06Z

@JuliaRegistrator register() subdir=lib/NNlibCUDA

JuliaRegistrator · 2021-03-16T17:27:13Z

Registration pull request updated: JuliaRegistries/General/32119

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a NNlibCUDA-v0.1.0 -m "<description of version>" fb5377811b3bd37d87a504d5506cc676826b9648
git push origin NNlibCUDA-v0.1.0

CarloLucibello · 2021-03-16T17:34:25Z

@DilumAluthge there is also failing import NNlibCUDA with CUDA 2.6, while CUDA 3.0.0 not being released yet. I just wanted to start the registration process to make testing against CUDA#master easier, but if that's a problem we can just wait for CUDA 3.0 release

DilumAluthge · 2021-03-16T17:36:52Z

Yeah no worries at all! We can merge it manually in 3 days.

DhairyaLGandhi · 2021-03-16T17:45:31Z

It might make sense to register https://github.com/FluxML/NNlibCUDA.jl instead. We wouldn't have to deal with registering it oddly and it keeps the projects separate in their own little silos. Currently NNlibCUDA doesn't get used by NNlib internally and developing it means getting unintended code alongside.

CarloLucibello · 2021-03-16T18:09:09Z

We wouldn't have to deal with registering it oddly

don't see any oddity here, this subpackage thing is actually quite cool and already well supported by julia's toolings

it keeps the projects separate in their own little silos

That's exactly what we don't want since the two packages are tightly coupled. When we change NNlib we want to make sure everything is fine with NNlibCUDA by running its tests, and sometimes we have to make changes that involve both packages (adding some function requiring also a cuda kernel). Having NNlibCUDA as a subpackage is much safer since it allows combined CI very easily (I already put that in place), makes contributing easier, and relieves a lot of the maintenance burden.

Currently NNlibCUDA doesn't get used by NNlib internally and developing it means getting unintended code alongside.

I don't see what's the problem in also seeing the folder lib/NNlibCUDA when developing, if anyone gets confused by that he probably shouldn't be developing NNlib at all. Actually, as I said above, the possibility to make changes to both packages in a single PR is great.

DhairyaLGandhi · 2021-03-16T18:17:30Z

Not really, since most of the work in NNlib proper is not focused on CuDNN at all, in fact, the code in the two achieve very different goals. You typically either work on the kernels or the plumbing (NNlibCUDA is a bridge to CUDNN), rarely both. Most new kernels shouldn't be written twice anyway. Waiting around for GPU ci when not changing any GPU code is also a thing here, since most nnlib code isn't meant for GPUs.

if anyone gets confused by that he probably shouldn't be developing NNlib at all.

I don't think that's the message we want to send to contributors...

ToucheSir · 2021-03-16T18:58:54Z

I'm a little late to this, but can someone explain why NNlibCUDA exists as a subdir and a separate repo and why there wasn't a consensus made about which one it should be before now? Getting a whiff of some communications breakdown or personal enmity being involved and I don't like it.

DhairyaLGandhi · 2021-03-16T19:10:38Z

I needed something to get testing with CUDA#master, Julia 1.6 and share that work easily and quickly. See also https://github.com/FluxML/Flux.jl/tree/dg/cuda16 which gets flux+ cuda + 1.6 usable.

Either way is fine in the long run as long as we get our stack functional :)

DhairyaLGandhi · 2021-03-16T19:14:04Z

I like to be able to do ] add NNlibCUDA#branch can we do that with subdirs as well?

CarloLucibello · 2021-03-17T03:17:21Z

@ToucheSir Dhairya and I disagree on a lot of stuff, which is of course unfortunate for two people working on the same project. There is nothing about "personal" and "enmity" involved here though.

can someone explain why NNlibCUDA exists as a subdir and a separate repo and why there wasn't a consensus made about which one it should be before now?

The first comment pointing about the possibility of using multi-package repos for this stuff was
#224 (comment)
and then a follow-up at the end of that thread.
Then there was discussion in CUDA.jl's PRs/issues and zulip, then there was #286 and here we are.
So discussion was scattered but there were some entry points to it,
no one at any point before now expressed negative feedback against multi-package repo,
so I volunteered a few hours of my time and did this very boring work that needed to be done
in the best way I could think of. I did make some decisions and I could have communicated more or better along the way,
but I think the process was perfectly reasonable by community-driven project standards.

As someone doing a lot of maintenance here, I know very well that having a separate NNlibCUDA.jl would be an additional maintenance hassle (a finite and non-small fraction of which weighing on me), not a big deal but something I'd rather avoid if possible.

I see a few pros in having the multi-repo in one package approach.
The only potential con I see is

I like to be able to do ] add NNlibCUDA#branch can we do that with subdirs as well?

but probably that actually works.

Finally, as Dhairyia said, we can split out a repo at any point, nothing here is irreversible, but the fact that someone already put the things in a functional state and with a nicely integrated CI (so better situation than before) is something that should be exploited.

Most new kernels shouldn't be written twice anyway. Waiting around for GPU ci when not changing any GPU code is also a thing here, since most nnlib code isn't meant for GPUs.

We have a relevant fraction of code (e.g. activation functions) that works on both gpu and cpu and should be tested on both

GunnarFarneback · 2021-04-01T20:07:12Z

I like to be able to do ] add NNlibCUDA#branch can we do that with subdirs as well?

Yes.

DhairyaLGandhi · 2021-04-01T20:09:54Z

I think setting up ci isn't something new. We also shouldn't have nnlib depending on so many heavy deps like cuda. That makes it very heavy for the rest of the ecosystem to use.

GunnarFarneback · 2021-04-01T20:20:37Z

For what it's worth, having a package in a subdir doesn't imply a dependency. Personally I wouldn't place a subdir package inside another package (rather place them as siblings in the repo) since it means that the subdir package code will be distributed with all copies of the main package, but the dependencies of the subdir package still don't affect the main package at all. Well, unless the subdir package is a dependency of the main package, but then you would get exactly the same effect in a separate repository.

DhairyaLGandhi · 2021-04-01T20:23:22Z

I don't think the cuda version is a dep, so it's good that those are separate

CarloLucibello · 2021-04-02T20:25:40Z

I think setting up ci isn't something new. We also shouldn't have nnlib depending on so many heavy deps like cuda. That makes it very heavy for the rest of the ecosystem to use.

As @GunnarFarneback said, NNlib's doesn't depend on anything CUDA-related, NNlibCUDA is its own package (depending on NNlib).

Here we just have the same scaffolding of KernelAbstractions with CUDAKernels and ROCKernels https://github.com/JuliaGPU/KernelAbstractions.jl/tree/master/lib
which seems nicely scalable, since we may soon have NNlibCUDA and NNlibAMDGPU.

CarloLucibello · 2021-04-10T12:31:12Z

So I´d like to give this package-in-a-subdir a try, there is no problem in pulling out at any moment anyway. Maybe the people from KernelAbstractions @vchuravy @jpsamaroo can tell us whether their experience with this has been positive or not?

DilumAluthge · 2021-04-10T12:42:16Z

there is no problem in pulling out at any moment anyway

As I've mentioned elsewhere, moving the location of a subdirectory package is nontrivial. You have to make sure that the new location of the package includes all of the Git tree SHAs that have already been registered in the General registry.

So I would recommend that you first make a definitive decision on which of the two options you want:

Subdirectory package
Package in its own repo

And then register based on that decision, with the knowledge that switching from option 1 to option 2 requires a nontrivial amount of work.

CarloLucibello · 2021-04-10T12:50:53Z

@DilumAluthge there is something I don´t understand, from the answer to this question, the relocation process seemed quite easy...

DilumAluthge · 2021-04-10T13:09:51Z

Idk, I've never done it myself, but here's one data point: SnoopCompileBot was originally a subdirectory package. At one point, the maintainer wanted to move it to its own repo. The maintainer found the process difficult enough that instead of moving it to its own repo, they just made a new package called CompileBot.

You can probably find that conversation by searching for SnoopCompileBot and CompileBot in PRs to the General registry.

Again, I've never done this personally, but if it was painful enough that one package author gave up and just made a new package, I find that concerning.

I would recommend at least looking over the historical discussion for SnoopCompileBot/CompileBot.

CarloLucibello · 2021-04-10T13:24:36Z

ok, that seems something to consider. Let´s wait for some feedback from @vchuravy and @jpsamaroo

jpsamaroo · 2021-04-10T17:20:09Z

I personally don't think we've had enough time to evaluate the merits of the subpackage approach, since KA 0.6 was only released 11 days ago. I'm personally in favor of the separate package approach, which you can always "promote" to a subpackage later without much difficulty (there are tools to combine git histories of different repos too).

CarloLucibello · 2021-04-12T09:47:17Z

it seems I'm the only one hyped by this subdir approach, let's move everything to the separate repo https://github.com/FluxML/NNlibCUDA.jl then. I'll make a PR removing NNlibCUDA from here soon

maxfreu · 2021-04-21T09:04:43Z

Oh! Unfortunately I didn't see this discussion in time. Now it's probably too late (hopefully not?), but I am very much in favor of keeping things as a single package with sub-packages. I maybe don't overlook the whole picture, but here are two reasons / examples:

I just ported a C cuda kernel to julia into the NNlibCUDA sub-package. Before this was 1 fork, 1 branch, 1 PR. Now it's two each, which is annoying.
This gets even worse further down the road: Consider we have NNlib, NNlibCUDA, NNlibAMD & NNlibKA. Let's suppose I have written two special GPU kernels for CUDA & AMD and now want to merge them into a single one using kernel abstractions. Maybe I even wrote a KA kernel for CPU, so I can remove the core CPU code from NNlib. So I'll have to make 4 forks, 4 PRs etc and each of the packages has its own version. I fear this will be versioning and maintenance hell. You'd have to merge everything more or less at once and tag new versions & compat entries or the packages become incompatible.

Did I get sth completely wrong?

At the danger of riding a dead horse here, but maybe @timholy can share his experiences first hand? Maybe his reasons for moving it out don't apply here.

DhairyaLGandhi · 2021-04-21T09:50:06Z

As the maintainer of this package and others, I understand these concerns, deal with the consequences and think this was the right move. You see:

setting up the development environment is nontrivial esp over subpackages with different or conflicting needs (thus discouraging contributions) (see Add scatter for CUDA support #296)
The disruptions to NNlib itself over something trivial (New package: NNlibCUDA v0.1.0 JuliaRegistries/General#32119 (comment))
That recent NNlib releases contained GPU dependencies (thus saying the setup is prone to errors)(5c2690b#commitcomment-49262320)
its fair to say that this kind of change requires some very strong gains of whether it is worth sacrificing the stability of over 250 packages in the ecosystem.

Remeber too that a design that expects an NNlibX necessitates some amount of code/ logic repetition. It's also fair to say that NNlib devs shouldn't be expected to maintain every single spin-off, thus also making the monorepo difficult to scale and QA.

The GPU backend really does achieve different end goals so it makes sense to be separate. In fact, when the time comes, it would be better to remove the CUDA moniker and have it depend on GPUCompiler only, thus obviating the need for an AMD and KA flavour.

The versioning and following proper semver can be handled like we do with so many dependencies in the ecosystem. Yes it would mean working over multiple repos, but the versioning standard isn't any different for sub packages, so it's the same regardless of where the code lives.

Recently, a faulty NNlib was also the reason behind the breakage of the tutorials that involve plotting, which also includes all the SciML ones, so I would want to ensure some amount of stability, which does require us to separate the concerns of different sections of code.

maxfreu · 2021-04-21T10:21:54Z

Ok, thank you for the write-up. I'm still not convinced, but let's see where it takes us :)

CarloLucibello · 2021-07-11T10:42:56Z

reopening this after a few months of working with NNlibCUDA as a separate package. In the work for gather/scatter we add to ping-pong between the two libraries multiple times, it has been very annoying, so I encourage people to reconsider the option of having NNlibCUDA as a subpackage. Also see #332.

Moreover, according to https://github.com/FluxML/Flux.jl/pull/1566/files we are going to have FluxCUDA, FluxAMDGPU etc.. as subpackages of Flux, so NNlib should reflect that.

DhairyaLGandhi · 2021-07-11T11:48:09Z

No decision has been made on that front. The issue seems to be that there isn't so much a need for the cuda package at all but that the code for gather should have been vetted to work with the Flux counterpart before merging.

JuliaRegistrator mentioned this issue Mar 16, 2021

New package: NNlibCUDA v0.1.0 JuliaRegistries/General#32119

Merged

DilumAluthge mentioned this issue Mar 16, 2021

Add a LICENSE.md file for the NNlibCUDA.jl package #301

Merged

CarloLucibello closed this as completed Apr 12, 2021

DhairyaLGandhi mentioned this issue Apr 14, 2021

Move NNlibCUDA #314

Merged

CarloLucibello mentioned this issue Apr 21, 2021

introduce trilinear upsampling #315

Merged

CarloLucibello mentioned this issue Jul 11, 2021

gather shouldn't fallback to getindex #332

Merged

CarloLucibello changed the title ~~register NNlibCUDA~~ register NNlibCUDA as a subpackage of NNlib Jul 11, 2021

CarloLucibello reopened this Jul 11, 2021

CarloLucibello mentioned this issue Sep 7, 2021

simplify scatter mean #350

Merged

mcabbott mentioned this issue Mar 5, 2022

Simplify softmax, test second derivatives #393

Merged

maxfreu mentioned this issue Jun 2, 2022

partly revert changes to stay compatible w NNlibCUDA #414

Merged

ToucheSir mentioned this issue Jul 22, 2022

Move ctc_loss from Flux to NNlibCUDA FluxML/NNlibCUDA.jl#55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

register NNlibCUDA as a subpackage of NNlib #300

register NNlibCUDA as a subpackage of NNlib #300

CarloLucibello commented Mar 16, 2021

CarloLucibello commented Mar 16, 2021

JuliaRegistrator commented Mar 16, 2021

DilumAluthge commented Mar 16, 2021

JuliaRegistrator commented Mar 16, 2021

CarloLucibello commented Mar 16, 2021

DilumAluthge commented Mar 16, 2021

DhairyaLGandhi commented Mar 16, 2021

CarloLucibello commented Mar 16, 2021

DhairyaLGandhi commented Mar 16, 2021

ToucheSir commented Mar 16, 2021 •

edited

Loading

DhairyaLGandhi commented Mar 16, 2021 •

edited

Loading

DhairyaLGandhi commented Mar 16, 2021

CarloLucibello commented Mar 17, 2021

GunnarFarneback commented Apr 1, 2021

DhairyaLGandhi commented Apr 1, 2021

GunnarFarneback commented Apr 1, 2021

DhairyaLGandhi commented Apr 1, 2021

CarloLucibello commented Apr 2, 2021

CarloLucibello commented Apr 10, 2021

DilumAluthge commented Apr 10, 2021

CarloLucibello commented Apr 10, 2021

DilumAluthge commented Apr 10, 2021 •

edited

Loading

CarloLucibello commented Apr 10, 2021

jpsamaroo commented Apr 10, 2021

CarloLucibello commented Apr 12, 2021

maxfreu commented Apr 21, 2021

DhairyaLGandhi commented Apr 21, 2021 •

edited

Loading

maxfreu commented Apr 21, 2021

CarloLucibello commented Jul 11, 2021

DhairyaLGandhi commented Jul 11, 2021

register NNlibCUDA as a subpackage of NNlib #300

register NNlibCUDA as a subpackage of NNlib #300

Comments

CarloLucibello commented Mar 16, 2021

CarloLucibello commented Mar 16, 2021

JuliaRegistrator commented Mar 16, 2021

DilumAluthge commented Mar 16, 2021

JuliaRegistrator commented Mar 16, 2021

CarloLucibello commented Mar 16, 2021

DilumAluthge commented Mar 16, 2021

DhairyaLGandhi commented Mar 16, 2021

CarloLucibello commented Mar 16, 2021

DhairyaLGandhi commented Mar 16, 2021

ToucheSir commented Mar 16, 2021 • edited Loading

DhairyaLGandhi commented Mar 16, 2021 • edited Loading

DhairyaLGandhi commented Mar 16, 2021

CarloLucibello commented Mar 17, 2021

GunnarFarneback commented Apr 1, 2021

DhairyaLGandhi commented Apr 1, 2021

GunnarFarneback commented Apr 1, 2021

DhairyaLGandhi commented Apr 1, 2021

CarloLucibello commented Apr 2, 2021

CarloLucibello commented Apr 10, 2021

DilumAluthge commented Apr 10, 2021

CarloLucibello commented Apr 10, 2021

DilumAluthge commented Apr 10, 2021 • edited Loading

CarloLucibello commented Apr 10, 2021

jpsamaroo commented Apr 10, 2021

CarloLucibello commented Apr 12, 2021

maxfreu commented Apr 21, 2021

DhairyaLGandhi commented Apr 21, 2021 • edited Loading

maxfreu commented Apr 21, 2021

CarloLucibello commented Jul 11, 2021

DhairyaLGandhi commented Jul 11, 2021

ToucheSir commented Mar 16, 2021 •

edited

Loading

DhairyaLGandhi commented Mar 16, 2021 •

edited

Loading

DilumAluthge commented Apr 10, 2021 •

edited

Loading

DhairyaLGandhi commented Apr 21, 2021 •

edited

Loading