Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change to pre-generated bindings, add feature for generating at compile-time #21

Merged
merged 30 commits into from
Feb 14, 2024

Conversation

pheki
Copy link
Member

@pheki pheki commented Oct 29, 2023

Changed to use pre-generated bindings by default, with the option of generating bindings at compile-time using the headers at $VITASDK by enabling the bindgen feature.

I tried to keep approximately the same style in build-util. I did not add error handling. To re-generate the bindings you can just run:

$ cargo run -p vitasdk-sys-build-util -- bindgen

Also, one curious side effect, the cargo package (compressed) is now less than half the size!

# Before
Packaged 417 files, 1.5MiB (328.2KiB compressed)
# After
Packaged 17 files, 1.1MiB (123.3KiB compressed)

Ordering

Part of the bindings were being generated in an essentially random order, due "random" HashMap iteration order, affecting mostly things related to the stubs/features. Changing those to BTreeMaps made it predictable (possibly deterministic).

One problem that remained is that when updating the vita-headers submodule, many of the bindings would move due to bindings being generated in inclusion order. For that I created a function to sort the bindings based on the identifier, and related header are still usually close due to naming conventions. Tried to keep the sorting code simple, so I depended on some things already being generated in the correct order and the stability of sort_by_cached_index. I'm fine with changing that.

To Do (Done!)

  • Update update-bindings.yml to use pre-generated bindings. Will probably be very similar as it was on 0.2: old update-bindings.yml
  • Change build.yml builds/checks to not install LLVM/Clang anymore
  • Create a new CI check to check if bindings have been regenerated (just regenerate and check if diff is empty)
  • Update CHANGELOG.md
  • Update docs on both the root and the example to specify that bindgen is only required if the bindgen feature is renamed

These are not necessarily blocking release:

  • Automate updating Cargo.toml's feature list when running cargo run -p vitasdk-sys-build-util -- bindgen
  • Create a new job to check building with the bindgen feature and LLVM installed (maybe we could do this on another PR, when creating actual tests)

@zetanumbers
Copy link
Contributor

Why?

@pheki
Copy link
Member Author

pheki commented Nov 17, 2023

  • Faster build times.
  • Makes it possible to actually review the impact of PRs by having diffs on generated bindings, including the impact of updating dependencies.
  • No need for clang to be installed on the host, which is currently imposed on all transitive dependents
  • Because of the points above, we'll be able to have simpler, faster and more consistent CI.
  • We won't have to impose these limitations to the CI of all transitive dependents, like winit and gilrs.

@pheki pheki marked this pull request as ready for review November 17, 2023 15:12
@zetanumbers
Copy link
Contributor

zetanumbers commented Nov 27, 2023

I will check on my weekend. I want to ask if anyone has 30k bindgen file like this in their projects? Still scared to make this our precedent. But if this is the only way, then fine. Do you think we should consider getting rid of redundant code like combining extern "C" blocks, do not use full paths for crate::ctypes::c_type, doc_auto_cfg (people have some issues with it, maybe not)?

@pheki
Copy link
Member Author

pheki commented Nov 30, 2023

I will check on my weekend.

Thanks!

I want to ask if anyone has 30k bindgen file like this in their projects?

I decided to research a bit on some random projects I thought of. I tried to find big commited plaintext files, but I also searched for big bindgen files as that's what you asked, altough I don't know if that matters all.

As a reference, src/bindings.rs has 1.17 MB.

The commands I used are:

To get the biggest files:

git ls-tree -r -t -l --full-name HEAD | sort -n -k 4 | tail -n 10

To get file count:

git ls-files | wc -l

Repositories:

1. libsqlite3-sys / rusqlite

https://github.com/rusqlite/rusqlite

repo file count: 94

commit count: 2259

.git size: 11.4 MB

Biggest file: 8.9 MB libsqlite3-sys/sqlite3/sqlite3.c, with bonus of a 8.75 MB libsqlite3-sys/sqlcipher/sqlite3.c

Biggest auto-generated by bindgen: 339 kB libsqlite3-sys/sqlite3/bindgen_bundled_version_ext.rs

Total size of the 5 files auto-generated by bindgen: 968 kB

2. linux-raw-sys

https://github.com/sunfishcode/linux-raw-sys

commit count: 246

.git size: 3.5 MB

Biggest file: 645 kB gen/modules/ioctl.h

20 bindgen-generated 100+ kB files src/x86_64/netlink.rs, for many architectures

3. Zulip

https://github.com/zulip/zulip

repo file count: 6593

commit count: 55023

.git size: 453.2 MB

Biggest plaintext file: 977 kB zerver/openapi/zulip.yaml

4. Synapse

https://github.com/matrix-org/synapse

repo file count: 1635

commit count: 23391

.git size: 431 MB

Biggest plaintext file: 375 kB [contrib/grafana/synapse.json]

14 100+ kB files

Conclusion

I don't think there's a very big risk, depending on what you consider a big .git. Rusqlite is in a similar situation while also including some VERY big C files and the .git is at 11.4 MB. The bindgen file should be highly compressable and git is able to store different revisions as deltas. My gut feeling is that we're better off than vita-headers, which has about the same amount of code (1.1 MB) but scattered across multiple files, with lots of definitions moving from one file to another (which means more deltas and I don't know how efficient are git packfiles for that).

Do you think we should consider getting rid of redundant code like combining extern "C" blocks

Mostly yes, if we moved the cfgs to outside of the extern block, removed all doc(cfg()) and merged all extern blocks with the same features I'm guessing we could reduce the raw file size by about 10~30%. A simple implementation is just sorting all of the extern blocks by what's in the CFG, then merging neighbors.

do not use full paths for crate::ctypes::c_type

We can do that, but I'm not certain it would be that valuable.

doc_auto_cfg (people have some issues with it, maybe not)

The problems I found were in related to multiple different implementations of the same function, each using their own cfg. We don't have anything similar AFAIR, so it may not apply to us, unless there are other problems. I think it's reasonable to give it a try.

@zetanumbers
Copy link
Contributor

That was definitely thorough research. Wow. Then it's ok.

@zetanumbers
Copy link
Contributor

Hm, i think we can just split up the file into multiple parts?

Copy link
Contributor

@zetanumbers zetanumbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this PR, but couple of issues should be resolved.

build.rs Outdated Show resolved Hide resolved
build.rs Show resolved Hide resolved
build-util/src/bindgen.rs Outdated Show resolved Hide resolved
build-util/src/bindgen.rs Outdated Show resolved Hide resolved
build-util/src/bindgen.rs Outdated Show resolved Hide resolved
build-util/src/bindgen.rs Outdated Show resolved Hide resolved
@zetanumbers
Copy link
Contributor

Sorry for a big delay. I kinda forgot about reviewing cause i didn't leave the note for myself. 😅

I promise I'll try help anytime I have from now on.

@pheki
Copy link
Member Author

pheki commented Dec 22, 2023

No problem! Most changes are good to be reviewed again (with pending discussions about the first 2)

@pheki
Copy link
Member Author

pheki commented Dec 22, 2023

Do you think we should consider getting rid of redundant code like combining extern "C" blocks

I gave it a try on another branch: 41c6f6b

The file size reduced from 1,170,678 to 880,974 bytes and the code didn't get more complex IMO.

Copy link
Contributor

@zetanumbers zetanumbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Added some functionality, feel free to modify if you don't like something.

@pheki
Copy link
Member Author

pheki commented Jan 2, 2024

All changes LGTM.

One question though, did you take a look at commit 41c6f6b on branch group-items-by-feature and do you have an opinion whether we should use that?

@zetanumbers
Copy link
Contributor

One question though, did you take a look at commit 41c6f6b on branch group-items-by-feature and do you have an opinion whether we should use that?

Probably should, since we already preprocess them via syn. I liked new Sort visitor (rustc calls that a pass), while Link became kinda hard to read and less uniform IMO. I wonder if I was to introduce you to mutating locals from inside of an iterator.

These things look like they have a simpler approach. Maybe we should just render items as text, sort and then reparse them back? Well, current sort visitor is fine tho.

@zetanumbers
Copy link
Contributor

zetanumbers commented Jan 2, 2024

Also code doesn't use empty extern blocks with link argument to merge collect items within it. And i don't like that you made Link visitor as not implementing VisitMut trait but adds naked visit method, speaking of uniformity.

@pheki
Copy link
Member Author

pheki commented Jan 7, 2024

I wonder if I was to introduce you to mutating locals from inside of an iterator.

I'm not sure what you mean by this, so maybe? I am mutating items_by_features from inside of the iterator, and I usually avoid that and mutable state in general (idk if that's your point), but in this case I think it made the code simpler. Just change it to your preferred style if you want to.

And i don't like that you made Link visitor as not implementing VisitMut trait but adds naked visit method, speaking of uniformity.

Kinda agree with you here, the reason I made it that way is because Link doesn't really need to visit recursively while it seems like VisitMut is for recursive traversal. If you still think it would be better that way, go ahead.

I made a change to sort foreign mods by features instead of the first item, to make it more consistent. The code is complicated and can surely be improved (e.g. by using Attribute::parse_nested_meta and/or others from syn), but it seems to work fine.

When we merge this, we should probably squash and merge to avoid having all of the different bindings.rs versions on main's history.

PS: The CI issue seems related to rust-lang/rust#119026

@zetanumbers
Copy link
Contributor

When I used bindgen with LLVM 15, I got this:

Снимок экрана 2024-01-07 в 14 29 44

LLVM 17 does not introduce any changes tho. Not sure how this should be handled. Should we use bindgen/static? tho I have trouble with it enabled. Maybe filter it from build-util's bindgen? What do you think?

@zetanumbers
Copy link
Contributor

I'm not sure what you mean by this, so maybe? I am mutating items_by_features from inside of the iterator, and I usually avoid that and mutable state in general (idk if that's your point), but in this case I think it made the code simpler. Just change it to your preferred style if you want to.

Eh, it works so whatever. Maybe a task for someone that would find this code irritating enough.

@pheki
Copy link
Member Author

pheki commented Jan 10, 2024

Hmmmm unfortunately I don't have a more general solution, but we can blocklist_item("(?i)__gnu.*") (source) to avoid that definition and also this other unused one:

pub type __gnuc_va_list = u32;

@zetanumbers
Copy link
Contributor

Hmmmm unfortunately I don't have a more general solution, but we can blocklist_item("(?i)__gnu.*") (source) to avoid that definition and also this other unused one:

pub type __gnuc_va_list = u32;

We could put a check for llvm version only if it is not build.rs, filter out these items (transforming the code) so it would be just like you have said, or add a CI check that would run build-util generate --check with this new flag (--check flag just like in the rustfmt) and specify used LLVM version in the readme? I don't think we should ignore it. You decide.

@pheki
Copy link
Member Author

pheki commented Jan 10, 2024

or add a CI check that would run build-util generate --check with this new flag (--check flag just like in the rustfmt)

Oh, I'm already adding a check that re-generates the bindings and checks if there's no diff:

# Checks if there's no diff when regenerating bindings
check-bindings:
name: Check bindings
runs-on: ubuntu-latest
needs: install-vitasdk
timeout-minutes: 10
steps:
- name: Set commit status as pending
uses: myrotvorets/set-commit-status-action@v1.1.7
with:
token: ${{ secrets.GITHUB_TOKEN }}
status: pending
context: Check bindings
sha: ${{ github.sha }}
- uses: actions/checkout@v3
with:
submodules: true
- name: Restore vitasdk cache
uses: actions/cache/restore@v3
with:
path: /opt/vitasdk
key: ${{ runner.os }}-vitasdk
fail-on-cache-miss: true
- name: Cache LLVM and Clang
id: cache-llvm
uses: actions/cache@v3
with:
path: |
${{ runner.temp }}/llvm
key: llvm
- name: Install LLVM and Clang
uses: KyleMayes/install-llvm-action@v1
with:
version: ${{ env.LLVM_VERSION }}
directory: ${{ runner.temp }}/llvm
cached: ${{ steps.cache-llvm.outputs.cache-hit }}
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
profile: minimal
override: true
- name: Regenerate bindings
env:
VITASDK: /opt/vitasdk
# From clang-sys
LIBCLANG_PATH: ${{ runner.temp }}/llvm/lib
LLVM_CONFIG_PATH: ${{ runner.temp }}/llvm/bin/llvm-config
run: |
cargo run --profile build-util -p vitasdk-sys-build-util -- bindgen
- name: Check diff
run: |
git add . && git diff --quiet && git diff --cached --quiet
- name: Set final commit status
uses: myrotvorets/set-commit-status-action@v1.1.7
if: always()
with:
token: ${{ secrets.GITHUB_TOKEN }}
status: ${{ job.status }}
context: Check bindings
sha: ${{ github.sha }}

You can see it failing for this commit: cee2090

I like the --check idea, we can do it in the future

specify used LLVM version in the readme

Yeah, I'll add a note there

@zetanumbers
Copy link
Contributor

Well, it seems someone broke std or libc for armv7-sony-vita-newlibeabihf target.

@pheki
Copy link
Member Author

pheki commented Jan 12, 2024

Well, it seems someone broke std or libc for armv7-sony-vita-newlibeabihf target.

Yep, see rust-lang/libc#3538

@pheki pheki merged commit e357f6e into main Feb 14, 2024
7 checks passed
@pheki pheki deleted the pregenerated-bindings branch February 14, 2024 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants