Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it? #9561

Open
alamb opened this issue Mar 11, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@alamb
Copy link
Contributor

alamb commented Mar 11, 2024

This ticket tracks adding a profile guided optimization to the documentation section and link to #9507

Many thanks to @@zamazan4ik for this wonderful content

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it?

Yes, it would be a great option. It requires almost no resources to maintain (write once and link to this discussion for the results). In this case, users who are interested in optimizing arrow-datafusion more will be able to use this information as an additional optimization opportunity. I have several examples of how such documentation can be written (it's for applications but anyway - for a library case it should look a similar way):

Provide pre-gathered PGO data somehow, so users could build DataFusion with profiles guided from TPCH (or clickbench).

Unfortunately, this way is a bit trickier in practice. Pre-gathered PGO profiles have multiple issues - e.g. incompatibilities between different compiler versions, a profile skew (when a PGO profile is gathered for an older version of the code. When time flies, pre-gathered PGO profiles become less and less efficient so some kind of regular PGO profile regeneration is required).

I could suggest another similar way - integrate into the build scripts the way to build the library with enabled PGO (based on some workload like TPCH, Clickbench, any other target workload, or any combination of them - it's up to discussion). On the one hand, users will be able to build the PGO-optimized version of the library. On another hand, you won't waste your maintenance resources on maintaining always up-to-date pre-gathered PGO profiles (however, this process can be simplified with CI).

Some examples of PGO build integration into the build scripts:

If you have some prebuilt versions of the library (e.g. a Python wheel), you can think about pre-optimizing these prebuilt binaries with PGO (based on TPCH, Clickbench, etc.). As an example - Pydantic-core: GitHub PR.

Originally posted by @zamazan4ik in #9507 (reply in thread)

@alamb alamb added the documentation Improvements or additions to documentation label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant