Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template/expval #489

Merged
merged 56 commits into from
Sep 7, 2023
Merged

Template/expval #489

merged 56 commits into from
Sep 7, 2023

Conversation

vincentmr
Copy link
Contributor

@vincentmr vincentmr commented Aug 28, 2023

Before submitting

Please complete the following checklist when submitting a PR:

  • All new features must include a unit test.
    If you've fixed a bug or added code that should be tested, add a test to the
    tests directory!

  • All new functions and code must be clearly commented and documented.
    If you do make documentation changes, make sure that the docs build and
    render correctly by running make docs.

  • Ensure that the test suite passes, by running make test.

  • Add a new entry to the .github/CHANGELOG.md file, summarizing the
    change, and including a link back to the PR.

  • Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


Context:
This PR is a follow-up on #481. In the last PR, it appeared that reducing expval on the fly is generally faster than using inner products. Another factor is the computation of the observable-statevector product and the parallelization scheme used to do it. The general scheme uses three layers of parallelism with team policies. This introduces several parameters which should be tuned for optimal performance, but are currently left to Kokkos' heuristics to decide. On the other hand, the straightforward range policy-based scheme of the 1- and 2-qubit kernels outperforms the general scheme significantly.

Since this discrepancy does not appear explainable by the flop intensity increase between 2- and 3+-qubit kernels, I introduce specialized 3- to 5-qubit kernels. I draw the following conclusions:

  • On-the-fly expval kernels are generally faster.
  • Range-policy kernels are faster than the team-policy one up to 4-qubits on the OPENMP and HIP backends and up to 5-qubits on CUDA.

The following figures show timings to get the expectation value of a Hermitian observable for OPENMP, CUDA and HIP respectively.

benchmarks_CPU
benchmarks_GPU
benchmarks_HIP

Description of the Change:
Introduce specialized 3- to 5-qubit kernels. Refactor getExpValMatrix wrapper in MeasurementsKokkos.hpp. Add few tests.

Benefits:
Faster expval on all platforms, especially for 3+-qubit observables.

Possible Drawbacks:
None

Related GitHub Issues:
#481

vincentmr and others added 30 commits August 21, 2023 10:52
…ata` to work with devices.

M  pennylane_lightning/core/src/simulators/lightning_kokkos/StateVectorKokkos.hpp; `applyMatrix` bugfix: use intermediate hostview to copy matrix data; same bugfix for `getDataVector`.
M  pennylane_lightning/core/src/simulators/lightning_kokkos/algorithms/AdjointJacobianKokkos.hpp; use copy constructor.
M  pennylane_lightning/core/src/simulators/lightning_kokkos/measurements/MeasurementsKokkos.hpp; use copy constructor.
M  pennylane_lightning/core/src/simulators/lightning_kokkos/observables/ObservablesKokkos.hpp; use copy constructor.
M  requirements-dev.txt; add clang-format-14.
…calls into two templated methods. Call specialized expval methods when possible. Remove obsolete 'Apply directly' tests.
@codecov
Copy link

codecov bot commented Aug 30, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +6.04% 🎉

Comparison is base (869bbb8) 93.04% compared to head (5123082) 99.09%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #489      +/-   ##
==========================================
+ Coverage   93.04%   99.09%   +6.04%     
==========================================
  Files         142      142              
  Lines       16278    16693     +415     
==========================================
+ Hits        15146    16542    +1396     
+ Misses       1132      151     -981     
Files Changed Coverage Δ
...tning_qubit/gates/tests/Test_OpToMemberFuncPtr.cpp 18.46% <ø> (ø)
pennylane_lightning/core/_version.py 100.00% <100.00%> (ø)
.../simulators/lightning_kokkos/StateVectorKokkos.hpp 99.76% <100.00%> (+5.99%) ⬆️
...s/gates/tests/Test_StateVectorKokkos_Generator.cpp 100.00% <100.00%> (ø)
...os/gates/tests/Test_StateVectorKokkos_NonParam.cpp 100.00% <100.00%> (ø)
...okkos/gates/tests/Test_StateVectorKokkos_Param.cpp 100.00% <100.00%> (ø)
...s/lightning_kokkos/measurements/ExpValFunctors.hpp 100.00% <100.00%> (+43.06%) ⬆️
...ghtning_kokkos/measurements/MeasurementsKokkos.hpp 98.26% <100.00%> (+3.79%) ⬆️
...asurements/tests/Test_StateVectorKokkos_Expval.cpp 100.00% <100.00%> (ø)
...surements/tests/Test_StateVectorKokkos_Measure.cpp 100.00% <100.00%> (ø)
... and 2 more

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vincentmr vincentmr mentioned this pull request Aug 31, 2023
5 tasks
@vincentmr vincentmr marked this pull request as ready for review August 31, 2023 17:06
Copy link
Contributor

@AmintorDusko AmintorDusko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left only a few comments for now.
I see that we still have some work to do in terms of coverage.

@vincentmr
Copy link
Contributor Author

I left only a few comments for now. I see that we still have some work to do in terms of coverage.

I would like to merge #485 first to assess the coverage situation.

@AmintorDusko
Copy link
Contributor

I left only a few comments for now. I see that we still have some work to do in terms of coverage.

I would like to merge #485 first to assess the coverage situation.

Absolutely, I think it is only sensible to do so.

Copy link
Contributor

@AmintorDusko AmintorDusko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing more to add. Thank you for that!

Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing more to add --- thanks a bunch @vincentmr
I'm happy with the macro-approach for now, but we can revisit later to see if it can become some compile-time generated parameter-packed solution.

@vincentmr vincentmr merged commit e96a53f into master Sep 7, 2023
61 checks passed
@vincentmr vincentmr deleted the template/expval branch September 7, 2023 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants