Skip to content

Commit

Permalink
Add sample(wires) support in LightningQubit (#809)
Browse files Browse the repository at this point in the history
### Before submitting

Please complete the following checklist when submitting a PR:

- [x] All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to
the
      [`tests`](../tests) directory!

- [x] All new functions and code must be clearly commented and
documented.
If you do make documentation changes, make sure that the docs build and
      render correctly by running `make docs`.

- [x] Ensure that the test suite passes, by running `make test`.

- [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing
the
      change, and including a link back to the PR.

- [x] Ensure that code is properly formatted by running `make format`. 

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


------------------------------------------------------------------------------------------------------------

**Context:**
`sample` call `generate_samples` which computes the full probabilities
and uses the alias method to generate samples for all wires. This is
wasteful whenever samples are required only for a subset of all wires.

**Description of the Change:**
Move alias method logic to the `discrete_random_variable` class.

**Benefits:**
Compute minimal probs and samples.
We benchmark the current changes against `master`, which already
benefits from some good speed-ups introduce in #795 and #800 .

We use ISAIC's AMD EPYC-Milan Processor using a single core/thread. The
times are obtained using at least 5 experiments and running for at least
250 milliseconds. We begin comparing `master`'s
`generate_samples(num_samples)` with ours `generate_samples({0},
num_samples)`. For 4-12 qubits, overheads dominate the calculation (the
absolute times range from 6 microseconds to 18 milliseconds, which is
not a lot. Already at 12 qubits however, a trend appears where our
implementation is significantly faster. This is to be expected for two
reason: `probs(wires)` itself is faster than `probs()` (for enough
qubits) and `sample(wires)` starts also requiring significantly less
work than `sample()`.


![speedup_vs_nthreads_1w](https://github.com/user-attachments/assets/472748e9-d812-489c-a00f-2b2b74c7e682)

Next we turn to comparing `master`'s `generate_samples(num_samples)`
with ours `generate_samples({0..num_qubits/2}, num_samples)`. The
situation there is similar, with speed-ups close to 1 for the smaller
qubit counts and (sometimes) beyond 20x for qubit counts above 20.


![speedup_vs_nthreads_hfw](https://github.com/user-attachments/assets/f39e3ccd-8051-4a57-a857-9cd13f547865)

Finally we compare `master`'s `generate_samples(num_samples)` with ours
`generate_samples({0..num_qubits-1}, num_samples)` (i.e. computing
samples on all wires). We expect similar performance since the main
difference comes from the caching mechanism in `master`'s discrete
random variable generator. The data suggests caching samples is
counter-productive compared with calculating the sample values on the
fly.


![speedup_vs_nthreads_fullw](https://github.com/user-attachments/assets/2c70ed21-2236-479e-be3d-6017b42fdc5e)

Turning OMP ON, using 16 threads and comparing `master`'s
`generate_samples(num_samples)` with ours `generate_samples({0},
num_samples)` we get good speed-ups above 12 qubits. Below that the
overhead of spawning threads isn't repaid, but absolute times remain
low.


![speedup_vs_omp16_1w](https://github.com/user-attachments/assets/e3e90a55-399f-4a5b-b90e-7059a0486228)

**Possible Drawbacks:**

**Related GitHub Issues:**
[sc-65127]

---------

Co-authored-by: ringo-but-quantum <github-ringo-but-quantum@xanadu.ai>
Co-authored-by: Ali Asadi <10773383+maliasadi@users.noreply.github.com>
  • Loading branch information
3 people committed Jul 24, 2024
1 parent 9a83ff4 commit de7beb7
Show file tree
Hide file tree
Showing 9 changed files with 180 additions and 148 deletions.
3 changes: 3 additions & 0 deletions .github/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@

### Improvements

* Add `generate_samples(wires)` support in LightningQubit.
[(#809)](https://github.com/PennyLaneAI/pennylane-lightning/pull/809)

* Optimize the OpenMP parallelization of Lightning-Qubit's `probs` for all number of targets.
[(#807)](https://github.com/PennyLaneAI/pennylane-lightning/pull/807)

Expand Down
2 changes: 1 addition & 1 deletion pennylane_lightning/core/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
Version number (major.minor.patch[-label])
"""

__version__ = "0.38.0-dev12"
__version__ = "0.38.0-dev13"
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,27 @@ void registerBackendSpecificMeasurements(PyClass &pyclass) {
static_cast<sparse_index_type>(values.request().size));
},
"Variance of a sparse Hamiltonian.")
.def("generate_samples",
[](Measurements<StateVectorT> &M,
const std::vector<std::size_t> &wires,
const std::size_t num_shots) {
const std::size_t num_wires = wires.size();
auto &&result = M.generate_samples(wires, num_shots);
const std::size_t ndim = 2;
const std::vector<std::size_t> shape{num_shots, num_wires};
constexpr auto sz = sizeof(std::size_t);
const std::vector<std::size_t> strides{sz * num_wires, sz};
// return 2-D NumPy array
return py::array(py::buffer_info(
result.data(), /* data as contiguous array */
sz, /* size of one scalar */
py::format_descriptor<std::size_t>::format(), /* data type
*/
ndim, /* number of dimensions */
shape, /* shape of the matrix */
strides /* strides for each axis */
));
})
.def("generate_mcmc_samples", [](Measurements<StateVectorT> &M,
std::size_t num_wires,
const std::string &kernelname,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
#pragma once

#include <complex>
#include <random>
#include <stack>
#include <utility>
#include <vector>

#include "BitUtil.hpp"
Expand Down Expand Up @@ -296,6 +299,86 @@ namespace PUtil = Pennylane::Util;

namespace Pennylane::LightningQubit::Measures {

/**
* @brief Generate samples using the alias method.
*
* @tparam PrecisionT Precision data type
*/
template <typename PrecisionT> class DiscreteRandomVariable {
private:
static constexpr std::size_t default_index =
std::numeric_limits<std::size_t>::max();
std::mt19937 &gen_;
const std::size_t n_probs_;
const std::vector<std::pair<double, std::size_t>> bucket_partners_;
mutable std::uniform_real_distribution<PrecisionT> distribution_{0.0, 1.0};

public:
/**
* @brief Create a DiscreteRandomVariable object.
*
* @param gen Random number generator reference.
* @param probs Probabilities for values 0 up to N - 1, where N =
* probs.size().
*/
DiscreteRandomVariable(std::mt19937 &gen,
const std::vector<PrecisionT> &probs)
: gen_{gen}, n_probs_{probs.size()},
bucket_partners_(init_bucket_partners_(probs)) {}

/**
* @brief Return a discrete random value.
*/
std::size_t operator()() const {
const auto idx =
static_cast<std::size_t>(distribution_(gen_) * n_probs_);
if (distribution_(gen_) >= bucket_partners_[idx].first &&
bucket_partners_[idx].second != default_index) {
return bucket_partners_[idx].second;
}
return idx;
}

private:
/**
* @brief Initialize the probability table of the alias method.
*/
std::vector<std::pair<double, std::size_t>>
init_bucket_partners_(const std::vector<PrecisionT> &probs) {
std::vector<std::pair<double, std::size_t>> bucket_partners(
n_probs_, {0.0, default_index});
std::stack<std::size_t> underfull_bucket_ids;
std::stack<std::size_t> overfull_bucket_ids;

for (std::size_t i = 0; i < n_probs_; i++) {
bucket_partners[i].first = n_probs_ * probs[i];
if (bucket_partners[i].first < 1.0) {
underfull_bucket_ids.push(i);
} else {
overfull_bucket_ids.push(i);
}
}

while (!underfull_bucket_ids.empty() && !overfull_bucket_ids.empty()) {
auto i = overfull_bucket_ids.top();
overfull_bucket_ids.pop();
auto j = underfull_bucket_ids.top();
underfull_bucket_ids.pop();

bucket_partners[j].second = i;
bucket_partners[i].first += bucket_partners[j].first - 1.0;

if (bucket_partners[i].first < 1.0) {
underfull_bucket_ids.push(i);
} else {
overfull_bucket_ids.push(i);
}
}

return bucket_partners;
}
};

/**
* @brief Probabilities for a subset of the full system.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@
#include <complex>
#include <cstdio>
#include <random>
#include <stack>
#include <type_traits>
#include <unordered_map>
#include <vector>
Expand Down Expand Up @@ -124,7 +123,7 @@ class Measurements final

const ComplexT *arr_data = this->_statevector.getData();

// Templated 1-4 wire cases; return probs
// Templated 1-4 wire cases; return probs
PROBS_SPECIAL_CASE(1);
PROBS_SPECIAL_CASE(2);
PROBS_SPECIAL_CASE(3);
Expand Down Expand Up @@ -175,7 +174,7 @@ class Measurements final
*
* @return Floating point std::vector with probabilities.
*/
auto probs(size_t num_shots) -> std::vector<PrecisionT> {
auto probs(std::size_t num_shots) -> std::vector<PrecisionT> {
return BaseType::probs(num_shots);
}

Expand Down Expand Up @@ -258,7 +257,7 @@ class Measurements final
const index_type *entries_ptr, const ComplexT *values_ptr,
const index_type numNNZ) -> PrecisionT {
PL_ABORT_IF(
(this->_statevector.getLength() != (size_t(row_map_size) - 1)),
(this->_statevector.getLength() != (std::size_t(row_map_size) - 1)),
"Statevector and Hamiltonian have incompatible sizes.");
auto operator_vector = Util::apply_Sparse_Matrix(
this->_statevector.getData(),
Expand Down Expand Up @@ -289,7 +288,7 @@ class Measurements final
"The lengths of the list of operations and wires do not match.");
std::vector<PrecisionT> expected_value_list;

for (size_t index = 0; index < operations_list.size(); index++) {
for (std::size_t index = 0; index < operations_list.size(); index++) {
expected_value_list.emplace_back(
expval(operations_list[index], wires_list[index]));
}
Expand Down Expand Up @@ -461,7 +460,7 @@ class Measurements final

std::vector<PrecisionT> expected_value_list;

for (size_t index = 0; index < operations_list.size(); index++) {
for (std::size_t index = 0; index < operations_list.size(); index++) {
expected_value_list.emplace_back(
var(operations_list[index], wires_list[index]));
}
Expand All @@ -488,7 +487,7 @@ class Measurements final
std::size_t num_qubits = this->_statevector.getNumQubits();
std::uniform_real_distribution<PrecisionT> distrib(0.0, 1.0);
std::vector<std::size_t> samples(num_samples * num_qubits, 0);
std::unordered_map<size_t, std::size_t> cache;
std::unordered_map<std::size_t, std::size_t> cache;
this->setRandomSeed();

TransitionKernelType transition_kernel = TransitionKernelType::Local;
Expand All @@ -502,13 +501,13 @@ class Measurements final
std::size_t idx = 0;

// Burn In
for (size_t i = 0; i < num_burnin; i++) {
for (std::size_t i = 0; i < num_burnin; i++) {
idx = metropolis_step(this->_statevector, tk, this->rng, distrib,
idx); // Burn-in.
}

// Sample
for (size_t i = 0; i < num_samples; i++) {
for (std::size_t i = 0; i < num_samples; i++) {
idx = metropolis_step(this->_statevector, tk, this->rng, distrib,
idx);

Expand All @@ -521,7 +520,7 @@ class Measurements final

// If not cached, compute
else {
for (size_t j = 0; j < num_qubits; j++) {
for (std::size_t j = 0; j < num_qubits; j++) {
samples[i * num_qubits + (num_qubits - 1 - j)] =
(idx >> j) & 1U;
}
Expand Down Expand Up @@ -551,7 +550,7 @@ class Measurements final
const index_type *entries_ptr, const ComplexT *values_ptr,
const index_type numNNZ) {
PL_ABORT_IF(
(this->_statevector.getLength() != (size_t(row_map_size) - 1)),
(this->_statevector.getLength() != (std::size_t(row_map_size) - 1)),
"Statevector and Hamiltonian have incompatible sizes.");
auto operator_vector = Util::apply_Sparse_Matrix(
this->_statevector.getData(),
Expand All @@ -577,79 +576,36 @@ class Measurements final
* @return 1-D vector of samples in binary, each sample is
* separated by a stride equal to the number of qubits.
*/
std::vector<std::size_t> generate_samples(size_t num_samples) {
std::vector<std::size_t> generate_samples(const std::size_t num_samples) {
const std::size_t num_qubits = this->_statevector.getNumQubits();
auto &&probabilities = probs();
std::vector<std::size_t> wires(num_qubits);
std::iota(wires.begin(), wires.end(), 0);
return generate_samples(wires, num_samples);
}

std::vector<std::size_t> samples(num_samples * num_qubits, 0);
std::uniform_real_distribution<PrecisionT> distribution(0.0, 1.0);
std::unordered_map<size_t, std::size_t> cache;
/**
* @brief Generate samples.
*
* @param wires Sample are generated for the specified wires.
* @param num_samples The number of samples to generate.
* @return 1-D vector of samples in binary, each sample is
* separated by a stride equal to the number of qubits.
*/
std::vector<std::size_t>
generate_samples(const std::vector<std::size_t> &wires,
const std::size_t num_samples) {
const std::size_t n_wires = wires.size();
std::vector<std::size_t> samples(num_samples * n_wires);
this->setRandomSeed();

const std::size_t N = probabilities.size();
std::vector<double> bucket(N);
std::vector<std::size_t> bucket_partner(N);
std::stack<std::size_t> overfull_bucket_ids;
std::stack<std::size_t> underfull_bucket_ids;

for (size_t i = 0; i < N; i++) {
bucket[i] = N * probabilities[i];
bucket_partner[i] = i;
if (bucket[i] > 1.0) {
overfull_bucket_ids.push(i);
}
if (bucket[i] < 1.0) {
underfull_bucket_ids.push(i);
}
}

// Run alias algorithm
while (!underfull_bucket_ids.empty() && !overfull_bucket_ids.empty()) {
// get an overfull bucket
std::size_t i = overfull_bucket_ids.top();

// get an underfull bucket
std::size_t j = underfull_bucket_ids.top();
underfull_bucket_ids.pop();

// underfull bucket is partned with an overfull bucket
bucket_partner[j] = i;
bucket[i] = bucket[i] + bucket[j] - 1;

// if overfull bucket is now underfull
// put in underfull stack
if (bucket[i] < 1) {
overfull_bucket_ids.pop();
underfull_bucket_ids.push(i);
}

// if overfull bucket is full -> remove
else if (bucket[i] == 1.0) {
overfull_bucket_ids.pop();
}
}

// Pick samples
for (size_t i = 0; i < num_samples; i++) {
PrecisionT pct = distribution(this->rng) * N;
auto idx = static_cast<std::size_t>(pct);
if (pct - idx > bucket[idx]) {
idx = bucket_partner[idx];
}
// If cached, retrieve sample from cache
if (cache.contains(idx)) {
std::size_t cache_id = cache[idx];
auto it_temp = samples.begin() + cache_id * num_qubits;
std::copy(it_temp, it_temp + num_qubits,
samples.begin() + i * num_qubits);
}
// If not cached, compute
else {
for (size_t j = 0; j < num_qubits; j++) {
samples[i * num_qubits + (num_qubits - 1 - j)] =
(idx >> j) & 1U;
}
cache[idx] = i;
DiscreteRandomVariable<PrecisionT> drv{this->rng, probs(wires)};
// The Python layer expects a 2D array with dimensions (n_samples x
// n_wires) and hence the linear index is `s * n_wires + (n_wires - 1 -
// j)` `s` being the "slow" row index and `j` being the "fast" column
// index
for (std::size_t s = 0; s < num_samples; s++) {
const std::size_t idx = drv();
for (std::size_t j = 0; j < n_wires; j++) {
samples[s * n_wires + (n_wires - 1 - j)] = (idx >> j) & 1U;
}
}
return samples;
Expand Down
Loading

0 comments on commit de7beb7

Please sign in to comment.