Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where quantization framework does not work with training #2100

Merged
merged 9 commits into from
Sep 12, 2024

Conversation

ryanbogan
Copy link
Member

@ryanbogan ryanbogan commented Sep 12, 2024

Description

Currently, the quantization framework is broken and does not work when training. This PR fixes this bug, so the vectors are properly quantized when the framework is enabled

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan
Copy link
Member Author

Rolling upgrade failure not related to this PR

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
jmazanec15
jmazanec15 previously approved these changes Sep 12, 2024
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me thank!

Copy link
Contributor

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify null checks, you can adress other comments in a follow up PR

searchParameters = &ivfParams;
} else {
auto hnswReader = dynamic_cast<const faiss::IndexBinaryHNSW*>(indexReader->index);
if(hnswReader != nullptr && (methodParamsJ != nullptr || parentIdsJ != nullptr)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do you need the condition after &&

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was already there, I'm not sure if it's necessary or not

if (parentIdsJ != nullptr) {
idGrouper = buildIDGrouperBitmap(jniUtil, env, parentIdsJ, &idGrouperBitmap);
hnswParams.grp = idGrouper.get();
if (ivfReader) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a good idea to throw in a unit test in this or as an immediate follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do in follow-up

return ArrayUtils.toPrimitive((Float[]) vectors.get(position));
}
};
QuantizationState quantizationState = quantizer.train(trainingRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I am assuming QuantizationServic doesn't fit here?

try {
List<byte[]> byteVectors = quantizeVectors(floats);
long memoryAddress = trainingDataAllocation.getMemoryAddress();
memoryAddress = JNICommons.storeBinaryVectorData(memoryAddress, byteVectors.toArray(new byte[0][0]), byteVectors.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do an argument capture for byte array and make sure they aren't the same in unit test

Comment on lines +56 to +57
verify(trainingDataAllocation).setMemoryAddress(valueCapture.capture());
when(trainingDataAllocation.getMemoryAddress()).thenReturn(valueCapture.getValue());
Copy link
Contributor

@shatejas shatejas Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: verify the capture

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan ryanbogan changed the title Enable training with quantization framework Fix bug where quantization framework does not work with training Sep 12, 2024
@ryanbogan ryanbogan merged commit 5d10d64 into opensearch-project:main Sep 12, 2024
38 of 39 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 12, 2024
* Initial implementation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Modify integration test and fix bugs in jni

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix unit test

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix integration test after merge

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add changelog (release notes)

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add unit test

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove entry for release notes

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add null checks

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit 5d10d64)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 12, 2024
* Initial implementation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Modify integration test and fix bugs in jni

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix unit test

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix integration test after merge

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add changelog (release notes)

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add unit test

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove entry for release notes

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add null checks

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit 5d10d64)
ryanbogan added a commit that referenced this pull request Sep 12, 2024
…) (#2102)

* Initial implementation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Modify integration test and fix bugs in jni

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix unit test

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix integration test after merge

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add changelog (release notes)

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add unit test

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove entry for release notes

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add null checks

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit 5d10d64)

Co-authored-by: Ryan Bogan <rbogan@amazon.com>
navneet1v pushed a commit that referenced this pull request Sep 14, 2024
…) (#2101)

* Initial implementation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Modify integration test and fix bugs in jni

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix unit test

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix integration test after merge

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add changelog (release notes)

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add unit test

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove entry for release notes

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add null checks

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit 5d10d64)

Co-authored-by: Ryan Bogan <rbogan@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants