Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support script score when doc value is disabled #1573

Merged
merged 4 commits into from
Mar 28, 2024

Conversation

bugmakerrrrrr
Copy link
Contributor

Description

Today, script score is not supported if using lucene engine and doc value is disabled, because k-NN plugin uses doc value to execute exact query with script score. But actually, we can use ByteVectorValues/FloatVectorValues to execute this type of query.

In addition to , I think that we can totally disable doc value when using lucene engine, vector values can replace the function of doc value. Further, we can use FlatVectorsFormat to store vector values and implement our own native index functionality, so that we can avoid extra de/serialization and unify the script doc value interface. Any thoughts?

Btw, This PR has not added relevant tests yet.

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@bugmakerrrrrr
Copy link
Contributor Author

Currently, if we disable the doc value and use script score, the following exception will be thrown:

{
    "error": {
        "root_cause": [
            {
                "type": "illegal_state_exception",
                "reason": "unexpected docvalues type NONE for field 'item_vector' (expected=BINARY). Re-index with correct docvalues type."
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "products-shirts",
                "node": "M-rVnb_0SiqtyJw2-04XNw",
                "reason": {
                    "type": "illegal_state_exception",
                    "reason": "unexpected docvalues type NONE for field 'item_vector' (expected=BINARY). Re-index with correct docvalues type."
                }
            }
        ],
        "caused_by": {
            "type": "illegal_state_exception",
            "reason": "unexpected docvalues type NONE for field 'item_vector' (expected=BINARY). Re-index with correct docvalues type.",
            "caused_by": {
                "type": "illegal_state_exception",
                "reason": "unexpected docvalues type NONE for field 'item_vector' (expected=BINARY). Re-index with correct docvalues type."
            }
        }
    },
    "status": 500
}

@navneet1v
Copy link
Collaborator

Further, we can use FlatVectorsFormat to store vector values and implement our own native index functionality, so that we can avoid extra de/serialization and unify the script doc value interface.

This is a great idea. There is a github issue created for the same. There some challenges around the same. Ref: #1087 (comment)

@navneet1v
Copy link
Collaborator

@bugmakerrrrrr can you fix the gh actions.

@navneet1v
Copy link
Collaborator

I think that we can totally disable doc value when using lucene engine

Yes we should do this if possible. @jmazanec15 what your thought?

@navneet1v
Copy link
Collaborator

I did a very high level look on the PR, the logic seems to be pretty neat. But please remove all sysout statements and add docs on public functions and new interfaces which are created.

Along with fix the gh actions.

@jmazanec15
Copy link
Member

jmazanec15 commented Mar 25, 2024

Further, we can use FlatVectorsFormat to store vector values and implement our own native index functionality, so that we can avoid extra de/serialization and unify the script doc value interface.

Right, #1087 calls this out. I think it makes sense to implement our own KnnVectorsFormat that uses the correct FlatVectorsFormat.

Edit: I meant at some point, not necessarily right now.

@jmazanec15
Copy link
Member

Yes we should do this if possible. @jmazanec15 what your thought?

Yes, I agree this can be disabled - no sense in storing both.

@jmazanec15
Copy link
Member

This looks pretty good to me. I think we will need to add a few test cases.

@navneet1v
Copy link
Collaborator

I think that we can totally disable doc value when using lucene engine

On this I would think we should do some BWC to see if for an index we suddenly stop sending doc values will it cause any issue.

@bugmakerrrrrr can we add a BWC tests for the same.

@bugmakerrrrrr
Copy link
Contributor Author

I think that we can totally disable doc value when using lucene engine

On this I would think we should do some BWC to see if for an index we suddenly stop sending doc values will it cause any issue.

@bugmakerrrrrr can we add a BWC tests for the same.

@navneet1v Of course. But in this PR, I will focus on supporting script score, and open another PR to disable doc value.

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

Attention: Patch coverage is 90.24390% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 84.97%. Comparing base (c861966) to head (46e9883).

Files Patch % Lines
...opensearch/knn/index/KNNVectorDVLeafFieldData.java 76.92% 2 Missing and 1 partial ⚠️
...opensearch/knn/index/KNNVectorScriptDocValues.java 96.42% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1573      +/-   ##
============================================
+ Coverage     84.96%   84.97%   +0.01%     
- Complexity     1366     1374       +8     
============================================
  Files           172      172              
  Lines          5566     5599      +33     
  Branches        546      553       +7     
============================================
+ Hits           4729     4758      +29     
- Misses          605      607       +2     
- Partials        232      234       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code looks good to me.

jmazanec15
jmazanec15 previously approved these changes Mar 27, 2024
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: panguixin <panguixin@bytedance.com>
Signed-off-by: panguixin <panguixin@bytedance.com>
Signed-off-by: panguixin <panguixin@bytedance.com>
Signed-off-by: panguixin <panguixin@bytedance.com>
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for fixing the tests.

@navneet1v
Copy link
Collaborator

@jmazanec15 need approval from your side too on the PR.

@navneet1v navneet1v merged commit 771c4b5 into opensearch-project:main Mar 28, 2024
49 of 51 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1573-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 771c4b54a74b7c4406c71a8bf758378329cfe4d5
# Push it to GitHub
git push --set-upstream origin backport/backport-1573-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1573-to-2.x.

@navneet1v
Copy link
Collaborator

@bugmakerrrrrr can you manually raise the backport PR for this change.

@bugmakerrrrrr
Copy link
Contributor Author

@bugmakerrrrrr can you manually raise the backport PR for this change.

@navneet1v Sure

bugmakerrrrrr added a commit to bugmakerrrrrr/k-NN that referenced this pull request Apr 2, 2024
)

* support script score when doc value is disabled

Signed-off-by: panguixin <panguixin@bytedance.com>

* add test

Signed-off-by: panguixin <panguixin@bytedance.com>

* apply review comments

Signed-off-by: panguixin <panguixin@bytedance.com>

* fix test

Signed-off-by: panguixin <panguixin@bytedance.com>

---------

Signed-off-by: panguixin <panguixin@bytedance.com>
(cherry picked from commit 771c4b5)
bugmakerrrrrr added a commit to bugmakerrrrrr/k-NN that referenced this pull request Apr 4, 2024
)

* support script score when doc value is disabled

Signed-off-by: panguixin <panguixin@bytedance.com>

* add test

Signed-off-by: panguixin <panguixin@bytedance.com>

* apply review comments

Signed-off-by: panguixin <panguixin@bytedance.com>

* fix test

Signed-off-by: panguixin <panguixin@bytedance.com>

---------

Signed-off-by: panguixin <panguixin@bytedance.com>
(cherry picked from commit 771c4b5)
navneet1v pushed a commit that referenced this pull request Apr 9, 2024
* support script score when doc value is disabled

Signed-off-by: panguixin <panguixin@bytedance.com>

* add test

Signed-off-by: panguixin <panguixin@bytedance.com>

* apply review comments

Signed-off-by: panguixin <panguixin@bytedance.com>

* fix test

Signed-off-by: panguixin <panguixin@bytedance.com>

---------

Signed-off-by: panguixin <panguixin@bytedance.com>
(cherry picked from commit 771c4b5)
jmazanec15 pushed a commit to jmazanec15/k-NN-1 that referenced this pull request May 7, 2024
) (opensearch-project#1587)

* support script score when doc value is disabled

Signed-off-by: panguixin <panguixin@bytedance.com>

* add test

Signed-off-by: panguixin <panguixin@bytedance.com>

* apply review comments

Signed-off-by: panguixin <panguixin@bytedance.com>

* fix test

Signed-off-by: panguixin <panguixin@bytedance.com>

---------

Signed-off-by: panguixin <panguixin@bytedance.com>
(cherry picked from commit 771c4b5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants