Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Supporting sparse semantic retrieval in neural search #343

Merged
merged 2 commits into from
Sep 27, 2023

Conversation

opensearch-trigger-bot[bot]
Copy link
Contributor

Backport 7bef7a0 from #333

* sparse mapper field and query builder

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix typo

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <zaniu@amazon.com>

* Fix compilation failure issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add more UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* add sparse encoding processor

Signed-off-by: xinyual <xinyual@amazon.com>

* add sparse encoding processor

Signed-off-by: xinyual <xinyual@amazon.com>

* remove guava in gradle

Signed-off-by: xinyual <xinyual@amazon.com>

* modify access control

Signed-off-by: xinyual <xinyual@amazon.com>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <zaniu@amazon.com>

* Fix compilation failure issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* change output logic

Signed-off-by: xinyual <xinyual@amazon.com>

* create abstract

Signed-off-by: xinyual <xinyual@amazon.com>

* create abstract proccesor

Signed-off-by: xinyual <xinyual@amazon.com>

* add abstract class

Signed-off-by: xinyual <xinyual@amazon.com>

* remove duplicate code

Signed-off-by: xinyual <xinyual@amazon.com>

* remove duplicate code

Signed-off-by: xinyual <xinyual@amazon.com>

* remove dl process

Signed-off-by: xinyual <xinyual@amazon.com>

* move static to abstract class

Signed-off-by: xinyual <xinyual@amazon.com>

* update query rewrite logic

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* modify header

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* merge conflict

Signed-off-by: xinyual <xinyual@amazon.com>

* delete index mapper, change to rank_features

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove unused import

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* list return result

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* refactor type and listTypeNestedMapKey, tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* forbid nested input. tidy.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enable nested

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Add ut it to sparse encoding processor (#6)

* fix original UT problem

Signed-off-by: xinyual <xinyual@amazon.com>

* add UT IT

Signed-off-by: xinyual <xinyual@amazon.com>

* add more UT

Signed-off-by: xinyual <xinyual@amazon.com>

* add more ut

Signed-off-by: xinyual <xinyual@amazon.com>

* fix typo error

Signed-off-by: xinyual <xinyual@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>

* utils, tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename to sparse_encoding query

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add validation and ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* sparse encoding query builder ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* UT for utils

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enrich sparse encoding IT mappings

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add integ test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename resource file

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove BoundedLinearQuery and TokenScoreUpperBound

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add delta to loose the equal

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* move SparseEncodingQueryBuilder to upper level path

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Update src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Co-authored-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Update src/main/java/org/opensearch/neuralsearch/util/TokenWeightUtil.java

Co-authored-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* restore gradle.propeties

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add release notes

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* change field modifier to private for NLPProcessor

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* use StringUtils to check

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* null check

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* modify changelog

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* nit

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* nit

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove query tokens from user interface

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* update function name

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add javadoc

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove debug log including inference result

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* make query text and model id required

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* minor changes based on comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add locale to String.format

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* update mock model url

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

---------

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Co-authored-by: zane-neo <zaniu@amazon.com>
Co-authored-by: xinyual <xinyual@amazon.com>
(cherry picked from commit 7bef7a0)
* fix apache http version

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add import

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

---------

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
@zhichao-aws
Copy link
Member

waiting opensearch-project/ml-commons#1398 get merged to run sparse encoding integ test

listener.onResponse(vector);
}, e -> {
if (RetryUtil.shouldRetry(e, retryTime)) {
final int retryTimeAdd = retryTime + 1;
inferenceSentencesWithRetry(targetResponseFilters, modelId, inputText, retryTimeAdd, listener);
retryableInferenceSentencesWithVectorResult(targetResponseFilters, modelId, inputText, retryTimeAdd, listener);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you name function like that? An function should be a verb phrase.

@zane-neo zane-neo merged commit 415082e into 2.x Sep 27, 2023
12 checks passed
@github-actions github-actions bot deleted the backport/backport-333-to-2.x branch September 27, 2023 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants