-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring back sparse_vector mapping #98996
Bring back sparse_vector mapping #98996
Conversation
server/src/main/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied and pasted from RankFeaturesFieldMapper
with some checks on index version.
public void parse(DocumentParserContext context) throws IOException { | ||
|
||
// No support for indexing / searching 7.x sparse_vector field types | ||
if (context.indexSettings().getIndexVersionCreated().before(IndexVersion.V_8_0_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - checking that we're not operating on indices created before 8.0. Return the same errors that the previous implementation had.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from RankFeaturesFieldMapperTests, with some additions
return Strings.concatStringArrays(super.getParseMinimalWarnings(indexVersion), additionalWarnings); | ||
} | ||
|
||
public void testSparseVectorWith7xIndex() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added from the previous version of the test. The main idea is to allow indices to exist with the previous sparse_vector
implementation (<8.0), but no new docs can be added to it.
@@ -441,6 +441,10 @@ protected String[] getParseMinimalWarnings() { | |||
return Strings.EMPTY_ARRAY; | |||
} | |||
|
|||
protected String[] getParseMinimalWarnings(IndexVersion indexVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this method so we can specify warnings that will be included when using mappings that go back to specific index versions.
|
||
public Builder(String name) { | ||
super(name); | ||
} | ||
|
||
@Override | ||
protected Parameter<?>[] getParameters() { | ||
return new Parameter<?>[] { meta }; | ||
return new Parameter<?>[] { positiveScoreImpact, meta }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove positiveScoreImpact
we can simplify this feature.
@@ -0,0 +1,21 @@ | |||
setup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a simple set of tests for exercising the mapped type & queries and such. No need to have a single file only for mapping tests.
This file should test:
- Creating the mapping (in the correct versions)
- Ensure creation fails in older 8.x versions
- That queries work
- That indexing works
@@ -0,0 +1,143 @@ | |||
--- | |||
"Indexing and searching sparse vectors": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test that checks indexing and performs a simulated text_expansion query.
server/src/main/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
|
||
public void testBoostNotAllowed() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've separated boost checking in two, for deprecated warnings and boosts not allowed.
@@ -521,6 +527,10 @@ public final void testDeprecatedBoost() throws IOException { | |||
assertParseMinimalWarnings(); | |||
} | |||
|
|||
protected IndexVersion boostNotAllowedIndexVersion() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the problem that checking for boosts not allowed caused sparse_vector mappings to error because sparse_vector is not allowed for 8.0.0.
I've created an overrridable method that returns the index version to test boosting for.
I'm happy to hear about other ideas for solving this in a simpler way 👍
@@ -103,8 +103,16 @@ public class TextExpansionQueryIT extends PyTorchModelRestTestCase { | |||
RAW_MODEL_SIZE = Base64.getDecoder().decode(BASE_64_ENCODED_MODEL).length; | |||
} | |||
|
|||
public void testRankFeaturesTextExpansionQuery() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've duped the tests on text_expansion queries so we check they can be used both with rank_features
and sparse_vector
field types.
I could as well have created separate classes for this, LMK if you can think of better ways to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not do this. Thank you for taking initiative here. But, all the ML related changes should be done after sparse_vector is merged. It will be good to separate the changes to make the change log cleaner and reviews easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - I'll revert this change and the text_expansion_search_sparse_vector.yml
test as well.
@@ -0,0 +1,109 @@ | |||
# This test uses the simple model defined in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a separate YAML test that performs the same checks for text_expansion
queries for sparse_vector
field type
@elasticsearchmachine run elasticsearch-ci/part-1 |
Hi @carlosdelest, I've created a changelog YAML for you. |
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Carlos!
reason: "sparse_vector field type supported in 7.x" | ||
- do: | ||
allowed_warnings: | ||
- "The [sparse_vector] field type is deprecated and will be removed in 8.0." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep the 8.0 warning here, since the test is 7.x only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8.0.0 -
indicates to me that this test only runs in 7.x as well. I am not sure why both warnings would be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have received both warnings on BWC tests, depending on the BWC test being performed:
- All nodes under test are on 7.x
- A mixed cluster test like this one. I understand that a node could have been updated to 8.x as part of this test, and thus return this warning.
public static final String CONTENT_TYPE = "sparse_vector"; | ||
|
||
static final String ERROR_MESSAGE_7X = "[sparse_vector] field type in old 7.x indices is allowed to " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicky question: Should we use the CONTENT_TYPE
constant in error messages instead of hardcoding sparse_vector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied / pasted from the previous sparse_vector
error messages. As content type constant is unlikely to change, I personally think this is more readable as well (and in line with how error messages are created as well)
public DocValueFormat docValueFormat(String format, ZoneId timeZone) { | ||
throw new UnsupportedOperationException(ERROR_MESSAGE_7X); | ||
public Query existsQuery(SearchExecutionContext context) { | ||
throw new IllegalArgumentException("[sparse_vector] fields do not support [exists] queries"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we decide to move from UnsupportedOperationException
to IllegalArgumentException
? Both are fine, just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really a change in the exception, as the method is different. Keep in mind that I copied / pasted RankFeaturesFieldMapper
, so there are a lot of changes and the diff is misleading.
); | ||
@Override | ||
protected boolean allowsNullValues() { | ||
return false; // TODO should this allow null values? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Remove TODO before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied / pasted from RankFeaturesFieldMapper
, so I think it makes sense for re-evaluating in the future.
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
@elasticsearchmachine update branch |
server/src/main/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
static final String ERROR_MESSAGE_8X = "The [sparse_vector] field type is not supported from 8.0 to 8.10 versions."; | ||
static final IndexVersion PREVIOUS_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_0_0; | ||
|
||
static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_500_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have here V_8_500_001
?
// No support for indexing / searching 7.x sparse_vector field types | ||
if (context.indexSettings().getIndexVersionCreated().before(PREVIOUS_SPARSE_VECTOR_INDEX_VERSION)) { | ||
throw new UnsupportedOperationException(ERROR_MESSAGE_7X); | ||
} else if (context.indexSettings().getIndexVersionCreated().before(IndexVersion.V_8_11_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here instead of 8_11
we should also put a new V_8_500_001
version
if (context.doc().getByKey(key) != null) { | ||
throw new IllegalArgumentException( | ||
"[sparse_vector] fields do not support indexing multiple values for the same " | ||
+ "rank feature [" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "rank feature", I think we can just say "feature"
|
||
static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_500_000; | ||
|
||
private static SparseVectorFieldType ft(FieldMapper in) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that it is never used. Should we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlosdelest Thanks Carlos, I left minor comments, but the change overall LGTM!
…ector_type_comeback
…ector_type_comeback
@elasticsearchmachine update branch |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
This PR has been merged. However, I would like to ask some questions to improve my understanding of this work. 😄
|
Yes
No
No |
Adds back
sparse_vector
field type, as a copy ofrank_features
.The main goal is to have the
sparse_vector
field type available so we can switch ELSER queries to use the new type.