Skip to content

Commit

Permalink
Add match_only_text, a space-efficient variant of text. (#66172)
Browse files Browse the repository at this point in the history
This adds a new `match_only_text` field, which indexes the same data as a `text`
field that has `index_options: docs` and `norms: false` and uses the `_source`
for positional queries like `match_phrase`. Unlike `text`, this field doesn't
support scoring.
  • Loading branch information
jpountz authored Apr 22, 2021
1 parent 5af6931 commit 83113ec
Show file tree
Hide file tree
Showing 24 changed files with 2,241 additions and 44 deletions.
3 changes: 2 additions & 1 deletion docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ values.
[[text-search-types]]
==== Text search types

<<text,`text`>>:: Analyzed, unstructured text.
<<text,`text` fields>>:: The text family, including `text` and `match_only_text`.
Analyzed, unstructured text.
{plugins}/mapper-annotated-text.html[`annotated-text`]:: Text containing special
markup. Used for identifying named entities.
<<completion-suggester,`completion`>>:: Used for auto-complete suggestions.
Expand Down
59 changes: 59 additions & 0 deletions docs/reference/mapping/types/match-only-text.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
[discrete]
[[match-only-text-field-type]]
=== Match-only text field type

A variant of <<text-field-type,`text`>> that trades scoring and efficiency of
positional queries for space efficiency. This field effectively stores data the
same way as a `text` field that only indexes documents (`index_options: docs`)
and disables norms (`norms: false`). Term queries perform as fast if not faster
as on `text` fields, however queries that need positions such as the
<<query-dsl-match-query-phrase,`match_phrase` query>> perform slower as they
need to look at the `_source` document to verify whether a phrase matches. All
queries return constant scores that are equal to 1.0.

Analysis is not configurable: text is always analyzed with the
<<specify-index-time-default-analyzer,default analyzer>>
(<<analysis-standard-analyzer,`standard`>> by default).

<<span-queries,span queries>> are not supported with this field, use
<<query-dsl-intervals-query,interval queries>> instead, or the
<<text-field-type,`text`>> field type if you absolutely need span queries.

Other than that, `match_only_text` supports the same queries as `text`. And
like `text`, it doesn't support sorting or aggregating.

[source,console]
--------------------------------
PUT logs
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "match_only_text"
}
}
}
}
--------------------------------

[discrete]
[[match-only-text-params]]
==== Parameters for match-only text fields

The following mapping parameters are accepted:

[horizontal]

<<multi-fields,`fields`>>::

Multi-fields allow the same string value to be indexed in multiple ways for
different purposes, such as one field for search and a multi-field for
sorting and aggregations, or the same string value analyzed by different
analyzers.

<<mapping-field-meta,`meta`>>::

Metadata about the field.
18 changes: 17 additions & 1 deletion docs/reference/mapping/types/text.asciidoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
[testenv="basic"]
[[text]]
=== Text field type
=== Text type family
++++
<titleabbrev>Text</titleabbrev>
++++

The text family includes the following field types:

* <<text-field-type,`text`>>, the traditional field type for full-text content
such as the body of an email or the description of a product.
* <<match-only-text-field-type,`match_only_text`>>, a space-optimized variant
of `text` that disables scoring and performs slower on queries that need
positions. It is best suited for indexing log messages.


[discrete]
[[text-field-type]]
=== Text field type

A field to index full-text values, such as the body of an email or the
description of a product. These fields are `analyzed`, that is they are passed through an
<<analysis,analyzer>> to convert the string into a list of individual terms
Expand Down Expand Up @@ -253,3 +267,5 @@ PUT my-index-000001
}
}
--------------------------------------------------

include::match-only-text.asciidoc[]
2 changes: 1 addition & 1 deletion modules/mapper-extras/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ esplugin {

restResources {
restApi {
include '_common', 'cluster', 'nodes', 'indices', 'index', 'search', 'get'
include '_common', 'cluster', 'field_caps', 'nodes', 'indices', 'index', 'search', 'get'
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

package org.elasticsearch.index.mapper;

import org.apache.lucene.analysis.CannedTokenStream;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.index.DocValuesType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.index.IndexableFieldType;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.plugins.Plugin;
import org.hamcrest.Matchers;

import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.List;

import static org.hamcrest.Matchers.containsString;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.instanceOf;

public class MatchOnlyTextFieldMapperTests extends MapperTestCase {

@Override
protected Collection<Plugin> getPlugins() {
return List.of(new MapperExtrasPlugin());
}

@Override
protected Object getSampleValueForDocument() {
return "value";
}

public final void testExists() throws IOException {
MapperService mapperService = createMapperService(fieldMapping(b -> { minimalMapping(b); }));
assertExistsQuery(mapperService);
assertParseMinimalWarnings();
}

@Override
protected void registerParameters(ParameterChecker checker) throws IOException {
checker.registerUpdateCheck(b -> {
b.field("meta", Collections.singletonMap("format", "mysql.access"));
}, m -> assertEquals(Collections.singletonMap("format", "mysql.access"), m.fieldType().meta()));
}

@Override
protected void minimalMapping(XContentBuilder b) throws IOException {
b.field("type", "match_only_text");
}

public void testDefaults() throws IOException {
DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMapping));
assertEquals(Strings.toString(fieldMapping(this::minimalMapping)), mapper.mappingSource().toString());

ParsedDocument doc = mapper.parse(source(b -> b.field("field", "1234")));
IndexableField[] fields = doc.rootDoc().getFields("field");
assertEquals(1, fields.length);
assertEquals("1234", fields[0].stringValue());
IndexableFieldType fieldType = fields[0].fieldType();
assertThat(fieldType.omitNorms(), equalTo(true));
assertTrue(fieldType.tokenized());
assertFalse(fieldType.stored());
assertThat(fieldType.indexOptions(), equalTo(IndexOptions.DOCS));
assertThat(fieldType.storeTermVectors(), equalTo(false));
assertThat(fieldType.storeTermVectorOffsets(), equalTo(false));
assertThat(fieldType.storeTermVectorPositions(), equalTo(false));
assertThat(fieldType.storeTermVectorPayloads(), equalTo(false));
assertEquals(DocValuesType.NONE, fieldType.docValuesType());
}

public void testNullConfigValuesFail() throws MapperParsingException {
Exception e = expectThrows(
MapperParsingException.class,
() -> createDocumentMapper(fieldMapping(b -> b.field("type", "match_only_text").field("meta", (String) null)))
);
assertThat(e.getMessage(), containsString("[meta] on mapper [field] of type [match_only_text] must not have a [null] value"));
}

public void testSimpleMerge() throws IOException {
XContentBuilder startingMapping = fieldMapping(b -> b.field("type", "match_only_text"));
MapperService mapperService = createMapperService(startingMapping);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));

merge(mapperService, startingMapping);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));

XContentBuilder newField = mapping(b -> {
b.startObject("field")
.field("type", "match_only_text")
.startObject("meta")
.field("key", "value")
.endObject()
.endObject();
b.startObject("other_field").field("type", "keyword").endObject();
});
merge(mapperService, newField);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));
assertThat(mapperService.documentMapper().mappers().getMapper("other_field"), instanceOf(KeywordFieldMapper.class));
}

public void testDisabledSource() throws IOException {
XContentBuilder mapping = XContentFactory.jsonBuilder().startObject().startObject("_doc");
{
mapping.startObject("properties");
{
mapping.startObject("foo");
{
mapping.field("type", "match_only_text");
}
mapping.endObject();
}
mapping.endObject();

mapping.startObject("_source");
{
mapping.field("enabled", false);
}
mapping.endObject();
}
mapping.endObject().endObject();

MapperService mapperService = createMapperService(mapping);
MappedFieldType ft = mapperService.fieldType("foo");
SearchExecutionContext context = createSearchExecutionContext(mapperService);
TokenStream ts = new CannedTokenStream(new Token("a", 0, 3), new Token("b", 4, 7));
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> ft.phraseQuery(ts, 0, true, context));
assertThat(e.getMessage(), Matchers.containsString("cannot run positional queries since [_source] is disabled"));

// Term queries are ok
ft.termQuery("a", context); // no exception
}

@Override
protected Object generateRandomInputValue(MappedFieldType ft) {
assumeFalse("We don't have a way to assert things here", true);
return null;
}

@Override
protected void randomFetchTestFieldConfig(XContentBuilder b) throws IOException {
assumeFalse("We don't have a way to assert things here", true);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ public Map<String, Mapper.TypeParser> getMappers() {
mappers.put(RankFeatureFieldMapper.CONTENT_TYPE, RankFeatureFieldMapper.PARSER);
mappers.put(RankFeaturesFieldMapper.CONTENT_TYPE, RankFeaturesFieldMapper.PARSER);
mappers.put(SearchAsYouTypeFieldMapper.CONTENT_TYPE, SearchAsYouTypeFieldMapper.PARSER);
mappers.put(MatchOnlyTextFieldMapper.CONTENT_TYPE, MatchOnlyTextFieldMapper.PARSER);
return Collections.unmodifiableMap(mappers);
}

Expand Down
Loading

0 comments on commit 83113ec

Please sign in to comment.