Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mappings: disallow exotic options on meta fields #8143

Closed
jpountz opened this issue Oct 17, 2014 · 10 comments
Closed

Mappings: disallow exotic options on meta fields #8143

jpountz opened this issue Oct 17, 2014 · 10 comments
Labels
>breaking :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v2.0.0-beta1

Comments

@jpountz
Copy link
Contributor

jpountz commented Oct 17, 2014

We have some mapping options that sound interesting but are actually almost useless or dangerous. I propose to remove them:

_type: { index: no }

Not indexing the _type sounds appealing since documentation mentions everything will keep on working, so that should just save space. Except that elasticsearch will internally run a prefix query on the _uid field instead, which is going to be super slow. I think we should remove this option.

_type: { store: yes } and _id: { store: yes }

Storing the type and _id is useless since we already enforce the _uid to be stored, and the _uid contains these informations.

_id: { index: not_analyzed }

The _id field is the same for all documents so we should not need to index, store or doc-value it. (Can be done now thanks to #6073 and #7965)

In general I'm wondering if we shouldn't go further and completely lock down how data is indexed/stored/docvalued for meta fields. There would just remain high-level configuration options such as enabled on the _timestamp mapper or type on _parent.

Relates to #8870

@clintongormley
Copy link

+1 I get the feeling a number of these options were added just in case the current settings didn't work out so well, but I think we can safely declare them battle tested now

@rjernst
Copy link
Member

rjernst commented Oct 18, 2014

In general I'm wondering if we shouldn't go further and completely lock down how data is indexed/stored/docvalued for meta fields.

+1

@jpountz
Copy link
Contributor Author

jpountz commented Feb 20, 2015

Removed the discuss label, let's do it!

@rjernst
Copy link
Member

rjernst commented Feb 23, 2015

I started work on this, but realized the branch was becoming massive with all the test fixes for different meta fields. I'm now attempting to have a separate PR for each meta field. First one is simple, just _uid: #9836

rjernst added a commit to rjernst/elasticsearch that referenced this issue Feb 23, 2015
Also, cleanup writePre20Settings so it is shared across all field
mappers.

see elastic#8143
rjernst added a commit to rjernst/elasticsearch that referenced this issue Feb 24, 2015
There are two implications to this change.
First, percolator now uses _uid internally, extracting the id portion
when needed. Second, sorting on _id is no longer possible, since you
can no longer index _id. However, _uid can still be used to sort, and
is better anyways as indexing _id just to make it available to
fielddata for sorting is wasteful.

see elastic#8143
closes elastic#9842
rjernst added a commit to rjernst/elasticsearch that referenced this issue Feb 25, 2015
rjernst added a commit to rjernst/elasticsearch that referenced this issue Feb 25, 2015
rjernst added a commit to rjernst/elasticsearch that referenced this issue Feb 27, 2015
This also changes the stored setting for _size to true (for
indexes created in 2.x).

see elastic#8143
closes elastic#9913
rjernst added a commit to rjernst/elasticsearch that referenced this issue Feb 27, 2015
While the parser allowed changing field type settings, these would never
have been serialized.  So this change simply removes parsing using
parseField. Backcompat will still work if a user uploads old settings
(they just would never have worked anyways, so we continue ignoring
them with 1.x, and 2.x will now error).

see elastic#8143
closes elastic#9914
@rjernst
Copy link
Member

rjernst commented Feb 27, 2015

_timestamp and _all are the only meta fields left that can have their field type modifed. I'm going to close this issue as _timestamp has its own discussion going in #9679 and _all I'm not sure what we can do there to limit it down (more discussion is probably needed), but we probably want to lock out some settings or improve error messages (right now, if you try to set it to not indexed, it just silently indexes it anyways...)

@rjernst rjernst closed this as completed Feb 27, 2015
@jpountz
Copy link
Contributor Author

jpountz commented Feb 27, 2015

Thanks @rjernst these changes are already awesome!

rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 30, 2015
Meta fields were locked down to not allow exotic options to the
underlying field types in elastic#8143. This change fixes the docs
to no longer refer to the old settings.

closes elastic#10879
rjernst added a commit to rjernst/elasticsearch that referenced this issue May 7, 2015
Meta fields were locked down to not allow exotic options to the
underlying field types in elastic#8143. This change fixes the docs
to no longer refer to the old settings.

closes elastic#10879
rjernst added a commit to rjernst/elasticsearch that referenced this issue May 7, 2015
Meta fields were locked down to not allow exotic options to the
underlying field types in elastic#8143. This change fixes the docs
to no longer refer to the old settings.

closes elastic#10879
rjernst added a commit to rjernst/elasticsearch that referenced this issue Jun 22, 2015
This is a follow up to elastic#8143 and elastic#6730 for _timestamp. It removes
support for `path`, as well as any field type settings, and
enables docvalues for _timestamp, for 2.0.  Users who need to
adjust these settings can use a date field.
@djschny
Copy link
Contributor

djschny commented Sep 9, 2015

In regards to the _type disabling, I've used this historically when I already had a field on my document that was the same value as the _type and therefore indexing it was not necessary since would craft the queries to filter on that field instead. For folks that have billions of very small documents, it was my understanding that not indexing _type was helpful as the ratio of that data compared to overall document size was larger.

@rjernst
Copy link
Member

rjernst commented Sep 9, 2015

@djschny if there is a field that duplicates a meta field, why can't the user not send that field? _type is essentially a virtual field on _uid, so it is actually not using any more memory than would be used nornally.

@djschny
Copy link
Contributor

djschny commented Sep 10, 2015

@rjernst Sure a user could not send that field, but that may not be ideal for them as they want their document to contain that field explicitly for when pulling data back out of ES. I understand that _type is essentially a virtual field on _uid, but it is my understanding that unless it is indexed separately then queries to find all docs of a particular type use a prefix query on _id which is not as ideal as a filter.

Why I bring this up is there are valid use cases for the configuration of _type to not be indexed and it's not an "exotic" use case. I think its very important to draw a distinction between a configuration that is invalid or not necessary (like the "store" on _type) vs. ones that have a practical use case.

@rjernst
Copy link
Member

rjernst commented Sep 10, 2015

I said "essentially" because it wasn't quite true. _type is indexed (the virtual field over _uid is for returning type as a stored field), but this is the cost of having the type system. Having types not indexed is not a valid use case, it is necessary for the basic operation of elasticsearch (doing a search like GET /myindex/type1,type2,type3/_search).

If a user doesn't want to pay that cost, they can not use types. By that I mean: send all their documents with the same ES type (the type will still be indexed, but the posting list will be highly compressed because all docs will have the same value). Then they can make their type field (notice no underscore) and do with it as they please.

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v2.0.0-beta1
Projects
None yet
Development

No branches or pull requests

5 participants