Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define valid index, type, field, id, routing values #6736

Closed
clintongormley opened this issue Jul 4, 2014 · 16 comments
Closed

Define valid index, type, field, id, routing values #6736

clintongormley opened this issue Jul 4, 2014 · 16 comments

Comments

@clintongormley
Copy link

Currently we have no specification of allowed values for index names, type names, IDs, field names or routing values.

This issue is an attempt to document and improve the existing specs to prevent inconsistencies.

Index names

Index names are limited by the file system. They may only be lower case, and my not start with an underscore. While we don't prevent index names starting with a ., we reserve those for internal use. Clearly, . and .. cannot be used.

These characters are already illegal: \, /, *, ?, ", <, >, |, ,,. We should also add the null byte.

There are other filenames which are illegal in Windows, but we probably don't need to check for those.

Type names

Type names can contain any character (except null bytes, which currently we don't check) but may not start with an underscore.

IDs

IDs can contain any character (except null bytes, which currently we don't check). IDs should not begin with an underscore.

Currently IDs are not checked for underscores and IDs with underscores may exist. These can clash with eg _mapping and so should be prevented. This is a backwards incompatible change.

Routing & Parent

Routing and parent values should be the same as IDs, ie any chars except for the null byte. The problem is that multiple routing values are passed in the query string as comma-separated values, eg ?routing=foo,bar.

If a single routing value contains a comma, it will be misinterpreted as two routing values. One idea is to pass multiple routing values as eg ?routing=foo&routing=bar,baz. Unfortunately, this is not backwards compatible and isn't supported by a number of client libraries.

The only solution I can think of is to support some escaping of commas, eg foo\,bar. This would mean that \ would need to be escaped as well, ie: foo\bar -> foo\\bar. Support for this escaping would need to be added to Elasticsearch and to the client libraries.

@colings86
Copy link
Contributor

Should ensure that if any global discisions are made regarding naming that the aggregation names are also included

@bleskes
Copy link
Contributor

bleskes commented Jul 4, 2014

one more relevant point is that some of our endpoints mask what is a valid get doc by id REST request (according to the above spec). For example: GET index/type/_mapping (which masks a document where id is _mapping). IMHO this is not a problem, but we should mention it for completeness.

@clintongormley
Copy link
Author

@bleskes i think it is a problem as there is no workaround. I've added this sentence to the original issue: "IDs should not begin with an underscore."

@clintongormley
Copy link
Author

Adding field names to the specs (see #5972). Field names should not begin with an underscore, contain . or null bytes.

If a fieldname contains . when creating a mapping, we have two choices:

  • throw an error
  • convert it to an object eg foo.bar: 5 -> `{ foo: { bar: 5}}``

Throwing an error seems a more transparent way of dealing with this.

@danielcweeks
Copy link

Having a dot in the field name is actually very useful. Would it be possible to use an escape for referencing a field name instead of path?

@clintongormley
Copy link
Author

@dcw-netflix an escape? do you mean foo\.bar? Yes we can probably support that.

@danielcweeks
Copy link

Yes, that would be perfect. The reason is that there are a lot of use cases where property/config files get indexed, which results in many dot separated keys.

@clintongormley
Copy link
Author

Colons in index names are also invalid. See #7148

@uboness
Copy link
Contributor

uboness commented Aug 13, 2014

This is a great start for having a format input validation rules in elasticsearch. I believe we need to centralize all these rules in one place. I also think we should have validation rules for every input in es (not just those listed above)... for example: field names, repository names, snapshot names, etc... basically everything that in one way or another can compromise the consistent state of the cluster.

We currently have a lot of this logic (probably incomplete) scattered in different places, it's definitely time to formalize them (both in docs & code)

@dadoonet
Copy link
Member

Note that for repository names, we also need to delegate the validation to plugins as there can be other rules with some cloud providers (azure for example). See also #7096

@ron-totango
Copy link

We have an issue with routing value with comma. Any workaround we should use? Thanks

@rpedela
Copy link

rpedela commented Oct 30, 2014

A common use case for ES (and my use case) is to index a DB table which may have column names that start with an underscore. Renaming the columns is not an option in my use case as well. Currently this requires storing a mapping between DB column names and ES field names which adds complexity.

Is it possible to escape an underscore in a field name? Or more generally is it possible to escape any special character in a field name? A more general escaping solution would be optimal in my opinion because then a field name could have any arbitrary characters just like a quoted SQL identifier.

@clintongormley
Copy link
Author

Closing in favour of #9059

@mcayland
Copy link

mcayland commented Dec 7, 2015

I've just come across this problem in the past week with ES 2.1 whilst trying to create documents with "."s in the field name. Am I correct in that even the field name escaping parts aren't included in ES 2.1? This is sadly a showstopper for our application as the field names we use are equipment serial codes, and we've recently added a supplier that includes "."s in their serial codes.

@clintongormley
Copy link
Author

@mcayland using serial numbers for field names is a bad design choice as you will end up with sparse fields, and much more disk usage than you actually need.

@roisin-jin
Copy link

roisin-jin commented Jun 9, 2016

hi, I'm wondering is there any other wildcard characters allowed in the template names apart from the star symbol? We have several indexes named by the same pattern, i.e: ap-YYYY-MM, bg-YYYY-MM, cm-YYYY-MM, etc. And they all have the same mapping, we just want separate those data into different indexes. Is there anyway to create a single template with index name pattern like '??-*' ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants