Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic template with match_mapping_type ignored when there's a match "*" after it #2401

Closed
andrewclegg opened this issue Nov 12, 2012 · 13 comments
Assignees
Labels
>bug good first issue low hanging fruit help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@andrewclegg
Copy link

I have the following dynamic mappings set up in default-mapping.json:

{
    "_default_": {
        "_source": {
            "enabled": false
        },
        "_all": {
            "enabled": false
        },
        "dynamic_templates": [
            {
                "strings": {
                    "match": "*",
                    "match_mapping_type": "string",
                    "mapping": {
                        "type": "multi_field",
                        "fields": {
                            "{name}": {
                                "type": "string",
                                "index": "not_analyzed",
                                "omit_norms": true,
                                "omit_term_freq_and_positions": true
                            },
                            "lower": {
                                "type": "string",
                                "index": "analyzed",
                                "analyzer": "lowercase",
                                "omit_norms": true,
                                "omit_term_freq_and_positions": true
                            }
                        }
                    }
                }
            },
            {
                "everything": {
                    "match": "*",
                    "mapping": {
                        "omit_norms": true,
                        "omit_term_freq_and_positions": true
                    }
                }
            }
        ]
    }
}

If I understand this page correctly (at the very bottom):

http://www.elasticsearch.org/guide/reference/mapping/root-object-type.html

the first mapping that matches will be applied -- i.e. "everything" should only be applied to fields that don't match "strings".

However, this isn't what I see (in 0.19.8 anyway) -- no matter which order I put the mappings in, all fields have "everything" applied, including string fields.

(By the way, "lowercase" is a simple custom analayzer with a keyword tokenizer and lowercase token filter.)

@jprante
Copy link
Contributor

jprante commented Nov 12, 2012

Unfortunately, it seems the dynamic template list in org.elasticsearch.index.mapper.object.RootObjectMapper is derived from a map, fetched as Map<String,Object> from the given JSON source. So the keys in the map are not guaranteed to be ordered sequentially. I guess, internally, the entry "everything" is ordered before the entry "strings".

My suggestion is to add a positional attribute ("position") to the dynamic_templates entries so ES can order them more reliably according to the users preference. Patch wanted?

@jprante
Copy link
Contributor

jprante commented Nov 12, 2012

Another cause may be that equals() and hashcode() methods in org.elasticsearch.index.mapper.object.DynamicTemplate do not work as expected.

@andrewclegg
Copy link
Author

It occurs to me that my original example is a bit bogus anyway -- since the omit_norms and omit_term_freq_and_positions are only valid for string types anyway. But the general point still stands...

@kimchy
Copy link
Member

kimchy commented Nov 13, 2012

Hi, this one is tricky... . The order is actually properly maintained of the dynamic templates, so thats not the problem (the array in the dynamic templates denotes the order, and we respect that).

The problem is with how we resolve dynamic templates, specifically, with match_on_type. When we encounter a string type, we first try and match on a dynamic template by name, without the type. Then, if we don't match on it, we try and guess the type of the string value (it can be a date, an attachment, or numbers if numeric auto detection is turned on or something like that). If its not of any specialized non string type, only then we try and match on a dynamic template with the name and the string type as well.

The reason for this behavior is actually down to JSON and binary values. Because binary values in json are strings, trying to auto detect a date for example by trying to convert it to string ends up screwing up the internal parser binary value (I need to check if thats the case still). So we first need to try and match on name without actually knowing the type, and then match on the type...

What you see happens because in the initial match on name (without type), it ends up actually matching on the catch all everything one, and then its used.

This one requires some thinking, not an easy one to solve...

@haizaar
Copy link

haizaar commented Aug 8, 2013

Is there any plan to fix this? On recent ElasticSearch version it still happens. Is there any other way to provide specific dynamic mapping template for strings and another template for all other types?

@InfinitiesLoop
Copy link

I've just run into this as well. I want strings within a subpath to be analyzed with a specific analyzer, and for all other types in the same subpath to be not_analyzed (but with some other changes for which I need a mapping defined -- for example, a set index_name). It seems because of this behavior I may need to put my strings into a different subpath. I was hoping to avoid making structure choices in my documents just so I can map it correctly.

@clintongormley
Copy link

Wondering if an unmatch_mapping_type will help here? Possibly combined with rules without match_mapping_type being placed below rules with a specified (or wildcard) match_mapping_type?

@ppf2
Copy link
Member

ppf2 commented Mar 23, 2015

Recently came across this. The following is the use case:

https://gist.github.com/ppf2/6da223f9517ddc0e9465

In this case, what appears to work (Test 2 in the gist) is if I add "match_mapping_type": "*" in addition to "match": "*" in the dynamic template mapping for the default/everything fields.

@yanjunh
Copy link
Contributor

yanjunh commented Mar 26, 2015

This is a sweet workaround. It appears working for me. thanks

@clintongormley clintongormley added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Apr 5, 2015
@clintongormley
Copy link

Given @ppf2 's workaround in #2401 (comment) it seems that we just need to default match_mapping_type to *?

@clintongormley clintongormley added the good first issue low hanging fruit label Apr 5, 2015
@erikringsmuth
Copy link
Contributor

+1

The workaround of adding "match_mapping_type": "*" to all fields works in the meantime.

PUT /_template/log_template
{
  "template": "log*",
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "timestamp": {
            "match": "@timestamp",
            "match_mapping_type": "*",
            "mapping": {
              "type": "date",
              "index": "not_analyzed",
              "doc_values": true
            }
          }
        },
        {
          "string_multifield": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed",
                  "doc_values": true
                }
              }
            }
          }
        },
        {
          "catch_all": {
            "match": "*",
            "match_mapping_type": "*",
            "mapping": {
              "index": "not_analyzed",
              "doc_values": true
            }
          }
        }
      ]
    }
  }
}

@clintongormley
Copy link

@kimchy can you expand on what you mean here:

The reason for this behavior is actually down to JSON and binary values. Because binary values in json are strings, trying to auto detect a date for example by trying to convert it to string ends up screwing up the internal parser binary value (I need to check if thats the case still). So we first need to try and match on name without actually knowing the type, and then match on the type...

This patch:

         if (unmatch != null && patternMatch(unmatch, name)) {
             return false;
         }
-        if (matchMappingType != null) {
-            if (dynamicType == null) {
-                return false;
-            }
-            if (!patternMatch(matchMappingType, dynamicType)) {
-                return false;
-            }
+        if (dynamicType == null) {
+            return false;
+        }
+        if (matchMappingType != null && !patternMatch(matchMappingType, dynamicType)) {
+            return false;
         }
         return true;
     }

seems to work fine with binary strings, eg:

DELETE test

PUT test
{
  "mappings": {
    "_default_": {
      "_source": {
        "enabled": false
      },
      "dynamic_templates": [
        {
          "dates": {
            "match": "*",
            "match_mapping_type": "date",
            "mapping": {
              "type": "date",
              "format": "YYYY-mm-dd"
            }
          }
        },
        {
          "everything": {
            "match": "*",
            "mapping": {
              "type": "binary",
              "store": true
            }
          }
        }
      ]
    }
  }
}

PUT test/test/1
{
  "binary": "QUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVoB",
  "date": "2014-01-01"
}

GET /test/test/_mapping

returns:

        "_source": {
           "enabled": false
        },
        "properties": {
           "binary": {
              "type": "binary",
              "store": true
           },
           "date": {
              "type": "date",
              "format": "YYYY-mm-dd"
           }
        }

and

GET test/test/_search?fields=*

returns:

        "fields": {
           "binary": [
              "QUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVoB"
           ]
        }

And to update the original example, this seems to work correctly:

DELETE test

PUT test
{
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "strings": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        },
        {
          "everything": {
            "match": "*",
            "mapping": {
              "type": "{dynamic_type}",
              "doc_values": true
            }
          }
        }
      ]
    }
  }
}

PUT test/test/1
{
  "string": "bar",
  "bool": true,
  "date": "2014-01-01",
  "int": 5
}

GET test/test/_mapping

returns:

        "properties": {
           "bool": {
              "type": "boolean",
              "doc_values": true
           },
           "date": {
              "type": "date",
              "doc_values": true,
              "format": "dateOptionalTime"
           },
           "int": {
              "type": "long",
              "doc_values": true
           },
           "string": {
              "type": "string",
              "fields": {
                 "raw": {
                    "type": "string",
                    "index": "not_analyzed"
                 }
              }
           }

@clintongormley clintongormley self-assigned this Apr 10, 2015
Alex-Ikanow added a commit to IKANOW/Aleph2-contrib that referenced this issue Aug 17, 2015
there's a "bug" in ES
(elastic/elasticsearch#2401) that not doing so
means that gets tested first
Alex-Ikanow added a commit to IKANOW/Aleph2-contrib that referenced this issue Aug 17, 2015
there's a "bug" in ES
(elastic/elasticsearch#2401) that not doing so
means that gets tested first
@clintongormley
Copy link

Closed by #18638

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug good first issue low hanging fruit help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

10 participants