Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange problem with searching across multiple types of one index. #2218

Closed
unikoid opened this issue Aug 30, 2012 · 15 comments
Closed

Strange problem with searching across multiple types of one index. #2218

unikoid opened this issue Aug 30, 2012 · 15 comments

Comments

@unikoid
Copy link

unikoid commented Aug 30, 2012

I have two elasticsearch indexes and many types in them.
And there's some strange behaviour.
My index "fts" has the folowing types: 'category', 'product', 'blog_entry', 'comment', 'forum'.
I am searching for some keyword and I know that some documents of 'comment' type contains needed keyword or phrase.
If I do request like:
curl -XGET "http://127.0.0.1:9200/fts/comment/_search" -d "{my_query}"
OR
curl -XGET "http://127.0.0.1:9200/fts/category,comment/_search" -d "{my_query}"
I recieve expected documents.

But if I do:
curl -XGET "http://127.0.0.1:9200/fts/category,product,comment/_search" -d "{my_query}"
OR
curl -XGET "http://127.0.0.1:9200/fts/_all/_search" -d "{my_query}"
OR EVEN
curl -XGET "http://127.0.0.1:9200/fts/_search" -d "{my_query}"
I recieve: {"took":32,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

Also, such problem appeared recently and suddenly, when I was using elasticsearch 0.19.2. I decided to upgrade it to 0.19.9, but it didn't help.
Also, such strange behaviour is taking place on our production server, but on local system everything works as expected.

@unikoid
Copy link
Author

unikoid commented Aug 31, 2012

1 minute ago I tryed to delete index completely and reindex all my data from db again. It helps. But I'm afraid that an issue can appear again soon.

@kimchy
Copy link
Member

kimchy commented Sep 5, 2012

If you can get a recreation, it would help to try and solve this. What you do should work, i.e. searching across all types. One possible reason for this is maybe some types share the same field name, but with different field type? For example, you have an age field in two types, where in one its a string, and in the other its numeric? In this case, without specifying the type, ES will pick one of those to build the query.

@unikoid
Copy link
Author

unikoid commented Sep 7, 2012

No, there are no types that share the same field name, but with different field type. I will try to provide a testcase, but now I can't provide all our data, it's hundreds of thousands of docs.

@steeve
Copy link

steeve commented Oct 30, 2012

Same issue here :(

type1,type2 works as expected, but type2,type1 returns no results.

Note that each type has it's own analyzer (multilang).

@kimchy
Copy link
Member

kimchy commented Nov 1, 2012

When you do "cross" type search, the first field that we find for the type we use its mapping definition to construct the query. There has been requests for a different behavior, using boolean logic between fields of different types, but for now, you can simply search on all types, and have your query be type specific on the same field (i.e. type1.field1, type2.field1).

@bclozel
Copy link

bclozel commented Feb 19, 2013

Got caught by this one.
Had a different mapping configuration on types (with same field name).

Getting different results for type1,type2 and type2,type1 feels wrong, but it makes sense once you figure it out.

Fixing mappings should do the trick.

@jbarata
Copy link

jbarata commented Mar 7, 2013

Hi,
I think I'm having the same problem.

We have a simple search page (with an input field) where a user can search for anything in our 2 types of docs (client and invoice).
A cliente doc is like {"name":"john"} and an invoice doc is like {"number":"123", "client":{"name":"john"}}

The output of the search, beside de docs, consists of a grand total and byType totals.

If the search consists of only the text to be found, for eg. john, than all the docs are shown and all the totals are OK.
If the search is like name:john than the results are NOT OK (it returns the only client doc or the invoice doc, depending on the types order that goes to ElasticSearch); also the totals are incorrect.

To reproduce the situation execute the following curls against your ElasticSearch:

Create the index `testindex`
curl -XPUT 'http://127.0.0.1:9602/testindex'

Create a type `client` where docs have a root field `name`
curl -XPUT 'http://127.0.0.1:9602/testindex/client/1' -d '{"name":"john"}'

Create a type `invoice` where docs have a root field `number` and an object `client` with a `name` field
curl -XPUT 'http://127.0.0.1:9602/testindex/invoice/1' -d '{"number":"123", "client":{"name":"john"}}'



Now, if one searches for "name:john" in the `client` type we get the `client` doc
curl -XGET 'http://127.0.0.1:9602/testindex/client/_search?q=name:john'

if the search is made in the `invoice` type we get the `invoice` doc
curl -XGET 'http://127.0.0.1:9602/testindex/invoice/_search?q=name:john'

if the search is made in both types with `client` before `invoice` we get only the `client` doc
curl -XGET 'http://127.0.0.1:9602/testindex/client,invoice/_search?q=name:john'

if we swap the types order (`invoice` before `client`) we get only the `invoice` doc
curl -XGET 'http://127.0.0.1:9602/testindex/invoice,client/_search?q=name:john'

I would expect that the searches made on both types would return both docs but that's not happening.

I guess it may have to do with the mappings.

Is there a way to overcome this problem? (without manually extract the fields from the user query and build a multi-match query as suggested in https://groups.google.com/forum/#!topic/elasticsearch/-RZtZykZq5o)

Thanks in advance!

@dadoonet
Copy link
Member

dadoonet commented Mar 7, 2013

@jbarata you can search on all types:

curl -XGET 'http://localhost:9200/testindex/_search?q=john&pretty'

produces:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.5945348,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "client",
      "_id" : "1",
      "_score" : 0.5945348, "_source" : {"name":"john"}
    }, {
      "_index" : "testindex",
      "_type" : "invoice",
      "_id" : "1",
      "_score" : 0.37158427, "_source" : {"number":"123", "client":{"name":"john"}}
    } ]
  }
}

Here is a full recreation: https://gist.github.com/dadoonet/5111167

@jbarata
Copy link

jbarata commented Mar 7, 2013

@dadoonet
thank you very much for taking the time to test this.

You might have not noticed in the beginning of my post but i have done that test with success (when the query has no field filter everything works fine)

If the search consists of only the text to be found, for eg. john, than all the docs are shown and all the totals are OK.

The problem is when the query has a field filter that exists in different levels of the JSONs, as is the name field in my example.

cheers!

@jbarata
Copy link

jbarata commented Mar 12, 2013

Also, @kimchy solution would do just fine for now but it seems not to work (at least in this case)

you can simply search on all types, and have your query be type specific on the same field (i.e. type1.field1, type2.field1).

If I use client.name as in
curl -XGET 'http://127.0.0.1:9602/testindex/client,invoice/_search?q=client.name:john'
I'll get the invoice doc no matter the order of the invoice and client types.
With client.name I will only get the client doc if the search is made only on the clienttype (I would be happy if the result was only the client doc whatever the types order is).

If I use invoice.client.name then it will always return the invoice as expected

Is there a problem with the subpath to the field being equal in both types?
Is there a way to force field full path by configuration or query parameter (searched a lot but could not find a way to do this)

Thanks

@jbarata
Copy link

jbarata commented Mar 12, 2013

Hi again!
just to let you guys now that I managed to put it returning the correct results if I "promote" the types to be separated indexes and ignore the types altogether, i.e having one type per index.

Do you think there will be any performance problems with this approach?
Thanks

@kimchy
Copy link
Member

kimchy commented Mar 12, 2013

@jbarata don't think you will notice a performance difference, and yea, the reason why it happens is that ES needs to decide on what to search, either name or customer.name, and thats picked based on the first type match

@jbarata
Copy link

jbarata commented Mar 12, 2013

Ok @kimchy 👍

If I find any problems implementing this approach I'll report it here so it helps others in any way :)

Thanks a lot for your help.

@clintongormley
Copy link

I'm wondering if we should make the field "chooser" more predictable, eg choosing the field that starts with a type name before a field that doesn't, so customer.name should match type customer, field name before it matches type invoice, field customer.name.

@clintongormley
Copy link

Closed in favour of #4081

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants