Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtendedStats with one document -> UnexpectedElasticsearchClientException #5007

Closed
meriturva opened this issue Sep 10, 2020 · 13 comments · Fixed by #5404
Closed

ExtendedStats with one document -> UnexpectedElasticsearchClientException #5007

meriturva opened this issue Sep 10, 2020 · 13 comments · Fixed by #5404
Assignees

Comments

@meriturva
Copy link

NEST/Elasticsearch.Net version:
7.9.0
Elasticsearch version:
7.9.1

Description of the problem including expected versus actual behavior:
Using ExtendedStats on a numeric value on one document generates always an exception:

    "message": "An error has occurred.",
    "exceptionMessage": "expected:'Number Token', actual:'\"NaN\"', at offset:179002",
    "exceptionType": "Elasticsearch.Net.UnexpectedElasticsearchClientException",

ps: I really don't know if the issue was present before, actually we working usually with a lot of documents, so this is the first time we see only one document after filtering our index and the first time in years that we see this exception.

Expected behavior
No exception but an empty result.

@meriturva meriturva added the Bug label Sep 10, 2020
@meriturva
Copy link
Author

Actually now I'm just filtering my results based on document count calculated on a pre query.
My first question is related to the issue...is it confirmed? any news from nest team? @russcam for example.

Secondly, how to avoid a pre query to filter data? is it possible to calculate ExtendedStats based on document count > 1 on the same query?

@russcam
Copy link
Contributor

russcam commented Sep 15, 2020

Hi @meriturva, this looks like a bug.

NaN is a peculiarity in JSON, which might be supported in a number of ways. The server is sending it as a string "NaN", but the internal client serializer will not accept a string, but would accept it as NaN. I think we'll need to address this in the client.

ps: I really don't know if the issue was present before,

I've not seen Elasticsearch return NaN before. This may have come in with the addition of standard deviation and variance sampling to extended stats in elastic/elasticsearch#49782

@meriturva
Copy link
Author

In meanwhile, any workaround to avoid double query (to filter aggregation without documents)?
I don't found any way to enable aggregation based on document count resulting in parent aggregation, so I have now to execute a more one query just to check document presents.

@meriturva
Copy link
Author

Hi @russcam I just would like to ask news about that issue or maybe a cool workaround to avoid double queries.
So thanks.

@meriturva
Copy link
Author

Just to know @russcam ...is it fixed on 7.10 version?

@russcam
Copy link
Contributor

russcam commented Nov 30, 2020

Hi @meriturva

there's no change in NEST for this yet. There's been some discussion about whether Elasticsearch should be returning NaN at all, given the ambiguity of representing it in JSON. No conclusion has been reached yet though.

@Mpdreamz, @stevejgordon perhaps we can check for "NaN" in the offending fields? I think it's the addition of the sampling fields in extended stats, but would need double checking.

@yBother2
Copy link

yBother2 commented Jan 6, 2021

Just updated from 7.6 to latest version: 7.10 and encountered the very similar problem
when making following call that worked before:

`
var searchRequest = new SearchRequest(Indices.Parse(ElasticsearchConstants.GetIndexNameFromBatchId(batchId)))
{
Size = 0, // We just need the aggregation data. Returned documents of the top level query are not required.
Query = new BoolQuery
{
Must = new List
{
new TermQuery { Field = ElasticsearchConstants.DataPointFieldBatchId, Value = batchId },
new TermQuery { Field = ElasticsearchConstants.DataPointFieldParameterId, Value = parameterId }
},
Filter = new List
{
new DateRangeQuery
{
Field = ElasticsearchConstants.DataPointFieldTimestamp,
GreaterThanOrEqualTo = startDateTime.ToString("O", CultureInfo.InvariantCulture),
LessThanOrEqualTo = endDateTime.ToString("O", CultureInfo.InvariantCulture)
}
}
},
Aggregations = new DateHistogramAggregation(ElasticsearchConstants.DataPointsHistogramAggregationKeyString)
{
Field = ElasticsearchConstants.DataPointFieldTimestamp,
FixedInterval = histogramTimeInterval,

                                                           Order = HistogramOrder.CountAscending,
                                                           Aggregations = new ExtendedStatsAggregation(ElasticsearchConstants.DataPointsHistogramStatsKeyString, ElasticsearchConstants.DataPointFieldValue),
                                                           MinimumDocumentCount = 1 // Just returns buckets which contains documents.
                                                       }
                                };

`

error message is pretty much the same:

Elasticsearch.Net.UnexpectedElasticsearchClientException: expected:'Number Token', actual:'"NaN"', at offset:493 ---> Elasticsearch.Net.Utf8Json.JsonParsingException: expected:'Number Token', actual:'"NaN"', at offset:493
at Elasticsearch.Net.Utf8Json.JsonReader.ReadDouble()
at Nest.AggregateFormatter.GetExtendedStatsAggregate(JsonReader& reader, IJsonFormatterResolver formatterResolver, StatsAggregate statsMetric, IReadOnlyDictionary2 meta) at Nest.AggregateFormatter.GetStatsAggregate(JsonReader& reader, IJsonFormatterResolver formatterResolver, IReadOnlyDictionary2 meta)
at Nest.AggregateFormatter.ReadAggregate(JsonReader& reader, IJsonFormatterResolver formatterResolver)
at Nest.AggregateFormatter.GetSubAggregates(JsonReader& reader, String name, IJsonFormatterResolver formatterResolver)
at Nest.AggregateFormatter.GetDateHistogramBucket(JsonReader& reader, IJsonFormatterResolver formatterResolver)
at Nest.AggregateFormatter.ReadBucket(JsonReader& reader, IJsonFormatterResolver formatterResolver)
at Nest.AggregateFormatter.GetMultiBucketAggregate(JsonReader& reader, IJsonFormatterResolver formatterResolver, ArraySegment1& propertyName, IReadOnlyDictionary2 meta)
at Nest.AggregateFormatter.ReadAggregate(JsonReader& reader, IJsonFormatterResolver formatterResolver)
at Nest.AggregateDictionaryFormatter.ReadAggregate(JsonReader& reader, IJsonFormatterResolver formatterResolver, String[] tokens, Dictionary`2 dictionary)
at Nest.AggregateDictionaryFormatter.Deserialize(JsonReader& reader, IJsonFormatterResolver formatterResolver)
at Deserialize(Object[] , JsonReader& , IJsonFormatterResolver )

@yBother2
Copy link

yBother2 commented Jan 6, 2021

Hi @meriturva

there's no change in NEST for this yet. There's been some discussion about whether Elasticsearch should be returning NaN at all, given the ambiguity of representing it in JSON. No conclusion has been reached yet though.

With ES 7.6 "NaN" has been returned - at aleast when index is set to ignore malformed values.
Not returning already stored "NaN" values is not acceptable.

@stevejgordon stevejgordon self-assigned this Jan 7, 2021
@meriturva
Copy link
Author

same issue un 7.11.1
any news @russcam here?

@jmisaxobank
Copy link

I am experiencing the very same problem (in my case it's on 7.10.0), and can confirm that the NaN comes from the computed "sampling" fields, not from my documents.

    "extended_stats#my_extendedstats_agg" : {
      "count" : 1,
      "min" : -269.0,
      "max" : -269.0,
      "avg" : -269.0,
      "sum" : -269.0,
      "sum_of_squares" : 72361.0,
      "variance" : 0.0,
      "variance_population" : 0.0,
      "variance_sampling" : "NaN",
      "std_deviation" : 0.0,
      "std_deviation_population" : 0.0,
      "std_deviation_sampling" : "NaN",
      "std_deviation_bounds" : {
        "upper" : -269.0,
        "lower" : -269.0,
        "upper_population" : -269.0,
        "lower_population" : -269.0,
        "upper_sampling" : "NaN",
        "lower_sampling" : "NaN"
      }
    }

I really just need to compute the standard deviation, don't even know what the "sampling" fields are used for.

@russcam
Copy link
Contributor

russcam commented Mar 15, 2021

I've opened #5404 to address

stevejgordon pushed a commit that referenced this issue Mar 15, 2021
github-actions bot pushed a commit that referenced this issue Mar 15, 2021
github-actions bot pushed a commit that referenced this issue Mar 15, 2021
@jmisaxobank
Copy link

Awesome, thanks!!!

stevejgordon pushed a commit that referenced this issue Mar 22, 2021
Fixes #5007

Co-authored-by: Russ Cam <russ.cam@elastic.co>
stevejgordon pushed a commit that referenced this issue Mar 22, 2021
Fixes #5007

Co-authored-by: Russ Cam <russ.cam@elastic.co>
@Vycka
Copy link

Vycka commented May 18, 2021

NEST/Elasticsearch.Net version:
7.12.1
Elasticsearch version:
7.10.1

Hello, FYI - same type of issue still exists, but me comes from different line of code in both master and 7.13:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants