Add support for 64-bit unsigned integers #32434

jpountz · 2018-07-27T13:50:58Z

As a follow-up to discussions in #17006, we agreed in FixitFriday to add support for 64-bit unsigned integers. These numbers will support accurate term and range queries, but aggregations will work on the nearest double value.

elasticmachine · 2018-07-27T13:50:59Z

Pinging @elastic/es-search-aggs

rw-access · 2019-12-12T18:55:09Z

Just adding this note here for reference.

We ran into a use case for this in elastic/ecs#673. The Windows registry only has a concept of unsigned numbers, so we saw two paths to take for numbers greater than 2⁶³ for u64. We either cast to s64, which is hard to enforce and too easy to do wrong. The other option was to only store the string representation. Since we have little need for math, we opted for a keyword string field in the interim.

marshallmain · 2020-01-13T20:00:32Z

Following up on @rw-access comment, there are a fair number of data fields coming from the Elastic Endpoint that we would like to store as u64 if possible. Is this issue being actively worked on?

ktetzlaff · 2020-05-22T02:57:42Z

Same problem here during processing of network performance data from routers as part of a network management system. This data contains counters (e.g. received octets) which are 64-bit unsigned values. It would be great if we could store them properly.

This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - sorting and aggregations is based on conversion of long values to double and can be imprecise for large values. Closes elastic#32434

mayya-sharipova · 2020-07-31T16:32:40Z

We, within the search team, had discussion about the following points:

I. How to represent unsigned long internally, in json, and the elasticsearch clients for:

Queries
Sort values
Aggs
Scripts

Several options here are:

Long for queries, sort values and scripts; Double for aggs
- easy to handle
- as values > Long.MAX_VALUE are presented as negative, not a nice experience for users especially when unsigned long is used for IDs.
BigInteger for query field values, sort values, scripts; Double for aggs
- difficult to handle internally
- some elasticsearch clients may not support them (need to confirm with the lang-client team)
String
- easy to handle
- easy to support
- no support for numeric metrics aggs (do we need them on unsigned long field?)

II. Should we accept mixing longs and unsigned longs when sorting on a field?
- Do we need this mixed sorting on longs and unsigned longs?

robcowart · 2020-08-04T19:31:44Z

Option 3 is a non-starter for any use-cases that I have worked on. The most common scenario that I see is related to metrics which are counters. Besides the network equipment mentioned by @ktetzlaff, a significant amount of server and application monitoring data are unsigned 64-bit counters. For example, this is the package that Metricbeat uses to gather disk I/O stats (https://github.com/shirou/gopsutil/blob/master/disk/disk.go#L14) and as you can see there are a number of uint64 values. Beats actually sends these to Elasticsearch as a long, which is technically wrong, but to date simply unavoidable. So this feature is needed to "fix" many Beats modules as well.

Given that counters will make up the majority of unsigned values, it is important (mandatory really) that at least the derivative aggregation work well and accurately. Whether that is best achieved with option 1 or 2... I would say that 2 is probably preferred. However if 1 is the only way to get the necessary support, and it allows for derivative aggregations, I could live with it.

Dumbaz · 2020-08-05T11:28:48Z

I am in the same Situation as @robcowart : Option 3 would not achieve anything for me in my usecases. The main reason I want 64 bit integers is because sometimes metric values are greater than the maximum size the current data types offer (mostly long running tasks, counted in milliseconds).

jimczi · 2020-08-06T11:19:52Z

I think we should eliminate option 1 from the picture. It's confusing to rely on a convention that requires to translate a signed long. Option 2 is more natural since we already have an abstraction for numerics in Scripts and sort values. They return a Number and the user can choose the resolution that he needs when consuming the value. Unsigned longs could for instance return a BigInteger for values greater than Long.MAX_VALUE.
@elastic/es-clients would you be able to comment on the proposal ? The json specification accepts numbers without restrictions so we were wondering if we need to do anything special to ease the usage of these 64 bits numbers in clients ?

mayya-sharipova · 2020-08-06T14:50:14Z

Thanks for comments everyone, very valuable.

@jimczi I have chatted with the es-clients team, all the clients (php, perl, ruby, python, .NET) support BigInteger except JavaScript. Also for some clients with 32bit platform (which should be quite outdated) don't support BigInteger, e.g. php, ruby.

For JavaScript, the language does support BigInts, but the native json parser does not. But this should not be a blocker for us, the es-clients team are willing to write a custom json parser to handle this case.

One thing is left is to confirm with Kibana team if they can handle 64 bit integers.

Update:
I have chatted with the Kibana team, and Kibana has problems with handling big integers (relevant issue: elastic/kibana#40183), and this is not going to be fixed any time soon.
But considering that currently Kibana also has issues with large longs, and citing @timroes "I don't think adding a new type to the list that will cause exactly the same issues will be a problem", I guess we can still proceed with our option 2.
@jimczi what do you think?

Updated option 2:

Sort: return long for values < 2^63, BigInteger for values >= 2^63
Scrips: return long for values < 2^63, BigInteger for values >= 2^63
Aggregations:
- terms agg: return long or BigInteger for buckets (if this field is used for "ids", it is valuable to have precise bucket keys)
- numeric metric aggs: return doubles

Mpdreamz · 2020-08-11T13:42:57Z

Just to confirm: are we planning a maximum of 64bits unsigned long or any arbitrary number BigInteger? Implications to changes are the same but in terms of (de)serialization performance its good to know what to optimize for.

There are some potential client breaking changes.

Javascript client: will need a dedicated generic serializer which is going to be slower for everything. The workaround here is that the default WON'T support ulong/bigints OOTB but the client can be injected with a serializer that supports it either globally or per request.

Java HLRC: The java HLRC relies heavily on boxing, e.g the to and from on RangeQueryBuilder is an Object so is future proof 👍 (at the cost of boxing). Not sure if this is is the case everywhere the change is intended to be supported.

.NET NEST client: In most cases we have dedicated types and interface e.g the range query is backed by an interface IRangeQuery with implementations like LongRangeQuery, DateRangeQuery that strongly type the to/from parameters. We can extend this with e.g a BigIntegerRangeQuery quite easily. In some cases (especially responses) that are now capped as long or double changing that to BigInteger or BigDecimal is technically a breaking change.

I suggest we proceed with option 2 and:

try to solve any breaking changes as best as we can.
Accept a technical breaking change that in practice is hard to hit (e.g long implicitly converting to BigInteger and back might be good enough?).
If all else fails document the limitations of a particular client and set it up for success when 8.0 comes around.

ezimuel · 2020-08-12T09:07:45Z

PHP client: it supports 64-bit integer but not unsigned. We can support BigInteger using the GMP extensions of PHP (not default in the language).
Perl: it supports 64-bit by default and BigInteger using bigint.

Because some clients may need to support 64-bit and BigInteger in different ways, I suggest to provide some facilitation, for instance:

add a bigint_as_string=1 in the query string (0 by default) to return 64-bit and BigInteger as string; this will prevent JSON decoder to throw errors if numeric > 32-bit.
add a numeric_type field in the response that will alert the client for using 64 (bit) or big (as BigInteger). This alert will not have 32 (bit) value, as default. This alert will be triggered if at least one numeric value in the response is 64 or big.

delvedor · 2020-08-12T10:19:23Z

I've been thinking a lot about this, and I fear that sending numbers above the interop range for json could cause problems to some implementations.

From the json spec:

   This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
   generally available and widely used, good interoperability can be
   achieved by implementations that expect no more precision or range
   than these provide, in the sense that implementations will
   approximate JSON numbers within the expected precision.  A JSON
   number such as 1E400 or 3.141592653589793238462643383279 may indicate
   potential interoperability problems, since it suggests that the
   software that created it expects receiving software to have greater
   capabilities for numeric magnitude and precision than is widely
   available.

   Note that when such software is used, numbers that are integers and
   are in the range [-(2**53)+1, (2**53)-1] are interoperable in the
   sense that implementations will agree exactly on their numeric
   values.

For instance, if JavaScript will get a value outside that range, it cannot longer guarantee arithmetic precision (MDN docs).
Furthermore, if JS will try to parse a JSON with a bigint value, it will round it without warn the user, causing a lot of pain to debug it.

Given that a custom serializer will introduce significant overhead, and it should be a user choice accept it or not, I agree with @ezimuel that introducing a query parameter such as bigint_as_string that will instruct Elasticsearch on how those values are being sent and read could be a good tradeoff.
By doing so, we can guarantee interoperability and still support all the use cases that need bigint as numeric values.

The JS client could use this query parameter as follows:
Send it by default to all requests, so we are sure not to lose precision and let the user decide how to handle the bigint values, but also give the users the ability to disable this default option and use a custom serializer to handle Bigint automatically. (this cannot be a default for the JS client for two reasons, the json bigint parser is significantly slower and the number and bigint types are not compatible, so the user must know what they are doing).

Unfortunately, this is not a JS problem only, this issue will happen also with other languages. For example if you use jq, any value above 1.79E+308 will be rounded to 1.79E+308.

$ echo '{"n":1.7976931348623157e+310}' | jq .

{
  "n": 1.7976931348623157e+308
}

By supporting something like bigint_as_string, we delegate to the user this choice, which they can take based on the platform they need to support.

mayya-sharipova · 2020-08-13T17:28:26Z

@ezimuel @delvedor Thanks a lot for comments and explanation. We've discussed this and decided that we should find a systematic way for the REST layer to handle big integers (also big longs), and this should not be a part of this Issue/PR. I have created a separate issue for this and put your ideas there.

For 64-bit unsigned integers, we have decided to proceed with our current plan to return Long/BigInteger.

This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - precise sort and terms aggregations - other aggregations are based on conversion of long values to double and can be imprecise for large values. Closes #32434

This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - precise sort and terms aggregations - other aggregations are based on conversion of long values to double and can be imprecise for large values. Backport for elastic#60050 Closes elastic#32434

Introduce 64-bit unsigned long field type This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - precise sort and terms aggregations - other aggregations are based on conversion of long values to double and can be imprecise for large values. Backport for #60050 Closes #32434

jpountz added >feature help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types labels Jul 27, 2018

jpountz mentioned this issue Jul 27, 2018

Support for BigInteger and BigDecimal #17006

Closed

giladgal mentioned this issue Oct 1, 2019

Add BigDecimal data type #46934

Open

rjernst added the Team:Search Meta label for search team label May 4, 2020

mayya-sharipova removed the help wanted adoptme label Jul 6, 2020

mayya-sharipova self-assigned this Jul 6, 2020

mayya-sharipova mentioned this issue Jul 22, 2020

Introduce 64-bit unsigned long field type #60050

Merged

urso mentioned this issue Jul 23, 2020

Reenable uint64 support elastic/beats#20188

Closed

ebeahan mentioned this issue Jul 29, 2020

64-bit unsigned integer field type usage elastic/ecs#900

Closed

mayya-sharipova mentioned this issue Aug 13, 2020

Handling big longs/ integers by REST layer #61109

Closed

mayya-sharipova closed this as completed in #60050 Sep 23, 2020

mayya-sharipova mentioned this issue Sep 24, 2020

Unsigned64bits integer7.x #62892

Merged

urso mentioned this issue Oct 21, 2020

Uint64 field types elastic/package-spec#67

Open

gregnazario mentioned this issue Oct 21, 2020

[logging] Prevent high value u64s from being logged diem/diem#6564

Open

Mpdreamz mentioned this issue Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this issue Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for 64-bit unsigned integers #32434

Add support for 64-bit unsigned integers #32434

jpountz commented Jul 27, 2018

elasticmachine commented Jul 27, 2018

rw-access commented Dec 12, 2019

marshallmain commented Jan 13, 2020

ktetzlaff commented May 22, 2020

mayya-sharipova commented Jul 31, 2020

robcowart commented Aug 4, 2020

Dumbaz commented Aug 5, 2020

jimczi commented Aug 6, 2020

mayya-sharipova commented Aug 6, 2020 •

edited by erioni

Loading

Mpdreamz commented Aug 11, 2020

ezimuel commented Aug 12, 2020

delvedor commented Aug 12, 2020

mayya-sharipova commented Aug 13, 2020 •

edited

Loading

Add support for 64-bit unsigned integers #32434

Add support for 64-bit unsigned integers #32434

Comments

jpountz commented Jul 27, 2018

elasticmachine commented Jul 27, 2018

rw-access commented Dec 12, 2019

marshallmain commented Jan 13, 2020

ktetzlaff commented May 22, 2020

mayya-sharipova commented Jul 31, 2020

robcowart commented Aug 4, 2020

Dumbaz commented Aug 5, 2020

jimczi commented Aug 6, 2020

mayya-sharipova commented Aug 6, 2020 • edited by erioni Loading

Mpdreamz commented Aug 11, 2020

ezimuel commented Aug 12, 2020

delvedor commented Aug 12, 2020

mayya-sharipova commented Aug 13, 2020 • edited Loading

mayya-sharipova commented Aug 6, 2020 •

edited by erioni

Loading

mayya-sharipova commented Aug 13, 2020 •

edited

Loading