Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest: Support integer and long hex values in convert #32213

Merged
merged 5 commits into from
Jul 24, 2018

Conversation

rjernst
Copy link
Member

@rjernst rjernst commented Jul 19, 2018

This commit adds checks for hex formatted strings in the convert
processor, allowing strings like 0x1 to be parsed as integer 1.

closes #32182

This commit adds checks for hex formatted strings in the convert
processor, allowing strings like `0x1` to be parsed as integer `1`.

closes elastic#32182
@rjernst rjernst added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v7.0.0 v6.5.0 labels Jul 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@rjernst rjernst requested a review from talevy July 19, 2018 20:43
return Integer.parseInt(value.toString());
String str = value.toString();
if (str.startsWith("0x")) {
return Integer.decode(str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a test for when it is Integer.decode that throws the NumberFormatException?

same goes for Long.decode

@rjernst
Copy link
Member Author

rjernst commented Jul 19, 2018

@talevy I switched to always use decode, since that also handles decimal.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment.

@@ -42,7 +42,7 @@
@Override
public Object convert(Object value) {
try {
return Integer.parseInt(value.toString());
return Integer.decode(value.toString());
Copy link
Member

@jasontedor jasontedor Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about using Integer#decode, I think this is a silent dangerous breaking change. Today with Integer#parseInt 010 would be parsed as 10. With Integer#decode it would be parsed as an octal to 8!

Copy link
Member

@jasontedor jasontedor Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a test case that passes today and fails after this change:

diff --git a/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/ConvertProcessorTests.java b/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/ConvertProcessorTests.java
index 4a6ce21b2dc..2dece6d1b9a 100644
--- a/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/ConvertProcessorTests.java
+++ b/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/ConvertProcessorTests.java
@@ -59,6 +59,14 @@ public class ConvertProcessorTests extends ESTestCase {
         assertThat(ingestDocument.getFieldValue(fieldName, Integer.class), equalTo(randomInt));
     }
 
+    public void testConvertIntWithLeadingZero() throws Exception {
+        IngestDocument ingestDocument = RandomDocumentPicks.randomIngestDocument(random());
+        String fieldName = RandomDocumentPicks.addRandomField(random(), ingestDocument, "010");
+        Processor processor = new ConvertProcessor(randomAlphaOfLength(10), fieldName, fieldName, Type.INTEGER, false);
+        processor.execute(ingestDocument);
+        assertThat(ingestDocument.getFieldValue(fieldName, Integer.class), equalTo(10));
+    }
+
     public void testConvertIntList() throws Exception {
         IngestDocument ingestDocument = RandomDocumentPicks.randomIngestDocument(random());
         int numItems = randomIntBetween(1, 10);

Now this gives:

FAILURE 0.14s | ConvertProcessorTests.testConvertIntWithLeadingZero <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 
   > Expected: <10>
   >      but: was <8>
   >    at __randomizedtesting.SeedInfo.seed([535A28EEEA332364:F868934FEE583676]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.ingest.common.ConvertProcessorTests.testConvertIntWithLeadingZero(ConvertProcessorTests.java:67)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
   >    at java.base/java.lang.Thread.run(Thread.java:844)
  2> NOTE: leaving temporary files on disk at: /home/jason/src/elastic/elasticsearch/modules/ingest-common/build/testrun/test/J0/temp/org.elasticsearch.ingest.common.ConvertProcessorTests_535A28EEEA332364-001
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=616, maxMBSortInHeap=6.377649114725773, sim=RandomSimilarity(queryNorm=true): {}, locale=ckb-IQ, timezone=EST
  2> NOTE: Linux 4.17.5-200.fc28.x86_64 amd64/Oracle Corporation 10.0.1 (64-bit)/cpus=20,threads=1,free=492873192,total=536870912
  2> NOTE: All tests run in this JVM: [ConvertProcessorTests]
Completed [1/1] in 0.82s, 1 test, 1 failure <<< FAILURES!


> Task :modules:ingest-common:test FAILED
   [junit4] <JUnit4> says hello! Master seed: 535A28EEEA332364
==> Test Info: seed=535A28EEEA332364; jvm=1; suite=1
Tests with failures:
  - org.elasticsearch.ingest.common.ConvertProcessorTests.testConvertIntWithLeadingZero

   [junit4] JVM J0:     0.31 ..     1.71 =     1.39s
   [junit4] Execution time total: 1.72 sec.
   [junit4] Tests summary: 1 suite, 1 test, 1 failure

Copy link
Member Author

@rjernst rjernst Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the concern. Would you be ok with having a sysprop that controls this in 6.x (defaulting to the old behavior), and logging a deprecation warning in 6.x for the old behavior? While I understand the concern, having numbers zero padded is very rare IMO, so the likelihood this affects that many users seems low.

Alternatively, I can switch back to my first implementation and test for "0x", only passing to decode with that prefix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A system property, or alternatively a converter setting on the processor, is okay with me! The nice thing with the latter is we can introduce this with no behavior change today. I’m open to either approach so curious to hear your preference.

@rjernst
Copy link
Member Author

rjernst commented Jul 23, 2018

@talevy @jasontedor I switched back to explicitly checking for 0x.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Yet, would you add a test in this PR for the case I mentioned (010 -> 10) so that we don't have to worry about someone refactoring this in the future and inadvertently making the same break that we avoided here?

@rjernst
Copy link
Member Author

rjernst commented Jul 24, 2018

@jasontedor I added a test in 54c30ca

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rjernst rjernst merged commit 49d4b26 into elastic:master Jul 24, 2018
@rjernst rjernst deleted the ingest_hexparse branch July 24, 2018 19:05
dnhatn added a commit that referenced this pull request Jul 25, 2018
* master:
  Security: revert to old way of merging automata (#32254)
  Networking: Fix test leaking buffer (#32296)
  Undo a debugging change that snuck in during the field aliases merge.
  Painless: Update More Methods to New Naming Scheme (#32305)
  [TEST] Fix assumeFalse -> assumeTrue in SSLReloadIntegTests
  Ingest: Support integer and long hex values in convert (#32213)
  Introduce fips_mode setting and associated checks (#32326)
  Add V_6_3_3 version constant
  [DOCS] Removed extraneous callout number.
  Rest HL client: Add put license action (#32214)
  Add ERR to ranking evaluation documentation (#32314)
  Introduce Application Privileges with support for Kibana RBAC (#32309)
  Build: Shadow x-pack:protocol into x-pack:plugin:core (#32240)
  [Kerberos] Add Kerberos authentication support (#32263)
  [ML] Extract persistent task methods from MlMetadata (#32319)
  Add Restore Snapshot High Level REST API
  Register ERR metric with NamedXContentRegistry (#32320)
  fixes broken build for third-party-tests (#32315)
  Allow Integ Tests to run in a FIPS-140 JVM (#31989)
  [DOCS] Rollup Caps API incorrectly mentions GET Jobs API (#32280)
  awaitsfix testRandomClusterStateUpdates
  [TEST] add version skip to weighted_avg tests
  Consistent encoder names (#29492)
  Add WeightedAvg metric aggregation (#31037)
  Switch monitoring to new style Requests (#32255)
  Rename ranking evaluation `quality_level` to `metric_score` (#32168)
  Fix a test bug around nested aggregations and field aliases. (#32287)
  Add new permission for JDK11 to load JAAS libraries (#32132)
  Silence SSL reload test that fails on JDK 11
  [test] package pre-install java check (#32259)
  specify subdirs of lib, bin, modules in package (#32253)
  Switch x-pack:core to new style Requests (#32252)
  awaitsfix SSLConfigurationReloaderTests
  Painless: Clean up add methods in PainlessLookup (#32258)
  Fail shard if IndexShard#storeStats runs into an IOException (#32241)
  AwaitsFix RecoveryIT#testHistoryUUIDIsGenerated
  Remove unnecessary warning supressions (#32250)
  CCE when re-throwing "shard not available" exception in TransportShardMultiGetAction (#32185)
  Add new fields to monitoring template for Beats state (#32085)
rjernst added a commit that referenced this pull request Jul 25, 2018
This commit adds checks for hex formatted strings in the convert
processor, allowing strings like `0x1` to be parsed as integer `1`.

closes #32182
dnhatn added a commit that referenced this pull request Jul 27, 2018
* 6.x:
  Only enforce password hashing check if FIPS enabled (#32383)
  Introduce fips_mode setting and associated checks (#32326)
  [DOCS] Fix formatting error in Slack action
  Ingest: Support integer and long hex values in convert (#32213)
  Release pipelined request in netty server tests (#32368)
  Add opaque_id to index audit logging (#32260)
  Painless: Fix documentation links to use existing refs (#32335)
  Painless: Decouple PainlessLookupBuilder and Whitelists (#32346)
  [DOCS] Adds recommendation for xpack.security.enabled (#32345)
  [test] package pre-install java check (#32259)
  [DOCS] Adds link from bucket_span property to common time units
  [DOCS] Fixes typo in ML aggregations page
  [ML][DOCS] Add documentation for detector rules and filters (#32013)
  Bump the 6.x branch to 6.5.0 (#32361)
  fixes broken build repository-s3 for third-party-tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support converting hex strings to integers
5 participants