Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest-attachment module tika dependency versions #93755

Merged
merged 4 commits into from
Feb 13, 2023

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Feb 13, 2023

In the course of #93608, I discovered that we're relying on some outdated tika dependencies, and also @rjernst noticed that we're using global dependency versions in places that should really be local to the module. This PR is meant to address exactly and only those two issues.

I've version bumped a variety of dependencies based on interrogating the binary tika app jar and asking it what it thinks it should rely on:

joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep poi
  - Apache POI - Common (https://poi.apache.org/) org.apache.poi:poi:jar:5.2.3
  - Apache POI - API based on OPC and OOXML schemas (https://poi.apache.org/) org.apache.poi:poi-ooxml:jar:5.2.3
  - Apache POI (https://poi.apache.org/) org.apache.poi:poi-ooxml-lite:jar:5.2.3
  - Apache POI (https://poi.apache.org/) org.apache.poi:poi-scratchpad:jar:5.2.3
joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep slf4j
  - JCL 1.2 implemented over SLF4J (http://www.slf4j.org) org.slf4j:jcl-over-slf4j:jar:2.0.3
  - SLF4J API Module (http://www.slf4j.org) org.slf4j:slf4j-api:jar:2.0.3
  - Apache Log4j SLF4J 2.0 Binding (https://logging.apache.org/log4j/2.x/log4j-slf4j2-impl/) org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.19.0
joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep commons-logging
  - Apache Commons Logging (http://commons.apache.org/proper/commons-logging/) commons-logging:commons-logging:jar:1.2
joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep xz
  - XZ for Java (https://tukaani.org/xz/java.html) org.tukaani:xz:jar:1.9
joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep commons-codec
  - Apache Commons Codec (https://commons.apache.org/proper/commons-codec/) commons-codec:commons-codec:jar:1.15
joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep xmlbeans
From: 'XmlBeans' (https://xmlbeans.apache.org/)
  - XmlBeans (https://xmlbeans.apache.org/) org.apache.xmlbeans:xmlbeans:jar:5.1.1
joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep commons-collections
  - Apache Commons Collections (https://commons.apache.org/proper/commons-collections/) org.apache.commons:commons-collections4:jar:4.4
joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep commons-compress
  - Apache Commons Compress (https://commons.apache.org/proper/commons-compress/) org.apache.commons:commons-compress:jar:1.22
joegallo@galactic:~/Desktop/tika versions/2.6.0 $ cat META-INF/DEPENDENCIES | grep commons-lang
  - Apache Commons Lang (https://commons.apache.org/proper/commons-lang/) org.apache.commons:commons-lang3:jar:3.12.0

With the net effect in terms of difference of the contents of the modules/ingest-attachment/ directory in an untarred Elasticsearch distribution being:

 SparseBitSet-1.2.jar
 apache-mime4j-core-0.8.5.jar
 apache-mime4j-dom-0.8.5.jar
 commons-codec-1.15.jar
-commons-collections4-4.1.jar
-commons-compress-1.21.jar
+commons-collections4-4.4.jar
+commons-compress-1.22.jar
 commons-io-2.11.0.jar
-commons-lang3-3.9.jar
+commons-lang3-3.12.0.jar
 commons-logging-1.2.jar
 commons-math3-3.6.1.jar
 fontbox-2.0.27.jar
@@ -16,11 +16,11 @@
 pdfbox-2.0.27.jar
 plugin-descriptor.properties
 plugin-security.policy
-poi-5.2.2.jar
-poi-ooxml-5.2.2.jar
-poi-ooxml-lite-5.2.2.jar
-poi-scratchpad-5.2.2.jar
-slf4j-api-1.6.2.jar
+poi-5.2.3.jar
+poi-ooxml-5.2.3.jar
+poi-ooxml-lite-5.2.3.jar
+poi-scratchpad-5.2.3.jar
+slf4j-api-2.0.3.jar
 tagsoup-1.2.1.jar
 tika-core-2.6.0.jar
 tika-langdetect-tika-2.6.0.jar
@@ -33,5 +33,5 @@
 tika-parser-xml-module-2.6.0.jar
 tika-parser-xmp-commons-2.6.0.jar
 tika-parser-zip-commons-2.6.0.jar
-xmlbeans-5.0.3.jar
-xz-1.8.jar
+xmlbeans-5.1.1.jar
+xz-1.9.jar

Of these changes, most seem non-notable, with the exception of slf4j-api which is being bumped from 1.6.2 to 2.0.3.

This commit doesn't change any dependencies, it only just changes
where those dependency versions are coming from.
@joegallo joegallo added >bug WIP :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v8.7.0 v8.8.0 labels Feb 13, 2023
@joegallo joegallo marked this pull request as ready for review February 13, 2023 19:56
@joegallo joegallo removed the WIP label Feb 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joegallo joegallo merged commit e907d89 into elastic:main Feb 13, 2023
@joegallo joegallo deleted the ingest-attachment-tika-version-260 branch February 13, 2023 20:41
carlosdelest pushed a commit to carlosdelest/elasticsearch that referenced this pull request Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v8.7.0 v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants