Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: update dependencies and remove pins #3147

Merged
merged 9 commits into from
Sep 5, 2022

Conversation

danielbichuetti
Copy link
Contributor

Related Issues

Proposed Changes:

After a recent issue opened on the repository I decided to take a look at the dependencies that are pinned. After discovering the reasons why each one has been pinned, I started looking over the dependency chain and dependency releases. Indeed, some contain bug fixes, better memory management, performance modifications, and some refactored code.

Containers versions used in samples and tests have been looked over for newer versions, even if this doesn't affect haystack usage, some users may erroneously think that the outdated versions are a need. Despise been not.

So, I updated all possible versions that will fit in the whole project dependency chain.

How did you test it?

Repository tests have been run here: https://github.com/danielbichuetti/haystack/actions/runs/2988323504

Notes for the reviewer

Checklist

@danielbichuetti danielbichuetti requested review from a team as code owners September 4, 2022 13:51
@danielbichuetti danielbichuetti requested review from ZanSara and removed request for a team September 4, 2022 13:51
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Thanks for another important change like this 😊 However there are some things to fix.

  1. Please revert the doc changes. It's mostly indentation, and in some places it looks really odd. I'd prefer to keep the old version.

  2. Some of the removed pins are a bit dangerous, so let's dig into how you tested them. I'll do my tests separately to verify it's all good before approving.

  3. I've seen you upgraded some containers, but the upgrades don't match well across the tutorials, the tests and Haystack utils. Let's make sure all versions are all in sync everywhere.

I've added some more specific questions in the comments.

pyproject.toml Outdated Show resolved Hide resolved
@@ -62,9 +62,9 @@ dependencies = [
"mmh3", # fast hashing function (murmurhash3)
"quantulum3", # quantities extraction from text
"posthog", # telemetry
"azure-ai-formrecognizer==3.2.0b2", # forms reader
"azure-ai-formrecognizer>=3.2.0b2", # forms reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, have you actually tried to install a later version of this package and run tests and tutorials with that? I have a vague memory that we pinned this beta release for a good reason, even though I can't recall why right now. @bogdankostic do you know more about this?

Copy link
Contributor Author

@danielbichuetti danielbichuetti Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZanSara When the code has been last updated, the azure-ai-formrecognizer latest version available was b2. The other versions would not have the DocumentAnalysisClient available.

from azure.ai.formrecognizer import DocumentAnalysisClient, AnalyzeResult

pip would not force a beta over a GA. So, without the pin, 3.1.2 was being chosen. The new versions all have DocumentAnalysisClient available and some improvements (support for new API version). Despise MS released latest as GA, pip has yet the b6 tag there.

# audio's espnet-model-zoo requires huggingface-hub version <0.8 while we need >=0.5 to be able to use create_repo in FARMReader
"huggingface-hub<0.8.0,>=0.5.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you test this, specifically? Did the audio nodes install and work properly once you installed without this pin?

Copy link
Contributor Author

@danielbichuetti danielbichuetti Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audio tests are had been run manually. Furthermore, going to the espnet-model-zoo, you will see that there is no need for lower hub versions anymore. Isn't there on CI any audio tests also? I don't remember.

Anyway, I have run then locally.

test/conftest.py Outdated Show resolved Hide resolved
@masci
Copy link
Contributor

masci commented Sep 5, 2022

Please revert the doc changes. It's mostly indentation, and in some places it looks really odd. I'd prefer to keep the old version.

This comes along with the latest versions of pydoc-markdown, it's something we need to live with eventually.

@danielbichuetti
Copy link
Contributor Author

danielbichuetti commented Sep 5, 2022

@ZanSara The documentation update is the new pydoc-markdown standard, it wasn't done by me (directly). It's intention, I suppose, is to make documentation easier to read as we will get one parameter per line. User can see easily know how the methods are called.

@ZanSara
Copy link
Contributor

ZanSara commented Sep 5, 2022

@masci @danielbichuetti thank you for the clarification on the docs format change. I didn't know about it. Let's keep it then! 👍

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary for future reference:

  • Tika: upgraded to 1.28.4
  • Opensearch: upgraded to 1.3.5
  • Weaviate: upgrade leftover from 1.11 to 1.14

Looks good to me! 👍 As soon as the CI completes this can be merged.

@ZanSara ZanSara merged commit e1f3992 into deepset-ai:main Sep 5, 2022
@danielbichuetti danielbichuetti deleted the remove_pins branch September 5, 2022 13:02
@nickchomey
Copy link
Contributor

nickchomey commented Sep 5, 2022

@danielbichuetti can I ask why Tika wasn't changed to 2.x/latest? If there's an incompatibility, perhaps a note can be left somewhere in the code or documentation to prevent changing it? Or maybe an issue opened to invite/track work to make Haystack compatible with it? Surely there's a massive difference between 1.28 and 2.x

@danielbichuetti
Copy link
Contributor Author

For further notes regarding Tika 2.x update:

  • The client module, tika-python, doesn't support Tika 2.x. First, it was because there were security concerns on the Tika implementation, Tika then updated, but the module has not been updated yet.

Here is the issue that tracks the update on the module repo:

chrismattmann/tika-python#359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pinned outdated dependencies
5 participants