-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dependency management #1979
Comments
@tholor @oryx1729 @julian-risch @tstadel @askainet @brandenchan @bogdankostic @lalitpagaria: Let me know what do you think about the dependency groups and if you have any opinions about the topic 🙂 |
I feel many groups may confuse the users. Small options might be good like Not related to this task but how about a CLI utility that takes pipeline YAML as input and lists or installs required dependencies to run that pipeline smoothly. |
I agree with @lalitpagaria and would reduce the groups a bit: minimal: basic Haystack on CPU with one single document store (inMemory maybe) Not sure how many dependencies we have for "preprocessing / conversion", but might also be an extra category "preprocessing". Thinking about which version is the "default", we might also consider something between "minimal" and "all". Maybe really just calling it internally Also: I think once we have the basic structure implemented it will be rather easy to extend the list of options here if we see the need. |
Thank you both for the feedback! I'm ok reducing the groups of course. I like @tholor's list except for the demo related deps: I think many people would like to use the REST API with their own frontend, so I'd rather keep Proposed list:
That makes for 15(16) categories. Many indeed, but most users will not need to know about them anyway. Note also that this syntax |
Hello, It would make sense to me that one of the optional dependencies would be transformers and torch, as one could want to use their own code for testing certain embeddings with incompatible versions of torch and still respect the fact you need to give numpy arrays to the document store query for instance. |
Unfortunately I don't think it is an option. This doesn't have to do with your idea (in principle it's not bad), but with the way pip and Python's dependency management works right now. The issue here is that, currently, there is no way to specify "opt-out" dependencies in a If by any chance you know a good way to implement opt-out dependencies, I'll be glad to learn about it! Unfortunately after a few days of research I came back empty-handed (see pypa/setuptools#1503, pypa/setuptools#1139). I even experimented with custom |
The current handling of dependencies is quite monolithic: users must install them all regardless of the subset of features they want to use. We should make Haystack more modular at install time.
Options
Nowadays there are several ways to properly handle dependency groups:
requirement.txt
files: quite old fashioned by now and a bit harder to manageextras_require
insetup.py
: "traditional" way, safe and widely usedpyproject.toml
: the new way, as recommended by PEP517 and PEP660.Proposed dependency groups
minimal
: basic Haystack on CPU with one single document store (inMemory maybe)gpu
: for running Haystack on GPUrest
: install also the REST server API depsui
: install Streamlit depsdemo
:rest
+ui
ci
: for GitHub runnerswin
: for Windows installs (if possible)colab
: to workaround Colab specific issues when necessaryall_doc_stores
: install all possible dependency from document storestest
for the test dependenciesdocs
: for building documentationcode
: black, linter and possible extra tools if/when we introduce themall
(ordev
): complete dependency list for development and contributing. Includes all of the above.We can also consider adding smaller groups for special components with exotic dependencies, like
crawler
,ocr
, etc.Default install
It's up to debate what the default install (
pip install haystack
) should look like.The important point is that the dependencies that are installed in this case must be marked as mandatory. This at least is the case for
extras_require
insetup.py
, and might have changed inpyproject.toml
. If it's the case, the default install should be effectively a minimal install. For example, if we include GPU deps in this group, they will become mandatory, and having a pure CPU install will be impossible.I will investigate the options and update this section with new information.
Related issues
Related to #1291, #1716, #1826, #1806
Closes #1070
Next steps
pyproject.toml
and whether all of our dependencies can actually work with it. As of last year that were still some issues with large libraries that needed complex build steps.The text was updated successfully, but these errors were encountered: