Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: MultiModalRetriever #2891

Merged
merged 164 commits into from
Oct 17, 2022
Merged

feat: MultiModalRetriever #2891

merged 164 commits into from
Oct 17, 2022

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Jul 27, 2022

Related Issue(s):

Proposed changes:

  • Create a multi modal retriever by generalizing the concepts introduced by TableTextRetriever
  • It introduces a stack of new subclasses to support such retriever, such as MultiModalEmbedder)
  • Note that this Retriever will NOT be tested for working in pipelines, but only to work in isolation. It will also, most likely, stay undocumented. See Add support for images #2418 for the rationale.

Additional context:

  • As mentioned in the original issue, an attempt to generalize TableTextRetriever quickly proved too complex for the scope of this PR.
  • Rather than modifying an existing Retriever with the risk of breaking working code, I opted for cloning the class and its stack of supporting classes and perform the changes needed to support N models rather than just 3.
  • A later goal is to be able to perform table retrieval with MultiModalRetriever and use its stack to dispose of TriAdaptiveModel, BiAdaptiveModel and maybe AdaptiveModel itself, along with their respective helpers (custom predictive heads, custom processors, etc).

Additional changes:

  • Soon I realized that with image support we need to generalize the concept of tokenizer. So I renamed haystack/modeling/models/tokenization.py -> haystack/modeling/models/feature_extraction.py, created a class called FeatureExtractor and used it as a uniform interface over AutoTokenizer and AutoFeatureExtractor

Pre-flight checklist

  • I have read the contributors guidelines
  • If this is a code change, I added tests or updated existing ones
  • If this is a code change, I updated the docstrings

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ZanSara ZanSara mentioned this pull request Jul 27, 2022
8 tasks
haystack/modeling/model/feature_extraction.py Outdated Show resolved Hide resolved
haystack/modeling/model/multimodal/transformers.py Outdated Show resolved Hide resolved
haystack/modeling/model/multimodal/base.py Show resolved Hide resolved
haystack/nodes/retriever/multimodal/embedder.py Outdated Show resolved Hide resolved
haystack/document_stores/memory.py Show resolved Hide resolved
@vblagoje vblagoje self-requested a review October 14, 2022 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants