Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign primitives #1398

Merged
merged 140 commits into from
Oct 13, 2021
Merged

Redesign primitives #1398

merged 140 commits into from
Oct 13, 2021

Conversation

tholor
Copy link
Member

@tholor tholor commented Sep 1, 2021

Proposed changes:

Tasks:

  • Move Answer to schema
  • Review Answer fields
  • Rename "question" to "query" everywhere
  • Review Label + Multilabel
  • Re-use labels in API instead of ExtractiveQAFeedback
  • Decide if to go for plain class, dataclass, BaseModel or TypedDict
  • Review Document
  • Fix REST API serialization
  • Adjust feedback endpoint in REST API
  • Make behavior of write_labels / indices consistent across doc stores
  • Update Game-of-thrones example docker image to include new doc structure

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

Breaking changes

There are many and we should probably do a "migration guide" when we release 1.0
Here are the central ones:

Document

  • Renamed Document.text -> Document.content
  • Removed Document.question (was only used a while ago in FAQ search cases)
  • Rename text_field -> content_field in ElasticsearchDocumentStore & Weaviate init
  • Remove faq_question_field in ElasticsearchDocumentStore & Weaviate init

Label

  • Rename Label.question -> Label.query
  • Make Label.answer an Answer obj rather than plain str
  • Remove Label.document_id (can now be accessed via Label.document.id)
  • Rename Label.model_id -> Label.pipeline_id
  • Remove Label.offset_start_in_doc (can now be accessed via label.answer.offsets_in_document[0].start`

Answer

The reader returns now an Answer object rather than a dict.
It follows this new structure:

Particularly the handling of offsets has changed to be more explicit and allow for other multiple spans (e.g. TableQA):

# Old -> New 

answer["offset_start_in_doc"] -> answer.offsets_in_document[0].start
answer["offset_end_in_doc"] -> answer.offsets_in_document[0].end

answer["offset_start"] -> answer.offset_in_context[0].start
answer["offset_end"] -> answer.offset_in_context[0].end

Future work

@tholor tholor marked this pull request as draft September 1, 2021 11:51
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
@tholor
Copy link
Member Author

tholor commented Sep 2, 2021

@lalitpagaria Thanks for your comments! This is still a super early draft. We will need more discussions and iterations in the next week to achieve a nice design. It's one of the few remaining things we want to get straight before the 1.0 release :)

@lalitpagaria
Copy link
Contributor

Oh great! so Haystack 1.0 is around 🎉

haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

haystack/pipeline.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused a bit more on small things this time because the diff is huge! I found nothing that I believe might be a bug, all comments are on details, style, and such. BTW on many of them I can help fixing if you need

haystack/document_store/faiss.py Show resolved Hide resolved
haystack/document_store/sql.py Show resolved Hide resolved
haystack/document_store/weaviate.py Outdated Show resolved Hide resolved
haystack/document_store/weaviate.py Show resolved Hide resolved
haystack/eval.py Show resolved Hide resolved
haystack/schema.py Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Show resolved Hide resolved
haystack/translator/transformers.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change type:refactor Not necessarily visible to the users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants