-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign primitives #1398
Redesign primitives #1398
Conversation
@lalitpagaria Thanks for your comments! This is still a super early draft. We will need more discussions and iterations in the next week to achieve a nice design. It's one of the few remaining things we want to get straight before the 1.0 release :) |
Oh great! so Haystack 1.0 is around 🎉 |
…content_field in docstores. update tutorials for content field
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I focused a bit more on small things this time because the diff is huge! I found nothing that I believe might be a bug, all comments are on details, style, and such. BTW on many of them I can help fixing if you need
Proposed changes:
Tasks:
Answer
to schemaAnswer
fieldsLabel
+Multilabel
Document
Status (please check what you already did):
Breaking changes
There are many and we should probably do a "migration guide" when we release 1.0
Here are the central ones:
Document
Document.text
->Document.content
Document.question
(was only used a while ago in FAQ search cases)text_field
->content_field
in ElasticsearchDocumentStore & Weaviate initfaq_question_field
in ElasticsearchDocumentStore & Weaviate initLabel
Label.question
->Label.query
Label.answer
anAnswer
obj rather than plain strLabel.document_id
(can now be accessed viaLabel.document.id
)Label.model_id
->Label.pipeline_id
Label.offset_start_in_doc
(can now be accessed via label.answer.offsets_in_document[0].start`Answer
The reader returns now an
Answer
object rather than a dict.It follows this new structure:
Particularly the handling of offsets has changed to be more explicit and allow for other multiple spans (e.g. TableQA):
Future work
Answer
primitive #1582 Fix autocomplete forAnswer
(dataclasses-json seems to break it -> probably switch to manual serialization)Document
s instead of DictsDocument
s instead of Dicts from PreProcessorAnswer
index
to group Labels and Documents more closely #1563)