-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FARMClassifier node for Document Classification #1265
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool addition! I can see many use cases at query - but also at indexing time. Left a few minor comments.
haystack/classifier/base.py
Outdated
return wrapper | ||
|
||
def print_time(self): | ||
print("Ranker (Speed)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left overs from "Ranker copy"
haystack/classifier/base.py
Outdated
return_preds: bool = False, | ||
) -> dict: | ||
""" | ||
Performs evaluation of the Ranker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the eval
method from the base class if we don't have it implemented in any child class
p = Pipeline() | ||
p.add_node(component=retriever, name="ESRetriever", inputs=["Query"]) | ||
p.add_node(component=classifier, name="Classifier", inputs=["ESRetriever"]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it actually also work to use this node in an indexing pipeline? I mean something like FileConverter->Preprocessor->Classifier->DocStore
So we would basically append meta data to the docs at indexing time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have to be part of this PR if it requires bigger changes, but maybe you can document what's missing for that use case and create a separate issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created an issue here: #1281
|
||
- Take a plain language model (e.g. `bert-base-cased`) and train it for TextClassification | ||
- Take a TextClassification model and fine-tune it for your domain | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some info about the expected format of the train file (csv, what columns ...)
haystack/classifier/farm.py
Outdated
dev_split=dev_split, | ||
test_filename=test_filename, | ||
data_dir=Path(data_dir), | ||
delimiter="\t" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also add the delimiter
as an option to the init?
haystack/classifier/farm.py
Outdated
""" | ||
Use loaded classification model to classify the supplied list of Document. | ||
|
||
Returns list of Document enriched with classification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add here the info where the classification result is stored (Document.meta["classification"])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Proposed changes:
meta
field**Status **: