Add FARMClassifier node for Document Classification #1265

julian-risch · 2021-07-09T08:53:19Z

Proposed changes:

Add FARM classification node that enriches a List of Documents with class probabilities in meta field

**Status **:

First draft (up for discussions & feedback)
Final code
Added tests
Implement evaluation (or remove methods copied from Ranker for now)
Updated documentation

tholor

Cool addition! I can see many use cases at query - but also at indexing time. Left a few minor comments.

tholor · 2021-07-13T14:50:03Z

haystack/classifier/base.py

+        return wrapper
+
+    def print_time(self):
+        print("Ranker (Speed)")


Left overs from "Ranker copy"

tholor · 2021-07-13T14:51:13Z

haystack/classifier/base.py

+        return_preds: bool = False,
+    ) -> dict:
+        """
+        Performs evaluation of the Ranker.


Let's remove the eval method from the base class if we don't have it implemented in any child class

tholor · 2021-07-13T14:52:37Z

haystack/classifier/farm.py

+    p = Pipeline()
+    p.add_node(component=retriever, name="ESRetriever", inputs=["Query"])
+    p.add_node(component=classifier, name="Classifier", inputs=["ESRetriever"])
+


Would it actually also work to use this node in an indexing pipeline? I mean something like FileConverter->Preprocessor->Classifier->DocStore

So we would basically append meta data to the docs at indexing time...

Doesn't have to be part of this PR if it requires bigger changes, but maybe you can document what's missing for that use case and create a separate issue

created an issue here: #1281

tholor · 2021-07-13T14:54:53Z

haystack/classifier/farm.py

+
+        - Take a plain language model (e.g. `bert-base-cased`) and train it for TextClassification
+        - Take a TextClassification model and fine-tune it for your domain
+


Please add some info about the expected format of the train file (csv, what columns ...)

haystack/classifier/farm.py

tholor · 2021-07-13T14:56:05Z

haystack/classifier/farm.py

+            dev_split=dev_split,
+            test_filename=test_filename,
+            data_dir=Path(data_dir),
+            delimiter="\t"


Maybe we should also add the delimiter as an option to the init?

tholor · 2021-07-13T14:57:37Z

haystack/classifier/farm.py

+        """
+        Use loaded classification model to classify the supplied list of Document.
+
+        Returns list of Document enriched with classification.


Please add here the info where the classification result is stored (Document.meta["classification"])

tholor

LGTM

julian-risch added 5 commits July 9, 2021 10:50

Add FARM classification node

0f0b271

Add classification output to meta field of document

c42c9e3

Update usage example

13b9e85

Add test case for FARMClassifier

8020ba2

Replace FARMRanker with FARMClassifier in documentation strings

72badb0

julian-risch changed the title ~~WIP: Add FARM classification node~~ Add FARMClassifier node for Document Classification Jul 9, 2021

julian-risch marked this pull request as ready for review July 9, 2021 15:26

tholor requested changes Jul 13, 2021

View reviewed changes

Remove base method not implemented by any child class, etc.

decfc96

julian-risch mentioned this pull request Jul 13, 2021

Indexing Pipeline with Document Classifier #1281

Closed

julian-risch requested a review from tholor July 13, 2021 17:55

tholor approved these changes Jul 13, 2021

View reviewed changes

julian-risch merged commit 4e6f7f3 into master Jul 13, 2021

julian-risch deleted the classification-node branch July 13, 2021 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FARMClassifier node for Document Classification #1265

Add FARMClassifier node for Document Classification #1265

julian-risch commented Jul 9, 2021 •

edited

Loading

tholor left a comment

tholor Jul 13, 2021

tholor Jul 13, 2021

tholor Jul 13, 2021

tholor Jul 13, 2021

julian-risch Jul 13, 2021

tholor Jul 13, 2021

tholor Jul 13, 2021

tholor Jul 13, 2021

tholor left a comment


		- Take a plain language model (e.g. `bert-base-cased`) and train it for TextClassification
		- Take a TextClassification model and fine-tune it for your domain

Add FARMClassifier node for Document Classification #1265

Add FARMClassifier node for Document Classification #1265

Conversation

julian-risch commented Jul 9, 2021 • edited Loading

tholor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tholor left a comment

Choose a reason for hiding this comment

julian-risch commented Jul 9, 2021 •

edited

Loading