Refactor REST APIs to use Pipelines #922

oryx1729 · 2021-03-25T16:12:07Z

This PR introduces significant refactoring of the REST APIs.

Under-the-hood, the APIs are now using Pipelines for both Query & Indexing(upload of files).

Changes

/doc-qa & /faq-qa endpoints are deprecated. It is replaced with the new /query endpoint. The query processing
can be configured in the rest_api/pipeline.yaml file.
The /query endpoint request & response accepts a single query string and returns a list of answers(or documents)
for the given query. Multiple queries in a single request are not supported with the /query endpoint.
A FileTypeClassifier node is introduced to route files to the corresponding file converters. This node can be
extended to use different converters tailored for specific use cases.
/doc-qa-feedback & /faq-qa-feedback endpoints are replaced with a new generic /feedback endpoint.
Feedback APIs now allow exporting negative labels as well.
Streamlit UI is updated to work with the new Query APIs.
ElasticDSL endpoints are removed. They could potentially be reimplemented as a node in the query pipeline.

haystack/file_converter/base.py

lalitpagaria · 2021-03-26T22:16:17Z

@oryx1729 Really great changes 🚀

rest_api/application.py

lalitpagaria · 2021-03-26T22:45:04Z

rest_api/controller/feedback.py

-        export_data.append(feedback)
+        if document is None:
+            raise HTTPException(
+                status_code=500, detail="Could not find document with id {label.document_id} for label id {label.id}"


just nip: HTTPStatus.NOT_FOUND status would be more suitable here.

HTTP 4x error codes are client-side, i.e., a client requesting a resource with an ID that does not exist. This means that the client can ensure the correctness of the resource identifier and try again.

In this case, it's a server-side error where data is inconsistent, i.e., a label has an incorrect document ID.

I misunderstood that doc id is user supplied. We can replace this with HTTPStatus .INTERNAL_SERVER_ERROR for more readability.

lalitpagaria · 2021-03-26T22:49:08Z

rest_api/controller/feedback.py


    export = {"data": export_data}

+    with open("feedback_squad_direct.json", "w", encoding="utf8") as f:
+        json.dump(export_data, f, ensure_ascii=False, sort_keys=True, indent=4)


API caller/client can sort the keys and add indent if they like, we can free server from spending time on sorting and reduce the payload size by not adding indent :)

I guess it's a relatively small performance penalty for the added readability & convenience.

lalitpagaria · 2021-03-26T22:53:44Z

rest_api/controller/file_upload.py

 ):
+    if not INDEXING_PIPELINE:
+        raise HTTPException(status_code=501, detail="Indexing Pipeline is not configured.")


HTTPStatus.NOT_IMPLEMENTED

HTTP 501 represents "not implemented". Do you mean changing 501 to NOT_IMPLEMENTED for code readability?

lalitpagaria · 2021-03-26T22:58:29Z

rest_api/controller/search.py

-        results.append(result)
-    elasticapm.set_custom_context({"results": results})
+
+class Request(BaseModel):


We can add schema_extra to add sample example for request and response on swagger
https://fastapi.tiangolo.com/tutorial/schema-extra-example/

lalitpagaria · 2021-03-26T23:04:04Z

rest_api/pipelines.yaml

@@ -0,0 +1,45 @@
+version: '0.7'


Don't seeing anywhere we are checking version

The plan is to implement in the next iterations. It's still useful to have it for documentation purprose or issues to be able to correlate with the Haystack version.

Timoeller

I cannot comment in detail about api changes but from concept and naming all makes sense.

I think I discovered a bug about feedback extraction.

[optional] What about the TODO in our tests: "Add integration tests for other APIs"? What about a test for feedback export? Since this PR will change quite a lot we might want to increase our testing.

rest_api/controller/feedback.py

Timoeller · 2021-03-30T16:42:37Z

@Armbruj might have encounter a problem in our docker-compose setup in combination with the new rest api pipelines. See #932 for some docker logs errors.

guillim · 2021-04-01T15:51:45Z

Tested this pipeline way of doing ExtractiveQA locally, sounds good ✅ to me
For info, I had this config for the Reader in the rest_api/pipelines.yaml file :

- name: Reader       # custom-name for the component; helpful for visualization & debugging
    type: TransformersReader    # Haystack Class name for the component
    params:
      model_name_or_path: etalab-ia/camembert-base-squadFR-fquad-piaf
      tokenizer: etalab-ia/camembert-base-squadFR-fquad-piaf
      top_k_per_candidate: 4
      return_no_answers: False
      use_gpu: -1

guillim · 2021-04-01T15:52:07Z

A single testing note :
I had to add another variable to the Elasticsearch container definition (in the docker-compose.yml file).

  elasticsearch:
    container_name: elasticsearch

Why ? Because the connection to the ES container needs to match the exact same name as in the host in the rest_api/pipelines.yaml file as follows:

components:    # define all the building-blocks for Pipeline
  - name: ElasticsearchDocumentStore
    type: ElasticsearchDocumentStore
    params:
      host: elasticsearch

Which is not the default "localhost" in my case

Timoeller

Offsets are now corrected.

It is difficult for me to judge all the changes. I see you removed some tests alltogether (ES dsl tests). Do you think we have enough tests for this rather complex change? If you think so, lets merge soon and adjust if problems occur before our release next week.

oryx1729 · 2021-04-07T15:23:30Z

Hi @guillim, thank you for reviewing this PR and spotting the docker-compose error. It should now be resolved.

oryx1729 force-pushed the revamp-api branch 6 times, most recently from b18e1a7 to 21762e4 Compare March 26, 2021 13:04

oryx1729 added the breaking change label Mar 26, 2021

oryx1729 force-pushed the revamp-api branch 2 times, most recently from d1552f5 to 3c14707 Compare March 26, 2021 13:45

oryx1729 requested a review from Timoeller March 26, 2021 13:45

oryx1729 mentioned this pull request Mar 26, 2021

Rest API update with pipeline #927

Closed

lalitpagaria reviewed Mar 26, 2021

View reviewed changes

haystack/file_converter/base.py Show resolved Hide resolved

lalitpagaria reviewed Mar 26, 2021

View reviewed changes

rest_api/application.py Show resolved Hide resolved

lalitpagaria reviewed Mar 26, 2021

View reviewed changes

lalitpagaria mentioned this pull request Mar 27, 2021

WIP: Feature/api list document #880

Closed

oryx1729 force-pushed the revamp-api branch from 2f243d0 to 64ad1b4 Compare March 29, 2021 08:44

Rob192 mentioned this pull request Mar 29, 2021

Switch from Finder to Pipeline in Haystack API etalab-ia/piaf-ml#33

Closed

oryx1729 changed the title ~~WIP: Refactor REST APIs to use Pipelines~~ Refactor REST APIs to use Pipelines Mar 29, 2021

Timoeller suggested changes Mar 29, 2021

View reviewed changes

rest_api/controller/feedback.py Outdated Show resolved Hide resolved

Timoeller mentioned this pull request Mar 29, 2021

filter parameter ignored in search post /models/{model_id}/doc-qa #932

Closed

oryx1729 changed the title ~~Refactor REST APIs to use Pipelines~~ WIP: Refactor REST APIs to use Pipelines Mar 31, 2021

oryx1729 force-pushed the revamp-api branch from c5f615c to f6326cd Compare April 7, 2021 09:39

oryx1729 added 15 commits April 7, 2021 11:42

Refactor REST APIs to use Pipelines

338b496

Update Streamlit UI

4ab3725

Remove unused code

97e1f0f

Fix type hints

9111c5b

Update tests for REST API

bf36b57

Allow Pipeline names to be configurable for REST APIs

01de03c

Update method names for Feedback API

bd5c859

Remove tests for Elastic DSL

06cd16d

Fix test fixture

545bcb3

Remove FAISSDocumentStore from feedback

c7276e8

Simplify FileTypeClassifier check

1a7679d

Add warning for CORS

63c10f2

Fix feedback export API

cafe150

Update test pipeline YAML

f290fa6

Add import for FileTypeClassifier

7e59360

oryx1729 force-pushed the revamp-api branch from 682895e to 7e59360 Compare April 7, 2021 09:42

Update pipeline name in tests

43a5cc3

oryx1729 force-pushed the revamp-api branch from 2a218cb to 43a5cc3 Compare April 7, 2021 10:12

oryx1729 requested a review from Timoeller April 7, 2021 11:50

oryx1729 changed the title ~~WIP: Refactor REST APIs to use Pipelines~~ Refactor REST APIs to use Pipelines Apr 7, 2021

Update docker-compose

922e45e

oryx1729 force-pushed the revamp-api branch from bef59ea to 922e45e Compare April 7, 2021 14:54

Add latest docstring and tutorial changes

4d0adcb

Timoeller approved these changes Apr 7, 2021

View reviewed changes

oryx1729 merged commit 8c68699 into master Apr 7, 2021

oryx1729 deleted the revamp-api branch April 7, 2021 15:53

guillim mentioned this pull request Apr 9, 2021

Custom Components in Pipelines with REST API #955

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor REST APIs to use Pipelines #922

Refactor REST APIs to use Pipelines #922

oryx1729 commented Mar 25, 2021 •

edited

Loading

lalitpagaria commented Mar 26, 2021

lalitpagaria Mar 26, 2021

oryx1729 Mar 29, 2021

lalitpagaria Mar 31, 2021

lalitpagaria Mar 26, 2021

oryx1729 Mar 29, 2021

lalitpagaria Mar 26, 2021

oryx1729 Mar 29, 2021

lalitpagaria Mar 29, 2021

lalitpagaria Mar 26, 2021

lalitpagaria Mar 26, 2021

oryx1729 Mar 29, 2021

Timoeller left a comment

Timoeller commented Mar 30, 2021

guillim commented Apr 1, 2021

guillim commented Apr 1, 2021

Timoeller left a comment

oryx1729 commented Apr 7, 2021

Refactor REST APIs to use Pipelines #922

Refactor REST APIs to use Pipelines #922

Conversation

oryx1729 commented Mar 25, 2021 • edited Loading

Changes

lalitpagaria commented Mar 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Timoeller left a comment

Choose a reason for hiding this comment

Timoeller commented Mar 30, 2021

guillim commented Apr 1, 2021

guillim commented Apr 1, 2021

Timoeller left a comment

Choose a reason for hiding this comment

oryx1729 commented Apr 7, 2021

oryx1729 commented Mar 25, 2021 •

edited

Loading