Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor REST APIs to use Pipelines #922

Merged
merged 18 commits into from
Apr 7, 2021
Merged

Refactor REST APIs to use Pipelines #922

merged 18 commits into from
Apr 7, 2021

Conversation

oryx1729
Copy link
Contributor

@oryx1729 oryx1729 commented Mar 25, 2021

This PR introduces significant refactoring of the REST APIs.

Under-the-hood, the APIs are now using Pipelines for both Query & Indexing(upload of files).

Changes

  1. /doc-qa & /faq-qa endpoints are deprecated. It is replaced with the new /query endpoint. The query processing
    can be configured in the rest_api/pipeline.yaml file.

  2. The /query endpoint request & response accepts a single query string and returns a list of answers(or documents)
    for the given query. Multiple queries in a single request are not supported with the /query endpoint.

  3. A FileTypeClassifier node is introduced to route files to the corresponding file converters. This node can be
    extended to use different converters tailored for specific use cases.

  4. /doc-qa-feedback & /faq-qa-feedback endpoints are replaced with a new generic /feedback endpoint.

  5. Feedback APIs now allow exporting negative labels as well.

  6. Streamlit UI is updated to work with the new Query APIs.

  7. ElasticDSL endpoints are removed. They could potentially be reimplemented as a node in the query pipeline.

@lalitpagaria
Copy link
Contributor

@oryx1729 Really great changes 🚀

export_data.append(feedback)
if document is None:
raise HTTPException(
status_code=500, detail="Could not find document with id {label.document_id} for label id {label.id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just nip: HTTPStatus.NOT_FOUND status would be more suitable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP 4x error codes are client-side, i.e., a client requesting a resource with an ID that does not exist. This means that the client can ensure the correctness of the resource identifier and try again.

In this case, it's a server-side error where data is inconsistent, i.e., a label has an incorrect document ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood that doc id is user supplied. We can replace this with HTTPStatus .INTERNAL_SERVER_ERROR for more readability.


export = {"data": export_data}

with open("feedback_squad_direct.json", "w", encoding="utf8") as f:
json.dump(export_data, f, ensure_ascii=False, sort_keys=True, indent=4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API caller/client can sort the keys and add indent if they like, we can free server from spending time on sorting and reduce the payload size by not adding indent :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's a relatively small performance penalty for the added readability & convenience.

):
if not INDEXING_PIPELINE:
raise HTTPException(status_code=501, detail="Indexing Pipeline is not configured.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPStatus.NOT_IMPLEMENTED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP 501 represents "not implemented". Do you mean changing 501 to NOT_IMPLEMENTED for code readability?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

results.append(result)
elasticapm.set_custom_context({"results": results})

class Request(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add schema_extra to add sample example for request and response on swagger
https://fastapi.tiangolo.com/tutorial/schema-extra-example/

@@ -0,0 +1,45 @@
version: '0.7'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't seeing anywhere we are checking version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to implement in the next iterations. It's still useful to have it for documentation purprose or issues to be able to correlate with the Haystack version.

@oryx1729 oryx1729 changed the title WIP: Refactor REST APIs to use Pipelines Refactor REST APIs to use Pipelines Mar 29, 2021
Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment in detail about api changes but from concept and naming all makes sense.

I think I discovered a bug about feedback extraction.

[optional] What about the TODO in our tests: "Add integration tests for other APIs"? What about a test for feedback export? Since this PR will change quite a lot we might want to increase our testing.

rest_api/controller/feedback.py Outdated Show resolved Hide resolved
@Timoeller
Copy link
Contributor

@Armbruj might have encounter a problem in our docker-compose setup in combination with the new rest api pipelines. See #932 for some docker logs errors.

@oryx1729 oryx1729 changed the title Refactor REST APIs to use Pipelines WIP: Refactor REST APIs to use Pipelines Mar 31, 2021
@guillim
Copy link
Contributor

guillim commented Apr 1, 2021

Tested this pipeline way of doing ExtractiveQA locally, sounds good ✅ to me
For info, I had this config for the Reader in the rest_api/pipelines.yaml file :

- name: Reader       # custom-name for the component; helpful for visualization & debugging
    type: TransformersReader    # Haystack Class name for the component
    params:
      model_name_or_path: etalab-ia/camembert-base-squadFR-fquad-piaf
      tokenizer: etalab-ia/camembert-base-squadFR-fquad-piaf
      top_k_per_candidate: 4
      return_no_answers: False
      use_gpu: -1

@guillim
Copy link
Contributor

guillim commented Apr 1, 2021

A single testing note :
I had to add another variable to the Elasticsearch container definition (in the docker-compose.yml file).

  elasticsearch:
    container_name: elasticsearch

Why ? Because the connection to the ES container needs to match the exact same name as in the host in the rest_api/pipelines.yaml file as follows:

components:    # define all the building-blocks for Pipeline
  - name: ElasticsearchDocumentStore
    type: ElasticsearchDocumentStore
    params:
      host: elasticsearch

Which is not the default "localhost" in my case

@oryx1729 oryx1729 requested a review from Timoeller April 7, 2021 11:50
@oryx1729 oryx1729 changed the title WIP: Refactor REST APIs to use Pipelines Refactor REST APIs to use Pipelines Apr 7, 2021
Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offsets are now corrected.

It is difficult for me to judge all the changes. I see you removed some tests alltogether (ES dsl tests). Do you think we have enough tests for this rather complex change? If you think so, lets merge soon and adjust if problems occur before our release next week.

@oryx1729
Copy link
Contributor Author

oryx1729 commented Apr 7, 2021

Hi @guillim, thank you for reviewing this PR and spotting the docker-compose error. It should now be resolved.

@oryx1729 oryx1729 merged commit 8c68699 into master Apr 7, 2021
@oryx1729 oryx1729 deleted the revamp-api branch April 7, 2021 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants