Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query response without answers #2161

Merged
merged 18 commits into from
Feb 16, 2022
Merged

Query response without answers #2161

merged 18 commits into from
Feb 16, 2022

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Feb 10, 2022

So far, pipelines returning no answers or no documents in response to a query would break Pydantic's validation of the response in the REST API and break them.

This PR makes sure both fields contain at least an empty list (not None) before returning the response.

Closes #1863

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ZanSara ZanSara marked this pull request as ready for review February 10, 2022 14:54
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some open questions written down in the comments. Other than that, could you please remove the commits about tutorial 6 from this PR?

@@ -384,11 +384,11 @@
"source": [
"# !pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[milvus]\n",
"\n",
"#from haystack.utils import launch_milvus\n",
"#from haystack.document_stores import MilvusDocumentStore\n",
"# from haystack.utils import launch_milvus\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tutorial 6 changes from #2148 made it in here. We shouldn't do that. Could you please remove them?

@@ -116,13 +116,13 @@ See [their docs](https://milvus.io/docs/v1.0.0/milvus_docker-cpu.md) for more de


```python
!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[milvus]
# !pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[milvus]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tutorial 6 changes from #2148 made it in here. We shouldn't do that. Could you please remove them?

{"openapi": "3.0.2", "info": {"title": "Haystack-API", "version": "1.0.0"}, "paths": {"/initialized": {"get": {"tags": ["search"], "summary": "Check Status", "description": "This endpoint can be used during startup to understand if the\nserver is ready to take any requests, or is still loading.\n\nThe recommended approach is to call this endpoint with a short timeout,\nlike 500ms, and in case of no reply, consider the server busy.", "operationId": "check_status_initialized_get", "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {}}}}}}}, "/hs_version": {"get": {"tags": ["search"], "summary": "Haystack Version", "description": "Get the running Haystack version.", "operationId": "haystack_version_hs_version_get", "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {}}}}}}}, "/query": {"post": {"tags": ["search"], "summary": "Query", "description": "This endpoint receives the question as a string and allows the requester to set\nadditional parameters that will be passed on to the Haystack pipeline.", "operationId": "query_query_post", "requestBody": {"content": {"application/json": {"schema": {"$ref": "#/components/schemas/QueryRequest"}}}, "required": true}, "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {"$ref": "#/components/schemas/QueryResponse"}}}}, "422": {"description": "Validation Error", "content": {"application/json": {"schema": {"$ref": "#/components/schemas/HTTPValidationError"}}}}}}}, "/feedback": {"get": {"tags": ["feedback"], "summary": "Get Feedback", "description": "This endpoint allows the API user to retrieve all the\nfeedback that has been sumbitted through the\n`POST /feedback` endpoint", "operationId": "get_feedback_feedback_get", "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {}}}}}}, "post": {"tags": ["feedback"], "summary": "Post Feedback", "description": "This endpoint allows the API user to submit feedback on\nan answer for a particular query. For example, the user\ncan send feedback on whether the answer was correct and\nwhether the right snippet was identified as the answer.\nInformation submitted through this endpoint is used to\ntrain the underlying QA model.", "operationId": "post_feedback_feedback_post", "requestBody": {"content": {"application/json": {"schema": {"$ref": "#/components/schemas/LabelSerialized"}}}, "required": true}, "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {}}}}, "422": {"description": "Validation Error", "content": {"application/json": {"schema": {"$ref": "#/components/schemas/HTTPValidationError"}}}}}}}, "/eval-feedback": {"post": {"tags": ["feedback"], "summary": "Get Feedback Metrics", "description": "This endpoint returns basic accuracy metrics based on user feedback,\ne.g., the ratio of correct answers or correctly identified documents.\nYou can filter the output by document or label.\n\nExample:\n`curl --location --request POST 'http://127.0.0.1:8000/eval-doc-qa-feedback' --header 'Content-Type: application/json' --data-raw '{ \"filters\": {\"document_id\": [\"XRR3xnEBCYVTkbTystOB\"]} }'`", "operationId": "get_feedback_metrics_eval_feedback_post", "requestBody": {"content": {"application/json": {"schema": {"$ref": "#/components/schemas/FilterRequest"}}}}, "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {}}}}, "422": {"description": "Validation Error", "content": {"application/json": {"schema": {"$ref": "#/components/schemas/HTTPValidationError"}}}}}}}, "/export-feedback": {"get": {"tags": ["feedback"], "summary": "Export Feedback", "description": "This endpoint returns JSON output in the SQuAD format for question/answer pairs\nthat were marked as \"relevant\" by user feedback through the `POST /feedback` endpoint.\n\nThe context_size param can be used to limit response size for large documents.", "operationId": "export_feedback_export_feedback_get", "parameters": [{"required": false, "schema": {"title": "Context Size", "type": "integer", "default": 100000}, "name": "context_size", "in": "query"}, {"required": false, "schema": {"title": "Full Document Context", "type": "boolean", "default": true}, "name": "full_document_context", "in": "query"}, {"required": false, "schema": {"title": "Only Positive Labels", "type": "boolean", "default": false}, "name": "only_positive_labels", "in": "query"}], "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {}}}}, "422": {"description": "Validation Error", "content": {"application/json": {"schema": {"$ref": "#/components/schemas/HTTPValidationError"}}}}}}}, "/file-upload": {"post": {"tags": ["file-upload"], "summary": "Upload File", "description": "You can use this endpoint to upload a file for indexing\n(see [http://localhost:3000/guides/rest-api#indexing-documents-in-the-haystack-rest-api-document-store]).", "operationId": "upload_file_file_upload_post", "requestBody": {"content": {"multipart/form-data": {"schema": {"$ref": "#/components/schemas/Body_upload_file_file_upload_post"}}}, "required": true}, "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {}}}}, "422": {"description": "Validation Error", "content": {"application/json": {"schema": {"$ref": "#/components/schemas/HTTPValidationError"}}}}}}}, "/documents/get_by_filters": {"post": {"tags": ["document"], "summary": "Get Documents", "description": "This endpoint allows you to retrieve documents contained in your document store.\nYou can filter the documents to delete by metadata (like the document's name),\nor provide an empty JSON object to clear the document store.\n\nExample of filters:\n`'{\"filters\": {{\"name\": [\"some\", \"more\"], \"category\": [\"only_one\"]}}'`\n\nTo get all documents you should provide an empty dict, like:\n`'{\"filters\": {}}'`", "operationId": "get_documents_documents_get_by_filters_post", "requestBody": {"content": {"application/json": {"schema": {"$ref": "#/components/schemas/FilterRequest"}}}, "required": true}, "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {"title": "Response Get Documents Documents Get By Filters Post", "type": "array", "items": {"$ref": "#/components/schemas/DocumentSerialized"}}}}}, "422": {"description": "Validation Error", "content": {"application/json": {"schema": {"$ref": "#/components/schemas/HTTPValidationError"}}}}}}}, "/documents/delete_by_filters": {"post": {"tags": ["document"], "summary": "Delete Documents", "description": "This endpoint allows you to delete documents contained in your document store.\nYou can filter the documents to delete by metadata (like the document's name),\nor provide an empty JSON object to clear the document store.\n\nExample of filters:\n`'{\"filters\": {{\"name\": [\"some\", \"more\"], \"category\": [\"only_one\"]}}'`\n\nTo get all documents you should provide an empty dict, like:\n`'{\"filters\": {}}'`", "operationId": "delete_documents_documents_delete_by_filters_post", "requestBody": {"content": {"application/json": {"schema": {"$ref": "#/components/schemas/FilterRequest"}}}, "required": true}, "responses": {"200": {"description": "Successful Response", "content": {"application/json": {"schema": {"title": "Response Delete Documents Documents Delete By Filters Post", "type": "boolean"}}}}, "422": {"description": "Validation Error", "content": {"application/json": {"schema": {"$ref": "#/components/schemas/HTTPValidationError"}}}}}}}}, "components": {"schemas": {"AnswerSerialized": {"title": "AnswerSerialized", "required": ["answer"], "type": "object", "properties": {"answer": {"title": "Answer", "type": "string"}, "type": {"title": "Type", "enum": ["generative", "extractive", "other"], "type": "string", "default": "extractive"}, "score": {"title": "Score", "type": "number"}, "context": {"title": "Context", "type": "string"}, "offsets_in_document": {"title": "Offsets In Document", "type": "array", "items": {"$ref": "#/components/schemas/Span"}}, "offsets_in_context": {"title": "Offsets In Context", "type": "array", "items": {"$ref": "#/components/schemas/Span"}}, "document_id": {"title": "Document Id", "type": "string"}, "meta": {"title": "Meta", "type": "object"}}}, "Body_upload_file_file_upload_post": {"title": "Body_upload_file_file_upload_post", "required": ["files"], "type": "object", "properties": {"files": {"title": "Files", "type": "array", "items": {"type": "string", "format": "binary"}}, "meta": {"title": "Meta", "type": "string", "default": "null"}, "remove_numeric_tables": {"title": "Remove Numeric Tables"}, "valid_languages": {"title": "Valid Languages"}, "clean_whitespace": {"title": "Clean Whitespace"}, "clean_empty_lines": {"title": "Clean Empty Lines"}, "clean_header_footer": {"title": "Clean Header Footer"}, "split_by": {"title": "Split By"}, "split_length": {"title": "Split Length"}, "split_overlap": {"title": "Split Overlap"}, "split_respect_sentence_boundary": {"title": "Split Respect Sentence Boundary"}}}, "DocumentSerialized": {"title": "DocumentSerialized", "required": ["content", "content_type", "id", "meta"], "type": "object", "properties": {"content": {"title": "Content", "type": "string"}, "content_type": {"title": "Content Type", "enum": ["text", "table", "image"], "type": "string"}, "id": {"title": "Id", "type": "string"}, "meta": {"title": "Meta", "type": "object"}, "score": {"title": "Score", "type": "number"}, "embedding": {"title": "Embedding", "type": "array", "items": {"type": "number"}}, "id_hash_keys": {"title": "Id Hash Keys", "type": "array", "items": {"type": "string"}}}}, "FilterRequest": {"title": "FilterRequest", "type": "object", "properties": {"filters": {"title": "Filters", "type": "object", "additionalProperties": {"anyOf": [{"type": "string"}, {"type": "array", "items": {"type": "string"}}]}}}}, "HTTPValidationError": {"title": "HTTPValidationError", "type": "object", "properties": {"detail": {"title": "Detail", "type": "array", "items": {"$ref": "#/components/schemas/ValidationError"}}}}, "LabelSerialized": {"title": "LabelSerialized", "required": ["id", "query", "document", "is_correct_answer", "is_correct_document", "origin"], "type": "object", "properties": {"id": {"title": "Id", "type": "string"}, "query": {"title": "Query", "type": "string"}, "document": {"$ref": "#/components/schemas/DocumentSerialized"}, "is_correct_answer": {"title": "Is Correct Answer", "type": "boolean"}, "is_correct_document": {"title": "Is Correct Document", "type": "boolean"}, "origin": {"title": "Origin", "enum": ["user-feedback", "gold-label"], "type": "string"}, "answer": {"$ref": "#/components/schemas/AnswerSerialized"}, "no_answer": {"title": "No Answer", "type": "boolean"}, "pipeline_id": {"title": "Pipeline Id", "type": "string"}, "created_at": {"title": "Created At", "type": "string"}, "updated_at": {"title": "Updated At", "type": "string"}, "meta": {"title": "Meta", "type": "object"}, "filters": {"title": "Filters", "type": "object"}}}, "QueryRequest": {"title": "QueryRequest", "required": ["query"], "type": "object", "properties": {"query": {"title": "Query", "type": "string"}, "params": {"title": "Params", "type": "object"}, "debug": {"title": "Debug", "type": "boolean", "default": false}}, "additionalProperties": false}, "QueryResponse": {"title": "QueryResponse", "required": ["query"], "type": "object", "properties": {"query": {"title": "Query", "type": "string"}, "answers": {"title": "Answers", "type": "array", "items": {"$ref": "#/components/schemas/AnswerSerialized"}, "default": []}, "documents": {"title": "Documents", "type": "array", "items": {"$ref": "#/components/schemas/DocumentSerialized"}, "default": []}, "_debug": {"title": " Debug", "type": "object"}}}, "Span": {"title": "Span", "required": ["start", "end"], "type": "object", "properties": {"start": {"title": "Start", "type": "integer"}, "end": {"title": "End", "type": "integer"}}}, "ValidationError": {"title": "ValidationError", "required": ["loc", "msg", "type"], "type": "object", "properties": {"loc": {"title": "Location", "type": "array", "items": {"type": "string"}}, "msg": {"title": "Message", "type": "string"}, "type": {"title": "Error Type", "type": "string"}}}}}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we already beautified the formatting of this file in one of the previous PRs, didn't we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. Just found it in https://github.com/deepset-ai/haystack/pull/2164/files Just make sure that merging this file gives the expected result in the end. I doubt that this will work automatically.

@@ -215,6 +215,23 @@ def test_query_with_invalid_filter(populated_client: TestClient):
assert len(response_json["answers"]) == 0


def test_query_with_no_documents():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say let's rename to test_query_with_no_documents_and_answers() because this test checks both.

@@ -65,7 +67,7 @@ def query(request: QueryRequest):
return result


def _process_request(pipeline, request) -> QueryResponse:
def _process_request(pipeline, request) -> Dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we keep QueryResponse here? Line 85 looks like we are making sure it's a QueryResponse.

Copy link
Contributor Author

@ZanSara ZanSara Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I thought the same. However, in practice we were returning a dictionary all the time and worse yet, returning a real QueryResponse causes a Pydantic validation error. So I decided to change the type to clarify this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to improve on this. The Dict[str, Any] is also inconsistent with the response model in @router.post("/query", response_model=QueryResponse, response_model_exclude_none=True).
@tholor do you have some advice how to make use of QueryResponsehere?

Copy link
Contributor Author

@ZanSara ZanSara Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the FastAPI docs and it looks like it's designed to handle dictionaries without any issue: https://fastapi.tiangolo.com/tutorial/response-model/ However I agree it's odd that it was failing validation when returning a QueryResponse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important thing is that we declare QueryResponse as the response model on the real endpoints (enables openapi/swagger documentation, validation ...).

On this "helper method" _process_request: I agree that the old version didn't make sense (returning a dict, but annotating with QueryResponse). I think it would make most sense to change the actual returned object to a QueryResponse - What pydantic error did you get there when trying it?

If this is a rabbit hole, switching the return type annotation to Dict is also not too bad here IMO as we still enforce the response model on the higher-level endpoint.

I don't see much value in adding this "conversion-and-back-to-dict step" in line 85. This will happen anyway on-the-fly in the endpoint and is just extra computation time 🤔

Copy link
Contributor Author

@ZanSara ZanSara Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic errors are always hard to figure out. If I implement the method this way:

def _process_request(pipeline, request) -> QueryResponse:
    ...
    result = pipeline.run(query=request.query, params=params, debug=request.debug)
    response = QueryResponse(**result)
    ...
    return response

I get this error on every tests that calls /query:

2022-02-11_14:28:44

I pushed a commit with the method implemented this way so you can see yourself how the error looks like. For me is really hard to parse 😅 QueryResponse should expect DocumentSerialized and AnswerSerialized objects, or equivalent dicts, not necessarily dicts only... and yet it fails validation. let me know if you've seen this before and what causes it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back to the dictionary type because I still can't find a way to make QueryResponse work, and I think it's a minor issue anyway. Let's keep it like this for now.

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As discussed, Dict[str, Any] instead of QueryResponse is okay as we don't expose it to the user.

@ZanSara ZanSara merged commit 13a9bc6 into master Feb 16, 2022
@ZanSara ZanSara deleted the query_response_without_answers branch February 16, 2022 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REST API for Query breaks for document retrieval Pipelines
3 participants