-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor REST APIs to use Pipelines #922
Conversation
b18e1a7
to
21762e4
Compare
d1552f5
to
3c14707
Compare
@oryx1729 Really great changes 🚀 |
export_data.append(feedback) | ||
if document is None: | ||
raise HTTPException( | ||
status_code=500, detail="Could not find document with id {label.document_id} for label id {label.id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nip: HTTPStatus.NOT_FOUND
status would be more suitable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP 4x error codes are client-side, i.e., a client requesting a resource with an ID that does not exist. This means that the client can ensure the correctness of the resource identifier and try again.
In this case, it's a server-side error where data is inconsistent, i.e., a label has an incorrect document ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood that doc id is user supplied. We can replace this with HTTPStatus .INTERNAL_SERVER_ERROR
for more readability.
|
||
export = {"data": export_data} | ||
|
||
with open("feedback_squad_direct.json", "w", encoding="utf8") as f: | ||
json.dump(export_data, f, ensure_ascii=False, sort_keys=True, indent=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API caller/client can sort the keys and add indent if they like, we can free server from spending time on sorting and reduce the payload size by not adding indent :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's a relatively small performance penalty for the added readability & convenience.
): | ||
if not INDEXING_PIPELINE: | ||
raise HTTPException(status_code=501, detail="Indexing Pipeline is not configured.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTPStatus.NOT_IMPLEMENTED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP 501 represents "not implemented". Do you mean changing 501 to NOT_IMPLEMENTED for code readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
results.append(result) | ||
elasticapm.set_custom_context({"results": results}) | ||
|
||
class Request(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add schema_extra
to add sample example for request and response on swagger
https://fastapi.tiangolo.com/tutorial/schema-extra-example/
@@ -0,0 +1,45 @@ | |||
version: '0.7' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't seeing anywhere we are checking version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to implement in the next iterations. It's still useful to have it for documentation purprose or issues to be able to correlate with the Haystack version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot comment in detail about api changes but from concept and naming all makes sense.
I think I discovered a bug about feedback extraction.
[optional] What about the TODO in our tests: "Add integration tests for other APIs"? What about a test for feedback export? Since this PR will change quite a lot we might want to increase our testing.
Tested this pipeline way of doing ExtractiveQA locally, sounds good ✅ to me - name: Reader # custom-name for the component; helpful for visualization & debugging
type: TransformersReader # Haystack Class name for the component
params:
model_name_or_path: etalab-ia/camembert-base-squadFR-fquad-piaf
tokenizer: etalab-ia/camembert-base-squadFR-fquad-piaf
top_k_per_candidate: 4
return_no_answers: False
use_gpu: -1 |
A single testing note : elasticsearch:
container_name: elasticsearch Why ? Because the connection to the ES container needs to match the exact same name as in the components: # define all the building-blocks for Pipeline
- name: ElasticsearchDocumentStore
type: ElasticsearchDocumentStore
params:
host: elasticsearch Which is not the default "localhost" in my case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offsets are now corrected.
It is difficult for me to judge all the changes. I see you removed some tests alltogether (ES dsl tests). Do you think we have enough tests for this rather complex change? If you think so, lets merge soon and adjust if problems occur before our release next week.
Hi @guillim, thank you for reviewing this PR and spotting the docker-compose error. It should now be resolved. |
This PR introduces significant refactoring of the REST APIs.
Under-the-hood, the APIs are now using Pipelines for both Query & Indexing(upload of files).
Changes
/doc-qa
&/faq-qa
endpoints are deprecated. It is replaced with the new/query
endpoint. The query processingcan be configured in the
rest_api/pipeline.yaml
file.The
/query
endpoint request & response accepts a singlequery
string and returns a list of answers(or documents)for the given query. Multiple queries in a single request are not supported with the
/query
endpoint.A
FileTypeClassifier
node is introduced to route files to the corresponding file converters. This node can beextended to use different converters tailored for specific use cases.
/doc-qa-feedback
&/faq-qa-feedback
endpoints are replaced with a new generic/feedback
endpoint.Feedback APIs now allow exporting negative labels as well.
Streamlit UI is updated to work with the new Query APIs.
ElasticDSL endpoints are removed. They could potentially be reimplemented as a node in the query pipeline.