Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Feature/api list document #880

Closed
wants to merge 9 commits into from
Closed

WIP: Feature/api list document #880

wants to merge 9 commits into from

Conversation

ankh6
Copy link
Contributor

@ankh6 ankh6 commented Mar 8, 2021

Hey !
Here's the pull request regarding issue #797
I am eagerly waiting for your feedback !

from fastapi import APIRouter, HTTPException
import logging

sql = sql.SQLDocumentStore()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already initialising document store at https://github.com/deepset-ai/haystack/blob/master/rest_api/controller/search.py#L38 , so we can skip initialisation here or we can refactor that piece of code a bit

sql = sql.SQLDocumentStore()
now = datetime.now()
SUCCESS: int = 200
CREATED: int = 201
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can directly use HTTPStatus enum.

:param payload: input from the client to create a document.
"""
logger.info("Constructing document object")
created_document = Document.from_dict(payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you yet to add implementation to add document

"""
try:
logger.info("Retrieving documents")
retrieved_documents = sql.get_all_documents()
Copy link
Contributor

@lalitpagaria lalitpagaria Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can build pagination on top of generator.

def get_all_documents_generator(

raise Exception(f"Something went wrong while retrieving documents", e.message)


@router.get("/documents/{id}", status_code=SUCCESS)
Copy link
Contributor

@lalitpagaria lalitpagaria Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return response_model so open api spec will render correctly also to other APIs as well

try:
retrieved_document = sql.get_document_by_id(id)
if not retrieved_document:
raise HTTPException(status_code=404, detail=f"No document with id {id} could be found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception raise here will be caught by line no 63, and that will convert it into normal Exception at line no 67

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPException is not a generic Python exception, meaning an exception will be raised and the code will terminate with exit code != 0. We won't go further in the calling chain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:param payload: input from the client to create a Document
"""
try:
if not id:
Copy link
Contributor

@lalitpagaria lalitpagaria Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id can never be None so no need to check for that here.

:param id: UUID pointing to the document to retrieve
"""
try:
if not id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise HTTPException(status_code=404, detail=f"No document with id {id} could be found")
sql.delete_document_by_id(id)
cursor = sql.SQLDocumentStore(update_existing_documents=True)
document_to_be_updated = cursor.write_documents(payload.to_dict())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add update_existing_documents to write_documents as an optional param.
Also we need to make sure payload["id"] == id otherwise document store will generate random UUID

Copy link
Contributor Author

@ankh6 ankh6 Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_document() does not accept update_existing_document as an input parameter. That is the reason why I had to instantiate another instance of the SQLDocumentStore. Plus, setting it to True makes sure that any existing document with an id gets updated (per SQLDocumentStore docstring).
Is that correct ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I mean how about updating write_documents function which accept update_existing_documents as optional param

@lalitpagaria
Copy link
Contributor

@ankh6 Thank you for working on it. I have few review comments.
Few general are -

  • We can add delete_document_by_id as abstract function in BaseDocumentStore
  • Can you please remove formatting changes from this PR as difficult to understand the actual code changes
  • We can add tests as well

@tholor We can fix coding style, which user can integrate with IDE. Also CI can also run style check. https://realpython.com/python-code-quality/

@ankh6
Copy link
Contributor Author

ankh6 commented Mar 9, 2021

An error is raised when trying to run the test suite with pytest test -m "not elasticsearch"

ImportError while loading conftest '/home/andy/opensource/deepset_ai_haystack/haystack/test/conftest.py'.
test/conftest.py:8: in
from elasticsearch import Elasticsearch
E ModuleNotFoundError: No module named 'elasticsearch'

@Timoeller
Copy link
Contributor

Hey @ankh6 thanks for the PR, this is a really useful feature.
Also big thanks to @lalitpagaria for the thorough review, the points make total sense.

Do you, ankh6 want to work on those suggestions? Otherwise @oryx1729 would take over soon.

@ankh6
Copy link
Contributor Author

ankh6 commented Mar 17, 2021

@Timoeller I am working on it. I'll make another PR this evening.
I am writing some unit tests. I'll be done today :)

@ankh6
Copy link
Contributor Author

ankh6 commented Mar 17, 2021

@Timoeller Changes have been pushed.
I am waiting for your feedback

@Timoeller
Copy link
Contributor

Hey cool, thanks for the changes.
There are still a couple of Lalits comments left, would you also like to work on those?
Also the API test is failing related to some pydantic problem. Do you need help there?

If you work on those comments we can resolve them and let Lalit or @oryx1729 do another round of review. But we are getting into good shape here. Thanks again for the work!

@ankh6
Copy link
Contributor Author

ankh6 commented Mar 18, 2021

@Timoeller. Yes, I'll keep working on that. I want my branch to be merged :)

All comments written by Lalit have been tackled in the latest commit. Which points are you referring to ?
Yes, the pipeline did break for a minor issue. I'd appreciate having guidance
As pointed in previous comment (9 days ago) I am not able to run the automated test suite locally. I pulled the latest code but still. Errors are raised. These error come from module I have not worked on:
Screenshot from 2021-03-18 22-25-54

@Timoeller
Copy link
Contributor

There are still a couple of Lalits comments left, would you also like to work on those?

Hey, yes you are right, I must have looked at an older version of your code.

@oryx1729 could you please take a look at the pydantic error and rest api changes? We will need your review before we can merge here.

@Timoeller Timoeller requested a review from oryx1729 March 19, 2021 12:26
@lalitpagaria
Copy link
Contributor

An error is raised when trying to run the test suite with pytest test -m "not elasticsearch"

ImportError while loading conftest '/home/andy/opensource/deepset_ai_haystack/haystack/test/conftest.py'.
test/conftest.py:8: in
from elasticsearch import Elasticsearch
E ModuleNotFoundError: No module named 'elasticsearch'

@ankh6 what error you get after you install elasticsearch>=7.7,<=7.10?
not elasticsearch here means you are not running any tests which need elasticsearch server. But you still need elasticsearch dependency

@oryx1729 oryx1729 changed the title Feature/api list document WIP: Feature/api list document Mar 24, 2021
@ankh6
Copy link
Contributor Author

ankh6 commented Mar 27, 2021

@lalitpagaria I'll install the elasticsearch server and create another PR.
Hope the previous PR will be reviewed in the meanwhile so that potential changes can be squeezed in one commit

@lalitpagaria
Copy link
Contributor

@ankh6 Some test are failing in this PR. Otherwise looks fine to me.
Better to get review from @oryx1729, not sure whether it will conflict with this PR #922

@ankh6
Copy link
Contributor Author

ankh6 commented Mar 30, 2021

@oryx1729 Have you had the time to review the PR ?
Tests are failing because Document class does not have validator:
est_rest_api.py:None (test_rest_api.py)
../venv/lib/python3.8/site-packages/fastapi/utils.py:65: in create_response_field
return response_field(field_info=field_info)
pydantic/fields.py:327: in pydantic.fields.ModelField.init
???
pydantic/fields.py:438: in pydantic.fields.ModelField.prepare
???
pydantic/fields.py:651: in pydantic.fields.ModelField.populate_validators
???
pydantic/validators.py:715: in find_validators
???
E RuntimeError: no validator found for <class 'haystack.schema.Document'>, see arbitrary_types_allowed in Config

@oryx1729
Copy link
Contributor

oryx1729 commented Apr 9, 2021

Hi @ankh6, thank you for working on this PR, and apologies for the delay in getting back on it.

This already looks in very good shape. It needs minor adjustments(mostly related to recently merged #922) and fixing the CI - I'm happy to take over from here to take it to the finish line in the next days!

@ankh6
Copy link
Contributor Author

ankh6 commented Apr 10, 2021

@oryx1729 Sure thing!
I always look to better myself. Any advice/feedback on code implementation ?

@oryx1729
Copy link
Contributor

oryx1729 commented May 3, 2021

Hi @ankh6, a quick update. I revisited the PR and realized it currently only has an implementation for the SQLDocumentStore. There is a bit more work to be done — extending it to other document stores, ensuring document embeddings(if present) gets updated when documents are updated/deleted. Please feel free if you'd like to contribute more to the PR, but in any case, we'll get back on it in the next few days 🙂

@lalitpagaria
Copy link
Contributor

@ankh6 just checking if you able to work on it.

@ZanSara
Copy link
Contributor

ZanSara commented Oct 19, 2021

Getting and deleting documents through the REST API is now possible (#1546, #1580). Closing this. If anybody wants to continue working on the remaining features proposed here, let's open a new issue/PR and start fresh.

@ZanSara ZanSara closed this Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants