Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Util functions to print output from a pipeline #1374

Closed
predoctech opened this issue Aug 27, 2021 · 8 comments
Closed

Util functions to print output from a pipeline #1374

predoctech opened this issue Aug 27, 2021 · 8 comments
Assignees

Comments

@predoctech
Copy link

Question
In deploying a SentenceTransformersRanker and re-ranking the predictions from a FAQPipeline, what would be the suitable utils function to print the results? It seems like labels in the returned dictionary has been changed and print_answers is no longer suitable. I tried using print_documents but it broke saying that:
"TypeError: 'Document' object is not subscriptable" on line 143

Additional context
Add any other context or screenshots about the question (optional).

FAQ Check
Followed the documents and github issues on the usage of SentenceTransformersRanker but there is no mention of the suitable utils for printing out the corresponding result.

@predoctech
Copy link
Author

Hi Haystack team, anyone can shed some lights to the question? Many thanks.

@julian-risch
Copy link
Member

Hi @predoctech I will look into this issue later today and will start by reproducing your error message. If you have some code ready to share we can speed up the process a bit. Did you maybe work with a jupyter notebook or some other code snippet that you could share? I would expect the Ranker to return a list of Document. Line 143 that you are referring to is this line here, correct?

new_text = d["text"][:max_text_len]

@julian-risch
Copy link
Member

@predoctech I think the problem is that the returned type is Document but the type expected by print_documents() is a dictionary.
Could you please check whether the following helps (assuming that the variable rescontains the result of your pipeline)?

document_dicts = [doc.to_dict() for doc in res["documents"]]
res["documents"] = document_dicts
print_documents(res, max_text_len=100)

I took it from here

# document_dicts = [doc.to_dict() for doc in res["documents"]]

For your information, we are working on refactoring Document and other primitives in #1232

@predoctech
Copy link
Author

@julian-risch Thanks for the suggestion. Your code does seem to take care of the type problem, but print_documents() make reference to a key "name" inside the meta dict of "res" which does not exist. Following is the error dump, any suggestion?

image

@julian-risch
Copy link
Member

The print_documents() method assumes that the meta field contains the document's name in the field name. When documents are added to the document store, we typically add a name to the document. For example, here is a line of code that does that in Tutorial 1:

documents.append({"text": para, "meta": {"name": path.name}})

or a line of code that does that when evaluation data is loaded:

cur_meta = {"name": document_dict.get("title", None)}

If your documents don't have a name, I would suggest that you implement a slightly different version of print_documents() yourself that does not require the name field. Or you add a name to each document, which makes sense for many applications, I believe.

Does that solve your problem for now?

For the sake of completeness, there are two other options that I see: 1) we could modify the print_documents() method so that it checks whether the key name exists before accessing it. 2) we could make sure that objects of type Document always have a name in the meta field. This could be tackled in issue #1232 with the redesign of the different primitives.

@julian-risch
Copy link
Member

Hi @predoctech have you found a solution for your problem?

@predoctech
Copy link
Author

Well sort of @julian-risch.
Neither the suggested approach works for me. Instead I wish to add the answer data from retriever into the document produced by ranker (the question is already in the "text": value". Once the value for "name" is filled print_documents() will output as suggested.
Thanks for the advice.

@julian-risch
Copy link
Member

Okay, I understand. As mentioned before we are working on refactoring Document and other primitives in #1232 and this refactoring should simplify interfaces between pipeline nodes, how to pass on documents and how to print documents. Stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants