Util functions to print output from a pipeline #1374

predoctech · 2021-08-27T09:23:52Z

Question
In deploying a SentenceTransformersRanker and re-ranking the predictions from a FAQPipeline, what would be the suitable utils function to print the results? It seems like labels in the returned dictionary has been changed and print_answers is no longer suitable. I tried using print_documents but it broke saying that:
"TypeError: 'Document' object is not subscriptable" on line 143

Additional context
Add any other context or screenshots about the question (optional).

FAQ Check
Followed the documents and github issues on the usage of SentenceTransformersRanker but there is no mention of the suitable utils for printing out the corresponding result.

predoctech · 2021-08-30T12:08:32Z

Hi Haystack team, anyone can shed some lights to the question? Many thanks.

julian-risch · 2021-08-30T12:37:45Z

Hi @predoctech I will look into this issue later today and will start by reproducing your error message. If you have some code ready to share we can speed up the process a bit. Did you maybe work with a jupyter notebook or some other code snippet that you could share? I would expect the Ranker to return a list of Document. Line 143 that you are referring to is this line here, correct?

haystack/haystack/utils.py

Line 143 in c3d8aa0

new_text = d["text"][:max_text_len]

julian-risch · 2021-08-30T12:49:45Z

@predoctech I think the problem is that the returned type is Document but the type expected by print_documents() is a dictionary.
Could you please check whether the following helps (assuming that the variable rescontains the result of your pipeline)?

document_dicts = [doc.to_dict() for doc in res["documents"]]
res["documents"] = document_dicts
print_documents(res, max_text_len=100)

I took it from here

haystack/haystack/classifier/farm.py

Line 45 in 4e6f7f3

# document_dicts = [doc.to_dict() for doc in res["documents"]]

For your information, we are working on refactoring Document and other primitives in #1232

predoctech · 2021-08-30T15:03:16Z

@julian-risch Thanks for the suggestion. Your code does seem to take care of the type problem, but print_documents() make reference to a key "name" inside the meta dict of "res" which does not exist. Following is the error dump, any suggestion?

julian-risch · 2021-08-30T15:25:26Z

The print_documents() method assumes that the meta field contains the document's name in the field name. When documents are added to the document store, we typically add a name to the document. For example, here is a line of code that does that in Tutorial 1:

haystack/haystack/preprocessor/utils.py

Line 267 in 1c8a03a

documents.append({"text": para, "meta": {"name": path.name}})

or a line of code that does that when evaluation data is loaded:

haystack/haystack/preprocessor/utils.py

Line 120 in 1c8a03a

cur_meta = {"name": document_dict.get("title", None)}

If your documents don't have a name, I would suggest that you implement a slightly different version of print_documents() yourself that does not require the name field. Or you add a name to each document, which makes sense for many applications, I believe.

Does that solve your problem for now?

For the sake of completeness, there are two other options that I see: 1) we could modify the print_documents() method so that it checks whether the key name exists before accessing it. 2) we could make sure that objects of type Document always have a name in the meta field. This could be tackled in issue #1232 with the redesign of the different primitives.

julian-risch · 2021-09-07T07:00:23Z

Hi @predoctech have you found a solution for your problem?

predoctech · 2021-09-09T14:47:45Z

Well sort of @julian-risch.
Neither the suggested approach works for me. Instead I wish to add the answer data from retriever into the document produced by ranker (the question is already in the "text": value". Once the value for "name" is filled print_documents() will output as suggested.
Thanks for the advice.

julian-risch · 2021-09-09T15:13:42Z

Okay, I understand. As mentioned before we are working on refactoring Document and other primitives in #1232 and this refactoring should simplify interfaces between pipeline nodes, how to pass on documents and how to print documents. Stay tuned!

julian-risch self-assigned this Aug 31, 2021

julian-risch added the type:question label Aug 31, 2021

julian-risch closed this as completed Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Util functions to print output from a pipeline #1374

Util functions to print output from a pipeline #1374

predoctech commented Aug 27, 2021

predoctech commented Aug 30, 2021

julian-risch commented Aug 30, 2021

julian-risch commented Aug 30, 2021

predoctech commented Aug 30, 2021

julian-risch commented Aug 30, 2021

julian-risch commented Sep 7, 2021

predoctech commented Sep 9, 2021

julian-risch commented Sep 9, 2021

Util functions to print output from a pipeline #1374

Util functions to print output from a pipeline #1374

Comments

predoctech commented Aug 27, 2021

predoctech commented Aug 30, 2021

julian-risch commented Aug 30, 2021

julian-risch commented Aug 30, 2021

predoctech commented Aug 30, 2021

julian-risch commented Aug 30, 2021

julian-risch commented Sep 7, 2021

predoctech commented Sep 9, 2021

julian-risch commented Sep 9, 2021