Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve output of the tutorials #1675

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
00b3908
Improve output of Tutorial11 (.py version only)
ZanSara Oct 28, 2021
67baf0f
Improve output of Tutorial10 (.py only)
ZanSara Oct 28, 2021
2bc14e6
Slightly improve Tutorial8 (.py only)
ZanSara Oct 28, 2021
638d4f4
Reduce level of detail of printed answers in Tutorial11 (.ipynb)
ZanSara Oct 28, 2021
54cbf18
Add latest docstring and tutorial changes
github-actions[bot] Oct 28, 2021
abc29ff
Improve output printing of tutorial13 (.py only)
Oct 29, 2021
ce3fc38
Improve output of tutorial13 (.ipynb)
ZanSara Oct 29, 2021
c9dff47
Add latest docstring and tutorial changes
github-actions[bot] Oct 29, 2021
e421a2e
Improve output of Tutorial14 (.py only)
ZanSara Oct 29, 2021
fe179e9
Merge branch 'tutorials_output' of github.com:deepset-ai/haystack int…
ZanSara Oct 29, 2021
0148e10
Add the same modifications to the ipynb version of Tutorial14
ZanSara Oct 29, 2021
a2973df
Add the same modifications to the ipynb version of Tutorial14
ZanSara Oct 29, 2021
08ed6c5
Add latest docstring and tutorial changes
github-actions[bot] Oct 29, 2021
e77b98a
Add a clear message to print_answers in case there are no answers to …
ZanSara Nov 1, 2021
c6c0d3e
Merge branch 'tutorials_output' of github.com:deepset-ai/haystack int…
ZanSara Nov 1, 2021
289ef13
Clean up Tutorial14 and rename QueryClassifier to MyQueryClassifier i…
ZanSara Nov 1, 2021
293096c
Add latest docstring and tutorial changes
github-actions[bot] Nov 1, 2021
27d9ff3
Clear all notebooks' output
ZanSara Nov 1, 2021
cb84923
Merge branch 'tutorials_output' of github.com:deepset-ai/haystack int…
ZanSara Nov 1, 2021
621a53e
Add latest docstring and tutorial changes
github-actions[bot] Nov 1, 2021
6ea249a
Add more details about how to print the output in Tutorial1
ZanSara Nov 1, 2021
77cfee0
Merge branch 'tutorials_output' of github.com:deepset-ai/haystack int…
ZanSara Nov 1, 2021
fb8f662
Add latest docstring and tutorial changes
github-actions[bot] Nov 1, 2021
8cf8627
Add output to the first tutorial's last cell
ZanSara Nov 3, 2021
52d4a0f
Merge branch 'tutorials_output' of github.com:deepset-ai/haystack int…
ZanSara Nov 3, 2021
d3c7735
Add latest docstring and tutorial changes
github-actions[bot] Nov 3, 2021
fade01f
Modify repr and str for Document and Answer class
ZanSara Nov 4, 2021
843bf24
Merge branch 'tutorials_output' of github.com:deepset-ai/haystack int…
ZanSara Nov 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions docs/_src/tutorials/tutorials/1.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,9 +237,58 @@ prediction = pipe.run(


```python
# Change `minimal` to `medium` or `all` to see a more detailed output
print_answers(prediction, details="minimal")
```


```python
# Alternative: print the object

# print(predictions)

# or:

# import json
# print(json.dumps(prediction, indent=4, default=str))

# Sample output:
# {
# "query": "Who is the father of Arya Stark?",
# "no_ans_gap": 11.688868522644043,
# "answers": [
# "Answer(answer: 'Eddard', score: 0.9919578731060028, context: 's Nymeria after a legendary warrior queen. She travels with her father, Eddard, to King's Landing wh...')",
# "Answer(answer: 'Ned', score: 0.9767240881919861, context: '\n====Season 1====\nArya accompanies her father Ned and her sister Sansa to King's Landing. Before the...')",
# "Answer(answer: 'Lord Eddard Stark', score: 0.8930400013923645, context: 'ark daughters.\nDuring the Tourney of the Hand to honour her father Lord Eddard Stark, Sansa Stark is...')",
# "Answer(answer: 'Joffrey', score: 0.6753827035427094, context: 'laying with one of his wooden toys.\nAfter Eddard discovers the truth of Joffrey's paternity, he tell...')",
# "Answer(answer: 'Robb', score: 0.6665983200073242, context: 'allow the army to cross the river and to commit his troops in return for Robb and Arya Stark marryin...')"
# ],
# "documents": [
# "Document(id: 6b181174d1237878b706e6a12d69e92, content: '\n===In the Riverlands===\nThe Stark army reaches the Twins, a bridge stronghold controlled by Walder ...')",
# "Document(id: a4d2cc51d351b785c6effddd3345bb39, content: '\n===On the Kingsroad===\nCity Watchmen search the caravan for Gendry but are turned away by Yoren. Ge ...')",
# "Document(id: d1f36ec7170e4c46cde65787fe125dfe, content: '\n===''A Game of Thrones''===\nSansa Stark begins the novel by being betrothed to Crown Prince Joffrey ...')",
# "Document(id: dd4e070a22896afa81748d6510006d2, content: '\n===Season 2===\nGendry travels North with Yoren and other Night's Watch recruits, including Arya Sta ...')",
# "Document(id: 956aa2b653c6debcb6cb217531a6be58, content: '\n===In King's Landing===\nAfter Varys tells him that Sansa Stark's life is also at stake, Eddard \"Ned ...')",
# "Document(id: 180c2a6b36369712b361a80842e79356, content: '\n====Season 1====\nArya accompanies her father Ned and her sister Sansa to King's Landing. Before the ...')",
# "Document(id: fc56eb160221cbdc74d223383680dbeb, content: '\n==== ''A Storm of Swords'' and ''A Feast for Crows'' ====\nPrior to the Red Wedding, Roose Bolton pr ...')",
# "Document(id: e60cb63e43a5a01694aea3a5cab14281, content: '\n===House Frey===\n* '''Walder Frey''' (seasons 1, 3, 6\u20137) portrayed by David Bradley. David Bradley ...')",
# "Document(id: ba2a8e87ddd95e380bec55983ee7d55f, content: '\n==== ''A Game of Thrones'' ====\nArya adopts a direwolf cub, which she names Nymeria after a legenda ...')",
# "Document(id: 212af6309da6cb02c3c0e1da9f6fdf71, content: '\n== Character description ==\nGendry was conceived and born in King's Landing after Robert's Rebellio ...')"
# ],
# "root_node": "Query",
# "params": {
# "Retriever": {
# "top_k": 10
# },
# "Reader": {
# "top_k": 5
# }
# },
# "node_id": "Reader"
# }

```

## About us

This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany
Expand Down
8 changes: 4 additions & 4 deletions docs/_src/tutorials/tutorials/11.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ Below, we define a very naive `QueryClassifier` and show how to use it:


```python
class QueryClassifier(BaseComponent):
class MyQueryClassifier(BaseComponent):
outgoing_edges = 2

def run(self, query: str):
Expand All @@ -307,7 +307,7 @@ class QueryClassifier(BaseComponent):

# Here we build the pipeline
p_classifier = Pipeline()
p_classifier.add_node(component=QueryClassifier(), name="QueryClassifier", inputs=["Query"])
p_classifier.add_node(component=MyQueryClassifier(), name="QueryClassifier", inputs=["Query"])
p_classifier.add_node(component=es_retriever, name="ESRetriever", inputs=["QueryClassifier.output_1"])
p_classifier.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_2"])
p_classifier.add_node(component=reader, name="QAReader", inputs=["ESRetriever", "DPRRetriever"])
Expand All @@ -316,12 +316,12 @@ p_classifier.draw("pipeline_classifier.png")
# Run only the dense retriever on the full sentence query
res_1 = p_classifier.run(query="Who is the father of Arya Stark?")
print("DPR Results" + "\n" + "="*15)
print_answers(res_1)
print_answers(res_1, details="minimal")

# Run only the sparse retriever on a keyword based query
res_2 = p_classifier.run(query="Arya Stark father")
print("ES Results" + "\n" + "="*15)
print_answers(res_2)
print_answers(res_2, details="minimal")
```

## Evaluation Nodes
Expand Down
30 changes: 24 additions & 6 deletions docs/_src/tutorials/tutorials/13.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,15 @@ which the the document can answer.

```python
question_generation_pipeline = QuestionGenerationPipeline(question_generator)
for document in document_store:
result = question_generation_pipeline.run(documents=[document])
pprint(result)
for idx, document in enumerate(document_store):

print(f"\n * Generating questions for document {idx}: {document.content[:50]}...")
result = question_generation_pipeline.run(documents=[document])

print("Generated questions:")
for result in result["generated_questions"]:
for question in result["questions"]:
print(f" - {question}")
```

## Retriever Question Generation Pipeline
Expand All @@ -111,8 +117,14 @@ This pipeline takes a query as input. It retrieves relevant documents and then g
```python
retriever = ElasticsearchRetriever(document_store=document_store)
rqg_pipeline = RetrieverQuestionGenerationPipeline(retriever, question_generator)

print(f"\n * Generating questions for documents matching the query 'Arya Stark'")
result = rqg_pipeline.run(query="Arya Stark")
pprint(result)

print("Generated questions:")
for result in result["generated_questions"]:
for question in result["questions"]:
print(f" - {question}")
```

## Question Answer Generation Pipeline
Expand All @@ -124,9 +136,15 @@ a Reader model
```python
reader = FARMReader("deepset/roberta-base-squad2")
qag_pipeline = QuestionAnswerGenerationPipeline(question_generator, reader)
for document in tqdm(document_store):
for idx, document in enumerate(tqdm(document_store)):

print(f"\n * Generating questions and answers for document {idx}: {document.content[:20]}...")
result = qag_pipeline.run(documents=[document])
pprint(result)

for pair in result["results"]:
print(f" - Q:{pair['query']}")
for answer in pair["answers"]:
print(f" A: {answer.answer}")
```

## About us
Expand Down
8 changes: 4 additions & 4 deletions haystack/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,10 +186,10 @@ def __eq__(self, other):
getattr(other, 'id_hash_keys', None) == self.id_hash_keys)

def __repr__(self):
return str(self.to_dict())
return f"<Document: {str(self.to_dict())}>"

def __str__(self):
return f"content: {self.content[:100]} {'[...]' if len(self.content) > 100 else ''}"
return f"<Document: id={self.id}, content='{self.content[:100]} {'...' if len(self.content) > 100 else ''}'>"

def __lt__(self, other):
""" Enable sorting of Documents by score """
Expand Down Expand Up @@ -262,10 +262,10 @@ def __lt__(self, other):
return self.score < other.score

def __str__(self):
return f"answer: {self.answer} \nscore: {self.score} \ncontext: {self.context}"
return f"<Answer: answer='{self.answer}', score={self.score}, context='{self.context[:50]}{'...' if len(self.context) > 50 else ''}'>"

def to_dict(self):
return asdict(self)
return f"<Answer {asdict(self)}>"

@classmethod
def from_dict(cls, dict:dict):
Expand Down
4 changes: 3 additions & 1 deletion haystack/utils/export_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def print_answers(results: dict, details: str = "all"):
"""
Utilitiy function to print results of Haystack pipelines
:param results: Results from a pipeline
:param details: One of ["minimum", "medium", "all]. Defining the level of details to print.
:param details: One of ["minimum", "medium", "all"]. Defining the level of details to print.
:return: None
"""
# TODO: unify the output format of Generator and Reader so that this function doesn't have the try/except
Expand All @@ -47,6 +47,8 @@ def print_answers(results: dict, details: str = "all"):
except:
if details == "minimal":
print(f"Query: {results['query']}")
if not "answers" in results.keys():
print("No answers!")
for a in results["answers"]:
print(f"Answer: {a['answer']}")
else:
Expand Down
43 changes: 27 additions & 16 deletions tutorials/Tutorial10_Knowledge_Graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import subprocess
import time
from pathlib import Path
from pprint import pprint

from haystack.nodes import Text2SparqlRetriever
from haystack.document_stores import GraphDBKnowledgeGraph
Expand All @@ -26,7 +27,7 @@ def tutorial10_knowledge_graph():

# Start a GraphDB server
if LAUNCH_GRAPHDB:
logging.info("Starting GraphDB ...")
print("Starting GraphDB ...\n")
status = subprocess.run(
['docker run -d -p 7200:7200 --name graphdb-instance-tutorial docker-registry.ontotext.com/graphdb-free:9.4.1-adoptopenjdk11'], shell=True
)
Expand All @@ -52,8 +53,13 @@ def tutorial10_knowledge_graph():

# Import triples of subject, predicate, and object statements from a ttl file
kg.import_from_ttl_file(index="tutorial_10_index", path=Path(graph_dir+"triples.ttl"))
logging.info(f"The last triple stored in the knowledge graph is: {kg.get_all_triples()[-1]}")
logging.info(f"There are {len(kg.get_all_triples())} triples stored in the knowledge graph.")

print()
print("# KNOWLEDGE GRAPH CONTENT")
print("#########################")
print(f"There are {len(kg.get_all_triples())} triples stored in the knowledge graph.")
print(f"The last triple stored in the knowledge graph is:\n{kg.get_all_triples()[-1]}")
print()

# Define prefixes for names of resources so that we can use shorter resource names in queries
prefixes = """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Expand All @@ -69,25 +75,30 @@ def tutorial10_knowledge_graph():
# One limitation though: our pre-trained model can only generate questions about resources it has seen during training.
# Otherwise, it cannot translate the name of the resource to the identifier used in the knowledge graph.
# E.g. "Harry" -> "hp:Harry_potter"

query = "In which house is Harry Potter?"
logging.info(f"Translating the text query \"{query}\" to a SPARQL query and executing it on the knowledge graph...")
print(f"\nTranslating the text query \"{query}\" to a SPARQL query and executing it on the knowledge graph...")
print(" -> Correct SPARQL query: select ?a { hp:Harry_potter hp:house ?a . }")
print(" -> Correct answer: Gryffindor")
result = kgqa_retriever.retrieve(query=query)
logging.info(result)
# Correct SPARQL query: select ?a { hp:Harry_potter hp:house ?a . }
# Correct answer: Gryffindor
print("Results: ")
for r in result:
pprint(r)

logging.info("Executing a SPARQL query with prefixed names of resources...")
print("\nExecuting a SPARQL query with prefixed names of resources...")
print(" -> Paraphrased question: Who is the keeper of keys and grounds?")
print(" -> Correct answer: Rubeus Hagrid")
result = kgqa_retriever._query_kg(sparql_query="select distinct ?sbj where { ?sbj hp:job hp:Keeper_of_keys_and_grounds . }")
logging.info(result)
# Paraphrased question: Who is the keeper of keys and grounds?
# Correct answer: Rubeus Hagrid
print(" * Results: ")
for r in result:
pprint(r)

logging.info("Executing a SPARQL query with full names of resources...")
print("\nExecuting a SPARQL query with full names of resources...")
print(" -> Paraphrased question: What is the patronus of Hermione?")
print(" -> Correct answer: Otter")
result = kgqa_retriever._query_kg(sparql_query="select distinct ?obj where { <https://deepset.ai/harry_potter/Hermione_granger> <https://deepset.ai/harry_potter/patronus> ?obj . }")
logging.info(result)
# Paraphrased question: What is the patronus of Hermione?
# Correct answer: Otter
print("Results: ")
for r in result:
pprint(r)


if __name__ == "__main__":
Expand Down
8 changes: 4 additions & 4 deletions tutorials/Tutorial11_Pipelines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -547,7 +547,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
"class QueryClassifier(BaseComponent):\n",
"class MyQueryClassifier(BaseComponent):\n",
" outgoing_edges = 2\n",
"\n",
" def run(self, query: str):\n",
Expand All @@ -558,7 +558,7 @@
"\n",
"# Here we build the pipeline\n",
"p_classifier = Pipeline()\n",
"p_classifier.add_node(component=QueryClassifier(), name=\"QueryClassifier\", inputs=[\"Query\"])\n",
"p_classifier.add_node(component=MyQueryClassifier(), name=\"QueryClassifier\", inputs=[\"Query\"])\n",
"p_classifier.add_node(component=es_retriever, name=\"ESRetriever\", inputs=[\"QueryClassifier.output_1\"])\n",
"p_classifier.add_node(component=dpr_retriever, name=\"DPRRetriever\", inputs=[\"QueryClassifier.output_2\"])\n",
"p_classifier.add_node(component=reader, name=\"QAReader\", inputs=[\"ESRetriever\", \"DPRRetriever\"])\n",
Expand All @@ -567,12 +567,12 @@
"# Run only the dense retriever on the full sentence query\n",
"res_1 = p_classifier.run(query=\"Who is the father of Arya Stark?\")\n",
"print(\"DPR Results\" + \"\\n\" + \"=\"*15)\n",
"print_answers(res_1)\n",
"print_answers(res_1, details=\"minimal\")\n",
"\n",
"# Run only the sparse retriever on a keyword based query\n",
"res_2 = p_classifier.run(query=\"Arya Stark father\")\n",
"print(\"ES Results\" + \"\\n\" + \"=\"*15)\n",
"print_answers(res_2)"
"print_answers(res_2, details=\"minimal\")"
],
"outputs": [],
"metadata": {
Expand Down
Loading