Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add execute_eval_run example to Tutorial 5 #2459

Merged
merged 6 commits into from
Jun 13, 2022
Merged

Conversation

tstadel
Copy link
Member

@tstadel tstadel commented Apr 26, 2022

Proposed changes:

  • Add section Storing results in MLflow to Tutorial 5

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@tstadel tstadel marked this pull request as ready for review June 9, 2022 17:21
@tstadel tstadel requested a review from julian-risch June 9, 2022 17:21
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the changes and got the results stored here: https://public-mlflow.deepset.ai/#/experiments/698 that looks good to me! 👍 @tstadel I will go ahead and merge this PR but I'll also leave some comments here on how I think we could improve the tutorial further.

One thing I noticed is that the comparison of scores returned by pipeline.eval(add_isolated_node_eval=True) and by reader.eval() is difficult.

reader.eval output is:

Reader Top-4-Accuracy: 99.09208819714657
Reader Top-1-Exact Match: 95.71984435797665
Reader Top-1-F1-Score: 95.73510337987335
Reader Top-4-Accuracy (without no_answers): 72.0
Reader Top-4-Exact Match (without no_answers): 44.0
Reader Top-4-F1-Score (without no_answers): 59.71344537815126

and pipeline.eval output is:

0.48 #print(metrics["Reader"]["exact_match"])
0.6027426153741944 #print(metrics["Reader"]["f1"])

Here, we could add a sentence to explain that the "without no_answers" metrics are the ones that a re expected to be similar.
As another small improvement, we could add a sentence in the tutorial after the headline "Run experiments". We should briefly explain here that an experiment consists of several executions of pipeline.eval() (evaluation runs) so that the user knows what to expect.

@julian-risch julian-risch merged commit 66c7d1a into master Jun 13, 2022
@julian-risch julian-risch deleted the execute_eval_run_nb branch June 13, 2022 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants