Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infra: add CI job for running tutorial notebooks #26944

Merged
merged 49 commits into from
Sep 27, 2024
Merged

Conversation

ccurme
Copy link
Collaborator

@ccurme ccurme commented Sep 27, 2024

No description provided.

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 27, 2024
@ccurme ccurme marked this pull request as draft September 27, 2024 15:25
@ccurme ccurme marked this pull request as ready for review September 27, 2024 16:38
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Sep 27, 2024
- cron: '0 13 * * *'

env:
PYTHON_VERSION: "3.11"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth parameterizing this to make sure they run in 3.9 + 3.12 mostly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do separately as it's easier to debug workflow dispatch after merging the new workflow to master.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added python version as a parameter in #26956

Comment on lines 50 to 58
env:
# these won't actually be used because of the VCR cassettes
# but need to set them to avoid triggering getpass()
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
TAVILY_API_KEY: ${{ secrets.TAVILY_API_KEY }}
run: |
./docs/scripts/execute_notebooks.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we know they won't be needed, could we skip populating the real secrets and instead just run export ANTHROPIC_API_KEY=fake-api-key commands?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this comment as these actually do get used for notebooks where we skip cassettes.


NOTEBOOKS_NO_EXECUTION = [
"docs/docs/tutorials/graph.ipynb", # Requires local graph db running
"docs/docs/tutorials/local_rag.ipynb", # Local LLMs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooc why can't this be run off a cassette?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local LLM notebook I think may not make http calls, there's just significant overhead running local models (e.g., ollama). LMK if I've misunderstood.

file="$1"
echo "Starting execution of $file"
start_time=$(date +%s)
if ! output=$(time poetry run jupyter execute "$file" 2>&1); then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth switching to jupyter nbconvert --to notebook --execute $file for later customizability if we ever want to process the outputs / look at the file

doesn't need to block this PR - this looks like it only gets errors from the jupyter runtime and might be tricky to get warnings/incorrect outputs from cells?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

export -f execute_notebook

# Find all notebooks and filter out those in the skip list
notebooks=$(find docs/docs/tutorials -name "*.ipynb" | grep -v ".ipynb_checkpoints" | grep -vFf <(echo "$SKIP_NOTEBOOKS"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's a way to parameterize this, might make this CI step more customizable in future if we want to use it to run e.g. individual notebooks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into this after merging so we can debug workflow dispatch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #26956

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Sep 27, 2024
@ccurme ccurme enabled auto-merge (squash) September 27, 2024 18:25
@ccurme ccurme merged commit 67df944 into master Sep 27, 2024
87 checks passed
@ccurme ccurme deleted the cc/run_notebooks branch September 27, 2024 18:29
ccurme added a commit that referenced this pull request Sep 27, 2024
Addressing some lingering comments from
#26944, adding parameters
for
- python version
- working directory

![Screenshot 2024-09-27 at 3 33
21 PM](https://github.com/user-attachments/assets/dfa45772-fddb-4489-a148-c9ed83d844d0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging. size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants