Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New notebook-level metadata indicating whether outputs should be saved #299

Open
jasongrout opened this issue Aug 27, 2022 · 3 comments
Open

Comments

@jasongrout
Copy link
Member

For a long time, people have struggled with Jupyter notebooks and version control. One complicating factor causing churn in notebooks is when output is saved. For example, in ipywidgets we finally insisted on our example notebooks always being saved only after clearing any outputs to prevent churn in the repo. Saving output can also trigger security or business concerns in certain situations as well. There are many situations when a user would like to indicate to the system that a particular notebook should be saved with only the inputs and outputs should be stripped out.

What do people think of having a new notebook-level metadata key that indicates the user wishes to only save inputs, i.e., the user wishes to effectively clear the outputs before saving? Perhaps jupyter.exclude_outputs, which if true, is an explicit user hint to the tool saving the notebook that it should strip outputs before saving.

Not all tools would obey this hint. For example, I imagine that nbconvert would save outputs if the appropriate options were given, regardless of this hint. However, I think it would be great if JupyterLab/Jupyter Notebook and other frontends could respect the setting.

Disclaimer: we are also looking at how outputs are saved in Jupyter notebook exports at Databricks, where users may have a business need to indicate outputs should not be saved in a notebook.

@Carreau
Copy link
Member

Carreau commented Aug 30, 2022

I don't think that should be in the notebook metadata.
I think that should be in a local folder like a .jupyter_config, in the same way you have a .gitignore maybe with file patterns, and options.

@vidartf
Copy link
Contributor

vidartf commented Sep 27, 2022

Note that there might be two different user needs here:

  • The user does not want to send the outputs to the server (or, if the outputs are discarded, we might want to optimize network traffic by not sending them only to immediately have them discarded). This would then be something implemented in the front-end.
  • The user is fine with sending outputs, but don't want them written to disk. This would then be implemented in the contents manager. This also allows for an admin to enforce the stripping of outputs regardless of user preference (or to respect it, if wanted).

In both cases, the question is where the setting would be best stored (but ideally it would be a choice where the setting is both visible to the client and the contents manager). You could have a server-wide setting to enable/disable/filter, but there would also be benefit to tying a field to a specific notebook: either "I know that this notebook has safe outputs, please save them" or "I know that this notebook has code that can produce sensitive outputs (like PII), please never save them". In that case, using notebook metadata is the best way (setting will endure even if notebook is copied/moved or shared with others).

@bchiang2
Copy link

Following up on @Carreau 's proposal for managing Jupyter notebook results, I would like to suggest creating a . jupyter_ignore_results file similar to .gitignore. This file would allow us to specify whether or not we want to include the results of our Jupyter notebooks when sharing or committing our code.

As an example, consider the following .jupyter_ignore_results file contents:

# This is a comment
folder/*
!folder/secrets/*
folder/secrets/not-so-secret/*

This would result in the following results inclusion for notebooks:

folder/subfolder/notebook.ipynb - includes results
folder/secrets/notebook.ipynb - does not include results, note the exclusion pattern that matches it.
folder/secrets/not-so-secret/subfolder/notebook.ipynb - includes results, note the override more specific pattern that follows the exclusion pattern.

For context, we are actively working on a prototype implementation for Databricks of this for exporting Jupyter notebooks, and we’d like to have a conversation about standardizing if it is useful to the wider community. Please let me know if you have any further questions or suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants