From 79d39c9e90c158b39feaf3134659cc377f794c07 Mon Sep 17 00:00:00 2001 From: Facundo Sapienza Date: Wed, 19 Jul 2023 23:15:23 +0200 Subject: [PATCH] Persistent Environment Documentation (#66) * small changes to Python setup * draft for persistent environment in the CryoCloud * small changes based on https://github.com/CryoInTheCloud/CryoCloudWebsite/issues/65 * grammar check * Added best environment practices * grammar check --- book/how_tos/background/python.md | 98 +++++++++++++++++++++++++++---- 1 file changed, 87 insertions(+), 11 deletions(-) diff --git a/book/how_tos/background/python.md b/book/how_tos/background/python.md index b4b0c68..3b3bf98 100644 --- a/book/how_tos/background/python.md +++ b/book/how_tos/background/python.md @@ -1,4 +1,4 @@ -# Python Installation +# Python Installation and Environments ## Overview @@ -17,7 +17,8 @@ and have a fully functioning environment after. Python software is distributed as a series of *libraries* that are called within your code to perform certain tasks. There are many different collections, or *distributions* of Python software. Generally you install a specific distribution of Python and then add additional libraries as you need them. There are also several different *versions* of Python. Support for Python 2 ended in 2020, so you should use Python>=3! ```{note} -If you open a terminal on your computer, chances are if you type 'python' you will find it is already installed! But it is best-practice to create separate environments or 'virtual environments' to not interfere with existing installations. You can use {term}`conda` for this. +If you open a terminal on your computer, chances are if you type `python` you will find it is already installed! But it is best-practice to create separate environments or _virtual environments_ to not interfere with existing installations. This also allows you to have different projects in +different workspaces, each one of them with different Python versions and different packages installed. You can use {term}`conda` for this (see next sections). ``` ## What is Conda? @@ -45,6 +46,7 @@ url=https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh wget $url -O miniconda.sh bash miniconda.sh -b -p $HOME/miniconda ``` +Notice that this last command is using `bash`. If bash is not already your default shell, you need to set it to be so (use the `chsh -s /bin/bash` command to change your default shell to bash). ### Installing Anaconda (Optional) @@ -60,23 +62,97 @@ Python 3.7.3|Anaconda custom (x86_64)| (default, Mar 27 2019, 22:11:17) ... ``` -### Installing a specific python version +### Installing Mamba (Optional) -We will be using Python 3 during the week. Since Anaconda (on Linux) expects you to work in the `bash` shell, if this is not already your default shell, you need to set it to be so (use the `chsh -s /bin/bash` command to change your default shell to bash), then you can create an isolated Python environment with the following commands: - -``` bash -conda create --name py39 python=3.9 +Setting up an environment with package dependencies using `conda` may be quite slow... +A better option is to use `mamba` instead to solve version dependencies, which is much faster and archives the same result than `conda`. +In order to use mamba, first install `mamba` in your base environment using `conda`: +```bash +conda install mamba ``` +Now, every `conda` command can be replaced by `mamba`. +For example, you can use `mamba install ` instead of `conda install ` (except the `conda activate`! That is the only command for which you need to use `conda`). + + +## Working in an Environment -To use Python 3.9: +Once miniconda/conda/mamba has been installed, we can use it to create new virtual environments with different Python versions and packages. A good practice is to have different environments for different projects when these have different dependencies. +````{admonition} Persistent Environments +By default, conda environments are not persistent in the CryoCloud Hub. +This means that every time you open a new CryoCloud session, all the installations you made in previous sessions will be gone. +In order to be able to work in the same computational environment across sessions without re-installing the same packages, we encourage users to create a folder in their home directory to store all their customized environments. +By doing this, your environments will stay in your account when you comeback to work in the future. +In order to do this, create a folder called `envs` in your home directory (that is, `/home/jovyan`). +You can do this directly from the terminal: +```bash +mkdir envs +``` +Then, also in your home directory, create a new textfile called `.condarc` (the name is important! don't forget the initial dot `.`) with the following content: +``` +# .condarc -``` bash -conda activate py39 +envs_dirs: + - ~/envs ``` +This will indicates to conda that all the new environments have to live inside `~envs` (`~` is the unix character for your home directory). +```` -To check if you have the correct version activated +You can now create a new environment with `conda`. +If you are just starting to work in a new project or want to test something, you can create an environment from scratch with +```bash +conda env create --name python=3.9 +``` +where you have to replace `` with the name you want to put to your new environment. +Alternatively, you can create an environment directly from a `environment.yml` file +```bash +conda env create -f environment.yml +``` +This second option is particularly practical for reproducibility and collaboration in a team. +You can check that the creation and installation of the new environment is working by first activating the environment +```bash +conda activate +``` +and then check if you have the correct Python version installed ```bash which python python --version ``` + +The best practice for reproducibility and collaboration is to have an updated `environmnent.yml` file in your working space (eg, in the GitHub repository of your project). +This allows members of the team to keep the environment updated and shared among users. +You can create a `environment.yml` file associated to the environment with the command +```bash +conda env export --from-history > environment.yml` +``` +This will write the required dependencies to reproduce the environment in the `environment.yml` file. +Conversely, if you have new packages listed in your `environment.yml` file (because you edited it or changes were made to it by a colleague and you got these changes over git, for example), you can apply these updates with this command +```bash +conda env update --file environment.yml --prune +``` +By doing this, you will keep the conda environment and the `environment.yml` file synchronized. +Sharing the `environment.yml` file ensures that other users will have the set of instructions to reproduce your virtual environment. +Notice that the environment and the `environment.yml` are not the same thing! The latest is just a text file that allows the creation +of a conda environment by using the instructions in this section. + +## Making the Environment Accessible to the iPython Kernel + +In order to access the kernel associated to our new environment from a Jupyter Notebook, we need to install `ipykernel`. We first activate the +new environment, +```bash +conda activate +``` +and then install `ipykernel`: +```bash +conda install ipykernel +``` +This steps are not needed if you created the environment directly from a `.yml` file that includes `ipykernel` as a dependency. + +Then we create the kernel with +```bash +python -m ipykernel install --user --name --display-name "IPython - " +``` +to create the associated kernel. +Replace `` with the name you want to call the iPython kernel and `` with the name of the respective environment. + +You are done! Next time you start your JupyterHub session, you will see the new kernel available from your launcher or from the upper right corner of any Jupyter Notebook.