🌸 Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks

This is the accompanying repo for our ICSE'22 paper. We present a Zenodo archive that also contains a list of sample Jupyter notebooks and the embedding file not suitable for this repository because of their size.

There are two main components of the approach:

Obtain runtime data using dynamic analysis. The data here is assignments encountered during execution.
Train a classifier and find name-value inconsistencies.

TL;DR 🪜

Simply run or follow the 📌 marked instructions from the root directory of this repository.

Requirements & Setup

📌 We have tested using Ubuntu 18.04 LTS and Python 3.8.12. Additionally, we use a Docker container to run dynamic analysis which also needs to be installed.

Directory Structure

The directory structure is as follows:

src/ # The root directory of all source files
benchmark/ # This may contain the input Python files & the Jupyter Notebooks
dynamic_analysis_runner/ # Code for running Dynamic Analysis
src/dynamic_analysis_tracker_local_package/ # Python package for saving the assignments encountered during execution
src/get_scripts_and_instrument/ # Code for getting Jupyter Notebooks, converting them to Python scripts and instrumenting
src/nn/ # Code for running the Classifier
results/ # The results generated by running the experiments are written here

Python Packages

The required packages are listed in requirements.txt. The packages may be installed using the command pip install -r requirements.txt. Additionally, install the PyTorch package (We have tested on PyTorch version 1.10.1).

📌

pip install -r requirements.txt
pip install torch==1.10.1+cpu torchvision==0.11.2+cpu torchaudio==0.10.1+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html

💡 The above command will install the CPU version of PyTorch. If you want CUDA support, please change the command accordingly as mentioned in the link.

Jupyter Notebook Dataset

We use the dataset from a CHI’18 paper that has analyzed more than 1.3 million publicly available Jupyter Notebooks from GitHub. Download the dataset using the link. We provide a sample of about 2000 Jupyter notebooks (benchmark/jupyter_notebook_datasets/sample.zip) obtained from this dataset for testing (Download the sample from the Zenodo archive).

Embedding

📌 Download the embedding file present at benchmark/python_embeddings.bin from the Zenodo archive and put in the benchmark folder.

1. Dynamic Analysis ⚙️

How to execute Jupyter notebooks from command line?

We want to execute a large number of Jupyter notebooks. We follow the following steps:

Convert the Jupyter notebooks to Python scripts.
Instrument the Python scripts individually.
Execute the instrumented scripts.
Collect the run-time data as:
- JSON files where a string representation of the data is saved. OR
- Pickled files where the value is stored in a binary format that may be read later. WARNING: Takes a lot of disk space.

Instrument Python files for tracking assignments

The directory is src/get_scripts_and_instrument

Run the following command from the root folder.

📌

python src/get_scripts_and_instrument/run_get_scripts_and_instrument.py

By default, this script should:

Extract the Jupyter notebooks present in 'benchmark/jupyter_notebook_datasets/sample.zip' to 'benchmark/python_scripts'
Convert the extracted notebooks to Python script
Delete the extracted notebooks
Instrument the converted Python scripts

Not all Jupyter Notebooks present in sample.zip get instrumented. Some encounter errors while conversion to Python scripts and some during instrumentation.

Execute the instrumented Python files in a Docker container

In many occasions, it has been found that the files being executed makes unsolicited network requests and downloads large datasets. This can lead to filling up the disk space quickly. We avoid this completely by running the instrumented python files in a docker container. More specifically, we execute the instrumented files using ipython3. Executing each file generates many JSON/pickle files if there exists any assignments which are in scope (We do not track assignments of type a.b.c = m or a[b] = c or aug-assignments of type a[b]+=2). Each generated file correspond to an assignment.

Dockerfile

The following Dockerfile is included in root directory. Notice the last line of the Dockerfile. When docker gets executed, this is the command that is run.

FROM python:3.7.5
COPY src/dynamic_analysis_tracker_local_package /home/dynamic_analysis_tracker_local_package

WORKDIR /home

RUN python3 -m pip install -e dynamic_analysis_tracker_local_package
RUN python3 -m pip install --upgrade pip
# Install some required packages
RUN python3 -m pip install \
        tqdm \
        jupyter \
        ipython 
# Install the most frequent packages
COPY dynamic_analysis_runner/most_frequent_packages.json /home
COPY dynamic_analysis_runner/install_freq_packages_python3.py /home
RUN python3 install_freq_packages_python3.py

# Create the Directories that will be mounted during Running the docker container
RUN mkdir -p /home/dynamic_analysis_runner
# We will mount the scripts that we want to execute here
RUN mkdir -p /home/python_scripts
# We will mount when 
RUN mkdir -p /home/dynamic_analysis_outputs
RUN mkdir -p /home/profile_default

# Create a working directory
RUN mkdir -p /home/temp_working_dir_inside_docker

# For debugging, check if all directories have been created properly
# RUN ls -al > /home/dynamic_analysis_outputs/directories_in_docker.txt
WORKDIR /home/temp_working_dir_inside_docker


CMD python3 /home/dynamic_analysis_runner/execute_files_py3.py

The general structure is something like the following

Build the docker image using the following command from the root of the project directory:

📌

sudo docker build -t nalin_dynamic_analysis_runner .

On building the image, it should create the required folders in the Docker container. Some of these folders are useful for mounting local folders (eg. folder containing the instrumented Python scripts) while running the image.

Additionally, it should also install the most common 100 packages present in benchmark/python_scripts. This may be obtained by running python src/get_scripts_and_instrument/utils/get_most_frequent_packages.py. If you do not want to re-run, we provide a pre-computed file at dynamic_analysis_runner/most_frequent_packages.json.

Run the Docker image:

📌

sudo docker run --network none \
-v "$(pwd)"/dynamic_analysis_runner:/home/dynamic_analysis_runner:ro \
-v "$(pwd)"/benchmark/python_scripts:/home/python_scripts:ro \
-v "$(pwd)"/results/dynamic_analysis_outputs:/home/dynamic_analysis_outputs \
-v "$(pwd)"/benchmark/profile_default:/home/profile_default:Z \
-it --rm nalin_dynamic_analysis_runner

What happens on running the above command?

By default, it should write the dynamic analysis results to the directory results/dynamic_analysis_outputs.
Mount two folders (read only). One contains own scripts while the other contains the Python files we want to execute.
- The dynamic_analysis_runner folder containing the runner script gets mounted at the home directory of the Docker container.
- The benchmark/python_scripts that contain the instrumented Python files also gets mounted at the home directory of the Docker container.
Mount another folder (writable) where the data is written by the executing scripts.
- The /results/dynamic_analysis_outputs folder gets mounted at the home directory of the Docker container.
The benchmark/profile_default folder and its content need to be mounted to avoid some iPython specific errors.

Dynamic Analysis Output

By default, the dynamic analysis outputs are written to the 'results/dynamic_analysis_outputs' folder as pickle files. Make sure this path exists.

2. Classifier 🦞

Hopefully, the dynamic analysis step generated many JSON/Pickle files. Each generated file represents one assignment (Eg. num=calc(1,2)) encountered during execution. The next step is to put the individual assignment files together and create a single file. We call this file positive_examples.

To be effective, a classifier needs both positive and negative examples. The negative examples are name and value pairs that typically do not occur together. We have two ways of create negative examples. One is to generate them randomly while the other use some heuristic for generating them.

All experiments using the classifier is run using the command python src/nn/run_classification.py.

Pre-Processing and Creating Negative Examples

All pre-processing and the creation of the negative examples happens at the process() call of run_classification.py. You may refer to the documentation of process to understand how it works.

💡 Both training and testing needs a pre-trained embedding file. As mentioned earlier in this README, you may download the one we provide or train your own using fastText.

Run training

📌

python src/nn/run_classification.py --train --num-epochs=5 --name='Nalin'

Run testing

During training the best models are saved to results/saved_models directory. The next step is to use the saved model for the test dataset.

📌 Provide path to a saved model and run the following command.

python src/nn/run_classification.py --test --saved-model=results/saved_models/RNNClassifier_Nalin.pt --test-dataset=results/test_examples.pkl

💡 The names of saved models are of the format: RNNClassifier_Nalin_RUN_ON__FSCORE.pt. For example, the saved model called RNNClassifier_Nalin_22-12-2021--10/47/00_0.817.pt was run on 22-Dec-2021 and the F-score was 0.817

Correspondence with the paper 📃

The testing phase corresponds to Section 4.2
The user study results of Section 4.3 are present in results/Variable Names vs. Runtime Values.csv
The results of Section 4.4 are present in a JSON file called results/test_dataset_predictions.json. The JSON file contains the file from where the example has been obtained from, the name of the variable, the value value assigned to it at runtime and predicted_p_buggy which means the probability with which the model thinks the example to be buggy. The files containing top-30 predictions are present at results/list_of_files_top30_predictions.txt.
The files used for comparing with other bug detecting approaches presented in Section 4.5 are listed at results/list_of_files_top30_predictions.txt.
The ablation study results of Section 4.6 are present at src/utils/ablation_study_results.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🌸 Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks

TL;DR 🪜

Requirements & Setup

Directory Structure

Python Packages

Jupyter Notebook Dataset

Embedding

1. Dynamic Analysis ⚙️

How to execute Jupyter notebooks from command line?

Instrument Python files for tracking assignments

Execute the instrumented Python files in a Docker container

Dockerfile

The general structure is something like the following

What happens on running the above command?

Dynamic Analysis Output

2. Classifier 🦞

Pre-Processing and Creating Negative Examples

Run training

Run testing

Correspondence with the paper 📃

Files

README.md

Latest commit

History

README.md

File metadata and controls

🌸 Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks

TL;DR 🪜

Requirements & Setup

Directory Structure

Python Packages

Jupyter Notebook Dataset

Embedding

1. Dynamic Analysis ⚙️

How to execute Jupyter notebooks from command line?

Instrument Python files for tracking assignments

Execute the instrumented Python files in a Docker container

Dockerfile

The general structure is something like the following

What happens on running the above command?

Dynamic Analysis Output

2. Classifier 🦞

Pre-Processing and Creating Negative Examples

Run training

Run testing

Correspondence with the paper 📃