Skip to content

Final Projects for CS224N: Natural Language Processing w/ Deep Learning & CS224U: Natural Language Understanding

Notifications You must be signed in to change notification settings


Repository files navigation

Sarcasm Detection

This repository contains code for 2 projects on sarcasm detection. The code at the main level of this repo was developed for the final project of Stanford's CS224N (Natural Language Processing with Deep Learning), which extended a previous course project and explored additional ways to extract contextual information using deeper models and attention. Code for the previous course project, which was a final project for Stanford's CS 224U (Natural Language Understanding), can be found in the old_project folder, and it explores the value of including context at the embedding level (comparing GloVe vs ELMo) and at the model-level (comparing a feed-forward NN with a bidirectional LSTM).

Getting Started

These instructions will get you a copy of the project up and running on your local machine.


What things you need to install the software and how to install them

1. Create a new Python3 virtual environment

Run python3 -m venv [your virtual environment name]
To activate your virtual environment, run  source [your virtual environment name]/bin/activate

2. Python >= 3.6

To check your python version, run python -V. If the Python version is less than 3.6, you will need to install a new version of python. You can follow this tutorial( if you need to upgrade your Python version.

3. numpy

If you have pip, you can simply run pip install numpy


If you have pip, simply run pip install --user -U nltk

5. Pytorch 1.4.0

If you have pip, simply run pip install torch==1.4.0

6. scikit-learn

If you have pip, simply run pip install sklearn

7. allennlp (requires Python >= 3.6)

If you have pip, simply run pip install allennlp

Getting Up and Running

  1. Pull the data.
If you want to pull the full dataset (~128k examples), run sh dataset_scripts/
If you want to pull a smaller version of the dataset (~6k examples, all political comments), run sh dataset_scripts/
Either script should take a several minutes to run.
  1. Generate ELMo embeddings for the data.
You'll need to run the script to generate ELMo embeddings for the data. If you pulled the full dataset, you can run the script as is by executing python If you pulled the smaller dataset, you'll need to edit by changing the 'main' on line 17 to 'pol'. This may take several hours for the full dataset.
  1. Start training models.
With the appropriate environment set up and a processed dataset, you're ready to start training models.
To run the training script, run python -m [model_type] -e [error_file], specifying one of the existing model types and a file to output errors for error analysis.
Example: python -m bilstm -e bilstm_errors.txt

Built With

  • Pytorch - Deep Learning Framework
  • AllenNLP - NLP package used to generate ELMo embeddings


  • Nicholas Benavides - Wrote most of the code for the projects

See also the list of contributors who participated in this project.


  • Thanks to kolchinski, yunjey, and MLWhiz for code used in various parts of these projects.
  • Thanks to Chris Manning and the CS 224N teaching staff for their guidance and instruction Winter 2020.
  • Thanks to Chris Potts and the CS 224U teaching staff for their guidance and instruction for Spring 2019.


Final Projects for CS224N: Natural Language Processing w/ Deep Learning & CS224U: Natural Language Understanding







No releases published


No packages published