COMP 562 Emoji Classification

Group Members: Angel Karafas, Dana Rubin, and Emily Stahle

In a world where people are using emojis more and more in order to express themselves, it is important that our Natural Language Processing methods adapt to this increased emoji use. Our final project for COMP 562: Machine Learning is to run logistic regression and Naive Bayes on a dataset of tweets that contain emojis.

Our project writeup is linked in this repository under Emoji_Sentiment_Analysis.pdf

Before Running the Code...

Note that we are using data sourced from Twitter, so some tweets may contain explicit content.
In order to run the code, you will need to download the emoji tweets dataset (https://www.kaggle.com/rexhaif/emojifydata-en?select=emojitweets-01-04-2018.txt) and place it in the project directory.
You may also have to pip install the project dependencies linked in the first cell in order to run the code. To do so, open a command prompt terminal and run 'python -m pip install -U matplotlib', 'python -m pip install -U numpy', 'python -m pip install --user -U nltk', 'python -m pip install -U tweet-preprocessor'. You may not need to run all of the commands for the code to run-- once the first cell runs, all of the imports should be sorted out.
The gradient descent function tends to run slowly depending on the processing speed of your computer. If the function is running slowly, lowering the number of iterations from 1500 should speed up the process. This should not significantly change the accuracy of the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

COMP 562 Emoji Classification

Before Running the Code...

Files

README.md

Latest commit

History

README.md

File metadata and controls

COMP 562 Emoji Classification

Before Running the Code...