Wei and Zou 2019, EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks (GitHub)
Run chmod +x download.sh
and ./download.sh
from DataLoader import DataLoader
dl = DataLoader()
X, y = dl.load_subreddits(subreddits=['leagueoflegends', 'AdviceAnimals'])
dl = DataLoader()
X, y = dl.load_subreddits(subreddits=['leagueoflegends', 'AdviceAnimals'])
dl.export_for_eda(X, y) # exports eda_nlp/data/reddit.txt
Then run python eda_nlp/code/augment.py --input=eda_nlp/data/reddit.txt
. Augmented data is
outputted to eda_nlp/data/eda_reddit.txt
. Refer to EDA
repo for details about the file format.
To use the augmented data,
X, y = dl.import_from_eda()