Skip to content

j-n-t/natural_language_processing

Repository files navigation

Natural Language Processing

This repository illustrates how different natural language techniques can be applied in a variety of scenarios. These projects follow a tutorial-like approach, where the implementation details are thoroughly discussed alongside with the code.

Projects

  • POS Tagging, Syntactic Dependency Parsing and NER

    Part-of-speech tagging (POS), syntactic dependency parsing and named entity recognition (NER) with spaCy.

    This notebook can be better visualized on nbviewer.

  • Training the NER pipeline component

    Update the Named Entity Recognition (NER) pipeline component using spaCy and INCEpTION.

    This notebook can be better visualized on nbviewer.

  • Sentiment Analysis

    Sentiment analysis of 10 000 Amazon reviews with a rule-based algorithm (VADER) and a machine learning model.

  • Text Classification with Classical ML

    Text classification of movie reviews from the polarity dataset v2.0 using different approaches. Creation of a custom text normalization transformer and a custom gensim vectorization transformer to be used in a scikit-learn pipeline. Testing of different classifiers.

  • Text Classification with Neural Networks

    Text classification of movie reviews from the large movie review dataset using artifical neural networks - creation of 9 different architectures with Keras. Evaluation and comparison of the performance of the different classifiers.

  • Topic Modeling

    Assigning over 400 000 quora questions to different categories, or topics, with three different methods: LDA, LSA and NMF. A double approach to LDA: the gensim way and the scikit-learn way. Topic visualization with pyLDAvis.

  • Question Answering

    Answering some simple questions with Keras.