Individual Assignment - Bioinformatics Course
This repository contains the code, data and report for a project conducted as part of a master's bioinformatics course. The project aimed to enhance protein secondary structure prediction using neural networks. The code is implemented in Python and executed in Jupyter Notebook.
The project utilized neural networks to predict protein secondary structures from amino acid sequences. It incorporated multiple sequence alignment data and employed a sliding-window approach to improve prediction accuracy. Various techniques, including dropout and ensemble modeling, were explored to optimize model performance and mitigate overfitting.
- Jupyter Notebook: Contains the Python code for the project.
- Data: Includes datasets used for training, cross-validation, and testing the models. The data were sourced from Katarina Elez's GitHub repository at https://github.com/katarinaelez/protein-ss-pred.
- Results: Contains visualizations and metrics evaluating model performance.
- Documentation: Additional documentation and resources related to the project.
To run the code, ensure you have Python installed along with necessary libraries like NumPy, Pandas, Matplotlib, Seaborn, TensorFlow, Keras, and scikit-learn. Clone the repository and execute the Jupyter Notebook files in the 'Notebooks' directory.
pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install keras
pip install keras-tuner
pip install tensorflow
pip install scikit-learn
pip install notebook
git clone https://github.com/kantonopoulos/ProteinSS-Prediction-Keras
Ensemble Fully Connected Neural Network | |||
---|---|---|---|
Class | Blind test | ||
Sensitivity | Helix | 0.7415±0.0147 | |
Sheet | 0.6436±0.0121 | ||
Coil | 0.8373±0.0137 | ||
Specificity | Helix | 0.9209±0.0147 | |
Sheet | 0.9451±0.0038 | ||
Coil | 0.7513±0.0134 | ||
Accuracy | Helix | 0.8511±0.0013 | |
Sheet | 0.8724±0.0009 | ||
Coil | 0.7831±0.0035 |