Zezima is a sophisticated gene prediction tool that leverages the power of Transformer models to analyze DNA sequences.
It's designed to efficiently predict gene-related features within DNA sequences, aiding in the complex task of genomic analysis.
- Transformer Model: Utilizes an advanced machine learning approach for gene prediction.
- Custom DNA Sequence Handling: Tailored to process specific DNA sequence formats.
- Configurable Parameters: Offers flexibility to adjust model parameters in
config.py
Before beginning the setup, ensure your system meets the following requirements:
- Operating System: Linux
- Python Version: Python 3.10 or higher
A Python virtual environment is recommended for managing the project's dependencies. Follow these steps to create and activate your virtual environment:
-
Create a Virtual Environment:
python3.10 -m venv venv
-
Activate the Virtual Environment:
source venv/bin/activate
-
Install Required Python Packages:
pip3 install -Ur requirements.txt
The input data file should be structured as follows, containing a header section and subsequent data vectors:
#HEADER#
#DATE=2024-02-25T12:02:29.238927
#pre_processing_version=[0, 1, 0]
#bp_vector_schema=['A', 'C', 'G', 'T', 'PROMOTOR_MOTIF', 'ORF', 'POLY_ADENYL', 'miRNA', 'rRNA', 'gene']
#description of nucleotide:A=[1, 0, 0, 0], C=[0, 1, 0, 0], G=[0, 0, 1, 0], T=[0, 0, 0, 1]
#description of feature:0=no_present, 1=start, 2=continuation/ongoing, 3=end
#max_feature_overlap=0
####END####
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0] [0, 1, 0, 0, 0, 1, 0, 0, 0, 0]
Each vector in the file represents information about a specific position on the DNA,
as defined by bp_vector_schema
. The vector encodes the presence and status of
various genetic features at that position.
For more detailed information about the input data structure and the pre-processing steps, please refer to the pre-processing documentation.
-
Prepare your DNA sequence data files according to the specified format and place them in the designated input directory. In a case that you don't have your own pipelines you can use or modify this pre-processing stage to your needs
-
Configure the model and execution parameters in the
config.py
file. -
Run the application:
python3.10 run.py
Zezima is open-source software licensed under the MIT License.
For more details, see the LICENSE
file.