coolneng 3ff3ea5594 | ||
---|---|---|
data | ||
locimend | ||
nix | ||
trained_model | ||
.gitignore | ||
LICENSE.md | ||
README.md | ||
default.nix | ||
docker.nix | ||
flake.lock | ||
flake.nix | ||
poetry.lock | ||
pyproject.toml | ||
shell.nix |
README.md
locimend
locimend is a tool that corrects DNA sequencing errors using Deep Learning.
The goal is to provide a correct DNA sequence, when a sequence containing errors is provided.
It provides both a command-line program and a REST API.
Technologies
- Tensorflow
- Biopython
- FastAPI
Installation
This project uses Nix to ensure reproducible builds.
- Install Nix (compatible with MacOS, Linux and WSL):
curl -L https://nixos.org/nix/install | sh
- Clone the repository:
git clone https://git.coolneng.duckdns.org/coolneng/locimend
- Change the working directory to the project:
cd locimend
- Enter the nix-shell:
nix-shell
- Install the dependencies via poetry:
poetry install
After running these commands, you will find yourself in a shell that contains all the needed dependencies.
Usage
Training the model
The following command creates the trains the Deep Learning model and shows the accuracy and AUC:
poetry run python locimend/main.py train <data file> <label file>
- : FASTQ file containing the sequences with errors
Both files must contain the canonical and read simulated sequences in the same positions (same row).
A dataset is provided to train the model, in order to proceed execute the following command:
poetry run python locimend/main.py train data/curesim-HVR.fastq data/HVR.fastq
Inference
A trained model is provided, which can be used to infer the correct sequences. There are two ways to interact with it:
- Command-line execution
- REST API
Command-line
The following command will infer the correct sequence, and print it:
poetry run python locimend/main.py infer "<DNA sequence>"
REST API
It is also possible to serve the model via a REST API, to start the web server run the following command:
poetry run api
The API can be accessed at http://localhost:8000, with either a GET or POST request:
Request | Endpoint | Payload |
---|---|---|
GET | / | Sequence as a path parameter (in the URL) |
POST | / | JSON |
For a POST request the JSON must have the following structure:
{"sequence": "<DNA sequence>"}