Go to file
coolneng d7725ab37e
Bump dependencies
2021-10-19 11:35:12 +02:00
data Update trained model and dataset 2021-07-06 03:37:36 +02:00
locimend Show total training time 2021-07-07 01:19:26 +02:00
nix Bump nixpkgs revision 2021-07-06 07:29:29 +02:00
trained_model Update trained model 2021-07-07 01:46:57 +02:00
.gitignore Create logs directory 2021-06-15 00:38:09 +02:00
LICENSE.md Initial commit 2021-02-17 18:06:14 +01:00
README.md Update README 2021-07-07 01:13:35 +02:00
default.nix Add default.nix and docker.nix 2021-07-06 20:12:45 +02:00
docker.nix Add default.nix and docker.nix 2021-07-06 20:12:45 +02:00
flake.lock Replace niv with flakes 2021-10-19 11:34:44 +02:00
flake.nix Replace niv with flakes 2021-10-19 11:34:44 +02:00
poetry.lock Bump dependencies 2021-10-19 11:35:12 +02:00
pyproject.toml Bump dependencies 2021-10-19 11:35:12 +02:00
shell.nix Bump dependencies 2021-10-19 11:35:12 +02:00

README.md

locimend

locimend is a tool that corrects DNA sequencing errors using Deep Learning.

The goal is to provide a correct DNA sequence, when a sequence containing errors is provided.

It provides both a command-line program and a REST API.

Technologies

  • Tensorflow
  • Biopython
  • FastAPI

Installation

This project uses Nix to ensure reproducible builds.

  1. Install Nix (compatible with MacOS, Linux and WSL):
curl -L https://nixos.org/nix/install | sh
  1. Clone the repository:
git clone https://git.coolneng.duckdns.org/coolneng/locimend
  1. Change the working directory to the project:
cd locimend
  1. Enter the nix-shell:
nix-shell
  1. Install the dependencies via poetry:
poetry install

After running these commands, you will find yourself in a shell that contains all the needed dependencies.

Usage

Training the model

The following command creates the trains the Deep Learning model and shows the accuracy and AUC:

poetry run python locimend/main.py train <data file> <label file>
  • : FASTQ file containing the sequences with errors

Both files must contain the canonical and read simulated sequences in the same positions (same row).

A dataset is provided to train the model, in order to proceed execute the following command:

poetry run python locimend/main.py train data/curesim-HVR.fastq data/HVR.fastq

Inference

A trained model is provided, which can be used to infer the correct sequences. There are two ways to interact with it:

  • Command-line execution
  • REST API

Command-line

The following command will infer the correct sequence, and print it:

poetry run python locimend/main.py infer "<DNA sequence>"

REST API

It is also possible to serve the model via a REST API, to start the web server run the following command:

poetry run api

The API can be accessed at http://localhost:8000, with either a GET or POST request:

Request Endpoint Payload
GET / Sequence as a path parameter (in the URL)
POST / JSON

For a POST request the JSON must have the following structure:

{"sequence": "<DNA sequence>"}