coolneng 3d8e0fe114 | ||
---|---|---|
data | ||
nix | ||
src | ||
.gitignore | ||
LICENSE.md | ||
README.md | ||
generation.sh | ||
shell.nix |
README.md
locigenesis
locigenesis is a tool that generates a human T-cell receptor (TCR), runs it through a sequence reader simulation tool and extracts CDR3.
The goal of this project is to generate both HVR sequences with and without sequencing errors, in order to create datasets for a Machine Learning algorithm.
Technologies
- immuneSIM: in silico generation of human and mouse BCR and TCR repertoires
- CuReSim: read simulator that mimics Ion Torrent sequencing
Installation
This project uses Nix to ensure reproducible builds.
- Install Nix (compatible with MacOS, Linux and WSL):
curl -L https://nixos.org/nix/install | sh
- Clone the repository:
git clone https://git.coolneng.duckdns.org/coolneng/locigenesis
- Change the working directory to the project:
cd locigenesis
- Enter the nix-shell:
nix-shell
After running these commands, you will find yourself in a shell that contains all the needed dependencies.
Usage
An execution script that accepts 2 parameters is provided, the following command invokes it:
./generation.sh <number of sequences> <number of reads>
- <number of sequences>: an integer that specifies the number of different sequences to generate
- <number of reads>: an integer that specifies the number of reads to perform on each sequence
The script will generate 2 files under the data directory:
HVR.fastq | curesim-HVR.fastq |
---|---|
Contains the original CDR3 sequence | Contains CDR3 after the read simulation, with sequencing errors |