Go to file
coolneng 8ffa86a965
Elaborate on the project description in the README
2021-05-04 02:01:10 +02:00
data Add v_segments and j_segments objects 2021-04-07 18:32:58 +02:00
docs Add literate programming notebook 2021-03-23 18:24:12 +01:00
nix Initial commit 2021-02-18 00:01:17 +01:00
src Fix HVR end position computation 2021-05-03 21:51:32 +02:00
.gitignore Generate FASTQ files from the simulated repertoire 2021-04-22 13:59:45 +02:00
LICENSE.md Initial commit 2021-02-18 00:01:17 +01:00
README.org Elaborate on the project description in the README 2021-05-04 02:01:10 +02:00
generation.sh Add usage instructions to the README 2021-05-04 01:28:49 +02:00
shell.nix Remove redundant JDK dependency 2021-05-04 01:57:34 +02:00

README.org

locigenesis

locigenesis is a tool that generates a human T-cell receptor (TCR), runs it through a sequence reader simulation tool and extracts CDR3.

The goal of this project is to generate both HVR sequences with and without sequencing errors, in order to create datasets for a Machine Learning algorithm.

Technologies

  • immuneSIM: in silico generation of human and mouse BCR and TCR repertoires
  • CuReSim: read simulator that mimics Ion Torrent sequencing

Installation

This project uses Nix to ensure reproducible builds.

  1. Install Nix (compatible with MacOS, Linux and WSL):
curl -L https://nixos.org/nix/install | sh
  1. Clone the repository:
git clone https://git.coolneng.duckdns.org/coolneng/locigenesis
  1. Change the working directory to the project:
cd locigenesis
  1. Enter the nix-shell:
nix-shell

After running these commands, you will find yourself in a shell that contains all the needed dependencies.

Usage

An execution script that accepts 2 parameters is provided, the following command invokes it:

./generation.sh <number of sequences> <number of reads>
  • <number of sequences>: an integer that specifies the number of different sequences to generate
  • <number of reads>: an integer that specifies the number of reads to perform on each sequence

The script will generate 2 files under the data directory:

HVR.fastq Contains the original CDR3 sequence
CuReSim-HVR.fastq Contains CDR3 after the read simulation, with sequencing errors