You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
coolneng 24d7820165
Replace niv with flakes
1 year ago
data Remove redundant directories 2 years ago
src Consider the read start in the Cys location 2 years ago
.gitignore Rename output file to curesim-HVR.fastq 2 years ago
flake.lock Replace niv with flakes 1 year ago
flake.nix Replace niv with flakes 1 year ago Add usage instructions to the README 2 years ago
shell.nix Replace niv with flakes 1 year ago


locigenesis is a tool that generates a human T-cell receptor (TCR), runs it through a sequence reader simulation tool and extracts CDR3.

The goal of this project is to generate both HVR sequences with and without sequencing errors, in order to create datasets for a Machine Learning algorithm.


  • immuneSIM: in silico generation of human and mouse BCR and TCR repertoires
  • CuReSim: read simulator that mimics Ion Torrent sequencing


This project uses Nix to ensure reproducible builds.

  1. Install Nix (compatible with MacOS, Linux and WSL):
curl -L | sh
  1. Clone the repository:
git clone
  1. Change the working directory to the project:
cd locigenesis
  1. Enter the nix-shell:

After running these commands, you will find yourself in a shell that contains all the needed dependencies.


An execution script that accepts 2 parameters is provided, the following command invokes it:

./ <number of sequences> <number of reads>
  • <number of sequences>: an integer that specifies the number of different sequences to generate
  • <number of reads>: an integer that specifies the number of reads to perform on each sequence

The script will generate 2 files under the data directory:

HVR.fastq curesim-HVR.fastq
Contains the original CDR3 sequence Contains CDR3 after the read simulation, with sequencing errors