# locigenesis locigenesis is a tool that generates a human T-cell receptor (TCR), runs it through a sequence reader simulation tool and extracts CDR3. The goal of this project is to generate both HVR sequences with and without sequencing errors, in order to create datasets for a Machine Learning algorithm. ## Technologies - [immuneSIM](https://github.com/GreiffLab/immuneSIM/): in silico generation of human and mouse BCR and TCR repertoires - [CuReSim](http://www.pegase-biosciences.com/curesim-a-customized-read-simulator/): read simulator that mimics Ion Torrent sequencing ## Installation This project uses [Nix](https://nixos.org/) to ensure reproducible builds. 1. Install Nix (compatible with MacOS, Linux and [WSL](https://docs.microsoft.com/en-us/windows/wsl/about)): ```bash curl -L https://nixos.org/nix/install | sh ``` 2. Clone the repository: ```bash git clone https://git.coolneng.duckdns.org/coolneng/locigenesis ``` 3. Change the working directory to the project: ```bash cd locigenesis ``` 4. Enter the nix-shell: ```bash nix-shell ``` After running these commands, you will find yourself in a shell that contains all the needed dependencies. ## Usage An execution script that accepts 2 parameters is provided, the following command invokes it: ```bash ./generation.sh ``` - \: an integer that specifies the number of different sequences to generate - \: an integer that specifies the number of reads to perform on each sequence The script will generate 2 files under the data directory: ------------------- ----------------------------------------------------------------- HVR.fastq Contains the original CDR3 sequence CuReSim-HVR.fastq Contains CDR3 after the read simulation, with sequencing errors ------------------- -----------------------------------------------------------------