* locigenesis locigenesis is a tool that generates a human T-cell receptor (TCR), runs it through a sequence reader simulation tool and extracts CDR3. The goal of this project is to generate both HVR sequences with and without sequencing errors, in order to create datasets for a Machine Learning algorithm. ** Technologies - [[https://github.com/GreiffLab/immuneSIM/][immuneSIM]]: in silico generation of human and mouse BCR and TCR repertoires - [[http://www.pegase-biosciences.com/curesim-a-customized-read-simulator/][CuReSim]]: read simulator that mimics Ion Torrent sequencing ** Installation This project uses [[https://nixos.org/][Nix]] to ensure reproducible builds. 1. Install Nix (compatible with MacOS, Linux and [[https://docs.microsoft.com/en-us/windows/wsl/about][WSL]]): #+begin_src shell curl -L https://nixos.org/nix/install | sh #+end_src 1. Clone the repository: #+begin_src shell git clone https://git.coolneng.duckdns.org/coolneng/locigenesis #+end_src 3. Change the working directory to the project: #+begin_src shell cd locigenesis #+end_src 4. Enter the nix-shell: #+begin_src shell nix-shell #+end_src After running these commands, you will find yourself in a shell that contains all the needed dependencies. ** Usage An execution script that accepts 2 parameters is provided, the following command invokes it: #+begin_src shell ./generation.sh #+end_src - : an integer that specifies the number of different sequences to generate - : an integer that specifies the number of reads to perform on each sequence The script will generate 2 files under the data directory: | HVR.fastq | Contains the original CDR3 sequence | | CuReSim-HVR.fastq | Contains CDR3 after the read simulation, with sequencing errors |