#+TITLE: locigenesis #+AUTHOR: Amin Kasrou Aouam #+DATE: 2021-03-10 * Sequence alignment Our generated sequences contain the full VJ region, but we are only interested in the CDR3 (Complementarity-determining region). We will proceed by delimiting CDR3, using the known sequences of V and J. #+begin_src R :results value silent v_segments <- readRDS("data/v_segments.rds") j_segments <- readRDS("data/j_segments_phe.rds") #+end_src #+begin_src R print(v_segments) print(j_segments) #+end_src #+RESULTS: #+begin_example A DNAStringSet instance of length 147 width seq names [1] 326 GATACTGGAATTACCCAGACAC...ATCTCTGCACCAGCAGCCAAGA TRBV1*01_P [2] 326 GATGCTGAAATCACCCAGAGCC...ATTTCTGCGCCAGCAGTGAGTC TRBV10-1*01_F [3] 326 GATGCTGAAATCACCCAGAGCC...ATTTCTGCGCCAGCAGTGAGTC TRBV10-1*02_F [4] 326 GATGCTGGAATCACCCAGAGCC...ATTTCTGCGCCAGCAGTGAGTC TRBV10-2*01_F [5] 326 GATGCTGGAATCACCCAGAGCC...ATTTCTGCGCCAGCAGTGAGTC TRBV10-2*02_F ... ... ... [143] 324 GATACTGGAGTCTCCCAGAACC...GTATCTCTGTGCCAGCACGTTG TRBV7-9*06_(F) [144] 323 .........................TGTATCTCTGTGCCAGCAGCAG TRBV7-9*07_(F) [145] 325 GATTCTGGAGTCACACAAACCC...TATTTCTGTGCCAGCAGCGTAG TRBV9*01_F [146] 325 GATTCTGGAGTCACACAAACCC...TATTTCTGTGCCAGCAGCGTAG TRBV9*02_F [147] 321 GATTCTGGAGTCACACAAACCC...TTTGTATTTCTGTGCCAGCAGC TRBV9*03_(F) A DNAStringSet instance of length 16 width seq names [1] 32 TGGGCGTCTGGGCGGAGGACTCCTGGTTCTGG TRBJ2-2P*01_ORF [2] 31 TTTGGAGAGGGAAGTTGGCTCACTGTTGTAG TRBJ1-3*01_F [3] 31 TTTGGTGATGGGACTCGACTCTCCATCCTAG TRBJ1-5*01_F [4] 31 TTTGGCAGTGGAACCCAGCTCTCTGTCTTGG TRBJ1-4*01_F [5] 31 TTCGGTTCGGGGACCAGGTTAACCGTTGTAG TRBJ1-2*01_F ... ... ... [12] 31 TTTGGCCCAGGCACCCGGCTGACAGTGCTCG TRBJ2-3*01_F [13] 31 TTCGGGCCAGGCACGCGGCTCCTGGTGCTCG TRBJ2-5*01_F [14] 31 TTCGGGCCAGGGACACGGCTCACCGTGCTAG TRBJ2-1*01_F [15] 31 TTCGGGCCGGGCACCAGGCTCACGGTCACAG TRBJ2-7*01_F [16] 31 GTCGGGCCGGGCACCAGGCTCACGGTCACAG TRBJ2-7*02_ORF #+end_example