locigenesis/docs/notebook.org

2.2 KiB

locigenesis

Sequence alignment

Our generated sequences contain the full VJ region, but we are only interested in the CDR3 (Complementarity-determining region). We will proceed by delimiting CDR3, using the known sequences of V and J.

v_segments <- readRDS("data/v_segments.rds")
j_segments <- readRDS("data/j_segments_phe.rds")
print(v_segments)
print(j_segments)
  A DNAStringSet instance of length 147
      width seq                                             names
  [1]   326 GATACTGGAATTACCCAGACAC...ATCTCTGCACCAGCAGCCAAGA TRBV1*01_P
  [2]   326 GATGCTGAAATCACCCAGAGCC...ATTTCTGCGCCAGCAGTGAGTC TRBV10-1*01_F
  [3]   326 GATGCTGAAATCACCCAGAGCC...ATTTCTGCGCCAGCAGTGAGTC TRBV10-1*02_F
  [4]   326 GATGCTGGAATCACCCAGAGCC...ATTTCTGCGCCAGCAGTGAGTC TRBV10-2*01_F
  [5]   326 GATGCTGGAATCACCCAGAGCC...ATTTCTGCGCCAGCAGTGAGTC TRBV10-2*02_F
  ...   ... ...
[143]   324 GATACTGGAGTCTCCCAGAACC...GTATCTCTGTGCCAGCACGTTG TRBV7-9*06_(F)
[144]   323 .........................TGTATCTCTGTGCCAGCAGCAG TRBV7-9*07_(F)
[145]   325 GATTCTGGAGTCACACAAACCC...TATTTCTGTGCCAGCAGCGTAG TRBV9*01_F
[146]   325 GATTCTGGAGTCACACAAACCC...TATTTCTGTGCCAGCAGCGTAG TRBV9*02_F
[147]   321 GATTCTGGAGTCACACAAACCC...TTTGTATTTCTGTGCCAGCAGC TRBV9*03_(F)
  A DNAStringSet instance of length 16
     width seq                                              names
 [1]    32 TGGGCGTCTGGGCGGAGGACTCCTGGTTCTGG                 TRBJ2-2P*01_ORF
 [2]    31 TTTGGAGAGGGAAGTTGGCTCACTGTTGTAG                  TRBJ1-3*01_F
 [3]    31 TTTGGTGATGGGACTCGACTCTCCATCCTAG                  TRBJ1-5*01_F
 [4]    31 TTTGGCAGTGGAACCCAGCTCTCTGTCTTGG                  TRBJ1-4*01_F
 [5]    31 TTCGGTTCGGGGACCAGGTTAACCGTTGTAG                  TRBJ1-2*01_F
 ...   ... ...
[12]    31 TTTGGCCCAGGCACCCGGCTGACAGTGCTCG                  TRBJ2-3*01_F
[13]    31 TTCGGGCCAGGCACGCGGCTCCTGGTGCTCG                  TRBJ2-5*01_F
[14]    31 TTCGGGCCAGGGACACGGCTCACCGTGCTAG                  TRBJ2-1*01_F
[15]    31 TTCGGGCCGGGCACCAGGCTCACGGTCACAG                  TRBJ2-7*01_F
[16]    31 GTCGGGCCGGGCACCAGGCTCACGGTCACAG                  TRBJ2-7*02_ORF