In this notebook we'll extract knowledge from our generated dataset. First, let's import our dependencies:
#+begin_src python
from tensorflow_io import genome
Tensorflow I/O is an extension that contains a module for genome parsing, we'll use it to import the sequences contained in our FASTQ files:
#+begin_src python :results silent
def parse_data(filepath):
HVR = genome.read_fastq(filename=filepath)
return HVR.sequences, HVR.raw_quality
Let's import both the immuneSIM generated HVR dataset and the CuReSim processed one, which contains sequencing errors (mostly indels):
#+begin_src python
original_HVR, _ = parse_data("data/HVR.fastq")
processed_HVR, _ = parse_data("data/CuReSim_HVR.fastq")