Conclude State of the Art with bioinformatics

2021-07-04 03:38:37 +02:00 · 2021-07-04 03:38:37 +02:00 · eaef4004ce
parent 3d03b276f8
commit eaef4004ce
3 changed files with 171 additions and 46 deletions
--- a/Dissertation.org
+++ b/Dissertation.org
@ -196,6 +196,8 @@ La bioinformática es un campo interdisciplinar en el que intervienen principalm
 Es tentador atribuir los orígenes de la bioinformática a la reciente convergencia de la secuenciación del ADN, los proyectos genómicos a gran escala, Internet y los superordenadores. Sin embargo, algunos científicos que afirman que la bioinformática se encuentra en su infancia reconocen que los ordenadores eran herramientas importantes en la biología molecular una década antes de que la secuenciación del ADN se convirtiera en algo factible (década de los 1960) cite:Hagen2000.
 En los ámbitos de la investigación biológica y médica, la labor del análisis computacional ha aumentado de forma espectacular. La primera ola de esta disciplina se centró en el análisis de secuencias, en la cual aún quedan muchos problemas importantes sin resolver, las necesidades actuales y futuras se centran en la integración sofisticada de conjuntos de datos extremadamente diversos.  Estos nuevos tipos de datos proceden de una variedad de técnicas experimentales, muchas de las cuales son capaces de producir datos a nivel de células enteras, órganos, organismos o incluso poblaciones. La principal fuerza impulsora de estos cambios ha sido la llegada de nuevas y eficientes técnicas experimentales, principalmente la secuenciación del ADN, que han conducido a un crecimiento exponencial de las descripciones de las moléculas de proteínas, ADN y ARN cite:book:930.
 * Estado del arte
 Procedemos a realizar un estudio de las metodologías actuales en los ámbitos, introducidos previamente, del /Deep Learning/ y de la bioinformática, con el objetivo de identificar las técnicas que se utilizan a nivel académico y en la industria.
@ -261,6 +263,17 @@ Las capas de convolución (capas C) se utilizan para extraer características y
 En la actualidad, las CNN se utilizan para /computer vision/, tanto para la clasificación de imágenes cite:howard17_mobil como para la segmentación cite:ronneberger15_u_net, sistemas de recomendación cite:yuan18_simpl_convol_gener_networ_next_item_recom y análisis de sentimientos cite:sadr21_novel_deep_learn_method_textual_sentim_analy.
 ** Bioinformática
 La expresión ómica engloba una serie de nuevas tecnologías que pueden ayudar a explicar vías, redes y procesos celulares, tanto normales como anormales, mediante el seguimiento simultáneo de miles de componentes moleculares. La ómica abarca un conjunto cada vez más amplio de ramas, desde la *genómica* (el estudio cuantitativo de los genes codificantes de proteínas, los elementos reguladores y las secuencias no codificantes), la *transcriptómica* (ARN y expresión de genes), la *proteómica* (por ejemplo, centrada en la abundancia de proteínas) y la metabolómica (metabolitos y redes metabólicas) hasta los avances en la era de la biología y la medicina postgenómica: farmacogenómica (estudio cuantitativo de cómo la genética afecta a la respuesta del huésped a los fármacos), fisiómica (dinámica y funciones fisiológicas de organismos enteros) cite:Schneider_2011.
 Los métodos de la bioinformática han demostrado ser eficaces para resolver los diversos problemas de la ómica, concretamente para la obtención del estado transcriptómico de una célula (RNA-seq) cite:Peri_2020, reconstrucción de las secuencias de ADN cite:Zerbino_2008, anotación de genomas cite:Spudich_2007 y predicción de la estructura tridimensional de las proteínas cite:Liu_2018.
 El problema de las tasas de error no negligibles en las tecnologías de secuenciación de ADN, de segunda y tercera generación, ha impulsado el desarrollo de múltiples técnicas bioinformáticas para paliar este contratiempo.
 La corrección de errores con respecto a una posición genómica específica puede lograrse disponiendo todas las lecturas de forma horizontal, una tras otra, y examinando la base en cada posición específica de todas estas lecturas. Como los errores son infrecuentes y aleatorios, las lecturas que contienen un error en una posición específica pueden corregirse seleccionando la base más probable inferida a partir de las demás lecturas cite:Yang_2012. Entre estos métodos, destacamos /Coral/ cite:Salmela_2011, /Quake/ cite:Kelley_2010 y /MEC/ cite:Zhao_2017.
 El uso de /Deep Learning/ para la corrección de errores de secuenciación es un área de investigación novedosa, en la cual cabe mencionar /NanoReviser/ cite:10.3389/fgene.2020.00900.
 * Objetivos
 1. Introducción al dominio de un problema de biología molecular: Secuenciación de ADN y análisis de receptores de linfocitos T (TCR)
--- a/Dissertation.pdf
+++ b/Dissertation.pdf
--- a/assets/bibliography.bib
+++ b/assets/bibliography.bib
@ -1,49 +1,3 @@
@article{10.1093/molbev/msy224,
  author          = {Flagel, Lex and Brandvain, Yaniv and Schrider, Daniel R},
  title           = "{The Unreasonable Effectiveness of Convolutional Neural
                  Networks in Population Genetic Inference}",
  journal         = {Molecular Biology and Evolution},
  volume          = 36,
  number          = 2,
  pages           = {220-238},
  year            = 2018,
  month           = 12,
  abstract        = "{Population-scale genomic data sets have given researchers
                  incredible amounts of information from which to infer
                  evolutionary histories. Concomitant with this flood of data,
                  theoretical and methodological advances have sought to extract
                  information from genomic sequences to infer demographic events
                  such as population size changes and gene flow among closely
                  related populations/species, construct recombination maps, and
                  uncover loci underlying recent adaptation. To date, most
                  methods make use of only one or a few summaries of the input
                  sequences and therefore ignore potentially useful information
                  encoded in the data. The most sophisticated of these
                  approaches involve likelihood calculations, which require
                  theoretical advances for each new problem, and often focus on
                  a single aspect of the data (e.g., only allele frequency
                  information) in the interest of mathematical and computational
                  tractability. Directly interrogating the entirety of the input
                  sequence data in a likelihood-free manner would thus offer a
                  fruitful alternative. Here, we accomplish this by representing
                  DNA sequence alignments as images and using a class of deep
                  learning methods called convolutional neural networks (CNNs)
                  to make population genetic inferences from these images. We
                  apply CNNs to a number of evolutionary questions and find that
                  they frequently match or exceed the accuracy of current
                  methods. Importantly, we show that CNNs perform accurate
                  evolutionary model selection and parameter estimation, even on
                  problems that have not received detailed theoretical
                  treatments. Thus, when applied to population genetic
                  alignments, CNNs are capable of outperforming expert-derived
                  statistical methods and offer a new path forward in cases
                  where no likelihood approach exists.}",
  issn            = {0737-4038},
  doi             = {10.1093/molbev/msy224},
  url             = {https://doi.org/10.1093/molbev/msy224},
  eprint          = {https://academic.oup.com/mbe/article-pdf/36/2/220/27736968/msy224.pdf},
 }
@Article{pmid19706884,
  Author          = "Robins, H. S. and Campregher, P. V. and Srivastava, S. K.
                  and Wacher, A. and Turtle, C. J. and Kahsai, O. and Riddell,
@ -1025,6 +979,7 @@
  year            = 2020,
  month           = {Apr}
 }
@article{howard17_mobil,
  author          = {Howard, Andrew G. and Zhu, Menglong and Chen, Bo and
                  Kalenichenko, Dmitry and Wang, Weijun and Weyand, Tobias and
@ -1053,6 +1008,7 @@
  eprint          = {1704.04861v1},
  primaryClass    = {cs.CV},
 }
@article{ronneberger15_u_net,
  author          = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
  title           = {U-Net: Convolutional Networks for Biomedical Image
@ -1083,6 +1039,7 @@
  eprint          = {1505.04597v1},
  primaryClass    = {cs.CV},
 }
@article{yuan18_simpl_convol_gener_networ_next_item_recom,
  author          = {Yuan, Fajie and Karatzoglou, Alexandros and Arapakis,
                  Ioannis and Jose, Joemon M and He, Xiangnan},
@ -1119,6 +1076,7 @@
  eprint          = {1808.05163v4},
  primaryClass    = {cs.IR},
 }
@article{sadr21_novel_deep_learn_method_textual_sentim_analy,
  author          = {Sadr, Hossein and Solimandarabi, Mozhdeh Nazari and Pedram,
                  Mir Mohsen and Teshnehlab, Mohammad},
@ -1153,3 +1111,157 @@
  eprint          = {2102.11651},
  primaryClass    = {cs.CL},
 }
@book{book:930,
  title           = {Bioinformatics: the machine learning approach},
  author          = {Pierre Baldi, SÃ¸ren Brunak},
  publisher       = {The MIT Press},
  isbn            = {026202506X,9780585444666,9780262025065},
  year            = 2001,
  series          = {Adaptive Computation and Machine Learning},
  edition         = 2,
  pages           = 12,
 }
@Article{Schneider_2011,
  author          = {Schneider, Maria V. and Orchard, Sandra},
  title           = {Omics Technologies, Data and Bioinformatics Principles},
  journal         = {Bioinformatics for Omics Data},
  year            = 2011,
  pages           = {3–30},
  issn            = {1940-6029},
  doi             = {10.1007/978-1-61779-027-0_1},
  url             = {http://dx.doi.org/10.1007/978-1-61779-027-0_1},
  ISBN            = 9781617790270,
  publisher       = {Humana Press},
 }
@Article{Peri_2020,
  author          = {Peri, Sateesh and Roberts, Sarah and Kreko, Isabella R. and
                  McHan, Lauren B. and Naron, Alexandra and Ram, Archana and
                  Murphy, Rebecca L. and Lyons, Eric and Gregory, Brian D. and
                  Devisetty, Upendra K. and et al.},
  title           = {Read Mapping and Transcript Assembly: A Scalable and
                  High-Throughput Workflow for the Processing and Analysis of
                  Ribonucleic Acid Sequencing Data},
  journal         = {Frontiers in Genetics},
  year            = 2020,
  volume          = 10,
  month           = {Jan},
  issn            = {1664-8021},
  doi             = {10.3389/fgene.2019.01361},
  url             = {http://dx.doi.org/10.3389/fgene.2019.01361},
  publisher       = {Frontiers Media SA}
 }
@Article{Zerbino_2008,
  author          = {Zerbino, D. R. and Birney, E.},
  title           = {Velvet: Algorithms for de novo short read assembly using de
                  Bruijn graphs},
  journal         = {Genome Research},
  year            = 2008,
  volume          = 18,
  number          = 5,
  month           = {Feb},
  pages           = {821–829},
  issn            = {1088-9051},
  doi             = {10.1101/gr.074492.107},
  url             = {http://dx.doi.org/10.1101/gr.074492.107},
  publisher       = {Cold Spring Harbor Laboratory}
 }
@Article{Spudich_2007,
  author          = {Spudich, G. and Fernandez-Suarez, X. M. and Birney, E.},
  title           = {Genome browsing with Ensembl: a practical overview},
  journal         = {Briefings in Functional Genomics and Proteomics},
  year            = 2007,
  volume          = 6,
  number          = 3,
  month           = {Aug},
  pages           = {202–219},
  issn            = {1477-4062},
  doi             = {10.1093/bfgp/elm025},
  url             = {http://dx.doi.org/10.1093/bfgp/elm025},
  publisher       = {Oxford University Press (OUP)}
 }
@Article{Liu_2018,
  author          = {Liu, Yang and Ye, Qing and Wang, Liwei and Peng, Jian},
  title           = {Learning structural motif representations for efficient
                  protein structure search},
  journal         = {Bioinformatics},
  year            = 2018,
  volume          = 34,
  number          = 17,
  month           = {Sep},
  pages           = {i773–i780},
  issn            = {1460-2059},
  doi             = {10.1093/bioinformatics/bty585},
  url             = {http://dx.doi.org/10.1093/bioinformatics/bty585},
  publisher       = {Oxford University Press (OUP)}
 }
@Article{Salmela_2011,
  author          = {Salmela, L. and Schroder, J.},
  title           = {Correcting errors in short reads by multiple alignments},
  journal         = {Bioinformatics},
  year            = 2011,
  volume          = 27,
  number          = 11,
  month           = {Apr},
  pages           = {1455–1461},
  issn            = {1460-2059},
  doi             = {10.1093/bioinformatics/btr170},
  url             = {http://dx.doi.org/10.1093/bioinformatics/btr170},
  publisher       = {Oxford University Press (OUP)}
 }
@Article{Yang_2012,
  author          = {Yang, X. and Chockalingam, S. P. and Aluru, S.},
  title           = {A survey of error-correction methods for next-generation
                  sequencing},
  journal         = {Briefings in Bioinformatics},
  year            = 2012,
  volume          = 14,
  number          = 1,
  month           = {Apr},
  pages           = {56–66},
  issn            = {1477-4054},
  doi             = {10.1093/bib/bbs015},
  url             = {http://dx.doi.org/10.1093/bib/bbs015},
  publisher       = {Oxford University Press (OUP)}
 }
@Article{Kelley_2010,
  author          = {Kelley, David R and Schatz, Michael C and Salzberg, Steven
                  L},
  title           = {Quake: quality-aware detection and correction of sequencing
                  errors},
  journal         = {Genome Biology},
  year            = 2010,
  volume          = 11,
  number          = 11,
  pages           = {R116},
  issn            = {1465-6906},
  doi             = {10.1186/gb-2010-11-11-r116},
  url             = {http://dx.doi.org/10.1186/gb-2010-11-11-r116},
  publisher       = {Springer Science and Business Media LLC}
 }
@Article{Zhao_2017,
  author          = {Zhao, Liang and Chen, Qingfeng and Li, Wencui and Jiang,
                  Peng and Wong, Limsoon and Li, Jinyan},
  title           = {MapReduce for accurate error correction of next-generation
                  sequencing data},
  journal         = {Bioinformatics},
  year            = 2017,
  editor          = {Sahinalp, CenkEditor},
  volume          = 33,
  number          = 23,
  month           = {Feb},
  pages           = {3844–3851},
  issn            = {1460-2059},
  doi             = {10.1093/bioinformatics/btx089},
  url             = {http://dx.doi.org/10.1093/bioinformatics/btx089},
  publisher       = {Oxford University Press (OUP)}
 }