Add CNN section in State of the Art

2021-07-03 19:30:14 +02:00 · 2021-07-03 19:30:14 +02:00 · 3d03b276f8
parent 9bf910f33a
commit 3d03b276f8
5 changed files with 161 additions and 1 deletions
--- a/Dissertation.org
+++ b/Dissertation.org
@ -236,9 +236,30 @@ Donde $L$ es la función de pérdida que penaliza $g(f(x))$ por ser distinto de

 Tradicionalmente, los /autoencoders/ se utilizaban para reducir la dimensionalidad o /feature learning/. Recientemente, ciertas teorías que conectan los AE y los modelos de variables latentes han llevado a los autocodificadores a la vanguardia del modelado generativo cite:Goodfellow-et-al-2016.

-En la actualidad, los /autoencoders/ se utilizan para la reducción de ruido, tanto en texto cite:Lewis_2020 como en imágenes cite:bigdeli17_image_restor_using_autoen_prior, /clustering/ no supervisado cite:makhzani15_adver_autoen, generación de imágenes sintéticas cite:Yoo_2020, reducción de dimensionalidad cite:makhzani15_adver_autoen, predicción de secuencia a secuencia para la traducción automática cite:kaiser18_discr_autoen_sequen_model.
+En la actualidad, los /autoencoders/ se utilizan para la reducción de ruido, tanto en texto cite:Lewis_2020 como en imágenes cite:bigdeli17_image_restor_using_autoen_prior, /clustering/ no supervisado cite:makhzani15_adver_autoen, generación de imágenes sintéticas cite:Yoo_2020, reducción de dimensionalidad cite:makhzani15_adver_autoen y predicción de secuencia a secuencia para la traducción automática cite:kaiser18_discr_autoen_sequen_model.

 *** Redes neuronales convolucionales (CNN)
+
+Una red neuronal convolucional (CNN) es un tipo de red neuronal especializada en el procesamiento de datos que tienen una topología en forma de cuadrícula (/grid/). El nombre de "red neuronal convolucional" indica que la red emplea una operación matemática denominada convolución. Las redes convolucionales han tenido han tenido un enorme éxito en las aplicaciones prácticas cite:Goodfellow-et-al-2016.
+
+#+CAPTION: Diagrama de una CNN. Una CNN es una red neuronal multicapa, compuesta por dos tipos diferentes de capas, a saber, las capas de convolución (capas C) y las capas de submuestreo (capas S) cite:LIU201711
+#+ATTR_HTML: :height 20% :width 70%
+#+NAME: fig:CNN
+[[./assets/figures/CNN.png]]
+
+En el contexto de una red neuronal convolucional, una convolución es una operación lineal que implica la multiplicación de un conjunto de pesos con la entrada, al igual que una red neuronal tradicional. La multiplicación se realiza entre una matriz de datos de entrada y una matriz bidimensional de pesos, llamada filtro o núcleo (/kernel/). El filtro es más pequeño que los datos de entrada, dado que permite que el mismo filtro (conjunto de pesos) se multiplique por la matriz de entrada varias veces en diferentes puntos de la entrada cite:brownlee_2020. La operación queda representada de forma gráfica en la siguiente figura:
+
+\clearpage
+
+#+CAPTION: Representación de una convolución bidimensional. Restringimos la salida a sólo las posiciones en las que el núcleo se encuentra completamente dentro de la imagen. Los recuadros con flechas indican cómo se forma el tensor de salida, aplicando el núcleo a la correspondiente región superior izquierda del tensor de entrada cite:Goodfellow-et-al-2016
+#+ATTR_HTML: :height 30% :width 70%
+#+NAME: fig:convolution
+[[./assets/figures/convolution.png]]
+
+Las capas de convolución (capas C) se utilizan para extraer características y las capas de submuestreo (capas S) son esencialmente una capa de mapeo de características (/feature mapping/). Sin embargo, cuando la dimensionalidad de las entradas es igual a la de la salida del filtro, debido a la alta dimensionalidad, la aplicación de un clasificador puede provocar un /overfitting/. Para resolver este problema, se introduce un proceso de /pooling/, \ie submuestreo o /down-sampling/, para reducir el tamaño total de la señal cite:LIU201711.
+
+En la actualidad, las CNN se utilizan para /computer vision/, tanto para la clasificación de imágenes cite:howard17_mobil como para la segmentación cite:ronneberger15_u_net, sistemas de recomendación cite:yuan18_simpl_convol_gener_networ_next_item_recom y análisis de sentimientos cite:sadr21_novel_deep_learn_method_textual_sentim_analy.
+
 ** Bioinformática
 * Objetivos

--- a/Dissertation.pdf
+++ b/Dissertation.pdf
--- a/assets/bibliography.bib
+++ b/assets/bibliography.bib
@ -986,6 +986,7 @@
  ISBN            = 9781728163956,
  publisher       = {IEEE}
 }
+
@article{kaiser18_discr_autoen_sequen_model,
  author          = {Kaiser, Łukasz and Bengio, Samy},
  title           = {Discrete Autoencoders for Sequence Models},
@ -1014,3 +1015,141 @@
  eprint          = {1801.09797v1},
  primaryClass    = {cs.LG},
 }
+
+@misc{brownlee_2020,
+  title           = {How Do Convolutional Layers Work in Deep Learning Neural
+                  Networks?},
+  url             = {https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/},
+  journal         = {Machine Learning Mastery},
+  author          = {Brownlee, Jason},
+  year            = 2020,
+  month           = {Apr}
+}
+@article{howard17_mobil,
+  author          = {Howard, Andrew G. and Zhu, Menglong and Chen, Bo and
+                  Kalenichenko, Dmitry and Wang, Weijun and Weyand, Tobias and
+                  Andreetto, Marco and Adam, Hartwig},
+  title           = {Mobilenets: Efficient Convolutional Neural Networks for
+                  Mobile Vision Applications},
+  journal         = {CoRR},
+  year            = 2017,
+  url             = {http://arxiv.org/abs/1704.04861v1},
+  abstract        = {We present a class of efficient models called MobileNets
+                  for mobile and embedded vision applications. MobileNets are
+                  based on a streamlined architecture that uses depth-wise
+                  separable convolutions to build light weight deep neural
+                  networks. We introduce two simple global hyper-parameters that
+                  efficiently trade off between latency and accuracy. These
+                  hyper-parameters allow the model builder to choose the right
+                  sized model for their application based on the constraints of
+                  the problem. We present extensive experiments on resource and
+                  accuracy tradeoffs and show strong performance compared to
+                  other popular models on ImageNet classification. We then
+                  demonstrate the effectiveness of MobileNets across a wide
+                  range of applications and use cases including object
+                  detection, finegrain classification, face attributes and large
+                  scale geo-localization.},
+  archivePrefix   = {arXiv},
+  eprint          = {1704.04861v1},
+  primaryClass    = {cs.CV},
+}
+@article{ronneberger15_u_net,
+  author          = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
+  title           = {U-Net: Convolutional Networks for Biomedical Image
+                  Segmentation},
+  journal         = {CoRR},
+  year            = 2015,
+  url             = {http://arxiv.org/abs/1505.04597v1},
+  abstract        = {There is large consent that successful training of deep
+                  networks requires many thousand annotated training samples. In
+                  this paper, we present a network and training strategy that
+                  relies on the strong use of data augmentation to use the
+                  available annotated samples more efficiently. The architecture
+                  consists of a contracting path to capture context and a
+                  symmetric expanding path that enables precise localization. We
+                  show that such a network can be trained end-to-end from very
+                  few images and outperforms the prior best method (a
+                  sliding-window convolutional network) on the ISBI challenge
+                  for segmentation of neuronal structures in electron
+                  microscopic stacks. Using the same network trained on
+                  transmitted light microscopy images (phase contrast and DIC)
+                  we won the ISBI cell tracking challenge 2015 in these
+                  categories by a large margin. Moreover, the network is fast.
+                  Segmentation of a 512x512 image takes less than a second on a
+                  recent GPU. The full implementation (based on Caffe) and the
+                  trained networks are available at
+                  http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .},
+  archivePrefix   = {arXiv},
+  eprint          = {1505.04597v1},
+  primaryClass    = {cs.CV},
+}
+@article{yuan18_simpl_convol_gener_networ_next_item_recom,
+  author          = {Yuan, Fajie and Karatzoglou, Alexandros and Arapakis,
+                  Ioannis and Jose, Joemon M and He, Xiangnan},
+  title           = {A Simple Convolutional Generative Network for Next Item
+                  Recommendation},
+  journal         = {CoRR},
+  year            = 2018,
+  url             = {http://arxiv.org/abs/1808.05163v4},
+  abstract        = {Convolutional Neural Networks (CNNs) have been recently
+                  introduced in the domain of session-based next item
+                  recommendation. An ordered collection of past items the user
+                  has interacted with in a session (or sequence) are embedded
+                  into a 2-dimensional latent matrix, and treated as an image.
+                  The convolution and pooling operations are then applied to the
+                  mapped item embeddings. In this paper, we first examine the
+                  typical session-based CNN recommender and show that both the
+                  generative model and network architecture are suboptimal when
+                  modeling long-range dependencies in the item sequence. To
+                  address the issues, we introduce a simple, but very effective
+                  generative model that is capable of learning high-level
+                  representation from both short- and long-range item
+                  dependencies. The network architecture of the proposed model
+                  is formed of a stack of \emph{holed} convolutional layers,
+                  which can efficiently increase the receptive fields without
+                  relying on the pooling operation. Another contribution is the
+                  effective use of residual block structure in recommender
+                  systems, which can ease the optimization for much deeper
+                  networks. The proposed generative model attains
+                  state-of-the-art accuracy with less training time in the next
+                  item recommendation task. It accordingly can be used as a
+                  powerful recommendation baseline to beat in future, especially
+                  when there are long sequences of user feedback.},
+  archivePrefix   = {arXiv},
+  eprint          = {1808.05163v4},
+  primaryClass    = {cs.IR},
+}
+@article{sadr21_novel_deep_learn_method_textual_sentim_analy,
+  author          = {Sadr, Hossein and Solimandarabi, Mozhdeh Nazari and Pedram,
+                  Mir Mohsen and Teshnehlab, Mohammad},
+  title           = {A Novel Deep Learning Method for Textual Sentiment
+                  Analysis},
+  journal         = {CoRR},
+  year            = 2021,
+  url             = {http://arxiv.org/abs/2102.11651v1},
+  abstract        = {Sentiment analysis is known as one of the most crucial
+                  tasks in the field of natural language processing and
+                  Convolutional Neural Network (CNN) is one of those prominent
+                  models that is commonly used for this aim. Although
+                  convolutional neural networks have obtained remarkable results
+                  in recent years, they are still confronted with some
+                  limitations. Firstly, they consider that all words in a
+                  sentence have equal contributions in the sentence meaning
+                  representation and are not able to extract informative words.
+                  Secondly, they require a large number of training data to
+                  obtain considerable results while they have many parameters
+                  that must be accurately adjusted. To this end, a convolutional
+                  neural network integrated with a hierarchical attention layer
+                  is proposed which is able to extract informative words and
+                  assign them higher weight. Moreover, the effect of transfer
+                  learning that transfers knowledge learned in the source domain
+                  to the target domain with the aim of improving the performance
+                  is also explored. Based on the empirical results, the proposed
+                  model not only has higher classification accuracy and can
+                  extract informative words but also applying incremental
+                  transfer learning can significantly enhance the classification
+                  performance.},
+  archivePrefix   = {arXiv},
+  eprint          = {2102.11651},
+  primaryClass    = {cs.CL},
+}
--- a/assets/figures/CNN.png
+++ b/assets/figures/CNN.png
--- a/assets/figures/convolution.png
+++ b/assets/figures/convolution.png