Commit Graph

40 Commits

Author SHA1 Message Date
coolneng 1311b9b945
Apply isort to the project 2021-07-06 03:01:43 +02:00
coolneng 92c6b54966
Implement model inference of sequences 2021-07-06 02:59:37 +02:00
coolneng 1333a9256b
Remove logs directory 2021-07-06 02:12:42 +02:00
coolneng eabb7f0285
Change model architecture to a MLP 2021-07-06 01:44:58 +02:00
coolneng 1a1262b0b1
Pad and mask the sequences in each batch 2021-07-05 19:55:31 +02:00
coolneng 70363a82a0
Refactor sequence preprocessing 2021-07-05 19:54:48 +02:00
coolneng 72e3de945a
Add type hints to the main module 2021-07-05 03:52:26 +02:00
coolneng bcc4f4b4d4
Parse data and label files from CLI arguments 2021-07-05 03:49:14 +02:00
coolneng a3780c9761
Move hyperparameters to a class 2021-07-05 03:24:54 +02:00
coolneng e07d0dcdbf
Change Flatten layer, loss function and add Input 2021-06-26 17:52:20 +02:00
coolneng 1237394bb1
Perform one hot encoding on the sequences 2021-06-25 00:05:14 +02:00
coolneng e9582d0883
Parallelize dataset transformations 2021-06-24 19:54:19 +02:00
coolneng b2f20f2070
Revert "Remove dense Tensor transformation"
This reverts commit 0912600fdc.
2021-06-24 17:10:07 +02:00
coolneng c9466baa68
Align altered sequence with the reference sequence 2021-06-23 18:29:16 +02:00
coolneng 0912600fdc
Remove dense Tensor transformation 2021-06-23 18:28:09 +02:00
coolneng 1e433c123f
Remove base counts from the dataset 2021-06-16 13:02:49 +02:00
coolneng 7029b64906
Refactor the casting function using a loop 2021-06-15 00:22:55 +02:00
coolneng 379303b440
Cast the parsed features to int32 2021-06-15 00:18:38 +02:00
coolneng d2e5fd0fa3
Build model incrementally 2021-06-14 23:32:49 +02:00
coolneng 19ed847d12
Convert sequence and label to VarLenFeature 2021-06-14 19:33:42 +02:00
coolneng 498d93de2a
Execute the training loop in the model module 2021-06-10 13:27:55 +02:00
coolneng 08611de8e6
Fix Tensorflow seed assignment 2021-06-07 19:26:21 +02:00
coolneng 0ce582250d
Implement the training loop and metrics evaluation 2021-06-06 00:20:03 +02:00
coolneng 168a68b50d
Update documentation about data splits 2021-06-06 00:13:37 +02:00
coolneng 8870da8543
Create a validation set 2021-06-06 00:04:18 +02:00
coolneng 38903c5737
Rename ref_sequence to label 2021-06-06 00:03:15 +02:00
coolneng 035162bd8d
Fix position weight matrix assignment 2021-06-05 20:40:13 +02:00
coolneng 02d20d4e72
Add reference sequence to each dataset instance 2021-06-05 20:34:59 +02:00
coolneng c9de0c8320
Add learning rate and l2 regularizer constants 2021-06-03 18:52:26 +02:00
coolneng ccaa8484c7
Document read_dataset and process_input 2021-06-03 18:51:49 +02:00
coolneng f8c1a54be3
Apply index-based encoding to the DNA sequence 2021-06-03 18:29:43 +02:00
coolneng d34e291085
Generate a dataset from the TFRecords files 2021-06-01 23:06:25 +02:00
coolneng 220c0482f1
Move hardcorded data to a constants module 2021-06-01 19:27:10 +02:00
coolneng 44ff69dc9e
Document the preprocessing module 2021-06-01 18:46:17 +02:00
coolneng 5ac81c049f
Change BASES constant to a local variable 2021-06-01 18:34:29 +02:00
coolneng 16c01afbe7
Create a dataset and write it to TFRecords files 2021-06-01 18:26:13 +02:00
coolneng 59aa61112e
Create a basic CNN model 2021-05-31 20:02:44 +02:00
coolneng 731b76a0af
Remove redundant modules 2021-05-31 20:00:43 +02:00
coolneng 6201e35e99
Document the data parsing function 2021-05-11 20:41:54 +02:00
coolneng eb072836a1
Parse a FASTQ file into a Tensor 2021-05-06 20:34:39 +02:00