Commit Graph

60 Commits

Author SHA1 Message Date
coolneng c24f528484
Update trained model and dataset 2021-07-06 03:37:36 +02:00
coolneng 6dd0d7e0ba
Add trained model 2021-07-06 03:07:54 +02:00
coolneng 1311b9b945
Apply isort to the project 2021-07-06 03:01:43 +02:00
coolneng 92c6b54966
Implement model inference of sequences 2021-07-06 02:59:37 +02:00
coolneng 1333a9256b
Remove logs directory 2021-07-06 02:12:42 +02:00
coolneng eabb7f0285
Change model architecture to a MLP 2021-07-06 01:44:58 +02:00
coolneng 1a1262b0b1
Pad and mask the sequences in each batch 2021-07-05 19:55:31 +02:00
coolneng 70363a82a0
Refactor sequence preprocessing 2021-07-05 19:54:48 +02:00
coolneng 72e3de945a
Add type hints to the main module 2021-07-05 03:52:26 +02:00
coolneng bcc4f4b4d4
Parse data and label files from CLI arguments 2021-07-05 03:49:14 +02:00
coolneng a3780c9761
Move hyperparameters to a class 2021-07-05 03:24:54 +02:00
coolneng e07d0dcdbf
Change Flatten layer, loss function and add Input 2021-06-26 17:52:20 +02:00
coolneng 4d67bdac30
Add poetry installation step to README 2021-06-26 04:35:59 +02:00
coolneng 1237394bb1
Perform one hot encoding on the sequences 2021-06-25 00:05:14 +02:00
coolneng e9582d0883
Parallelize dataset transformations 2021-06-24 19:54:19 +02:00
coolneng b2f20f2070
Revert "Remove dense Tensor transformation"
This reverts commit 0912600fdc.
2021-06-24 17:10:07 +02:00
coolneng c9466baa68
Align altered sequence with the reference sequence 2021-06-23 18:29:16 +02:00
coolneng 0912600fdc
Remove dense Tensor transformation 2021-06-23 18:28:09 +02:00
coolneng 1e433c123f
Remove base counts from the dataset 2021-06-16 13:02:49 +02:00
coolneng a2ae7bbe11
Add the Jupyter notebook 2021-06-15 01:00:45 +02:00
coolneng 7a568f4f98
Create logs directory 2021-06-15 00:38:09 +02:00
coolneng 7029b64906
Refactor the casting function using a loop 2021-06-15 00:22:55 +02:00
coolneng 379303b440
Cast the parsed features to int32 2021-06-15 00:18:38 +02:00
coolneng d2e5fd0fa3
Build model incrementally 2021-06-14 23:32:49 +02:00
coolneng 19ed847d12
Convert sequence and label to VarLenFeature 2021-06-14 19:33:42 +02:00
coolneng c6d0d5959d
Update gitignore 2021-06-10 19:23:05 +02:00
coolneng 2c07c5975f
Add usage instructions 2021-06-10 19:22:41 +02:00
coolneng 498d93de2a
Execute the training loop in the model module 2021-06-10 13:27:55 +02:00
coolneng 3b2b6c4af9
Remove deprecated org notebook 2021-06-10 13:19:03 +02:00
coolneng 00e3389f5b
Add datasets 2021-06-10 13:18:25 +02:00
coolneng 08611de8e6
Fix Tensorflow seed assignment 2021-06-07 19:26:21 +02:00
coolneng 0ce582250d
Implement the training loop and metrics evaluation 2021-06-06 00:20:03 +02:00
coolneng 168a68b50d
Update documentation about data splits 2021-06-06 00:13:37 +02:00
coolneng 8870da8543
Create a validation set 2021-06-06 00:04:18 +02:00
coolneng 38903c5737
Rename ref_sequence to label 2021-06-06 00:03:15 +02:00
coolneng 035162bd8d
Fix position weight matrix assignment 2021-06-05 20:40:13 +02:00
coolneng 02d20d4e72
Add reference sequence to each dataset instance 2021-06-05 20:34:59 +02:00
coolneng f30fc31c29
Update README 2021-06-04 12:18:44 +02:00
coolneng c9de0c8320
Add learning rate and l2 regularizer constants 2021-06-03 18:52:26 +02:00
coolneng ccaa8484c7
Document read_dataset and process_input 2021-06-03 18:51:49 +02:00
coolneng f8c1a54be3
Apply index-based encoding to the DNA sequence 2021-06-03 18:29:43 +02:00
coolneng d34e291085
Generate a dataset from the TFRecords files 2021-06-01 23:06:25 +02:00
coolneng 220c0482f1
Move hardcorded data to a constants module 2021-06-01 19:27:10 +02:00
coolneng 44ff69dc9e
Document the preprocessing module 2021-06-01 18:46:17 +02:00
coolneng 5ac81c049f
Change BASES constant to a local variable 2021-06-01 18:34:29 +02:00
coolneng ad49e598db
Update gitignore 2021-06-01 18:27:16 +02:00
coolneng 16c01afbe7
Create a dataset and write it to TFRecords files 2021-06-01 18:26:13 +02:00
coolneng 59aa61112e
Create a basic CNN model 2021-05-31 20:02:44 +02:00
coolneng 731b76a0af
Remove redundant modules 2021-05-31 20:00:43 +02:00
coolneng e957e714e6
Replace tensorflow-io with biopython 2021-05-31 12:30:57 +02:00