Commit 72080b67 authored by ZHENQIN WU's avatar ZHENQIN WU
Browse files

Updating index splitting number

parent 76b76fde
Loading
Loading
Loading
Loading
+336 −307
Original line number Diff line number Diff line
# DeepChem
# DeepChem

DeepChem aims to provide a high quality open-source toolchain that
democratizes the use of deep-learning in drug discovery, materials science, and quantum
@@ -205,27 +205,30 @@ different subclasses of ``Featurizer`` for convenience:
### Performances
Index splitting

|Dataset    |Model               |Train score/ROC-AUC|Valid score/ROC-AUC|Time(loading)/s |Time(running)/s|
|-----------|--------------------|-------------------|-------------------|----------------|---------------| 
|tox21      |logistic regression |0.910              |0.759              |30              |60             |
|           |tensorflow(MT-NN)   |0.987              |0.800              |30              |60             |
|           |robust MT-NN        |0.979              |0.741              |30              |90             |
|           |graph convolution   |0.930              |0.819              |40              |160            |
|muv        |logistic regression |0.910              |0.744              |600             |450            |
|           |tensorflow(MT-NN)   |0.980              |0.710              |600             |400            |
|           |robust MT-NN        |0.986              |0.672              |600             |550            |
|           |graph convolution   |0.881              |0.832              |800             |1800           |
|pcba       |logistic regression |0.794        	     |0.762              |1800            |10000          |
|           |tensorflow(MT-NN)	|0.949        	     |0.791              |1800            |9000           |
|           |graph convolution   |0.866        	     |0.836              |2200            |14000          |
|sider      |logistic regression |0.900        	     |0.620              |15              |80             |
|           |tensorflow(MT-NN)	|0.931        	     |0.647              |15              |75             |
|           |graph convolution   |0.845        	     |0.646              |20              |50             |
|toxcast    |logistic regression |0.762        	     |0.622              |80              |2600           |
|           |tensorflow(MT-NN)	|0.926        	     |0.705              |80              |2300           |
|           |graph convolution   |0.906        	     |0.725              |80              |900            |

Random splitting(Time omitted)
|Dataset    |Model               |Train score/ROC-AUC|Valid score/ROC-AUC|
|-----------|--------------------|-------------------|-------------------|
|tox21      |logistic regression |0.903              |0.705              |
|           |tensorflow(MT-NN)   |0.856              |0.763              |
|           |robust MT-NN        |0.857              |0.767              |
|           |graph convolution   |0.872              |0.798              |
|muv        |logistic regression |0.963              |0.766              |
|           |tensorflow(MT-NN)   |0.904              |0.764              |
|           |robust MT-NN        |0.934              |0.781              |
|           |graph convolution   |0.840              |0.823              |
|pcba       |logistic regression |0.809              |0.776              |
|           |tensorflow(MT-NN)   |0.826              |0.802              |
|           |robust MT-NN        |0.809              |0.783              |
|           |graph convolution   |0.876              |0.852              |
|sider      |logistic regression |0.933              |0.620              |
|           |tensorflow(MT-NN)   |0.775              |0.634              |
|           |robust MT-NN        |0.803              |0.632              |
|           |graph convolution   |0.708              |0.594              |
|toxcast    |logistic regression |0.721              |0.575              |
|           |tensorflow(MT-NN)   |0.830              |0.678              |
|           |robust MT-NN        |0.825              |0.680              |
|           |graph convolution   |0.821              |0.720              |

Random splitting

|Dataset    |Model               |Train score/ROC-AUC|Valid score/ROC-AUC|
|-----------|--------------------|-------------------|-------------------|
@@ -250,7 +253,7 @@ Random splitting(Time omitted)
|           |robust MT-NN        |0.814              |0.692              |
|           |graph convolution   |0.820        	     |0.692              |

Scaffold splitting(Time omitted)
Scaffold splitting

|Dataset    |Model               |Train score/ROC-AUC|Valid score/ROC-AUC|
|-----------|--------------------|-------------------|-------------------|
@@ -285,6 +288,32 @@ Number of tasks and examples in the datasets
|sider      |27         |1427       |
|toxcast    |617        |8615       |

Time needed for benchmark test(~20h in total)

|Dataset    |Model               |Time(loading)/s |Time(running)/s|
|-----------|--------------------|----------------|---------------| 
|tox21      |logistic regression |30              |60             |
|           |tensorflow(MT-NN)   |30              |60             |
|           |robust MT-NN        |30              |90             |
|           |graph convolution   |40              |160            |
|muv        |logistic regression |600             |450            |
|           |tensorflow(MT-NN)   |600             |400            |
|           |robust MT-NN        |600             |550            |
|           |graph convolution   |800             |1800           |
|pcba       |logistic regression |1800            |10000          |
|           |tensorflow(MT-NN)	 |1800            |9000           |
|           |robust MT-NN        |1800            |14000          |
|           |graph convolution   |2200            |14000          |
|sider      |logistic regression |15              |80             |
|           |tensorflow(MT-NN)	 |15              |75             |
|           |robust MT-NN        |15              |150            |
|           |graph convolution   |20              |50             |
|toxcast    |logistic regression |80              |2600           |
|           |tensorflow(MT-NN)   |80              |2300           |
|           |robust MT-NN        |80              |4000           |
|           |graph convolution   |80              |900            |


## Contributing to DeepChem

We actively encourage community contributions to DeepChem. The first place to start getting involved is by running our examples locally. Afterwards, we encourage contributors to give a shot to improving our documentation. While we take effort to provide good docs, there's plenty of room for improvement. All docs are hosted on Github, either in this `README.md` file, or in the `docs/` directory.
+3 −14
Original line number Diff line number Diff line
@@ -13,20 +13,9 @@ Giving performances of: Random forest(rf), MultitaskDNN(tf),
                    
on datasets: muv, nci, pcba, tox21, sider, toxcast

time estimation(on a nvidia tesla K20 GPU):
tox21   - dataloading: 30s
        - tf: 40s
muv     - dataloading: 400s
        - tf: 250s
pcba    - dataloading: 30min
        - tf: 2h
sider   - dataloading: 10s
        - tf: 60s
toxcast - dataloading: 70s
        - tf: 40min
(will include more)

Total time of running a benchmark test: 30h
time estimation listed in README file

Total time of running a benchmark test(for one splitting function): 20h
"""
from __future__ import print_function
from __future__ import division