Updating index splitting number (72080b67) · Commits · 钟慕尧 / deepchem

README.md

+336 −307

Original line number	Diff line number	Diff line
		# DeepChem
		# DeepChem

		DeepChem aims to provide a high quality open-source toolchain that
		democratizes the use of deep-learning in drug discovery, materials science, and quantum
		@@ -205,27 +205,30 @@ different subclasses of ``Featurizer`` for convenience:
		### Performances
		Index splitting

		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|Time(loading)/s \|Time(running)/s\|
		\|-----------\|--------------------\|-------------------\|-------------------\|----------------\|---------------\|
		\|tox21 \|logistic regression \|0.910 \|0.759 \|30 \|60 \|
		\| \|tensorflow(MT-NN) \|0.987 \|0.800 \|30 \|60 \|
		\| \|robust MT-NN \|0.979 \|0.741 \|30 \|90 \|
		\| \|graph convolution \|0.930 \|0.819 \|40 \|160 \|
		\|muv \|logistic regression \|0.910 \|0.744 \|600 \|450 \|
		\| \|tensorflow(MT-NN) \|0.980 \|0.710 \|600 \|400 \|
		\| \|robust MT-NN \|0.986 \|0.672 \|600 \|550 \|
		\| \|graph convolution \|0.881 \|0.832 \|800 \|1800 \|
		\|pcba \|logistic regression \|0.794 \|0.762 \|1800 \|10000 \|
		\| \|tensorflow(MT-NN) \|0.949 \|0.791 \|1800 \|9000 \|
		\| \|graph convolution \|0.866 \|0.836 \|2200 \|14000 \|
		\|sider \|logistic regression \|0.900 \|0.620 \|15 \|80 \|
		\| \|tensorflow(MT-NN) \|0.931 \|0.647 \|15 \|75 \|
		\| \|graph convolution \|0.845 \|0.646 \|20 \|50 \|
		\|toxcast \|logistic regression \|0.762 \|0.622 \|80 \|2600 \|
		\| \|tensorflow(MT-NN) \|0.926 \|0.705 \|80 \|2300 \|
		\| \|graph convolution \|0.906 \|0.725 \|80 \|900 \|

		Random splitting(Time omitted)
		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|
		\|-----------\|--------------------\|-------------------\|-------------------\|
		\|tox21 \|logistic regression \|0.903 \|0.705 \|
		\| \|tensorflow(MT-NN) \|0.856 \|0.763 \|
		\| \|robust MT-NN \|0.857 \|0.767 \|
		\| \|graph convolution \|0.872 \|0.798 \|
		\|muv \|logistic regression \|0.963 \|0.766 \|
		\| \|tensorflow(MT-NN) \|0.904 \|0.764 \|
		\| \|robust MT-NN \|0.934 \|0.781 \|
		\| \|graph convolution \|0.840 \|0.823 \|
		\|pcba \|logistic regression \|0.809 \|0.776 \|
		\| \|tensorflow(MT-NN) \|0.826 \|0.802 \|
		\| \|robust MT-NN \|0.809 \|0.783 \|
		\| \|graph convolution \|0.876 \|0.852 \|
		\|sider \|logistic regression \|0.933 \|0.620 \|
		\| \|tensorflow(MT-NN) \|0.775 \|0.634 \|
		\| \|robust MT-NN \|0.803 \|0.632 \|
		\| \|graph convolution \|0.708 \|0.594 \|
		\|toxcast \|logistic regression \|0.721 \|0.575 \|
		\| \|tensorflow(MT-NN) \|0.830 \|0.678 \|
		\| \|robust MT-NN \|0.825 \|0.680 \|
		\| \|graph convolution \|0.821 \|0.720 \|

		Random splitting

		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|
		\|-----------\|--------------------\|-------------------\|-------------------\|
		@@ -250,7 +253,7 @@ Random splitting(Time omitted)
		\| \|robust MT-NN \|0.814 \|0.692 \|
		\| \|graph convolution \|0.820 \|0.692 \|

		Scaffold splitting(Time omitted)
		Scaffold splitting

		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|
		\|-----------\|--------------------\|-------------------\|-------------------\|
		@@ -285,6 +288,32 @@ Number of tasks and examples in the datasets
		\|sider \|27 \|1427 \|
		\|toxcast \|617 \|8615 \|

		Time needed for benchmark test(~20h in total)

		\|Dataset \|Model \|Time(loading)/s \|Time(running)/s\|
		\|-----------\|--------------------\|----------------\|---------------\|
		\|tox21 \|logistic regression \|30 \|60 \|
		\| \|tensorflow(MT-NN) \|30 \|60 \|
		\| \|robust MT-NN \|30 \|90 \|
		\| \|graph convolution \|40 \|160 \|
		\|muv \|logistic regression \|600 \|450 \|
		\| \|tensorflow(MT-NN) \|600 \|400 \|
		\| \|robust MT-NN \|600 \|550 \|
		\| \|graph convolution \|800 \|1800 \|
		\|pcba \|logistic regression \|1800 \|10000 \|
		\| \|tensorflow(MT-NN) \|1800 \|9000 \|
		\| \|robust MT-NN \|1800 \|14000 \|
		\| \|graph convolution \|2200 \|14000 \|
		\|sider \|logistic regression \|15 \|80 \|
		\| \|tensorflow(MT-NN) \|15 \|75 \|
		\| \|robust MT-NN \|15 \|150 \|
		\| \|graph convolution \|20 \|50 \|
		\|toxcast \|logistic regression \|80 \|2600 \|
		\| \|tensorflow(MT-NN) \|80 \|2300 \|
		\| \|robust MT-NN \|80 \|4000 \|
		\| \|graph convolution \|80 \|900 \|


		## Contributing to DeepChem

		We actively encourage community contributions to DeepChem. The first place to start getting involved is by running our examples locally. Afterwards, we encourage contributors to give a shot to improving our documentation. While we take effort to provide good docs, there's plenty of room for improvement. All docs are hosted on Github, either in this `README.md` file, or in the `docs/` directory.

examples/benchmark.py

+3 −14

Original line number	Diff line number	Diff line
		@@ -13,20 +13,9 @@ Giving performances of: Random forest(rf), MultitaskDNN(tf),

		on datasets: muv, nci, pcba, tox21, sider, toxcast

		time estimation(on a nvidia tesla K20 GPU):
		tox21 - dataloading: 30s
		- tf: 40s
		muv - dataloading: 400s
		- tf: 250s
		pcba - dataloading: 30min
		- tf: 2h
		sider - dataloading: 10s
		- tf: 60s
		toxcast - dataloading: 70s
		- tf: 40min
		(will include more)

		Total time of running a benchmark test: 30h
		time estimation listed in README file

		Total time of running a benchmark test(for one splitting function): 20h
		"""
		from __future__ import print_function
		from __future__ import division

Original line number	Diff line number	Diff line
		# DeepChem
		# DeepChem

		DeepChem aims to provide a high quality open-source toolchain that
		democratizes the use of deep-learning in drug discovery, materials science, and quantum
		@@ -205,27 +205,30 @@ different subclasses of ``Featurizer`` for convenience:
		### Performances
		Index splitting

		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|Time(loading)/s \|Time(running)/s\|
		\|-----------\|--------------------\|-------------------\|-------------------\|----------------\|---------------\|
		\|tox21 \|logistic regression \|0.910 \|0.759 \|30 \|60 \|
		\| \|tensorflow(MT-NN) \|0.987 \|0.800 \|30 \|60 \|
		\| \|robust MT-NN \|0.979 \|0.741 \|30 \|90 \|
		\| \|graph convolution \|0.930 \|0.819 \|40 \|160 \|
		\|muv \|logistic regression \|0.910 \|0.744 \|600 \|450 \|
		\| \|tensorflow(MT-NN) \|0.980 \|0.710 \|600 \|400 \|
		\| \|robust MT-NN \|0.986 \|0.672 \|600 \|550 \|
		\| \|graph convolution \|0.881 \|0.832 \|800 \|1800 \|
		\|pcba \|logistic regression \|0.794 \|0.762 \|1800 \|10000 \|
		\| \|tensorflow(MT-NN) \|0.949 \|0.791 \|1800 \|9000 \|
		\| \|graph convolution \|0.866 \|0.836 \|2200 \|14000 \|
		\|sider \|logistic regression \|0.900 \|0.620 \|15 \|80 \|
		\| \|tensorflow(MT-NN) \|0.931 \|0.647 \|15 \|75 \|
		\| \|graph convolution \|0.845 \|0.646 \|20 \|50 \|
		\|toxcast \|logistic regression \|0.762 \|0.622 \|80 \|2600 \|
		\| \|tensorflow(MT-NN) \|0.926 \|0.705 \|80 \|2300 \|
		\| \|graph convolution \|0.906 \|0.725 \|80 \|900 \|

		Random splitting(Time omitted)
		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|
		\|-----------\|--------------------\|-------------------\|-------------------\|
		\|tox21 \|logistic regression \|0.903 \|0.705 \|
		\| \|tensorflow(MT-NN) \|0.856 \|0.763 \|
		\| \|robust MT-NN \|0.857 \|0.767 \|
		\| \|graph convolution \|0.872 \|0.798 \|
		\|muv \|logistic regression \|0.963 \|0.766 \|
		\| \|tensorflow(MT-NN) \|0.904 \|0.764 \|
		\| \|robust MT-NN \|0.934 \|0.781 \|
		\| \|graph convolution \|0.840 \|0.823 \|
		\|pcba \|logistic regression \|0.809 \|0.776 \|
		\| \|tensorflow(MT-NN) \|0.826 \|0.802 \|
		\| \|robust MT-NN \|0.809 \|0.783 \|
		\| \|graph convolution \|0.876 \|0.852 \|
		\|sider \|logistic regression \|0.933 \|0.620 \|
		\| \|tensorflow(MT-NN) \|0.775 \|0.634 \|
		\| \|robust MT-NN \|0.803 \|0.632 \|
		\| \|graph convolution \|0.708 \|0.594 \|
		\|toxcast \|logistic regression \|0.721 \|0.575 \|
		\| \|tensorflow(MT-NN) \|0.830 \|0.678 \|
		\| \|robust MT-NN \|0.825 \|0.680 \|
		\| \|graph convolution \|0.821 \|0.720 \|

		Random splitting

		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|
		\|-----------\|--------------------\|-------------------\|-------------------\|
		@@ -250,7 +253,7 @@ Random splitting(Time omitted)
		\| \|robust MT-NN \|0.814 \|0.692 \|
		\| \|graph convolution \|0.820 \|0.692 \|

		Scaffold splitting(Time omitted)
		Scaffold splitting

		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|
		\|-----------\|--------------------\|-------------------\|-------------------\|
		@@ -285,6 +288,32 @@ Number of tasks and examples in the datasets
		\|sider \|27 \|1427 \|
		\|toxcast \|617 \|8615 \|

		Time needed for benchmark test(~20h in total)

		\|Dataset \|Model \|Time(loading)/s \|Time(running)/s\|
		\|-----------\|--------------------\|----------------\|---------------\|
		\|tox21 \|logistic regression \|30 \|60 \|
		\| \|tensorflow(MT-NN) \|30 \|60 \|
		\| \|robust MT-NN \|30 \|90 \|
		\| \|graph convolution \|40 \|160 \|
		\|muv \|logistic regression \|600 \|450 \|
		\| \|tensorflow(MT-NN) \|600 \|400 \|
		\| \|robust MT-NN \|600 \|550 \|
		\| \|graph convolution \|800 \|1800 \|
		\|pcba \|logistic regression \|1800 \|10000 \|
		\| \|tensorflow(MT-NN) \|1800 \|9000 \|
		\| \|robust MT-NN \|1800 \|14000 \|
		\| \|graph convolution \|2200 \|14000 \|
		\|sider \|logistic regression \|15 \|80 \|
		\| \|tensorflow(MT-NN) \|15 \|75 \|
		\| \|robust MT-NN \|15 \|150 \|
		\| \|graph convolution \|20 \|50 \|
		\|toxcast \|logistic regression \|80 \|2600 \|
		\| \|tensorflow(MT-NN) \|80 \|2300 \|
		\| \|robust MT-NN \|80 \|4000 \|
		\| \|graph convolution \|80 \|900 \|


		## Contributing to DeepChem

		We actively encourage community contributions to DeepChem. The first place to start getting involved is by running our examples locally. Afterwards, we encourage contributors to give a shot to improving our documentation. While we take effort to provide good docs, there's plenty of room for improvement. All docs are hosted on Github, either in this `README.md` file, or in the `docs/` directory.

Admin message