Merge branch 'master' of https://github.com/deepchem/deepchem into binding_pocket_feat (afe44833) · Commits · 钟慕尧 / deepchem

README.md

+62 −79

Original line number	Diff line number	Diff line
		@@ -46,48 +46,28 @@ Installation from source is the only currently supported format. ```deepchem```
		conda install -c omnia openbabel=2.4.0
		```

		3. `pandas`
		```bash
		conda install pandas
		```

		4. `rdkit`
		3. `rdkit`
		```bash
		conda install -c omnia rdkit
		```

		5. `boost`
		```bash
		conda install -c omnia boost=1.59.0
		```

		6. `joblib`
		4. `joblib`
		```bash
		conda install joblib
		```

		7. `keras`
		```bash
		pip install keras --user
		```
		`deepchem` only supports the `tensorflow` backend for keras. To set the backend to `tensorflow`,
		add the following line to your `~/.bashrc`
		5. `keras`
		```bash
		export KERAS_BACKEND=tensorflow
		pip install keras
		```
		See [keras docs](https://keras.io/backend/) for more details and alternate methods of setting backend.
		`deepchem` only supports the `tensorflow` (default) backend for keras.

		8. `mdtraj`
		6. `mdtraj`
		```bash
		conda install -c omnia mdtraj
		```

		9. `scikit-learn`
		```bash
		conda install scikit-learn
		```

		10. `tensorflow`: Installing `tensorflow` on older versions of Linux (which
		7. `tensorflow`: Installing `tensorflow` on older versions of Linux (which
		have glibc < 2.17) can be very challenging. For these older Linux versions,
		contact your local sysadmin to work out a custom installation. If your
		version of Linux is recent, then the following command will work:
		@@ -95,12 +75,7 @@ Installation from source is the only currently supported format. ```deepchem```
		conda install -c https://conda.anaconda.org/jjhelmus tensorflow
		```

		11. `h5py`:
		```
		conda install h5py
		```

		12. `deepchem`: Clone the `deepchem` github repo:
		8. `deepchem`: Clone the `deepchem` github repo:
		```bash
		git clone https://github.com/deepchem/deepchem.git
		```
		@@ -109,9 +84,9 @@ Installation from source is the only currently supported format. ```deepchem```
		python setup.py install
		```

		13. To run test suite, install `nosetests`:
		9. To run test suite, install `nosetests`:
		```bash
		pip install nose --user
		pip install nose
		```
		Make sure that the correct version of `nosetests` is active by running
		```bash
		@@ -120,7 +95,7 @@ Installation from source is the only currently supported format. ```deepchem```
		You might need to uninstall a system install of `nosetests` if
		there is a conflict.

		14. If installation has been successful, all tests in test suite should pass:
		10. If installation has been successful, all tests in test suite should pass:
		```bash
		nosetests -v deepchem --nologcapture
		```
		@@ -181,10 +156,12 @@ Environmental Protection Agency, Environmental Research Laboratory, 1987.
		Most machine learning algorithms require that input data form vectors.
		However, input data for drug-discovery datasets routinely come in the
		format of lists of molecules and associated experimental readouts. To
		transform lists of molecules into vectors, we need to use the DeepChem
		loader class ``dc.load.DataLoader``. Instances of this class must be
		passed a ``Featurizer`` object. DeepChem provides a number of
		different subclasses of ``Featurizer`` for convenience:
		transform lists of molecules into vectors, we need to subclasses of DeepChem
		loader class ```dc.data.DataLoader``` such as ```dc.data.CSVLoader``` or
		```dc.data.SDFLoader```. Users can subclass ```dc.data.DataLoader``` to
		load arbitrary file formats. All loaders must be
		passed a ```dc.feat.Featurizer``` object. DeepChem provides a number of
		different subclasses of ```dc.feat.Featurizer``` for convenience.

		### Performances
		* Classification
		@@ -218,26 +195,26 @@ Random splitting

		\|Dataset \|Model \|Train score/ROC-AUC\|Valid score/ROC-AUC\|
		\|-----------\|--------------------\|-------------------\|-------------------\|
		\|tox21 \|logistic regression \|0.903 \|0.741 \|
		\| \|Multitask network \|0.846 \|0.812 \|
		\| \|robust MT-NN \|0.844 \|0.793 \|
		\| \|graph convolution \|0.872 \|0.816 \|
		\|muv \|logistic regression \|0.961 \|0.696 \|
		\| \|Multitask network \|0.895 \|0.740 \|
		\| \|robust MT-NN \|0.914 \|0.667 \|
		\| \|graph convolution \|0.846 \|0.776 \|
		\|pcba \|logistic regression \|0.807 \|0.772 \|
		\| \|Multitask network \|0.811 \|0.787 \|
		\| \|robust MT-NN \|0.809 \|0.778 \|
		\| \|graph convolution \|0.875 \|0.844 \|
		\|sider \|logistic regression \|0.932 \|0.628 \|
		\| \|Multitask network \|0.779 \|0.665 \|
		\| \|robust MT-NN \|0.761 \|0.621 \|
		\| \|graph convolution \|0.706 \|0.638 \|
		\|toxcast \|logistic regression \|0.737 \|0.543 \|
		\| \|Multitask network \|0.831 \|0.684 \|
		\| \|robust MT-NN \|0.814 \|0.692 \|
		\| \|graph convolution \|0.820 \|0.692 \|
		\|tox21 \|logistic regression \|0.903 \|0.735 \|
		\| \|Multitask network \|0.856 \|0.783 \|
		\| \|robust MT-NN \|0.855 \|0.773 \|
		\| \|graph convolution \|0.865 \|0.827 \|
		\|muv \|logistic regression \|0.957 \|0.719 \|
		\| \|Multitask network \|0.902 \|0.734 \|
		\| \|robust MT-NN \|0.933 \|0.732 \|
		\| \|graph convolution \|0.860 \|0.730 \|
		\|pcba \|logistic regression \|0.808 \|0.776 \|
		\| \|Multitask network \|0.811 \|0.778 \|
		\| \|robust MT-NN \|0.811 \|0.771 \|
		\| \|graph convolution \|0.872 \|0.844 \|
		\|sider \|logistic regression \|0.929 \|0.656 \|
		\| \|Multitask network \|0.777 \|0.655 \|
		\| \|robust MT-NN \|0.804 \|0.630 \|
		\| \|graph convolution \|0.705 \|0.618 \|
		\|toxcast \|logistic regression \|0.725 \|0.586 \|
		\| \|Multitask network \|0.836 \|0.684 \|
		\| \|robust MT-NN \|0.822 \|0.681 \|
		\| \|graph convolution \|0.820 \|0.717 \|

		Scaffold splitting

		@@ -269,11 +246,14 @@ Scaffold splitting
		\|Dataset \|Model \|Splitting \|Train score/R2\|Valid score/R2\|
		\|-----------\|--------------------\|------------\|--------------\|--------------\|
		\|delaney \|MT-NN regression \|Index \|0.773 \|0.574 \|
		\| \|graphconv regression\|Index \|0.964 \|0.829 \|
		\| \|graphconv regression\|Index \|0.991 \|0.825 \|
		\| \|MT-NN regression \|Random \|0.769 \|0.591 \|
		\| \|graphconv regression\|Random \|0.959 \|0.821 \|
		\| \|graphconv regression\|Random \|0.996 \|0.873 \|
		\| \|MT-NN regression \|Scaffold \|0.782 \|0.426 \|
		\| \|graphconv regression\|Scaffold \|0.976 \|0.581 \|
		\| \|graphconv regression\|Scaffold \|0.994 \|0.606 \|
		\|nci \|MT-NN regression \|Index \|0.890 \|0.890 \|
		\| \|MT-NN regression \|Random \|0.891 \|0.888 \|
		\| \|MT-NN regression \|Scaffold \|0.912 \|0.020 \|
		\|kaggle \|MT-NN regression \|User-defined\|0.748 \|0.452 \|

		* General features
		@@ -289,6 +269,7 @@ Number of tasks and examples in the datasets
		\|toxcast \|617 \|8615 \|
		\|delaney \|1 \|1128 \|
		\|kaggle \|15 \|173065 \|
		\|nci \|60 \|1057371 \|

		Time needed for benchmark test(~20h in total)

		@@ -315,6 +296,8 @@ Time needed for benchmark test(~20h in total)
		\| \|robust MT-NN \|80 \|4000 \|
		\| \|graph convolution \|80 \|900 \|
		\|delaney \|MT-NN regression \|10 \|40 \|
		\| \|graphconv regression\|10 \|40 \|
		\|nci \|MT-NN regression \|2000 \|30000 \|
		\|kaggle \|MT-NN regression \|2200 \|3200 \|

deepchem/dock/docking.py

+2 −2

Original line number	Diff line number	Diff line
		@@ -34,7 +34,7 @@ class VinaGridRFDocker(Docker):
		"""Builds model."""
		self.base_dir = tempfile.mkdtemp()
		print("About to download trained model.")
		call(("wget http://deepchem.io.s3-website-us-west-1.amazonaws.com/trained_models/random_full_RF.tar.gz").split())
		call(("wget -c http://deepchem.io.s3-website-us-west-1.amazonaws.com/trained_models/random_full_RF.tar.gz").split())
		call(("tar -zxvf random_full_RF.tar.gz").split())
		call(("mv random_full_RF %s" % (self.base_dir)).split())
		self.model_dir = os.path.join(self.base_dir, "random_full_RF")
		@@ -60,7 +60,7 @@ class VinaGridDNNDocker(object):
		"""Builds model."""
		self.base_dir = tempfile.mkdtemp()
		print("About to download trained model.")
		call(("wget http://deepchem.io.s3-website-us-west-1.amazonaws.com/trained_models/random_full_DNN.tar.gz").split())
		call(("wget -c http://deepchem.io.s3-website-us-west-1.amazonaws.com/trained_models/random_full_DNN.tar.gz").split())
		call(("tar -zxvf random_full_DNN.tar.gz").split())
		call(("mv random_full_DNN %s" % (self.base_dir)).split())
		self.model_dir = os.path.join(self.base_dir, "random_full_DNN")

deepchem/dock/pose_generation.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -67,7 +67,7 @@ class VinaPoseGenerator(PoseGenerator):
		print("Vina not available. Downloading")
		# TODO(rbharath): May want to move this file to S3 so we can ensure it's
		# always available.
		wget_cmd = "wget http://vina.scripps.edu/download/autodock_vina_1_1_2_linux_x86.tgz"
		wget_cmd = "wget -c http://vina.scripps.edu/download/autodock_vina_1_1_2_linux_x86.tgz"
		call(wget_cmd.split())
		print("Downloaded Vina. Extracting")
		download_cmd = "tar xzvf autodock_vina_1_1_2_linux_x86.tgz"

deepchem/dock/tests/test_pose_scoring.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -25,7 +25,7 @@ class TestPoseScoring(unittest.TestCase):
		"""
		def setUp(self):
		"""Downloads dataset."""
		call("wget http://deepchem.io.s3-website-us-west-1.amazonaws.com/featurized_datasets/core_grid.tar.gz".split())
		call("wget -c http://deepchem.io.s3-website-us-west-1.amazonaws.com/featurized_datasets/core_grid.tar.gz".split())
		call("tar -zxvf core_grid.tar.gz".split())
		self.core_dataset = dc.data.DiskDataset("core_grid/")

deepchem/feat/graph_features.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -110,7 +110,7 @@ def atom_features(atom, bool_id_feat=False):
		'Sb', 'Sn', 'Ag', 'Pd', 'Co', 'Se', 'Ti', 'Zn', 'H', # H?
		'Li', 'Ge', 'Cu', 'Au', 'Ni', 'Cd', 'In', 'Mn', 'Zr',
		'Cr', 'Pt', 'Hg', 'Pb', 'Unknown']) +
		one_of_k_encoding(atom.GetDegree(), [0, 1, 2, 3, 4, 5, 6]) +
		one_of_k_encoding(atom.GetDegree(), [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) +
		one_of_k_encoding_unk(atom.GetTotalNumHs(), [0, 1, 2, 3, 4]) +
		one_of_k_encoding_unk(atom.GetImplicitValence(), [0, 1, 2, 3, 4, 5, 6]) +
		[atom.GetFormalCharge(), atom.GetNumRadicalElectrons()] +

Admin message