Fixed imports (2091afd6) · Commits · 钟慕尧 / deepchem

deepchem/models/init.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -35,4 +35,4 @@ from deepchem.models.tensorgraph.models.chemnet_models import Smiles2Vec, ChemCe
		#################### Compatibility imports for renamed TensorGraph models. Remove below with DeepChem 3.0. ####################

		from deepchem.models.tensorgraph.models.text_cnn import TextCNNTensorGraph
		from deepchem.models.tensorgraph.models.graph_models import WeaveTensorGraph, DTNNTensorGraph, DAGTensorGraph, GraphConvTensorGraph, MPNNTensorGraph
		from deepchem.models.graph_models import WeaveTensorGraph, DTNNTensorGraph, DAGTensorGraph, GraphConvTensorGraph, MPNNTensorGraph

examples/adme/run_benchmarks.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -14,7 +14,7 @@ from sklearn import svm
		import tensorflow as tf
		tf.set_random_seed(123)
		import deepchem as dc
		from deepchem.models.tensorgraph.models.graph_models import GraphConvModel
		from deepchem.models.graph_models import GraphConvModel

		BATCH_SIZE = 128
		# Set to higher values to get better numbers

examples/low_data/sider_graph_conv_one_fold.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -9,7 +9,7 @@ import tempfile
		import numpy as np
		import tensorflow as tf
		import deepchem as dc
		from deepchem.models.tensorgraph.models.graph_models import GraphConvModel
		from deepchem.models.graph_models import GraphConvModel

		# 4-fold splits
		K = 4

examples/notebooks/Using_Tensorboard.ipynb

+1 −1

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Using Tensorboard in DeepChem

		%% Cell type:markdown id: tags:

		DeepChem Neural Networks models are built on top of [tensorflow](https://www.tensorflow.org). [Tensorboard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) is a powerful visualization tool in tensorflow for viewing your model architecture and performance.

		In this tutorial we will show how to turn on tensorboard logging for our models, and go show the network architecture for some of our more popular models.

		%% Cell type:markdown id: tags:

		The first thing we have to do is load a dataset that we will monitor model performance over.

		%% Cell type:code id: tags:

		``` python
		from IPython.display import Image, display
		import deepchem as dc
		from deepchem.molnet import load_tox21
		from deepchem.models.tensorgraph.models.graph_models import GraphConvModel
		from deepchem.models.graph_models import GraphConvModel

		# Load Tox21 dataset
		tox21_tasks, tox21_datasets, transformers = load_tox21(featurizer='GraphConv')
		train_dataset, valid_dataset, test_dataset = tox21_datasets

		```

		%% Output

		/home/leswing/miniconda3/envs/l25dfIi9oAFfmEN6/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
		from ._conv import register_converters as _register_converters

		Loading dataset from disk.
		Loading dataset from disk.
		Loading dataset from disk.

		%% Cell type:markdown id: tags:

		Now we will create our model with tensorboard on. All we have to do to turn tensorboard on is pass the tensorboard=True flag to the constructor of our model

		%% Cell type:code id: tags:

		``` python
		# Construct the model with tensorbaord on
		model = GraphConvModel(len(tox21_tasks), mode='classification', tensorboard=True, model_dir='models')

		# Fit the model
		model.fit(train_dataset, nb_epoch=10)
		```

		%% Output

		/home/leswing/miniconda3/envs/l25dfIi9oAFfmEN6/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
		"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

		Starting epoch 0
		Starting epoch 1
		Starting epoch 2
		Starting epoch 3
		Starting epoch 4
		Starting epoch 5
		Starting epoch 6
		Starting epoch 7
		Starting epoch 8
		Starting epoch 9
		Ending global_step 630: Average loss 932.451
		TIMING: model fitting took 52.909 s

		932.4511722625248

		%% Cell type:markdown id: tags:

		### Viewing the Tensorboard output
		When tensorboard is turned on we log all the files needed for tensorboard in model.model_dir. To launch the tensorboard webserver we have to call in a terminal
		```bash
		tensorboard --logdir models/ --port 6006
		```

		This will launch the tensorboard web server on your local computer on port 6006. Go to http://localhost:6006 in your web browser to look through tensorboard's UI.

		The first thing you will see is a graph of the loss vs mini-batches. You can use this data to determine if your model is still improving it's loss function over time or to find out if your gradients are exploding!.

		%% Cell type:code id: tags:

		``` python
		display(Image(filename='assets/tensorboard_landing.png'))
		```

		%% Output



		%% Cell type:markdown id: tags:

		If you click "GRAPHS" at the top you can see a visual layout of the model. Here is what our GraphConvModel Model looks like

		%% Cell type:code id: tags:

		``` python
		display(Image(filename='assets/GraphConvArch.png'))
		```

		%% Output



		%% Cell type:markdown id: tags:

		The "GraphGather" box is the "Neural Fingerprint" developed by learning features of the molecule via graph convolutions.

		Using tensorboard to visualize the model is a fast way to get a high level understanding of what a model is made of and how it works!

		%% Cell type:code id: tags:

		``` python
		```

examples/notebooks/graph_convolutional_networks_for_tox21.ipynb

+1 −1

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Graph Convolutions For Tox21
		In this notebook, we will explore the use of TensorGraph to create graph convolutional models with DeepChem. In particular, we will build a graph convolutional network on the Tox21 dataset.

		Let's start with some basic imports.

		%% Cell type:code id: tags:

		``` python
		from __future__ import division
		from __future__ import print_function
		from __future__ import unicode_literals

		import numpy as np
		import tensorflow as tf
		import deepchem as dc
		from deepchem.models.tensorgraph.models.graph_models import GraphConvModel
		from deepchem.models.graph_models import GraphConvModel
		```

		%% Cell type:markdown id: tags:

		Now, let's use MoleculeNet to load the Tox21 dataset. We need to make sure to process the data in a way that graph convolutional networks can use For that, we make sure to set the featurizer option to 'GraphConv'. The MoleculeNet call will return a training set, an validation set, and a test set for us to use. The call also returns `transformers`, a list of data transformations that were applied to preprocess the dataset. (Most deep networks are quite finicky and require a set of data transformations to ensure that training proceeds stably.)

		%% Cell type:code id: tags:

		``` python
		# Load Tox21 dataset
		tox21_tasks, tox21_datasets, transformers = dc.molnet.load_tox21(featurizer='GraphConv')
		train_dataset, valid_dataset, test_dataset = tox21_datasets
		```

		%% Output

		Loading dataset from disk.
		Loading dataset from disk.
		Loading dataset from disk.

		%% Cell type:markdown id: tags:

		Let's now train a graph convolutional network on this dataset. DeepChem has the class `GraphConvModel` that wraps a standard graph convolutional architecture underneath the hood for user convenience. Let's instantiate an object of this class and train it on our dataset.

		%% Cell type:code id: tags:

		``` python
		model = GraphConvModel(
		len(tox21_tasks), batch_size=50, mode='classification')
		# Set nb_epoch=10 for better results.
		model.fit(train_dataset, nb_epoch=1)
		```

		%% Output

		/home/leswing/miniconda3/envs/deepchem/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py:95: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
		"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

		Starting epoch 0
		Ending global_step 126: Average loss 590.739
		TIMING: model fitting took 7.317 s

		590.7391258118645

		%% Cell type:markdown id: tags:

		Let's try to evaluate the performance of the model we've trained. For this, we need to define a metric, a measure of model performance. `dc.metrics` holds a collection of metrics already. For this dataset, it is standard to use the ROC-AUC score, the area under the receiver operating characteristic curve (which measures the tradeoff between precision and recall). Luckily, the ROC-AUC score is already available in DeepChem.

		To measure the performance of the model under this metric, we can use the convenience function `model.evaluate()`.

		%% Cell type:code id: tags:

		``` python
		metric = dc.metrics.Metric(
		dc.metrics.roc_auc_score, np.mean, mode="classification")

		print("Evaluating model")
		train_scores = model.evaluate(train_dataset, [metric], transformers)
		print("Training ROC-AUC Score: %f" % train_scores["mean-roc_auc_score"])
		valid_scores = model.evaluate(valid_dataset, [metric], transformers)
		print("Validation ROC-AUC Score: %f" % valid_scores["mean-roc_auc_score"])
		```

		%% Output

		Evaluating model
		computed_metrics: [0.80045699830862893, 0.83618637604367374, 0.83908539936708681, 0.77873855933094183, 0.67692252993044244, 0.75578036941489168, 0.75895796821704797, 0.70234314980793855, 0.76387081283102387, 0.65924917162534913, 0.78448201448364341, 0.76675448900822296]
		Training ROC-AUC Score: 0.760236
		computed_metrics: [0.71533171721169553, 0.74090608465608465, 0.81106357802757933, 0.70627859684799188, 0.63177272727272715, 0.6326016835811501, 0.61491865697473169, 0.71286314850043442, 0.67676006592889104, 0.51656328658755846, 0.75414979999520937, 0.6603359173126615]
		Validation ROC-AUC Score: 0.681129

		%% Cell type:markdown id: tags:

		What's going on under the hood? Could we build `GraphConvModel` ourselves? Of course! The first step is to create a `TensorGraph` object. This object will hold the "computational graph" that defines the computation that a graph convolutional network will perform.

		%% Cell type:code id: tags:

		``` python
		from deepchem.models.tensorgraph.tensor_graph import TensorGraph

		tg = TensorGraph(use_queue=False)
		```

		%% Cell type:markdown id: tags:

		Let's now define the inputs to our model. Conceptually, graph convolutions just requires a the structure of the molecule in question and a vector of features for every atom that describes the local chemical environment. However in practice, due to TensorFlow's limitations as a general programming environment, we have to have some auxiliary information as well preprocessed.

		`atom_features` holds a feature vector of length 75 for each atom. The other feature inputs are required to support minibatching in TensorFlow. `degree_slice` is an indexing convenience that makes it easy to locate atoms from all molecules with a given degree. `membership` determines the membership of atoms in molecules (atom `i` belongs to molecule `membership[i]`). `deg_adjs` is a list that contains adjacency lists grouped by atom degree For more details, check out the [code](https://github.com/deepchem/deepchem/blob/master/deepchem/feat/mol_graphs.py).

		To define feature inputs in `TensorGraph`, we use the `Feature` layer. Conceptually, a `TensorGraph` is a mathematical graph composed of layer objects. `Features` layers have to be the root nodes of the graph since they consitute inputs.

		%% Cell type:code id: tags:

		``` python
		from deepchem.models.tensorgraph.layers import Feature

		atom_features = Feature(shape=(None, 75))
		degree_slice = Feature(shape=(None, 2), dtype=tf.int32)
		membership = Feature(shape=(None,), dtype=tf.int32)

		deg_adjs = []
		for i in range(0, 10 + 1):
		deg_adj = Feature(shape=(None, i + 1), dtype=tf.int32)
		deg_adjs.append(deg_adj)
		```

		%% Cell type:markdown id: tags:

		Let's now implement the body of the graph convolutional network. `TensorGraph` has a number of layers that encode various graph operations. Namely, the `GraphConv`, `GraphPool` and `GraphGather` layers. We will also apply standard neural network layers such as `Dense` and `BatchNorm`.

		The layers we're adding effect a "feature transformation" that will create one vector for each molecule.

		%% Cell type:code id: tags:

		``` python
		from deepchem.models.tensorgraph.layers import Dense, GraphConv, BatchNorm
		from deepchem.models.tensorgraph.layers import GraphPool, GraphGather

		batch_size = 50

		gc1 = GraphConv(
		64,
		activation_fn=tf.nn.relu,
		in_layers=[atom_features, degree_slice, membership] + deg_adjs)
		batch_norm1 = BatchNorm(in_layers=[gc1])
		gp1 = GraphPool(in_layers=[batch_norm1, degree_slice, membership] + deg_adjs)
		gc2 = GraphConv(
		64,
		activation_fn=tf.nn.relu,
		in_layers=[gp1, degree_slice, membership] + deg_adjs)
		batch_norm2 = BatchNorm(in_layers=[gc2])
		gp2 = GraphPool(in_layers=[batch_norm2, degree_slice, membership] + deg_adjs)
		dense = Dense(out_channels=128, activation_fn=tf.nn.relu, in_layers=[gp2])
		batch_norm3 = BatchNorm(in_layers=[dense])
		readout = GraphGather(
		batch_size=batch_size,
		activation_fn=tf.nn.tanh,
		in_layers=[batch_norm3, degree_slice, membership] + deg_adjs)
		```

		%% Cell type:markdown id: tags:

		Let's now make predictions from the `TensorGraph` model. Tox21 is a multitask dataset. That is, there are 12 different datasets grouped together, which share many common molecules, but with different outputs for each. As a result, we have to add a separate output layer for each task. We will use a `for` loop over the `tox21_tasks` list to make this happen. We need to add labels for each

		We also have to define a loss for the model which tells the network the objective to minimize during training.

		We have to tell `TensorGraph` which layers are outputs with `TensorGraph.add_output(layer)`. Similarly, we tell the network its loss with `TensorGraph.set_loss(loss)`.

		%% Cell type:code id: tags:

		``` python
		from deepchem.models.tensorgraph.layers import Dense, SoftMax, \
		SoftMaxCrossEntropy, WeightedError, Stack
		from deepchem.models.tensorgraph.layers import Label, Weights

		costs = []
		labels = []
		for task in range(len(tox21_tasks)):
		classification = Dense(
		out_channels=2, activation_fn=None, in_layers=[readout])

		softmax = SoftMax(in_layers=[classification])
		tg.add_output(softmax)

		label = Label(shape=(None, 2))
		labels.append(label)
		cost = SoftMaxCrossEntropy(in_layers=[label, classification])
		costs.append(cost)
		all_cost = Stack(in_layers=costs, axis=1)
		weights = Weights(shape=(None, len(tox21_tasks)))
		loss = WeightedError(in_layers=[all_cost, weights])
		tg.set_loss(loss)
		```

		%% Cell type:markdown id: tags:

		Now that we've successfully defined our graph convolutional model in `TensorGraph`, we need to train it. We can call `fit()`, but we need to make sure that each minibatch of data populates all four `Feature` objects that we've created. For this, we need to create a Python generator that given a batch of data generates a dictionary whose keys are the `Feature` layers and whose values are Numpy arrays we'd like to use for this step of training.

		%% Cell type:code id: tags:

		``` python
		from deepchem.metrics import to_one_hot
		from deepchem.feat.mol_graphs import ConvMol

		def data_generator(dataset, epochs=1, predict=False, pad_batches=True):
		for epoch in range(epochs):
		if not predict:
		print('Starting epoch %i' % epoch)
		for ind, (X_b, y_b, w_b, ids_b) in enumerate(
		dataset.iterbatches(
		batch_size, pad_batches=pad_batches, deterministic=True)):
		d = {}
		for index, label in enumerate(labels):
		d[label] = to_one_hot(y_b[:, index])
		d[weights] = w_b
		multiConvMol = ConvMol.agglomerate_mols(X_b)
		d[atom_features] = multiConvMol.get_atom_features()
		d[degree_slice] = multiConvMol.deg_slice
		d[membership] = multiConvMol.membership
		for i in range(1, len(multiConvMol.get_deg_adjacency_lists())):
		d[deg_adjs[i - 1]] = multiConvMol.get_deg_adjacency_lists()[i]
		yield d
		```

		%% Cell type:markdown id: tags:

		Now, we can train the model using `TensorGraph.fit_generator(generator)` which will use the generator we've defined to train the model.

		%% Cell type:code id: tags:

		``` python
		# Epochs set to 1 to render tutorials online.
		# Set epochs=10 for better results.
		tg.fit_generator(data_generator(train_dataset, epochs=1))
		```

		%% Output

		Starting epoch 0
		Ending global_step 251: Average loss 530.84
		TIMING: model fitting took 6.949 s

		530.8396410260882

		%% Cell type:markdown id: tags:

		Now that we have trained our graph convolutional method, let's evaluate its performance. We again have to use our defined generator to evaluate model performance.

		%% Cell type:code id: tags:

		``` python
		metric = dc.metrics.Metric(
		dc.metrics.roc_auc_score, np.mean, mode="classification")

		def reshape_y_pred(y_true, y_pred):
		"""
		TensorGraph.Predict returns a list of arrays, one for each output
		We also have to remove the padding on the last batch
		Metrics taks results of shape (samples, n_task, prob_of_class)
		"""
		n_samples = len(y_true)
		retval = np.stack(y_pred, axis=1)
		return retval[:n_samples]


		print("Evaluating model")
		train_predictions = tg.predict_on_generator(data_generator(train_dataset, predict=True))
		train_predictions = reshape_y_pred(train_dataset.y, train_predictions)
		train_scores = metric.compute_metric(train_dataset.y, train_predictions, train_dataset.w)
		print("Training ROC-AUC Score: %f" % train_scores)

		valid_predictions = tg.predict_on_generator(data_generator(valid_dataset, predict=True))
		valid_predictions = reshape_y_pred(valid_dataset.y, valid_predictions)
		valid_scores = metric.compute_metric(valid_dataset.y, valid_predictions, valid_dataset.w)
		print("Valid ROC-AUC Score: %f" % valid_scores)
		```

		%% Output

		Evaluating model
		computed_metrics: [0.83463194036351052, 0.86218739964675661, 0.84894031662657832, 0.80217986671584707, 0.70559942152332189, 0.79751934844253025, 0.8103057689046107, 0.71659210162938414, 0.80849247997327445, 0.72071717294380933, 0.83433314746710274, 0.78304357554399506]
		Training ROC-AUC Score: 0.793712
		computed_metrics: [0.78221936377578793, 0.78993055555555547, 0.81705388431256543, 0.77777071682765631, 0.66802272727272727, 0.67197702777122181, 0.64295604015230179, 0.72305596655628368, 0.74692724275959499, 0.63050611290902547, 0.80023473616134511, 0.73880275624461667]
		Valid ROC-AUC Score: 0.732455

		%% Cell type:markdown id: tags:

		Success! The model we've constructed behaves nearly identically to `GraphConvModel`. If you're looking to build your own custom models, you can follow the example we've provided here to do so. We hope to see exciting constructions from your end soon!

		%% Cell type:code id: tags:

		``` python
		```

Admin message