colab (c1783911) · Commits · 钟慕尧 / deepchem

examples/tutorials/01_The_Basic_Tools_of_the_Deep_Life_Sciences.ipynb

+22 −63

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial 1: The Basic Tools of the Deep Life Sciences
		Welcome to DeepChem's introductory tutorial for the deep life sciences. This series of notebooks is step-by-step guide for you to get to know the new tools and techniques needed to do deep learning for the life sciences. We'll start from the basics, assuming that you're new to machine learning and the life sciences, and build up a repertoire of tools and techniques that you can use to do meaningful work in the life sciences.

		Scope: This tutorial will encompass both the machine learning and data handling needed to build systems for the deep life sciences.

		## Colab

		This tutorial and the rest in the sequences are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/01_The_Basic_Tools_of_the_Deep_Life_Sciences.ipynb)


		## Why do the DeepChem Tutorial?

		1) Career Advancement: Applying AI in the life sciences is a booming
		industry at present. There are a host of newly funded startups and initiatives
		at large pharmaceutical and biotech companies centered around AI. Learning and
		mastering DeepChem will bring you to the forefront of this field and will
		prepare you to enter a career in this field.

		2) Humanitarian Considerations: Disease is the oldest cause of human
		suffering. From the dawn of human civilization, humans have suffered from pathogens,
		cancers, and neurological conditions. One of the greatest achievements of
		the last few centuries has been the development of effective treatments for
		many diseases. By mastering the skills in this tutorial, you will be able to
		stand on the shoulders of the giants of the past to help develop new
		medicine.

		3) Lowering the Cost of Medicine: The art of developing new medicine is
		currently an elite skill that can only be practiced by a small core of expert
		practitioners. By enabling the growth of open source tools for drug discovery,
		you can help democratize these skills and open up drug discovery to more
		competition. Increased competition can help drive down the cost of medicine.

		## Getting Extra Credit
		If you're excited about DeepChem and want to get more more involved, there's a couple of things that you can do right now:

		* Star DeepChem on GitHub! - https://github.com/deepchem/deepchem
		* Join the DeepChem forums and introduce yourself! - https://forum.deepchem.io
		* Say hi on the DeepChem gitter - https://gitter.im/deepchem/Lobby
		* Make a YouTube video teaching the contents of this notebook.


		## Prerequisites

		This tutorial will assume some basic familiarity with the Python data science ecosystem. We will assume that you have familiarity with libraries such as Numpy, Pandas, and TensorFlow. We'll provide some brief refreshers on basics through the tutorial so don't worry if you're not an expert.

		## Setup

		The first step is to get DeepChem up and running. We recommend using Google Colab to work through this tutorial series. You'll need to run the following commands to get DeepChem installed on your colab notebook. Note that this will take something like 5 minutes to run on your colab instance.

		%% Cell type:code id: tags:

		``` python
		!wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
		!chmod +x Anaconda3-2019.10-Linux-x86_64.sh
		!bash ./Anaconda3-2019.10-Linux-x86_64.sh -b -f -p /usr/local
		!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
		import sys
		sys.path.append('/usr/local/lib/python3.7/site-packages/')
		```

		%% Cell type:markdown id: tags:

		You can of course run this tutorial locally if you prefer. In this case, don't run the above cell since it will download and install Anaconda on your local machine. In either case, we can now import `deepchem` the package to play with.

		%% Cell type:code id: tags:

		``` python
		# Run this cell to see if things work
		import deepchem as dc
		```

		%% Output

		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
		warnings.warn(msg, category=FutureWarning)
		RDKit WARNING: [16:55:20] Enabling RDKit 2019.09.3 jupyter extensions
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint8 = np.dtype([("qint8", np.int8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint16 = np.dtype([("qint16", np.int16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint32 = np.dtype([("qint32", np.int32, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		np_resource = np.dtype([("resource", np.ubyte, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint8 = np.dtype([("qint8", np.int8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint16 = np.dtype([("qint16", np.int16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint32 = np.dtype([("qint32", np.int32, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		np_resource = np.dtype([("resource", np.ubyte, 1)])

		%% Cell type:markdown id: tags:

		# Basic Data Handling in DeepChem
		What does it take to do deep learning on the life sciences? Well, the first thing we'll need to do is actually handle some data. How can we start handling some basic data? For beginners, let's just take a look at some synthetic data.

		To generate some basic synthetic data, we will use Numpy to create some basic arrays.

		%% Cell type:code id: tags:

		``` python
		import numpy as np

		data = np.random.random((4, 4))
		labels = np.random.random((4,)) # labels of size 20x1
		```

		%% Cell type:markdown id: tags:

		We've given these arrays some evocative names: "data" and "labels." For now, don't worry too much about the names, but just note that the arrays have different shapes. Let's take a quick look to get a feeling for these arrays

		%% Cell type:code id: tags:

		``` python
		data, labels
		```

		%% Output

		(array([[0.8409268 , 0.54779812, 0.15791233, 0.83406089],
		[0.18134816, 0.90800565, 0.73072345, 0.260484 ],
		[0.00528476, 0.6911 , 0.96467034, 0.90761902],
		[0.701121 , 0.43129514, 0.49785863, 0.01989653]]),
		array([0.302203 , 0.43742869, 0.55756899, 0.92048307]))

		%% Cell type:markdown id: tags:

		In order to be able to work with this data in DeepChem, we need to wrap these arrays so DeepChem knows how to work with them. DeepChem has a `Dataset` API that it uses to facilitate its handling of datasets. For handling of Numpy datasets, we use DeepChem's `NumpyDataset` object.

		%% Cell type:code id: tags:

		``` python
		from deepchem.data.datasets import NumpyDataset

		dataset = NumpyDataset(data, labels)
		```

		%% Cell type:markdown id: tags:

		Ok, now what? We have these arrays in a `NumpyDataset` object. What can we do with it? Let's try printing out the object.

		%% Cell type:code id: tags:

		``` python
		dataset
		```

		%% Output

		<deepchem.data.datasets.NumpyDataset at 0x109845d68>

		%% Cell type:markdown id: tags:

		Ok, that's not terribly informative. It's telling us that `dataset` is a Python object that lives somewhere in memory. Can we recover the two datasets that we used to construct this object? Luckily, the DeepChem API allows us to recover the two original datasets by calling the `dataset.X` and `dataset.y` attributes of the original object.

		%% Cell type:code id: tags:

		``` python
		dataset.X, dataset.y
		```

		%% Output

		(array([[0.8409268 , 0.54779812, 0.15791233, 0.83406089],
		[0.18134816, 0.90800565, 0.73072345, 0.260484 ],
		[0.00528476, 0.6911 , 0.96467034, 0.90761902],
		[0.701121 , 0.43129514, 0.49785863, 0.01989653]]),
		array([0.302203 , 0.43742869, 0.55756899, 0.92048307]))

		%% Cell type:markdown id: tags:

		This set of transformations raises a few questions. First, what was the point of it all? Why would we want to wrap objects this way instead of working with the raw Numpy arrays? The simple answer is for have a unified API for working with larger datasets. Suppose that `X` and `y` are so large that they can't fit easily into memory. What would we do then? Being able to work with an abstract `dataset` object proves very convenient then. In fact, you'll have reason to use this feature of `Dataset` later in the tutorial series.

		What else can we do with the `dataset` object? It turns out that it can be useful to be able to walk through the datapoints in the `dataset` one at a time. For that, we can use the `dataset.itersamples()` method.

		%% Cell type:code id: tags:

		``` python
		for x, y, _, _ in dataset.itersamples():
		print(x, y)
		```

		%% Output

		[0.8409268 0.54779812 0.15791233 0.83406089] 0.3022030049033878
		[0.18134816 0.90800565 0.73072345 0.260484 ] 0.4374286946177691
		[0.00528476 0.6911 0.96467034 0.90761902] 0.5575689868183242
		[0.701121 0.43129514 0.49785863 0.01989653] 0.9204830702131853

		%% Cell type:markdown id: tags:

		There are a couple of other fields that the `dataset` object tracks. The first is `dataset.ids`. This is a listing of unique identifiers for the datapoitns in the dataset.

		%% Cell type:code id: tags:

		``` python
		dataset.ids
		```

		%% Output

		array([0, 1, 2, 3], dtype=object)

		%% Cell type:markdown id: tags:

		In addition, the `dataset` object has a field `dataset.w`. This is the "example weight" associated with each datapoint. Since we haven't explicitly assigned the weights, this is simply going to be all ones.

		%% Cell type:code id: tags:

		``` python
		dataset.w
		```

		%% Output

		array([1., 1., 1., 1.], dtype=float32)

		%% Cell type:markdown id: tags:

		What if we want to set nontrivial weights for a dataset? One time we might want to do this is if we have a dataset where there are only a few positive examples to play with. It's pretty straightforward to do this with DeepChem.

		%% Cell type:code id: tags:

		``` python
		w = np.random.random((4,)) # initializing weights with random vector of size 4x1
		dataset_with_weights = NumpyDataset(data, labels, w) # creates numpy dataset object
		dataset_with_weights.w
		```

		%% Output

		array([0.28356599, 0.10301754, 0.4463241 , 0.43093856])

		%% Cell type:markdown id: tags:

		## MNIST Example

		Just to get a better understanding, we'll use the venerable MNIST dataset and use `NumpyDataset` to store the data. We're going to make use of the `tensorflow-datasets` package to facilitate our data reading. You'll need to install this package in order to make use of it.

		%% Cell type:code id: tags:

		``` python
		# Install tensorflow-datasets
		# TODO(rbharath): Switch to stable version on release
		!pip install -q --upgrade tfds-nightly tf-nightly
		## TODO(rbharath): Switch to stable version on release
		# TODO(rbharath): This only works on TF2. Uncomment once we've upgraded.
		#!pip install -q --upgrade tfds-nightly tf-nightly
		```

		%% Cell type:code id: tags:

		``` python
		# TODO(rbharath): This cell will only work with TF2 installed. Swap to this as default soon.

		import tensorflow_datasets as tfds
		#import tensorflow_datasets as tfds

		data_dir = '/tmp/tfds'
		#data_dir = '/tmp/tfds'

		# Fetch full datasets for evaluation
		## Fetch full datasets for evaluation
		# tfds.load returns tf.Tensors (or tf.data.Datasets if batch_size != -1)
		# You can convert them to NumPy arrays (or iterables of NumPy arrays) with tfds.dataset_as_numpy
		mnist_data, info = tfds.load(name="mnist", batch_size=-1, data_dir=data_dir, with_info=True)
		mnist_data = tfds.as_numpy(mnist_data)
		train_data, test_data = mnist_data['train'], mnist_data['test']
		num_labels = info.features['label'].num_classes
		h, w, c = info.features['image'].shape
		num_pixels = h * w * c

		# Full train set
		train_images, train_labels = train_data['image'], train_data['label']
		train_images = np.reshape(train_images, (len(train_images), num_pixels))
		train_labels = one_hot(train_labels, num_labels)

		# Full test set
		test_images, test_labels = test_data['image'], test_data['label']
		test_images = np.reshape(test_images, (len(test_images), num_pixels))
		test_labels = one_hot(test_labels, num_labels)
		#mnist_data, info = tfds.load(name="mnist", batch_size=-1, data_dir=data_dir, with_info=True)
		#mnist_data = tfds.as_numpy(mnist_data)
		#train_data, test_data = mnist_data['train'], mnist_data['test']
		#num_labels = info.features['label'].num_classes
		#h, w, c = info.features['image'].shape
		#num_pixels = h * w * c

		## Full train set
		#train_images, train_labels = train_data['image'], train_data['label']
		#train_images = np.reshape(train_images, (len(train_images), num_pixels))
		#train_labels = one_hot(train_labels, num_labels)

		## Full test set
		#test_images, test_labels = test_data['image'], test_data['label']
		#test_images = np.reshape(test_images, (len(test_images), num_pixels))
		#test_labels = one_hot(test_labels, num_labels)
		```

		%% Output

		---------------------------------------------------------------------------
		AttributeError Traceback (most recent call last)
		<ipython-input-32-db19541136ab> in <module>
		----> 1 import tensorflow_datasets as tfds
		2
		3 data_dir = '/tmp/tfds'
		4
		5 # Fetch full datasets for evaluation
		~/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow_datasets/__init__.py in <module>
		44 # needs to happen before anything else, since the imports below will try to
		45 # import tensorflow, too.
		---> 46 from tensorflow_datasets.core import tf_compat
		47 tf_compat.ensure_tf_install()
		48
		~/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow_datasets/core/__init__.py in <module>
		19 # import tensorflow, too.
		20 from tensorflow_datasets.core import tf_compat
		---> 21 tf_compat.ensure_tf_install()
		22
		23 from tensorflow_datasets.core.api_utils import disallow_positional_args # pylint:disable=g-import-not-at-top
		~/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow_datasets/core/tf_compat.py in ensure_tf_install()
		64 required="1.13.0",
		65 present=tf.__version__))
		---> 66 _patch_tf(tf)
		67
		68
		~/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow_datasets/core/tf_compat.py in _patch_tf(tf)
		78 if v_1_13 <= tf_version < v_2:
		79 TF_PATCH = "tf1_13"
		---> 80 _patch_for_tf1_13(tf)
		81
		82
		~/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow_datasets/core/tf_compat.py in _patch_for_tf1_13(tf)
		94 if not hasattr(tf.autograph.experimental, "do_not_convert"):
		95 tf.autograph.experimental.do_not_convert = (
		---> 96 tf.contrib.autograph.do_not_convert)
		97
		98
		AttributeError: module 'tensorflow.compat.v2' has no attribute 'contrib'

		%% Cell type:code id: tags:

		``` python
		from tensorflow.examples.tutorials.mnist import input_data

		mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
		# Load the numpy data of MNIST into NumpyDataset
		train = NumpyDataset(mnist.train.images, mnist.train.labels)
		valid = NumpyDataset(mnist.validation.images, mnist.validation.labels)
		```

		%% Output

		WARNING:tensorflow:From <ipython-input-33-7ea47016376a>:3: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
		Instructions for updating:
		Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
		Instructions for updating:
		Please write your own downloading logic.
		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
		Instructions for updating:
		Please use tf.data to implement this functionality.
		Extracting MNIST_data/train-images-idx3-ubyte.gz
		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
		Instructions for updating:
		Please use tf.data to implement this functionality.
		Extracting MNIST_data/train-labels-idx1-ubyte.gz
		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
		Instructions for updating:
		Please use tf.one_hot on tensors.
		Extracting MNIST_data/t10k-images-idx3-ubyte.gz
		Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
		Instructions for updating:
		Please use alternatives such as official/mnist/dataset.py from tensorflow/models.

		%% Cell type:markdown id: tags:

		Let's take a look at some of the data we've loaded so we can visualize our samples.

		%% Cell type:code id: tags:

		``` python
		import matplotlib.pyplot as plt

		# Visualize one sample
		sample = np.reshape(train.X[5], (28, 28))
		plt.imshow(sample)
		plt.show()
		```

		%% Output



		%% Cell type:markdown id: tags:

		## Converting a Numpy Array to tf.data.dataset()


		Let's say you want to use the `tf.data` module instead of DeepChem's data handling library. Doing this is straightforward and is quite similar to getting a `NumpyDataset` object from numpy arrays.

		%% Cell type:code id: tags:

		``` python
		import tensorflow as tf
		data_small = np.random.random((4,5))
		label_small = np.random.random((4,))
		dataset = tf.data.Dataset.from_tensor_slices((data_small, label_small))
		print ("Data\n")
		print (data_small)
		print ("\n Labels")
		print (label_small)
		```

		%% Output

		Data

		[[0.76703639 0.32740734 0.83969256 0.88836494 0.13575798]
		[0.81691255 0.93198925 0.31409331 0.97220957 0.39390713]
		[0.85819869 0.05097942 0.57634862 0.42789271 0.08894737]
		[0.36893407 0.16985698 0.29056158 0.88201174 0.08395812]]

		Labels
		[0.42013801 0.14442763 0.74540663 0.48853841]

		%% Cell type:markdown id: tags:

		## Extracting the numpy dataset from tf.data

		In order to extract the numpy array from the `tf.data`, you first need to define an `iterator` to iterate over the `tf.data.Dataset` object and then in the tensorflow session, run over the iterator to get the data instances. Let's have a look at how it's done.

		%% Cell type:code id: tags:

		``` python
		iterator = dataset.make_one_shot_iterator() # iterator
		next_element = iterator.get_next()
		numpy_data = np.zeros((4, 5))
		numpy_label = np.zeros((4,))
		sess = tf.Session() # tensorflow session
		for i in range(4):
		data_, label_ = sess.run(next_element) # data_ contains the data and label_ contains the labels that we fed in the previous step
		numpy_data[i, :] = data_
		numpy_label[i] = label_

		print ("Numpy Data")
		print(numpy_data)
		print ("\n Numpy Label")
		print(numpy_label)
		```

		%% Output

		WARNING:tensorflow:From <ipython-input-37-f67e6d094179>:1: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
		Instructions for updating:
		Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
		Numpy Data
		[[0.76703639 0.32740734 0.83969256 0.88836494 0.13575798]
		[0.81691255 0.93198925 0.31409331 0.97220957 0.39390713]
		[0.85819869 0.05097942 0.57634862 0.42789271 0.08894737]
		[0.36893407 0.16985698 0.29056158 0.88201174 0.08395812]]

		Numpy Label
		[0.42013801 0.14442763 0.74540663 0.48853841]

		%% Cell type:markdown id: tags:

		Now that you have the numpy arrays of `data` and `labels`, you can convert it to `NumpyDataset`.

		%% Cell type:code id: tags:

		``` python
		dataset_ = NumpyDataset(numpy_data, numpy_label) # convert to NumpyDataset
		dataset_.X # printing just to check if the data is same!!
		```

		%% Output

		array([[0.76703639, 0.32740734, 0.83969256, 0.88836494, 0.13575798],
		[0.81691255, 0.93198925, 0.31409331, 0.97220957, 0.39390713],
		[0.85819869, 0.05097942, 0.57634862, 0.42789271, 0.08894737],
		[0.36893407, 0.16985698, 0.29056158, 0.88201174, 0.08395812]])

		%% Cell type:markdown id: tags:

		## Converting NumpyDataset to `tf.data`

		This can be easily done by the `make_iterator()` method of `NumpyDataset`. This converts the `NumpyDataset` to `tf.data`. Let's look how it's done!

		%% Cell type:code id: tags:

		``` python
		iterator_ = dataset_.make_iterator() # Using make_iterator for converting NumpyDataset to tf.data
		next_element_ = iterator_.get_next()

		sess = tf.Session() # tensorflow session
		data_and_labels = sess.run(next_element_) # data_ contains the data and label_ contains the labels that we fed in the previous step


		print ("Numpy Data")
		print(data_and_labels[0]) # Data in the first index
		print ("\n Numpy Label")
		print(data_and_labels[1]) # Labels in the second index
		```

		%% Output

		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
		Instructions for updating:
		tf.py_func is deprecated in TF V2. Instead, there are two
		options available in V2.
		- tf.py_function takes a python function which manipulates tf eager
		tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
		an ndarray (just call tensor.numpy()) but having access to eager tensors
		means `tf.py_function`s can use accelerators such as GPUs as well as
		being differentiable using a gradient tape.
		- tf.numpy_function maintains the semantics of the deprecated tf.py_func
		(it is not differentiable, and manipulates numpy arrays). It drops the
		stateful argument making all functions stateful.

		Numpy Data
		[[0.81691255 0.93198925 0.31409331 0.97220957 0.39390713]
		[0.36893407 0.16985698 0.29056158 0.88201174 0.08395812]
		[0.85819869 0.05097942 0.57634862 0.42789271 0.08894737]
		[0.76703639 0.32740734 0.83969256 0.88836494 0.13575798]]

		Numpy Label
		[0.14442763 0.48853841 0.74540663 0.42013801]

		%% Cell type:markdown id: tags:

		# Using Splitters to split DeepChem Datasets

		In this section we will have a look at the various splitters that are present in deepchem library and how each of them can be used.

		### Index Splitter

		We start with the IndexSplitter. This splitter returns a range object which contains the split according to the fractions provided by the user. The three range objects can then be used to iterate over the dataset as test,valid and Train.

		Each of the splitters that will be used has two functions inherited from the main class that are `train_test_split` which can be used to split the data into training and tesing data and the other fucnction is `train_valid_test_split` which is used to split the data to train, validation and test split.

		Note: All the splitters have a default percentage of 80,10,10 as train, valid and test respectively. But can be changed by specifying the `frac_train`,`frac_test` and `frac_valid` in the ratio we want to split the data.

		%% Cell type:code id: tags:

		``` python
		!wget https://raw.githubusercontent.com/deepchem/deepchem/master/deepchem/models/tests/example.csv
		```

		%% Output

		--2020-03-05 18:21:06-- https://raw.githubusercontent.com/deepchem/deepchem/master/deepchem/models/tests/example.csv
		Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.40.133
		Connecting to raw.githubusercontent.com (raw.githubusercontent.com)\|151.101.40.133\|:443... connected.
		HTTP request sent, awaiting response... 200 OK
		Length: 568 [text/plain]
		Saving to: ‘example.csv’

		example.csv 100%[===================>] 568 --.-KB/s in 0s

		2020-03-05 18:21:06 (24.6 MB/s) - ‘example.csv’ saved [568/568]


		%% Cell type:code id: tags:

		``` python
		import os

		current_dir=os.path.dirname(os.path.realpath('__file__'))
		input_data=os.path.join(current_dir,'example.csv')
		```

		%% Cell type:markdown id: tags:

		We then featurize the data using any one of the featurizers present.

		%% Cell type:code id: tags:

		``` python
		import deepchem as dc

		tasks=['log-solubility']
		featurizer=dc.feat.CircularFingerprint(size=1024)
		loader = dc.data.CSVLoader(tasks=tasks, smiles_field="smiles",featurizer=featurizer)
		dataset=loader.featurize(input_data)
		```

		%% Output

		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
		warnings.warn(msg, category=FutureWarning)
		RDKit WARNING: [18:23:34] Enabling RDKit 2019.09.3 jupyter extensions
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint8 = np.dtype([("qint8", np.int8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint16 = np.dtype([("qint16", np.int16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint32 = np.dtype([("qint32", np.int32, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		np_resource = np.dtype([("resource", np.ubyte, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint8 = np.dtype([("qint8", np.int8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint16 = np.dtype([("qint16", np.int16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint32 = np.dtype([("qint32", np.int32, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		np_resource = np.dtype([("resource", np.ubyte, 1)])

		Loading raw samples now.
		shard_size: 8192
		About to start loading CSV from /Users/bharath/Code/deepchem/examples/tutorials/example.csv
		Loading shard 1 of size 8192.
		Featurizing sample 0
		TIMING: featurizing shard 0 took 0.023 s
		TIMING: dataset construction took 0.036 s
		Loading dataset from disk.

		%% Cell type:code id: tags:

		``` python
		from deepchem.splits.splitters import IndexSplitter
		```

		%% Cell type:code id: tags:

		``` python
		splitter=IndexSplitter()
		train_data,valid_data,test_data=splitter.split(dataset)
		```

		%% Cell type:code id: tags:

		``` python
		train_data=[i for i in train_data]
		valid_data=[i for i in valid_data]
		test_data=[i for i in test_data]
		```

		%% Cell type:code id: tags:

		``` python
		len(train_data),len(valid_data),len(test_data)
		```

		%% Output

		(8, 1, 1)

		%% Cell type:markdown id: tags:

		As we can see that without providing the user specifications on how to split the data, the data was split into a default of 80,10,10.

		But when we specify the parameters the dataset can be split according to our specificaitons.

		%% Cell type:code id: tags:

		``` python
		train_data,valid_data,test_data=splitter.split(dataset,frac_train=0.7,frac_valid=0.2,frac_test=0.1)
		train_data=[i for i in train_data]
		valid_data=[i for i in valid_data]
		test_data=[i for i in test_data]
		len(train_data),len(valid_data),len(test_data)
		```

		%% Output

		(7, 2, 1)

		%% Cell type:markdown id: tags:

		## Specified Splitter

		The next splitter that is present in the library is the specified splitter. This splitter needs a list from the dataset where it is specified which data is for training and which is for validation and testing.

		%% Cell type:code id: tags:

		``` python
		from deepchem.splits.splitters import SpecifiedSplitter
		current_dir=os.path.dirname(os.path.realpath('__file__'))
		input_file=os.path.join('../../deepchem/models/tests/user_specified_example.csv')

		tasks=['log-solubility']
		featurizer=dc.feat.CircularFingerprint(size=1024)
		loader = dc.data.CSVLoader(tasks=tasks, smiles_field="smiles",featurizer=featurizer)
		dataset=loader.featurize(input_file)

		split_field='split'

		splitter=SpecifiedSplitter(input_file,split_field)
		```

		%% Output

		Loading raw samples now.
		shard_size: 8192
		About to start loading CSV from ../../deepchem/models/tests/user_specified_example.csv
		Loading shard 1 of size 8192.
		Featurizing sample 0
		TIMING: featurizing shard 0 took 0.020 s
		TIMING: dataset construction took 0.039 s
		Loading dataset from disk.

		%% Cell type:code id: tags:

		``` python
		train_data,valid_data,test_data=splitter.split(dataset)
		```

		%% Cell type:markdown id: tags:

		When we split the data using the specified splitter it compares the data in each row of the `split_field` which the user has to specify wether the given row should be used as training data, validation data or testing data. The user has to specify as `train`,`test` and `valid` in the `split_field`.
		Note: The input is case insensitive.

		%% Cell type:code id: tags:

		``` python
		train_data,valid_data,test_data
		```

		%% Output

		([0, 1, 2, 3, 4, 5], [6, 7], [8, 9])

		%% Cell type:markdown id: tags:

		## Indice Splitter

		Another splitter present in the fraework is `IndiceSplitter`. This splitter takes an input of valid_indices and test_indices which are lists with the indices of validation data and test data in the dataset respectively.

		%% Cell type:code id: tags:

		``` python
		from deepchem.splits.splitters import IndiceSplitter

		splitter=IndiceSplitter(valid_indices=[7],test_indices=[9])
		splitter.split(dataset)
		```

		%% Output

		([0, 1, 2, 3, 4, 5, 6, 8], [7], [9])

		%% Cell type:markdown id: tags:

		## RandomGroupSplitter

		The splitter which can be used to split the data on the basis of groupings is the `RandomGroupSplitter`. This splitter that splits on groupings.

		An example use case is when there are multiple conformations of the same molecule that share the same topology.This splitter subsequently guarantees that resulting splits preserve groupings.

		Note that it doesn't do any dynamic programming or something fancy to try to maximize the choice such that `frac_train`, `frac_valid`, or `frac_test` is maximized.It simply permutes the groups themselves. As such, use with caution if the number of elements per group varies significantly.

		The parameter that needs to be provided with the splitter is `groups`. This is an array like list of hashables which is the same as the size of the dataset.

		%% Cell type:code id: tags:

		``` python
		from deepchem.splits.splitters import RandomGroupSplitter

		groups = [0, 4, 1, 2, 3, 7, 0, 3, 1, 0]
		solubility_dataset=dc.data.tests.load_solubility_data()


		splitter=RandomGroupSplitter(groups=groups)


		train_idxs, valid_idxs, test_idxs = splitter.split(
		solubility_dataset)
		```

		%% Output

		Loading raw samples now.
		shard_size: 8192
		About to start loading CSV from /Users/bharath/Code/deepchem/deepchem/data/tests/../../models/tests/example.csv
		Loading shard 1 of size 8192.
		Featurizing sample 0
		TIMING: featurizing shard 0 took 0.022 s
		TIMING: dataset construction took 0.033 s
		Loading dataset from disk.

		%% Cell type:code id: tags:

		``` python
		train_idxs,valid_idxs,test_idxs
		```

		%% Output

		([4, 7, 1, 5, 0, 6, 9], [2, 8], [3])

		%% Cell type:code id: tags:

		``` python
		train_data=[]
		for i in range(len(train_idxs)):
		train_data.append(groups[train_idxs[i]])

		valid_data=[]
		for i in range(len(valid_idxs)):
		valid_data.append(groups[valid_idxs[i]])

		test_data=[]
		for i in range(len(test_idxs)):
		test_data.append(groups[test_idxs[i]])
		```

		%% Cell type:code id: tags:

		``` python
		print("Groups present in the training data =",train_data)
		print("Groups present in the validation data = ",valid_data)
		print("Groups present in the testing data = ", test_data)
		```

		%% Output

		Groups present in the training data = [3, 3, 4, 7, 0, 0, 0]
		Groups present in the validation data = [1, 1]
		Groups present in the testing data = [2]

		%% Cell type:markdown id: tags:

		So the `RandomGroupSplitter` when properly assigned the groups, splits the data accordingly and preserves the groupings.

		%% Cell type:markdown id: tags:

		## Scaffold Splitter

		The `ScaffoldSplitter` splits the data based on the scaffold of small molecules. The splitter takes the data and generates scaffolds using the smiles in the data. Then the splitter sorts the data into scaffold sets.

		%% Cell type:code id: tags:

		``` python
		from deepchem.splits.splitters import ScaffoldSplitter

		splitter=ScaffoldSplitter()
		solubility_dataset=dc.data.tests.load_solubility_data()
		train_data,valid_data,test_data = splitter.split(solubility_dataset,frac_train=0.7,frac_valid=0.2,frac_test=0.1)
		len(train_data),len(valid_data),len(test_data)
		```

		%% Output

		Loading raw samples now.
		shard_size: 8192
		About to start loading CSV from /Users/bharath/Code/deepchem/deepchem/data/tests/../../models/tests/example.csv
		Loading shard 1 of size 8192.
		Featurizing sample 0
		TIMING: featurizing shard 0 took 0.026 s
		TIMING: dataset construction took 0.046 s
		Loading dataset from disk.

		(7, 2, 1)

		%% Cell type:markdown id: tags:

		# Congratulations! Time to join the Community!

		Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

		## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
		This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

		## Join the DeepChem Gitter
		The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!

examples/tutorials/04_Introduction_to_Graph_Convolutions.ipynb

+1 −1

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial Part 4: Introduction to Graph Convolutions

		In the previous sections of the tutorial, we learned about `Dataset` and `Model` objects. We learned how to load some data into DeepChem from files on disk and also learned some basic facts about molecular data handling. We then dove into some basic deep learning architectures. However, until now, we stuck with vanilla deep learning architectures and didn't really consider how to handle deep architectures specifically engineered to work with life science data.

		In this tutorial, we'll change that by going a little deeper and learn about "graph convolutions." These are one of the most powerful deep learning tools for working with molecular data. The reason for this is that molecules can be naturally viewed as graphs.

		![Molecular Graph](basic_graphs.gif)

		Note how standard chemical diagrams of the sort we're used to from high school lend themselves naturally to visualizing molecules as graphs. In the remainder of this tutorial, we'll dig into this relationship in significantly more detail. This will let us get an in-the guts understanding of how these systems work.

		## Colab

		This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/03_Introduction_to_Graph_Convolutionsipynb)
		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/04_Introduction_to_Graph_Convolutions.ipynb)

		## Setup

		To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

		%% Cell type:code id: tags:

		``` python
		!wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
		!chmod +x Anaconda3-2019.10-Linux-x86_64.sh
		!bash ./Anaconda3-2019.10-Linux-x86_64.sh -b -f -p /usr/local
		!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
		import sys
		sys.path.append('/usr/local/lib/python3.7/site-packages/')
		```

		%% Cell type:markdown id: tags:

		Ok now that we have our environment installed, we can actually import the core `GraphConvModel` that we'll use through this tutorial.

		%% Cell type:code id: tags:

		``` python
		import deepchem as dc
		from deepchem.models.graph_models import GraphConvModel
		```

		%% Output

		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
		warnings.warn(msg, category=FutureWarning)
		RDKit WARNING: [12:33:54] Enabling RDKit 2019.09.3 jupyter extensions
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint8 = np.dtype([("qint8", np.int8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint16 = np.dtype([("qint16", np.int16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint32 = np.dtype([("qint32", np.int32, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		np_resource = np.dtype([("resource", np.ubyte, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint8 = np.dtype([("qint8", np.int8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint16 = np.dtype([("qint16", np.int16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint32 = np.dtype([("qint32", np.int32, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		np_resource = np.dtype([("resource", np.ubyte, 1)])

		%% Cell type:markdown id: tags:

		Now, let's use the MoleculeNet suite to load the Tox21 dataset. We need to make sure to process the data in a way that graph convolutional networks can use For that, we make sure to set the featurizer option to 'GraphConv'. The MoleculeNet call will return a training set, a validation set, and a test set for us to use. The call also returns `transformers`, a list of data transformations that were applied to preprocess the dataset. (Most deep networks are quite finicky and require a set of data transformations to ensure that training proceeds stably.)

		%% Cell type:code id: tags:

		``` python
		# Load Tox21 dataset
		tox21_tasks, tox21_datasets, transformers = dc.molnet.load_tox21(featurizer='GraphConv', reload=False)
		train_dataset, valid_dataset, test_dataset = tox21_datasets
		```

		%% Output

		Loading raw samples now.
		shard_size: 8192
		About to start loading CSV from /var/folders/st/ds45jcqj2232lvhr0y9qt5sc0000gn/T/tox21.csv.gz
		Loading shard 1 of size 8192.
		Featurizing sample 0

		RDKit WARNING: [12:34:15] WARNING: not removing hydrogen atom without neighbors

		Featurizing sample 1000
		Featurizing sample 2000
		Featurizing sample 3000
		Featurizing sample 4000
		Featurizing sample 5000
		Featurizing sample 6000
		Featurizing sample 7000
		TIMING: featurizing shard 0 took 9.963 s
		TIMING: dataset construction took 12.151 s
		Loading dataset from disk.
		TIMING: dataset construction took 2.447 s
		Loading dataset from disk.
		TIMING: dataset construction took 1.236 s
		Loading dataset from disk.
		TIMING: dataset construction took 1.171 s
		Loading dataset from disk.
		TIMING: dataset construction took 2.298 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.366 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.258 s
		Loading dataset from disk.

		%% Cell type:markdown id: tags:

		Let's now train a graph convolutional network on this dataset. DeepChem has the class `GraphConvModel` that wraps a standard graph convolutional architecture underneath the hood for user convenience. Let's instantiate an object of this class and train it on our dataset.

		%% Cell type:code id: tags:

		``` python
		n_tasks = len(tox21_tasks)
		model = GraphConvModel(n_tasks, batch_size=50, mode='classification')

		num_epochs = 10
		losses = []
		for i in range(num_epochs):
		loss = model.fit(train_dataset, nb_epoch=1)
		print("Epoch %d loss: %f" % (i, loss))
		losses.append(loss)
		```

		%% Output

		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
		Instructions for updating:
		Call initializer instance with the dtype argument instead of passing it to the constructor
		WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/layers.py:222: The name tf.unsorted_segment_sum is deprecated. Please use tf.math.unsorted_segment_sum instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/layers.py:224: The name tf.unsorted_segment_max is deprecated. Please use tf.math.unsorted_segment_max instead.

		WARNING:tensorflow:Entity <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:169: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/optimizers.py:76: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:258: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:260: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

		WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>>: AttributeError: module 'gast' has no attribute 'Num'

		WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/losses.py:108: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/losses.py:109: The name tf.losses.Reduction is deprecated. Please use tf.compat.v1.losses.Reduction instead.

		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:318: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
		Instructions for updating:
		Use tf.where in 2.0, which has the same broadcast rule as np.where

		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
		"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
		"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
		"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

		Epoch 0 loss: 0.179272
		Epoch 1 loss: 0.179948
		Epoch 2 loss: 0.170968
		Epoch 3 loss: 0.144263
		Epoch 4 loss: 0.154905
		Epoch 5 loss: 0.161570
		Epoch 6 loss: 0.157813
		Epoch 7 loss: 0.144116
		Epoch 8 loss: 0.160063
		Epoch 9 loss: 0.144864

		%% Cell type:markdown id: tags:

		Let's plot these losses so we can take a look at how the loss changes over the process of training.

		%% Cell type:code id: tags:

		``` python
		import matplotlib.pyplot as plot

		plot.ylabel("Loss")
		plot.xlabel("Epoch")
		x = range(num_epochs)
		y = losses
		plot.scatter(x, y)
		plot.show()
		```

		%% Output



		%% Cell type:markdown id: tags:

		We see that the losses fall nicely and give us stable learning.

		Let's try to evaluate the performance of the model we've trained. For this, we need to define a metric, a measure of model performance. `dc.metrics` holds a collection of metrics already. For this dataset, it is standard to use the ROC-AUC score, the area under the receiver operating characteristic curve (which measures the tradeoff between precision and recall). Luckily, the ROC-AUC score is already available in DeepChem.

		To measure the performance of the model under this metric, we can use the convenience function `model.evaluate()`.

		%% Cell type:code id: tags:

		``` python
		import numpy as np
		metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)

		print("Evaluating model")
		train_scores = model.evaluate(train_dataset, [metric], transformers)
		print("Training ROC-AUC Score: %f" % train_scores["mean-roc_auc_score"])
		valid_scores = model.evaluate(valid_dataset, [metric], transformers)
		print("Validation ROC-AUC Score: %f" % valid_scores["mean-roc_auc_score"])
		```

		%% Output

		Evaluating model
		computed_metrics: [0.8595891713475186, 0.9208810011239563, 0.9147081134144165, 0.8827564343909045, 0.7891199471022603, 0.8785310463438729, 0.8952509980966841, 0.8515397554668475, 0.8856747579741513, 0.8522724355082143, 0.9230658251868931, 0.8917149130167253]
		Training ROC-AUC Score: 0.878759
		computed_metrics: [0.8572700737149359, 0.8533399470899471, 0.8608064442725163, 0.8154550076258261, 0.6855681818181818, 0.7803477303123278, 0.7248182762201454, 0.8583574062331196, 0.8459618554273605, 0.7478649766271126, 0.8969316630338451, 0.8312230835486649]
		Validation ROC-AUC Score: 0.813162

		%% Cell type:markdown id: tags:

		What's going on under the hood? Could we build `GraphConvModel` ourselves? Of course! The first step is to define the inputs to our model. Conceptually, graph convolutions just require the structure of the molecule in question and a vector of features for every atom that describes the local chemical environment. However in practice, due to TensorFlow's limitations as a general programming environment, we have to have some auxiliary information as well preprocessed.

		`atom_features` holds a feature vector of length 75 for each atom. The other inputs are required to support minibatching in TensorFlow. `degree_slice` is an indexing convenience that makes it easy to locate atoms from all molecules with a given degree. `membership` determines the membership of atoms in molecules (atom `i` belongs to molecule `membership[i]`). `deg_adjs` is a list that contains adjacency lists grouped by atom degree. For more details, check out the [code](https://github.com/deepchem/deepchem/blob/master/deepchem/feat/mol_graphs.py).

		To define feature inputs with Keras, we use the `Input` layer. Conceptually, a model is a mathematical graph composed of layer objects. `Input` layers have to be the root nodes of the graph since they consitute inputs.

		%% Cell type:code id: tags:

		``` python
		import tensorflow as tf
		import tensorflow.keras.layers as layers

		atom_features = layers.Input(shape=(75,))
		degree_slice = layers.Input(shape=(2,), dtype=tf.int32)
		membership = layers.Input(shape=tuple(), dtype=tf.int32)

		deg_adjs = []
		for i in range(0, 10 + 1):
		deg_adj = layers.Input(shape=(i+1,), dtype=tf.int32)
		deg_adjs.append(deg_adj)
		```

		%% Cell type:markdown id: tags:

		Let's now implement the body of the graph convolutional network. DeepChem has a number of layers that encode various graph operations. Namely, the `GraphConv`, `GraphPool` and `GraphGather` layers. We will also apply standard neural network layers such as `Dense` and `BatchNormalization`.

		The layers we're adding effect a "feature transformation" that will create one vector for each molecule.

		%% Cell type:code id: tags:

		``` python
		from deepchem.models.layers import GraphConv, GraphPool, GraphGather

		batch_size = 50

		gc1 = GraphConv(64, activation_fn=tf.nn.relu)([atom_features, degree_slice, membership] + deg_adjs)
		batch_norm1 = layers.BatchNormalization()(gc1)
		gp1 = GraphPool()([batch_norm1, degree_slice, membership] + deg_adjs)
		gc2 = GraphConv(64, activation_fn=tf.nn.relu)([gp1, degree_slice, membership] + deg_adjs)
		batch_norm2 = layers.BatchNormalization()(gc2)
		gp2 = GraphPool()([batch_norm2, degree_slice, membership] + deg_adjs)
		dense = layers.Dense(128, activation=tf.nn.relu)(gp2)
		batch_norm3 = layers.BatchNormalization()(dense)
		readout = GraphGather(batch_size=batch_size, activation_fn=tf.nn.tanh)([batch_norm3, degree_slice, membership] + deg_adjs)
		logits = layers.Reshape((n_tasks, 2))(layers.Dense(n_tasks*2)(readout))
		softmax = layers.Softmax()(logits)
		```

		%% Output

		WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>>: AttributeError: module 'gast' has no attribute 'Num'

		%% Cell type:markdown id: tags:

		Let's now create the `KerasModel`. To do that we specify the inputs and outputs to the model. We also have to define a loss for the model which tells the network the objective to minimize during training.

		%% Cell type:code id: tags:

		``` python
		inputs = [atom_features, degree_slice, membership] + deg_adjs
		outputs = [softmax]
		keras_model = tf.keras.Model(inputs=inputs, outputs=outputs)
		loss = dc.models.losses.CategoricalCrossEntropy()
		model = dc.models.KerasModel(keras_model, loss=loss)
		```

		%% Cell type:markdown id: tags:

		Now that we've successfully defined our graph convolutional model, we need to train it. We can call `fit()`, but we need to make sure that each minibatch of data populates all the `Input` objects that we've created. For this, we need to create a Python generator that given a batch of data generates the lists of inputs, labels, and weights whose values are Numpy arrays we'd like to use for this step of training.

		%% Cell type:code id: tags:

		``` python
		from deepchem.metrics import to_one_hot
		from deepchem.feat.mol_graphs import ConvMol

		def data_generator(dataset, epochs=1, predict=False, pad_batches=True):
		for epoch in range(epochs):
		for ind, (X_b, y_b, w_b, ids_b) in enumerate(
		dataset.iterbatches(
		batch_size, pad_batches=pad_batches, deterministic=True)):
		multiConvMol = ConvMol.agglomerate_mols(X_b)
		inputs = [multiConvMol.get_atom_features(), multiConvMol.deg_slice, np.array(multiConvMol.membership)]
		for i in range(1, len(multiConvMol.get_deg_adjacency_lists())):
		inputs.append(multiConvMol.get_deg_adjacency_lists()[i])
		labels = [to_one_hot(y_b.flatten(), 2).reshape(-1, n_tasks, 2)]
		weights = [w_b]
		yield (inputs, labels, weights)
		```

		%% Cell type:markdown id: tags:

		Now, we can train the model using `KerasModel.fit_generator(generator)` which will use the generator we've defined to train the model.

		%% Cell type:code id: tags:

		``` python
		num_epochs = 10
		losses = []
		for i in range(num_epochs):
		loss = model.fit_generator(data_generator(train_dataset, epochs=1))
		print("Epoch %d loss: %f" % (i, loss))
		losses.append(loss)
		```

		%% Output

		WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>>: AttributeError: module 'gast' has no attribute 'Num'

		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
		"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
		"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
		"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

		Epoch 0 loss: 0.187656
		Epoch 1 loss: 0.176989
		Epoch 2 loss: 0.172129
		Epoch 3 loss: 0.137272
		Epoch 4 loss: 0.159109
		Epoch 5 loss: 0.157422
		Epoch 6 loss: 0.153595
		Epoch 7 loss: 0.144544
		Epoch 8 loss: 0.146739
		Epoch 9 loss: 0.143846

		%% Cell type:markdown id: tags:

		Let's now plot these losses and take a quick look.

		%% Cell type:code id: tags:

		``` python
		plot.title("Keras Version")
		plot.ylabel("Loss")
		plot.xlabel("Epoch")
		x = range(num_epochs)
		y = losses
		plot.scatter(x, y)
		plot.show()
		```

		%% Output



		%% Cell type:markdown id: tags:

		Now that we have trained our graph convolutional method, let's evaluate its performance. We again have to use our defined generator to evaluate model performance.

		%% Cell type:code id: tags:

		``` python
		metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)

		def reshape_y_pred(y_true, y_pred):
		"""
		GraphConv always pads batches, so we need to remove the predictions
		for the padding samples. Also, it outputs two values for each task
		(probabilities of positive and negative), but we only want the positive
		probability.
		"""
		n_samples = len(y_true)
		return y_pred[:n_samples, :, 1]


		print("Evaluating model")
		train_predictions = model.predict_on_generator(data_generator(train_dataset, predict=True))
		train_predictions = reshape_y_pred(train_dataset.y, train_predictions)
		train_scores = metric.compute_metric(train_dataset.y, train_predictions, train_dataset.w)
		print("Training ROC-AUC Score: %f" % train_scores)

		valid_predictions = model.predict_on_generator(data_generator(valid_dataset, predict=True))
		valid_predictions = reshape_y_pred(valid_dataset.y, valid_predictions)
		valid_scores = metric.compute_metric(valid_dataset.y, valid_predictions, valid_dataset.w)
		print("Valid ROC-AUC Score: %f" % valid_scores)
		```

		%% Output

		Evaluating model
		computed_metrics: [0.8597628498713714]
		Training ROC-AUC Score: 0.859763
		computed_metrics: [0.7793962313756978]
		Valid ROC-AUC Score: 0.779396

		%% Cell type:markdown id: tags:

		Success! The model we've constructed behaves nearly identically to `GraphConvModel`. If you're looking to build your own custom models, you can follow the example we've provided here to do so. We hope to see exciting constructions from your end soon!

		%% Cell type:markdown id: tags:

		# Congratulations! Time to join the Community!

		Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

		## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
		This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

		## Join the DeepChem Gitter
		The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!

examples/tutorials/05_Putting_Multitask_Learning_to_Work.ipynb

+2 −2

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial Part 4: Putting Multitask Learning to Work
		# Tutorial Part 5: Putting Multitask Learning to Work

		This notebook walks through the creation of multitask models on MUV [1]. The goal is to demonstrate that multitask methods outperform singletask methods on MUV.

		## Colab

		This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/04_Putting_Multitask_Learning_to_Work.ipynb)
		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/05_Putting_Multitask_Learning_to_Work.ipynb)


		## Setup

		To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

		%% Cell type:code id: tags:

		``` python
		!wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
		!chmod +x Anaconda3-2019.10-Linux-x86_64.sh
		!bash ./Anaconda3-2019.10-Linux-x86_64.sh -b -f -p /usr/local
		!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
		import sys
		sys.path.append('/usr/local/lib/python3.7/site-packages/')
		```

		%% Cell type:markdown id: tags:

		The MUV dataset is a challenging benchmark in molecular design that consists of 17 different "targets" where there are only a few "active" compounds per target. The goal of working with this dataset is to make a machine learnign model which achieves high accuracy on held-out compounds at predicting activity. To get started, let's download the MUV dataset for us to play with.

		%% Cell type:code id: tags:

		``` python
		import os
		import deepchem as dc

		current_dir = os.path.dirname(os.path.realpath("__file__"))
		dataset_file = "medium_muv.csv.gz"
		full_dataset_file = "muv.csv.gz"

		# We use a small version of MUV to make online rendering of notebooks easy. Replace with full_dataset_file
		# In order to run the full version of this notebook
		dc.utils.download_url("https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/%s" % dataset_file,
		current_dir)

		dataset = dc.utils.save.load_from_disk(dataset_file)
		print("Columns of dataset: %s" % str(dataset.columns.values))
		print("Number of examples in dataset: %s" % str(dataset.shape[0]))
		```

		%% Output

		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
		warnings.warn(msg, category=FutureWarning)
		RDKit WARNING: [19:04:44] Enabling RDKit 2019.09.3 jupyter extensions
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint8 = np.dtype([("qint8", np.int8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint16 = np.dtype([("qint16", np.int16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint32 = np.dtype([("qint32", np.int32, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		np_resource = np.dtype([("resource", np.ubyte, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint8 = np.dtype([("qint8", np.int8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint16 = np.dtype([("qint16", np.int16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		_np_qint32 = np.dtype([("qint32", np.int32, 1)])
		/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
		np_resource = np.dtype([("resource", np.ubyte, 1)])

		Columns of dataset: ['MUV-466' 'MUV-548' 'MUV-600' 'MUV-644' 'MUV-652' 'MUV-689' 'MUV-692'
		'MUV-712' 'MUV-713' 'MUV-733' 'MUV-737' 'MUV-810' 'MUV-832' 'MUV-846'
		'MUV-852' 'MUV-858' 'MUV-859' 'mol_id' 'smiles']
		Number of examples in dataset: 10000

		%% Cell type:markdown id: tags:

		Now, let's visualize some compounds from our dataset

		%% Cell type:code id: tags:

		``` python
		from rdkit import Chem
		from rdkit.Chem import Draw
		from itertools import islice
		from IPython.display import Image, display, HTML

		def display_images(filenames):
		"""Helper to pretty-print images."""
		for filename in filenames:
		display(Image(filename))

		def mols_to_pngs(mols, basename="test"):
		"""Helper to write RDKit mols to png files."""
		filenames = []
		for i, mol in enumerate(mols):
		filename = "MUV_%s%d.png" % (basename, i)
		Draw.MolToFile(mol, filename)
		filenames.append(filename)
		return filenames

		num_to_display = 12
		molecules = []
		for _, data in islice(dataset.iterrows(), num_to_display):
		molecules.append(Chem.MolFromSmiles(data["smiles"]))
		display_images(mols_to_pngs(molecules))
		```

		%% Output

























		%% Cell type:markdown id: tags:

		There are 17 datasets total in MUV as we mentioned previously. We're going to train a multitask model that attempts to build a joint model to predict activity across all 17 datasets simultaneously. There's some evidence [2] that multitask training creates more robust models.

		As fair warning, from my experience, this effect can be quite fragile. Nonetheless, it's a tool worth trying given how easy DeepChem makes it to build these models. To get started towards building our actual model, let's first featurize our data.

		%% Cell type:code id: tags:

		``` python
		MUV_tasks = ['MUV-692', 'MUV-689', 'MUV-846', 'MUV-859', 'MUV-644',
		'MUV-548', 'MUV-852', 'MUV-600', 'MUV-810', 'MUV-712',
		'MUV-737', 'MUV-858', 'MUV-713', 'MUV-733', 'MUV-652',
		'MUV-466', 'MUV-832']

		featurizer = dc.feat.CircularFingerprint(size=1024)
		loader = dc.data.CSVLoader(
		tasks=MUV_tasks, smiles_field="smiles",
		featurizer=featurizer)
		dataset = loader.featurize(dataset_file)
		```

		%% Output

		Loading raw samples now.
		shard_size: 8192
		About to start loading CSV from medium_muv.csv.gz
		Loading shard 1 of size 8192.
		Featurizing sample 0
		Featurizing sample 1000
		Featurizing sample 2000
		Featurizing sample 3000
		Featurizing sample 4000
		Featurizing sample 5000
		Featurizing sample 6000
		Featurizing sample 7000
		Featurizing sample 8000
		TIMING: featurizing shard 0 took 10.505 s
		Loading shard 2 of size 8192.
		Featurizing sample 0
		Featurizing sample 1000
		TIMING: featurizing shard 1 took 2.240 s
		TIMING: dataset construction took 13.162 s
		Loading dataset from disk.

		%% Cell type:markdown id: tags:

		We'll now want to split our dataset into training, validation, and test sets. We're going to do a simple random split using `dc.splits.RandomSplitter`. It's worth noting that this will provide overestimates of real generalizability! For better real world estimates of prospective performance, you'll want to use a harder splitter.

		%% Cell type:code id: tags:

		``` python
		splitter = dc.splits.RandomSplitter(dataset_file)
		train_dataset, valid_dataset, test_dataset = splitter.train_valid_test_split(
		dataset)
		#NOTE THE RENAMING:
		valid_dataset, test_dataset = test_dataset, valid_dataset
		```

		%% Output

		Computing train/valid/test indices
		TIMING: dataset construction took 0.611 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.308 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.279 s
		Loading dataset from disk.

		%% Cell type:markdown id: tags:

		Let's now get started building some models! We'll do some simple hyperparameter searching to build a robust model.

		%% Cell type:code id: tags:

		``` python
		import numpy as np
		import numpy.random

		params_dict = {"activation": ["relu"],
		"momentum": [.9],
		"batch_size": [50],
		"init": ["glorot_uniform"],
		"data_shape": [train_dataset.get_data_shape()],
		"learning_rate": [1e-3],
		"decay": [1e-6],
		"nb_epoch": [1],
		"nesterov": [False],
		"dropouts": [(.5,)],
		"nb_layers": [1],
		"batchnorm": [False],
		"layer_sizes": [(1000,)],
		"weight_init_stddevs": [(.1,)],
		"bias_init_consts": [(1.,)],
		"penalty": [0.],
		}


		n_features = train_dataset.get_data_shape()[0]
		def model_builder(model_params, model_dir):
		model = dc.models.MultitaskClassifier(
		len(MUV_tasks), n_features, **model_params)
		return model

		metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)
		optimizer = dc.hyper.HyperparamOpt(model_builder)
		best_dnn, best_hyperparams, all_results = optimizer.hyperparam_search(
		params_dict, train_dataset, valid_dataset, [], metric)
		```

		%% Output

		Fitting model 1/1
		hyperparameters: {'activation': 'relu', 'momentum': 0.9, 'batch_size': 50, 'init': 'glorot_uniform', 'data_shape': (1024,), 'learning_rate': 0.001, 'decay': 1e-06, 'nb_epoch': 1, 'nesterov': False, 'dropouts': (0.5,), 'nb_layers': 1, 'batchnorm': False, 'layer_sizes': (1000,), 'weight_init_stddevs': (0.1,), 'bias_init_consts': (1.0,), 'penalty': 0.0}
		WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
		Instructions for updating:
		Call initializer instance with the dtype argument instead of passing it to the constructor
		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:169: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/optimizers.py:76: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:258: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:260: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:237: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/losses.py:108: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/losses.py:109: The name tf.losses.Reduction is deprecated. Please use tf.compat.v1.losses.Reduction instead.


		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))

		computed_metrics: [nan, nan, 0.6733333333333333, nan, 0.9825581395348837, nan, nan, nan, nan, 0.9116766467065869, nan, nan, nan, 0.5046583850931677, nan, 0.7608024691358024, nan]
		Model 1/1, Metric mean-roc_auc_score, Validation set 0: 0.766606
		best_validation_score so far: 0.766606
		computed_metrics: [1.0, nan, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, nan, 1.0, nan, 1.0]
		Best hyperparameters: ('relu', 0.9, 50, 'glorot_uniform', (1024,), 0.001, 1e-06, 1, False, (0.5,), 1, False, (1000,), (0.1,), (1.0,), 0.0)
		train_score: 1.000000
		validation_score: 0.766606

		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))
		/Users/bharath/Code/deepchem/deepchem/metrics/__init__.py:368: UserWarning: Error calculating metric mean-roc_auc_score: Only one class present in y_true. ROC AUC score is not defined in that case.
		warnings.warn("Error calculating metric %s: %s" % (self.name, e))

		%% Cell type:markdown id: tags:

		# Congratulations! Time to join the Community!

		Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

		## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
		This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

		## Join the DeepChem Gitter
		The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!

		# Bibliography

		[1] https://pubs.acs.org/doi/10.1021/ci8002649

		[2] https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00146

examples/tutorials/06_Going_Deeper_on_Molecular_Featurizations.ipynb

+36 −4

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial Part 4: Going Deeper On Molecular Featurizations
		# Tutorial Part 6: Going Deeper On Molecular Featurizations

		One of the most important steps of doing machine learning on molecular data is transforming this data into a form amenable to the application of learning algorithms. This process is broadly called "featurization" and involves tutrning a molecule into a vector or tensor of some sort. There are a number of different ways of doing such transformations, and the choice of featurization is often dependent on the problem at hand.

		In this tutorial, we explore the different featurization methods available for molecules. These featurization methods include:

		1. `ConvMolFeaturizer`,
		2. `WeaveFeaturizer`,
		3. `CircularFingerprints`
		4. `RDKitDescriptors`
		5. `BPSymmetryFunction`
		6. `CoulombMatrix`
		7. `CoulombMatrixEig`
		8. `AdjacencyFingerprints`

		## Colab

		This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/06_Going_Deeper_on_Molecular_Featurizations.ipynb)

		## Setup

		To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

		%% Cell type:code id: tags:

		``` python
		!wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
		!chmod +x Anaconda3-2019.10-Linux-x86_64.sh
		!bash ./Anaconda3-2019.10-Linux-x86_64.sh -b -f -p /usr/local
		!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
		import sys
		sys.path.append('/usr/local/lib/python3.7/site-packages/')
		```

		%% Cell type:markdown id: tags:

		Let's start with some basic imports

		%% Cell type:code id: tags:

		``` python
		from __future__ import print_function
		from __future__ import division
		from __future__ import unicode_literals

		import numpy as np
		from rdkit import Chem

		from deepchem.feat import ConvMolFeaturizer, WeaveFeaturizer, CircularFingerprint
		from deepchem.feat import AdjacencyFingerprint, RDKitDescriptors
		from deepchem.feat import BPSymmetryFunctionInput, CoulombMatrix, CoulombMatrixEig
		from deepchem.utils import conformers
		```

		%% Cell type:markdown id: tags:

		We use `propane`( $CH_3 CH_2 CH_3 $ ) as a running example throughout this tutorial. Many of the featurization methods use conformers or the molecules. A conformer can be generated using the `ConformerGenerator` class in `deepchem.utils.conformers`.

		%% Cell type:markdown id: tags:

		### RDKitDescriptors

		%% Cell type:markdown id: tags:

		`RDKitDescriptors` featurizes a molecule by computing descriptors values for specified descriptors. Intrinsic to the featurizer is a set of allowed descriptors, which can be accessed using `RDKitDescriptors.allowedDescriptors`.

		The featurizer uses the descriptors in `rdkit.Chem.Descriptors.descList`, checks if they are in the list of allowed descriptors and computes the descriptor value for the molecule.

		%% Cell type:code id: tags:

		``` python
		example_smile = "CCC"
		example_mol = Chem.MolFromSmiles(example_smile)
		```

		%% Cell type:markdown id: tags:

		Let's check the allowed list of descriptors. As you will see shortly, there's a wide range of chemical properties that RDKit computes for us.

		%% Cell type:code id: tags:

		``` python
		for descriptor in RDKitDescriptors.allowedDescriptors:
		print(descriptor)
		```

		%% Output

		NumAromaticHeterocycles
		EState_VSA7
		EState_VSA6
		MolMR
		BertzCT
		SMR_VSA10
		NHOHCount
		MinPartialCharge
		HallKierAlpha
		MinEStateIndex
		Chi1n
		Chi4n
		ExactMolWt
		VSA_EState8
		SMR_VSA5
		SMR_VSA9
		NumAliphaticCarbocycles
		VSA_EState2
		SlogP_VSA6
		VSA_EState7
		PEOE_VSA7
		NumHeteroatoms
		Chi1v
		PEOE_VSA2
		SMR_VSA4
		PEOE_VSA9
		HeavyAtomCount
		NumRadicalElectrons
		EState_VSA3
		NumValenceElectrons
		EState_VSA5
		PEOE_VSA10
		EState_VSA11
		EState_VSA10
		SMR_VSA7
		Chi1
		RingCount
		NumHDonors
		LabuteASA
		VSA_EState1
		Chi2v
		NumSaturatedCarbocycles
		SMR_VSA8
		Chi3v
		EState_VSA9
		Kappa2
		NumAliphaticHeterocycles
		Chi0
		SMR_VSA1
		SMR_VSA2
		PEOE_VSA1
		MolLogP
		NumAliphaticRings
		MinAbsPartialCharge
		BalabanJ
		Kappa1
		PEOE_VSA13
		EState_VSA4
		SlogP_VSA11
		MolWt
		SMR_VSA3
		Chi2n
		VSA_EState3
		MaxEStateIndex
		PEOE_VSA11
		Ipc
		MaxAbsPartialCharge
		Chi0n
		VSA_EState10
		VSA_EState5
		EState_VSA1
		FractionCSP3
		Kappa3
		MaxPartialCharge
		PEOE_VSA6
		SlogP_VSA7
		NumHAcceptors
		NumAromaticCarbocycles
		SMR_VSA6
		Chi3n
		HeavyAtomMolWt
		SlogP_VSA8
		VSA_EState9
		PEOE_VSA3
		SlogP_VSA5
		NumRotatableBonds
		PEOE_VSA14
		SlogP_VSA9
		NOCount
		VSA_EState4
		PEOE_VSA5
		Chi0v
		NumSaturatedHeterocycles
		EState_VSA8
		SlogP_VSA12
		Chi4v
		SlogP_VSA10
		NumSaturatedRings
		MinAbsEStateIndex
		SlogP_VSA2
		SlogP_VSA1
		PEOE_VSA8
		PEOE_VSA4
		TPSA
		SlogP_VSA4
		SlogP_VSA3
		NumAromaticRings
		MaxAbsEStateIndex
		EState_VSA2
		VSA_EState6
		PEOE_VSA12

		%% Cell type:code id: tags:

		``` python
		rdkit_desc = RDKitDescriptors()
		features = rdkit_desc._featurize(example_mol)

		print('The number of descriptors present are: ', len(features))
		```

		%% Output

		The number of descriptors present are: 111

		%% Cell type:markdown id: tags:

		### BPSymmetryFunction

		%% Cell type:markdown id: tags:

		`Behler-Parinello Symmetry function` or `BPSymmetryFunction` featurizes a molecule by computing the atomic number and coordinates for each atom in the molecule. The features can be used as input for symmetry functions, like `RadialSymmetry`, `DistanceMatrix` and `DistanceCutoff` . More details on these symmetry functions can be found in [this paper](https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.98.146401). These functions can be found in `deepchem.feat.coulomb_matrices`

		The featurizer takes in `max_atoms` as an argument. As input, it takes in a conformer of the molecule and computes:

		1. coordinates of every atom in the molecule (in Bohr units)
		2. the atomic numbers for all atoms.

		These features are concantenated and padded with zeros to account for different number of atoms, across molecules.

		%% Cell type:code id: tags:

		``` python
		example_smile = "CCC"
		example_mol = Chem.MolFromSmiles(example_smile)
		engine = conformers.ConformerGenerator(max_conformers=1)
		example_mol = engine.generate_conformers(example_mol)
		```

		%% Cell type:markdown id: tags:

		Let's now take a look at the actual featurized matrix that comes out.

		%% Cell type:code id: tags:

		``` python
		bp_sym = BPSymmetryFunctionInput(max_atoms=20)
		features = bp_sym._featurize(mol=example_mol)
		features
		```

		%% Output

		array([[ 6. , 2.33166293, -0.52962788, -0.48097309],
		[ 6. , 0.0948792 , 1.07597567, -1.33579553],
		[ 6. , -2.40436371, -0.29483572, -0.90388318],
		[ 1. , 2.18166462, -0.95639011, 1.569049 ],
		[ 1. , 4.1178375 , 0.51816193, -0.81949623],
		[ 1. , 2.39319787, -2.32844253, -1.56157176],
		[ 1. , 0.29919987, 1.51730566, -3.37889252],
		[ 1. , 0.08875543, 2.88229706, -0.26437996],
		[ 1. , -3.99100651, 0.92016315, -1.54358853],
		[ 1. , -2.66167993, -0.71627602, 1.136556 ],
		[ 1. , -2.45014726, -2.08833123, -1.99406318],
		[ 0. , 0. , 0. , 0. ],
		[ 0. , 0. , 0. , 0. ],
		[ 0. , 0. , 0. , 0. ],
		[ 0. , 0. , 0. , 0. ],
		[ 0. , 0. , 0. , 0. ],
		[ 0. , 0. , 0. , 0. ],
		[ 0. , 0. , 0. , 0. ],
		[ 0. , 0. , 0. , 0. ],
		[ 0. , 0. , 0. , 0. ]])

		%% Cell type:markdown id: tags:

		A simple check for the featurization would be to count the different atomic numbers present in the features.

		%% Cell type:code id: tags:

		``` python
		atomic_numbers = features[:, 0]
		from collections import Counter

		unique_numbers = Counter(atomic_numbers)
		print(unique_numbers)
		```

		%% Output

		Counter({0.0: 9, 1.0: 8, 6.0: 3})

		%% Cell type:markdown id: tags:

		For propane, we have $3$ `C-atoms` and $8$ `H-atoms`, and these numbers are in agreement with the results shown above. There's also the additional padding of 9 atoms, to equalize with `max_atoms`.

		%% Cell type:markdown id: tags:

		### CoulombMatrix

		%% Cell type:markdown id: tags:

		`CoulombMatrix`, featurizes a molecule by computing the coulomb matrices for different conformers of the molecule, and returning it as a list.

		A Coulomb matrix tries to encode the energy structure of a molecule. The matrix is symmetric, with the off-diagonal elements capturing the Coulombic repulsion between pairs of atoms and the diagonal elements capturing atomic energies using the atomic numbers. More information on the functional forms used can be found [here](https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.108.058301).

		The featurizer takes in `max_atoms` as an argument and also has options for removing hydrogens from the molecule (`remove_hydrogens`), generating additional random coulomb matrices(`randomize`), and getting only the upper triangular matrix (`upper_tri`).

		%% Cell type:code id: tags:

		``` python
		example_smile = "CCC"
		example_mol = Chem.MolFromSmiles(example_smile)

		engine = conformers.ConformerGenerator(max_conformers=1)
		example_mol = engine.generate_conformers(example_mol)

		print("Number of available conformers for propane: ", len(example_mol.GetConformers()))
		```

		%% Output

		Number of available conformers for propane: 1

		%% Cell type:code id: tags:

		``` python
		coulomb_mat = CoulombMatrix(max_atoms=20, randomize=False, remove_hydrogens=False, upper_tri=False)
		features = coulomb_mat._featurize(mol=example_mol)
		```

		%% Cell type:markdown id: tags:

		A simple check for the featurization is to see if the feature list has the same length as the number of conformers

		%% Cell type:code id: tags:

		``` python
		print(len(example_mol.GetConformers()) == len(features))
		```

		%% Output

		True

		%% Cell type:markdown id: tags:

		### CoulombMatrixEig

		%% Cell type:markdown id: tags:

		`CoulombMatrix` is invariant to molecular rotation and translation, since the interatomic distances or atomic numbers do not change. However the matrix is not invariant to random permutations of the atom's indices. To deal with this, the `CoulumbMatrixEig` featurizer was introduced, which uses the eigenvalue spectrum of the columb matrix, and is invariant to random permutations of the atom's indices.

		`CoulombMatrixEig` inherits from `CoulombMatrix` and featurizes a molecule by first computing the coulomb matrices for different conformers of the molecule and then computing the eigenvalues for each coulomb matrix. These eigenvalues are then padded to account for variation in number of atoms across molecules.

		The featurizer takes in `max_atoms` as an argument and also has options for removing hydrogens from the molecule (`remove_hydrogens`), generating additional random coulomb matrices(`randomize`).

		%% Cell type:code id: tags:

		``` python
		example_smile = "CCC"
		example_mol = Chem.MolFromSmiles(example_smile)

		engine = conformers.ConformerGenerator(max_conformers=1)
		example_mol = engine.generate_conformers(example_mol)

		print("Number of available conformers for propane: ", len(example_mol.GetConformers()))
		```

		%% Output

		Number of available conformers for propane: 1

		%% Cell type:code id: tags:

		``` python
		coulomb_mat_eig = CoulombMatrixEig(max_atoms=20, randomize=False, remove_hydrogens=False)
		features = coulomb_mat_eig._featurize(mol=example_mol)
		```

		%% Cell type:code id: tags:

		``` python
		print(len(example_mol.GetConformers()) == len(features))
		```

		%% Output

		True

		%% Cell type:markdown id: tags:

		### Adjacency Fingerprints

		%% Cell type:code id: tags:
		%% Cell type:markdown id: tags:

		``` python
		```
		TODO(rbharath): This tutorial still needs to be expanded out with the additional fingerprints.

		%% Cell type:markdown id: tags:

		# Congratulations! Time to join the Community!

		Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

		## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
		This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

		## Join the DeepChem Gitter
		The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!

examples/tutorials/07_Uncertainty_In_Deep_Learning.ipynb

+2 −2

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial Part 5: Uncertainty in Deep Learning
		# Tutorial Part 7: Uncertainty in Deep Learning

		A common criticism of deep learning models is that they tend to act as black boxes. A model produces outputs, but doesn't given enough context to interpret them properly. How reliable are the model's predictions? Are some predictions more reliable than others? If a model predicts a value of 5.372 for some quantity, should you assume the true value is between 5.371 and 5.373? Or that it's between 2 and 8? In some fields this situation might be good enough, but not in science. For every value predicted by a model, we also want an estimate of the uncertainty in that value so we can know what conclusions to draw based on it.

		DeepChem makes it very easy to estimate the uncertainty of predicted outputs (at least for the models that support it—not all of them do). Let's start by seeing an example of how to generate uncertainty estimates. We load a dataset, create a model, train it on the training set, predict the output on the test set, and then derive some uncertainty estimates.

		## Colab

		This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/06_Uncertainty_In_Deep_Learning.ipynb)
		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/07_Uncertainty_In_Deep_Learning.ipynb)

		## Setup

		To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

		%% Cell type:code id: tags:

		``` python
		!wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
		!chmod +x Anaconda3-2019.10-Linux-x86_64.sh
		!bash ./Anaconda3-2019.10-Linux-x86_64.sh -b -f -p /usr/local
		!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
		import sys
		sys.path.append('/usr/local/lib/python3.7/site-packages/')
		```

		%% Cell type:markdown id: tags:

		We'll use the SAMPL dataset from the MoleculeNet suite to run our experiments in this tutorial. Let's load up our dataset for our experiments, and then make some uncertainty predictions.

		%% Cell type:code id: tags:

		``` python
		import deepchem as dc
		import numpy as np
		import matplotlib.pyplot as plot

		tasks, datasets, transformers = dc.molnet.load_sampl(reload=False)
		train_dataset, valid_dataset, test_dataset = datasets

		model = dc.models.MultitaskRegressor(len(tasks), 1024, uncertainty=True)
		model.fit(train_dataset, nb_epoch=200)
		y_pred, y_std = model.predict_uncertainty(test_dataset)
		```

		%% Output

		Loading raw samples now.
		shard_size: 8192
		About to start loading CSV from /var/folders/st/ds45jcqj2232lvhr0y9qt5sc0000gn/T/SAMPL.csv
		Loading shard 1 of size 8192.
		Featurizing sample 0
		TIMING: featurizing shard 0 took 0.698 s
		TIMING: dataset construction took 0.729 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.030 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.017 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.016 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.028 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.016 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.016 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.023 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.007 s
		Loading dataset from disk.
		TIMING: dataset construction took 0.006 s
		Loading dataset from disk.
		WARNING:tensorflow:Entity <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38119908>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38119908>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38119908>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38119908>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:169: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/optimizers.py:76: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:258: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:260: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

		WARNING:tensorflow:Entity <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38119908>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38119908>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING: Entity <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38119908>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38119908>>: AttributeError: module 'gast' has no attribute 'Num'
		WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:237: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.


		%% Cell type:markdown id: tags:

		All of this looks exactly like any other example, with just two differences. First, we add the option `uncertainty=True` when creating the model. This instructs it to add features to the model that are needed for estimating uncertainty. Second, we call `predict_uncertainty()` instead of `predict()` to produce the output. `y_pred` is the predicted outputs. `y_std` is another array of the same shape, where each element is an estimate of the uncertainty (standard deviation) of the corresponding element in `y_pred`. And that's all there is to it! Simple, right?

		Of course, it isn't really that simple at all. DeepChem is doing a lot of work to come up with those uncertainties. So now let's pull back the curtain and see what is really happening. (For the full mathematical details of calculating uncertainty, see https://arxiv.org/abs/1703.04977)

		To begin with, what does "uncertainty" mean? Intuitively, it is a measure of how much we can trust the predictions. More formally, we expect that the true value of whatever we are trying to predict should usually be within a few standard deviations of the predicted value. But uncertainty comes from many sources, ranging from noisy training data to bad modelling choices, and different sources behave in different ways. It turns out there are two fundamental types of uncertainty we need to take into account.

		### Aleatoric Uncertainty

		Consider the following graph. It shows the best fit linear regression to a set of ten data points.

		%% Cell type:code id: tags:

		``` python
		# Generate some fake data and plot a regression line.
		x = np.linspace(0, 5, 10)
		y = 0.15*x + np.random.random(10)
		plot.scatter(x, y)
		fit = np.polyfit(x, y, 1)
		line_x = np.linspace(-1, 6, 2)
		plot.plot(line_x, np.poly1d(fit)(line_x))
		plot.show()
		```

		%% Output



		%% Cell type:markdown id: tags:

		The line clearly does not do a great job of fitting the data. There are many possible reasons for this. Perhaps the measuring device used to capture the data was not very accurate. Perhaps `y` depends on some other factor in addition to `x`, and if we knew the value of that factor for each data point we could predict `y` more accurately. Maybe the relationship between `x` and `y` simply isn't linear, and we need a more complicated model to capture it. Regardless of the cause, the model clearly does a poor job of predicting the training data, and we need to keep that in mind. We cannot expect it to be any more accurate on test data than on training data. This is known as aleatoric uncertainty.

		How can we estimate the size of this uncertainty? By training a model to do it, of course! At the same time it is learning to predict the outputs, it is also learning to predict how accurately each output matches the training data. For every output of the model, we add a second output that produces the corresponding uncertainty. Then we modify the loss function to make it learn both outputs at the same time.

		### Epistemic Uncertainty

		Now consider these three curves. They are fit to the same data points as before, but this time we are using 10th degree polynomials.

		%% Cell type:code id: tags:

		``` python
		plot.figure(figsize=(12, 3))
		line_x = np.linspace(0, 5, 50)
		for i in range(3):
		plot.subplot(1, 3, i+1)
		plot.scatter(x, y)
		fit = np.polyfit(np.concatenate([x, [3]]), np.concatenate([y, [i]]), 10)
		plot.plot(line_x, np.poly1d(fit)(line_x))
		plot.show()
		```

		%% Output



		%% Cell type:markdown id: tags:

		Each of them perfectly interpolates the data points, yet they clearly are different models. (In fact, there are infinitely many 10th degree polynomials that exactly interpolate any ten data points.) They make identical predictions for the data we fit them to, but for any other value of `x` they produce different predictions. This is called epistemic uncertainty. It means the data does not fully constrain the model. Given the training data, there are many different models we could have found, and those models make different predictions.

		The ideal way to measure epistemic uncertainty is to train many different models, each time using a different random seed and possibly varying hyperparameters. Then use all of them for each input and see how much the predictions vary. This is very expensive to do, since it involves repeating the whole training process many times. Fortunately, we can approximate the same effect in a less expensive way: by using dropout.

		Recall that when you train a model with dropout, you are effectively training a huge ensemble of different models all at once. Each training sample is evaluated with a different dropout mask, corresponding to a different random subset of the connections in the full model. Usually we only perform dropout during training and use a single averaged mask for prediction. But instead, let's use dropout for prediction too. We can compute the output for lots of different dropout masks, then see how much the predictions vary. This turns out to give a reasonable estimate of the epistemic uncertainty in the outputs.

		### Uncertain Uncertainty?

		Now we can combine the two types of uncertainty to compute an overall estimate of the error in each output:

		$$\sigma_\text{total} = \sqrt{\sigma_\text{aleatoric}^2 + \sigma_\text{epistemic}^2}$$

		This is the value DeepChem reports. But how much can you trust it? Remember how I started this tutorial: deep learning models should not be used as black boxes. We want to know how reliable the outputs are. Adding uncertainty estimates does not completely eliminate the problem; it just adds a layer of indirection. Now we have estimates of how reliable the outputs are, but no guarantees that those estimates are themselves reliable.

		Let's go back to the example we started with. We trained a model on the SAMPL training set, then generated predictions and uncertainties for the test set. Since we know the correct outputs for all the test samples, we can evaluate how well we did. Here is a plot of the absolute error in the predicted output versus the predicted uncertainty.

		%% Cell type:code id: tags:

		``` python
		abs_error = np.abs(y_pred.flatten()-test_dataset.y.flatten())
		plot.scatter(y_std.flatten(), abs_error)
		plot.xlabel('Standard Deviation')
		plot.ylabel('Absolute Error')
		plot.show()
		```

		%% Output



		%% Cell type:markdown id: tags:

		The first thing we notice is that the axes have similar ranges. The model clearly has learned the overall magnitude of errors in the predictions. There also is clearly a correlation between the axes. Values with larger uncertainties tend on average to have larger errors.

		Now let's see how well the values satisfy the expected distribution. If the standard deviations are correct, and if the errors are normally distributed (which is certainly not guaranteed to be true!), we expect 95% of the values to be within two standard deviations, and 99% to be within three standard deviations. Here is a histogram of errors as measured in standard deviations.

		%% Cell type:code id: tags:

		``` python
		plot.hist(abs_error/y_std.flatten(), 20)
		plot.show()
		```

		%% Output



		%% Cell type:markdown id: tags:

		Most of the values are in the expected range, but there are a handful of outliers at much larger values. Perhaps this indicates the errors are not normally distributed, but it may also mean a few of the uncertainties are too low. This is an important reminder: the uncertainties are just estimates, not rigorous measurements. Most of them are pretty good, but you should not put too much confidence in any single value.

		%% Cell type:markdown id: tags:

		# Congratulations! Time to join the Community!

		Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

		## Star DeepChem on GitHub
		Starring DeepChem on GitHub helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

		## Join the DeepChem Gitter
		The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!

Admin message